Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process kafka requests in a separate scheduling group #24973

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Jan 29, 2025

Added code that allows using separate scheduling group for Kafka
requests processing. The scheduling group can be switched based on the
API type. By default processing is done in the group returned by
kafka::server::get_request_handler_sg

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

Added property allowing us to control if `kafka` scheduling group should
be used to handle parsing and handling Kafka API requests.

Signed-off-by: Michał Maślanka <[email protected]>
Added code that allows using separate scheduling group for Kafka
requests processing. The scheduling group can be switched based on the
API type. By default processing is done in the group returned by
`kafka::server::get_request_handler_sg`

Signed-off-by: Michał Maślanka <[email protected]>
@mmaslankaprv mmaslankaprv requested a review from a team as a code owner January 29, 2025 13:15
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#61350
test_id test_kind job_url test_status passed
rptest.tests.cloud_storage_timing_stress_test.CloudStorageTimingStressTest.test_cloud_storage.cleanup_policy=compact.delete ducktape https://buildkite.com/redpanda/redpanda/builds/61350#0194b27d-3fa4-4236-9700-4026364e9e88 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61350#0194b27d-3fa6-4560-9166-e42bc00c4ad4 FLAKY 1/3
rptest.tests.datalake.simple_connect_test.RedpandaConnectIcebergTest.test_translating_avro_serialized_records.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61350#0194b27d-3fa4-4236-9700-4026364e9e88 FLAKY 1/2
storage_single_thread_rpunit.storage_single_thread_rpunit unit https://buildkite.com/redpanda/redpanda/builds/61350#0194b234-9348-4a86-8544-18742a9bcac9 FLAKY 1/2

@@ -682,6 +684,10 @@ connection_context::dispatch_method_once(request_header hdr, size_t size) {
? std::make_optional<ss::sstring>(*hdr.client_id)
: std::nullopt,
};

co_await ss::coroutine::switch_to(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisdowns related to the discussion from yesterday, does this even allocate the coroutine frame if await_ready returns true (because we are already in the right group - ignoring for a second whether we actually are here or not)? I thought not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which coroutine do you refer to? In any case there shouldn't be a coroutine frame allocated per switch_to, whether the switch happens or not. What await_ready returning true does is avoids an unnecessary task switch (which is probably more expensive even than a coro frame in "micro" costs, and certainly in macro costs as it breaks out of the current executing request, etc).

coordinator_ntp_mapper& server::coordinator_mapper() {
return _group_router.local().coordinator_mapper().local();
}

ss::scheduling_group server::get_request_scheduling_group(api_key key) const {
if (key == produce_request::api_type::key) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this kind of stuff usually we want to add a handler method I think, rather than having the kafka server know anything about the different keys. E.g., a get_request_sg() method on the generic handler which returns the group (or none/default) to use?

Is the problem that the handlers shouldn't access config?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i will change that

/**
* Scheduling group to process requests received via the REST API of
* admin server.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love to see these comments, thanks!

Copy link
Member

@travisdowns travisdowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good pending moving the SG definition into the handler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants