-
-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch partitions in parallel #683
Comments
Hi, thanks for your response. What I understood about this refactoring is that: The consumer will process batches immediately after a response for each fetch request. So, if I have understood correctly, a slow task for some partition will block new fetches for all other partitions. I tried the test with the new beta version and all partitions stay blocked by slower partition task. Test code updated with beta KafkaJS: https://github.com/hmagarotto/kafkajs-test-partition-parallelism Similar needs for another library/language: akka/alpakka-kafka#110 |
Your understanding is correct, @hmagarotto. Currently the fetch loop will wait until all partitions have been processed before issuing new fetch requests. I've been having the same discussion on our Slack today, and I think it makes a lot of sense to start the next fetch for a broker as soon as we have processed the last response from it, to avoid having to wait for the slowest one. |
@Nevon, we recently tried a BETA version of KafkaJS and noticed that another fetch started before the current processing fetch completed. The problem we encountered was that the second fetch received the same message/offset from the same topic/partition that was currently being processed in the yet to finish fetch. This resulted in the same message being processed twice "concurrently" within <50 ms of each other. This seems like a bug, should I raise a bug report against the BETA version? There are a number of concerns around starting the next fetch before the previous finishes. For example, preventing the behavior above (duplicate concurrent processing of the same message) or guaranteeing FIFO processing for each topic/partition a consumer is assigned. If this feature is added, I believe you would need to ensure the second fetch does not fetch from topic/partitions that are still being processed by the current unfinished fetch. This seems like it could be difficult to ensure. |
Please do. Especially if you can produce a failing test or at least a repro case for this. To be clear, we would never start a fetch for a partition for which we haven't processed all messages in the current batch yet, as partitions are the unit of concurrency in Kafka. However, there's theoretically nothing stopping us from issuing a new fetch request for partition 1 just because we haven't finished processing the messages for partition 2 yet. |
Hi all - are there any plans to work on this? Regarding workarounds:
|
workaround 2 gives me If I put each consumer in it's own consumer group then it works (although I prefer workaround 1) |
The Check whether you subscribed to the correct topics in all your consumers. |
I ran into this bug, with 1 kafka instance I get messages pretty much instantly (minBytes is 1), with 2 it hits the maxWaitTimeInMs |
I'm also hitting this bug. Does this need repro steps, or anything else I can do to prioritize fixing this? |
Steps to reproduce would be usefull.
I'm especially interested as to how the messages get produced to the
topic. E.g. are there multiple producers? How many messages are produced
at a time?
…On Mon, 30 Aug 2021, 16:35 Milan B, ***@***.***> wrote:
I'm also hitting this bug. Does this need repro steps, or anything else I
can do to prioritize fixing this?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#683 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDLW5VOK3WZ2OLUWQNQOSDT7M7GVANCNFSM4LXBQJZQ>
.
|
I'll try to work on a demo video and source code, but just to confirm - in my case, there is one producer (one Kafka Connect task/job) that sends to any number of partitions. The problem only happens if there are 2+ Kafka brokers and works correctly with 1 Kafka broker. |
Maybe the example added on issue opening is still valid. I've added a repository with a simple example to reproduce the issue (with docker-compose). https://github.com/hmagarotto/kafkajs-test-partition-parallelism This example shows that the fetch of the partitions is synchronous. So, when the processing of some partition is slow, it impacts the consumption of all other partitions. It happens even on a single broker. |
Actually, now testing with 1.15.0 and 1.16.0-beta.21 and I can reproduce the bug (or maybe a different bug?) with a single-node Kafka cluster. The code is pretty much the same as the one in the OP's repro repo - for clarity, pasting here. I will upload videos shortly.
|
Uploaded a repro video, in case there's any issue with getting the same results elsewhere: https://www.youtube.com/watch?v=4qr64l3Fg-8 (Please note it might take a few hours until youtube processes the higher resolution versions, until then it's in unreadable 360p.) To sum up the video:
|
This isn't a bug - it is the expected behaviour with the current implementation. The partitionsConsumedConcurrently feature does concurrent partition processing for the batches that are retrieved in each fetch loop. All processing still needs to complete before the next fetch request is issued. A feature request for a 'more concurrent' consumer makes complete sense. But it's a fairly substantial piece of work. |
It baffles me how you can claim this is Furthermore, this clearly goes against the Kafka architecture, where individual consumers processed concurrently, regardless in how many clients they are running. If I have 20 partitions and one node.js process, I expect it to process 20 partitions concurrently by default. |
* split runner into partitionsConsumedConcurrently promises * fetch batches independently in each concurrent task * clean up old barrier and concurrency logic
tulios#683 Improve concurrency
Per #1258 (comment), is this resolved? |
There's always more that could be done, but I would consider it resolved by #1258. The current state is that essentially each broker gets its own fetch loop, so now there's no dependency between the processing of partitions fetched from different brokers, but if you are fetching partition 1 and 2 from the same broker, you'll still be waiting for all both partitions to finish processing before fetching the next batch for those partitions, but it won't prevent fetching new data from other partitions. Increasing parallelism further would increase the chattiness towards the brokers. For example, say you finished processing partition 1 but are still working on partition 2, then you could make a new fetch request for just partition 1. But that would mean multiplying the number of fetch requests by the number of subscribed partitions, which is not how the API is meant to be used. It might be possible to come up with a design where you use some kind of buffer of fetched-but-unprocessed batches and try to cleverly refill the buffers in a way that doesn't require making 1 fetch request per partition, but it seems wildly complex and I'm not sure it would actually yield that much better results in the real world. |
When the processing of one partition is slow, all partitions are affected.
It's because the "fetch" for other partitions only is performed after the process completion for all partitions.
The "fetch" for all topics and partitions are synchronous.
So, it's not possible to use the partitions as a unit of parallelism.
Can we deal with this using some setting or using any different strategy?
Test with a slow process (exaggerated) on partition 0:
https://github.com/hmagarotto/kafkajs-test-partition-parallelism
The text was updated successfully, but these errors were encountered: