-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report]: RetryForever stuck after partitions rebalance #151
Comments
Hello @SonicGD The issue lies with the revoking handling code. The service responsible for pausing/resuming the consumer is being stopped before the retry loop has a chance to resume the consumer. Effectively not doing anything that actually resumes the consumer. I've opened an issue there where you can read a bit more in detail and I've also opened a PR with a fix. |
Also, if you'd like to see the problem manifesting in you repo, change your handler to the following code:
It prints the number of paused partitions. You will see it print 0 right before the last retry of the loop, this is when the internal list of topic+partitions was cleared, and why the resume doesn't work.
|
Thank you so much! This issue caused us many problems. Our current fix is to use custom rerty middleware, which stores paused partitions list and assignment handler to restart consumer if those paused partitions are assigned again. Will wait for your fix to be merged :) |
Prerequisites
Description
Hello. We are using RetryForeverMiddleware in our project. And we experienced this strange behavior. If there is partitions rebalance ( other worker join/leave group ) while message is in retry loop - the loop will stop and processing of partition will not continue.
I'm not sure that this is retry middleware problem, maybe it caused by KafkaFlow/Confluent.Kafka/librdkafka. But let's start here =)
Steps to reproduce
docker-compose.yml
from sample repo to start kafka and zookeeper.bin/Debug/net7.0
directoryAnd nothing happening. We can see that there is lag for partition:
But no attempts to continue processing again.
If we restart this stuck consumer - then processing will begin:
Expected behavior
After rebalance is complete consumer should again start to process "bad" message.
Actual behavior
Consumer is stuck, processing is stopped
KafkaFlow Retry Extensions version
3.0.1
The text was updated successfully, but these errors were encountered: