-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid Offsets Committed during a rebalance on shutdown #3710
Comments
Checking to see if there's anything more needed to help get to the bottom of this? This is happening daily in our environment. |
I believe this is the same issue as fixed by this PR: |
Awesome @edenhill thanks for the update :) Is there any ETA on getting that PR merged? Asking as it’s causing serious issues in our environment. |
It will be merged this week. The v1.9.0 release should be out in 2-3 weeks. You will probably need to change your application anyway to handle the new error returned from offsets_store(), and perhaps not even attempt offsets_store() for partitions that are no longer assigned - and you could do this now as a workaround on the current version. |
Ah so the confluent-kafka-go lib still needs to be updated to handle this new error 👍 TY for the quick response. |
The go client will not need updating, but your application will. |
Maybe that's a separate issue issue in the confluent-kafka-go library then. Because we see it most often when pushing a new update to our application which does a rolling restart. The confluent-kafka-go client, on Good to know that with the PR that the other application will not be able to cause the issue, but curious the other application could pick the currently assigned partition up. |
The issue was merged and released in v1.9.0. Closing the issue. Please raise a new issue if you face any problem. |
Description
Invalid Offsets committed for a partition when the partition has been
postponed
while a rebalance is in progress while using theEAGER
load load balancing strategy.This is causing a single partitions offsets to be either committed out of range or committed back in time to a previous offset. this is causing our consumers to re-process millions of messages that were already processed.
We are using kafka-client-go version 1.8.2. for reference.
See details below about
EAGER
vsCOOPERATIVE
rebalance strategies but we switched to usingCOOPERATIVE
and it happens far less than withEAGER
however it still does happen daily.How to reproduce
When a consumer in a consumer group shuts down it causes a chain of events, at times, that commits the incorrect offsets for a partition.
Checklist
Please provide the following information:
librdkafka version:
v1.8.2
Apache Kafka version:
2.6.1
librdkafka client configuration:
Operating system:
Debian Linux (x64)>
Provide logs (with
debug=..
as necessary) from librdkafkaProvide broker log excerpts
Critical issue
The first line shows that the correct offsets were fetched but the old versions offset (-1) is what was stored.
The stored offset
411018846
is what was committed instead of411193513
which was the previously committed offset.Full debug logs from a single consumer
The text was updated successfully, but these errors were encountered: