Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Burrow stops emitting metric after Kafka upgrade #827

Open
arushi315 opened this issue Aug 8, 2024 · 2 comments
Open

Burrow stops emitting metric after Kafka upgrade #827

arushi315 opened this issue Aug 8, 2024 · 2 comments

Comments

@arushi315
Copy link

Burrow Version: 1.8.0

Issue:
After upgrading Kafka from version 3.6.x to 3.7.x, we observed that the Burrow service stopped emitting the consumer lag metric. Restarting the Burrow service temporarily resolved the issue.

Logs:
The following warnings and errors were observed in the Burrow logs:

{"level":"error","ts":1720927697.84136,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":7}
.......
{"level":"warn","ts":1720927137.8406005,"msg":"error in OffsetResponse","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"kafka server: Tried to send a message to a replica that is not the leader for some partition. Your metadata is out of date","broker":3,"topic":"kafka-connect-offsets.internal","partition":4}

The Kafka upgrade was performed in a rolling fashion, one broker at a time. While communication disruptions were expected with the upgrading broker, others should have been available.

Burrow Configuration:
Here is the configuration we are using:

[client-profile.profile]
kafka-version = "3.6.1"

[cluster.local-cluster]
client-profile = "profile"
class-name = "kafka"
topic-refresh = 60
offset-refresh = 10
groups-reaper-refresh = 10

[consumer.local-kafka]
class-name = "kafka"
cluster = "local-cluster"

[consumer.local-kafka-zk]
class-name = "kafka_zk"
cluster = "local-cluster"

[httpserver.default]
address = "{{ $http_address }}"

[logging]
level = "{{ $log_level }}"

Note: The kafka-version is set to 3.6.1, but as mentioned earlier, Burrow works fine with Kafka 3.7.x after a restart, so this does not seem to be the root cause.

Request:

  • Has anyone encountered a similar issue, particularly after upgrading Kafka?
  • Are there any recommendations on how to troubleshoot this further?
  • Does anything in our configuration appear to be problematic?

Please let me know if additional information is required.
Thank you!

@arushi315
Copy link
Author

Looks like the issue is intermittent because I am not able to reproduce this when I am upgrading kafka.
During the upgrade metric does stop for a few but once upgrad has completed, it starts showing up again without having to restart burrow.

@arushi315
Copy link
Author

For the kafka cluster where we originally noticed the cluster, we have 9 brokers and observed EOF with all 9 brokers:

{"level":"error","ts":1720926917.8395112,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":1}
{"level":"error","ts":1720927037.8437417,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":2}
{"level":"error","ts":1720927147.84239,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":3}
{"level":"error","ts":1720927257.8391266,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":4}
{"level":"error","ts":1720927377.8412945,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":5}
{"level":"error","ts":1720927507.8451424,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":6}
{"level":"error","ts":1720927697.84136,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":7}
{"level":"error","ts":1720927887.8561947,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":8}
{"level":"error","ts":1720928077.8389344,"msg":"failed to fetch offsets from broker","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","sarama_error":"EOF","broker":9}
....
{"level":"error","ts":1720928077.8438833,"msg":"failed to get the list of available consumer groups","type":"module","coordinator":"cluster","class":"kafka","name":"local-cluster","error":"dial tcp 10.104.7.186:9092: connect: connection refused"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant