You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for building this check! I have some questions on some of the behavior I have been experiencing while using this tool and want to make sure I am using it properly.
Replication and partitions count is set to 1 on the broker-replication-check topic. This doesn't seem correct as it wouldn't be able to determine if all brokers were in the ISR.
I recently have encountered a Kafka outage and after I recovered the cluster, I had to restart the Kafka health check service on ALL of my brokers so that it would detect that the brokers were healthy again. This is most likely due to losing connection to the cluster/zookeeper, however in one of my environments where I am experiencing issues getting kafka-health-check to report a healthy cluster, I can see that it does make reconnect attempts.
INFO[0037] closing connection and reconnecting
INFO[0042] found partition id 1 for broker 0 in topic "broker-0-health-check"
INFO[0042] found partition id 2 for broker 0 in topic "broker-replication-check"
INFO[0042] reconnected
I am still unable to figure out why kafka-health-check will not report green on this cluster. I have recompiled the check with an increased timeout without any progress. This is on a fresh Kafka cluster with only the consumer_offsets partition. It will just report NOOK and continue in a loop as mentioned above.
Thank you!
The text was updated successfully, but these errors were encountered:
thanks for trying kafka-health-check :)
concerning the broker-replication-check topic, the replica set is expanded on broker health check start and shrunk on shutdown.
It's of course sub-par to have to restart the health-check on each and every node; can you provide details on the health-check output? Maybe that's because the health check topics are only auto-created on initial startup; if the vanish during runtime, the health-check will not re-create them.
As of why the cluster isn't reported as healthy, I'd assume that there still is some state in ZooKeeper since it finds partition id 2 in broker-replication-check. Can you give details on what JSON is returned by the health check endpoint / and the /cluster endpoint? This could give more insights on what bothers the health check.
Hi @andreas-schroeder ,
Thanks for building this check! I have some questions on some of the behavior I have been experiencing while using this tool and want to make sure I am using it properly.
Replication and partitions count is set to 1 on the broker-replication-check topic. This doesn't seem correct as it wouldn't be able to determine if all brokers were in the ISR.
I recently have encountered a Kafka outage and after I recovered the cluster, I had to restart the Kafka health check service on ALL of my brokers so that it would detect that the brokers were healthy again. This is most likely due to losing connection to the cluster/zookeeper, however in one of my environments where I am experiencing issues getting kafka-health-check to report a healthy cluster, I can see that it does make reconnect attempts.
I am still unable to figure out why kafka-health-check will not report green on this cluster. I have recompiled the check with an increased timeout without any progress. This is on a fresh Kafka cluster with only the consumer_offsets partition. It will just report NOOK and continue in a loop as mentioned above.
Thank you!
The text was updated successfully, but these errors were encountered: