Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on functionality #18

Open
lconnell opened this issue Nov 3, 2017 · 1 comment
Open

Questions on functionality #18

lconnell opened this issue Nov 3, 2017 · 1 comment

Comments

@lconnell
Copy link

lconnell commented Nov 3, 2017

Hi @andreas-schroeder ,

Thanks for building this check! I have some questions on some of the behavior I have been experiencing while using this tool and want to make sure I am using it properly.

Replication and partitions count is set to 1 on the broker-replication-check topic. This doesn't seem correct as it wouldn't be able to determine if all brokers were in the ISR.

I recently have encountered a Kafka outage and after I recovered the cluster, I had to restart the Kafka health check service on ALL of my brokers so that it would detect that the brokers were healthy again. This is most likely due to losing connection to the cluster/zookeeper, however in one of my environments where I am experiencing issues getting kafka-health-check to report a healthy cluster, I can see that it does make reconnect attempts.

INFO[0037] closing connection and reconnecting         
 
INFO[0042] found partition id 1 for broker 0 in topic "broker-0-health-check" 

INFO[0042] found partition id 2 for broker 0 in topic "broker-replication-check"
 
INFO[0042] reconnected

I am still unable to figure out why kafka-health-check will not report green on this cluster. I have recompiled the check with an increased timeout without any progress. This is on a fresh Kafka cluster with only the consumer_offsets partition. It will just report NOOK and continue in a loop as mentioned above.

Thank you!

@andreas-schroeder
Copy link
Owner

andreas-schroeder commented Nov 13, 2017

Hi @lconnell ,

thanks for trying kafka-health-check :)
concerning the broker-replication-check topic, the replica set is expanded on broker health check start and shrunk on shutdown.

It's of course sub-par to have to restart the health-check on each and every node; can you provide details on the health-check output? Maybe that's because the health check topics are only auto-created on initial startup; if the vanish during runtime, the health-check will not re-create them.

As of why the cluster isn't reported as healthy, I'd assume that there still is some state in ZooKeeper since it finds partition id 2 in broker-replication-check. Can you give details on what JSON is returned by the health check endpoint / and the /cluster endpoint? This could give more insights on what bothers the health check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants