Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health Check Just Stalls #21

Open
josh-padnick opened this issue Apr 30, 2018 · 0 comments
Open

Health Check Just Stalls #21

josh-padnick opened this issue Apr 30, 2018 · 0 comments

Comments

@josh-padnick
Copy link

First, thanks for creating this tool. It's a big improvement over a simple TCP listener check.

I'm running into a sporadic issue where, when I reboot my Kafka cluster, one of the nodes' kafka-health-check daemon will just stall without doing anything. Here's the only log output I see:

time="2018-04-30T09:43:33Z" level=info msg="using topic broker-2-health-check for broker 2 health check"
time="2018-04-30T09:43:33Z" level=info msg="using topic broker-2-health-check for broker 2 replication check"
time="2018-04-30T09:43:38Z" level=info msg="unable to connect to broker, retrying in 5s (cannot connect)"
time="2018-04-30T09:43:43Z" level=info msg="unable to connect to broker, retrying in 5s (cannot connect)"
time="2018-04-30T09:45:08Z" level=info msg="using topic broker-2-health-check for broker 2 health check"
time="2018-04-30T09:45:08Z" level=info msg="using topic broker-2-health-check for broker 2 replication check"

Note that the 09:45 time stamp is when I manually restarted my supervisord service that runs kafka-health-check.

This issue only occurs after I have an initially healthy cluster, and then being rolling out an update across the cluster. Notably, each replacement Kafka broker retains the same broker-id, so I'm wondering if that's what's tripping up kafka-health-check?

Here's the command I'm using to run it:

kafka-health-check -zookeeper 172.31.4.179:2181,172.31.25.146:2181,172.31.20.115:2181 -broker-port 9094 -broker-id 2 -broker-host 127.0.0.1

And of course this works fine on other Kafka brokers. If it's any help, here's my Kafka broker config:

broker.id=2
listeners=EXTERNAL://0.0.0.0:9092,INTERNAL://0.0.0.0:9093,HEALTHCHECK://127.0.0.1:9094
advertised.listeners=EXTERNAL://13.127.215.56:9092,INTERNAL://172.31.17.29:9093,HEALTHCHECK://127.0.0.1:9094
listener.security.protocol.map=EXTERNAL:PLAINTEXT,INTERNAL:PLAINTEXT,HEALTHCHECK:PLAINTEXT
inter.broker.listener.name=INTERNAL
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/opt/kafka/kafka-logs/data
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
min.insync.replicas=1
default.replication.factor=1
unclean.leader.election.enable=true
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=172.31.20.115:2181,172.31.4.179:2181,172.31.25.146:2181
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0

Any help is much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant