Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Route to Host on StatefulSet Update #75

Closed
stevenpall opened this issue Oct 13, 2017 · 6 comments
Closed

No Route to Host on StatefulSet Update #75

stevenpall opened this issue Oct 13, 2017 · 6 comments

Comments

@stevenpall
Copy link

Hi there,

I've noticed that when I update the kafka StatefulSet configuration and do a rolling deploy, some or all of the brokers get into a bad state in which they appear to be unable to reach the network. These are the errors I'm seeing:

[2017-10-13 19:24:36,928] INFO Opening socket connection to server zookeeper.kafka.svc.cluster.local/100.64.135.230:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2017-10-13 19:24:38,051] WARN Session 0x15f1678eb8a0001 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.net.NoRouteToHostException: No route to host
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)

Anyone have an idea of what's going on here?

Thanks!

@solsson
Copy link
Contributor

solsson commented Oct 14, 2017

What kind of k8s cluster is it you're using? Do you mean the kind of rolling update that was new to StatefulSet in 1.7?

Have you tried execing into the failing brokers to try ping/curl, or do they die too fast for that?

@stevenpall
Copy link
Author

@solsson This is a private topology AWS cluster. And yes, I mean the rolling update mechanism introduced for StatefulSets in 1.7. That said, I was also seeing this behaviour when I was previously manually deleting pods.

I have execd into the pods, and they are unable to reach any external addresses (including intra-cluster) which is very odd to me. I also checked that other pods on the same node could still reach the network, and they could. This is also not isolated to a single machine which makes me think there is either a bug with something this Kafka image is doing or with the way in which Kubernetes is handing the associated StatefulSets.

@solsson
Copy link
Contributor

solsson commented Oct 14, 2017

I have no clue but I can keep guessing :) Can they ping cluster IPs, external IPs? I've seen issues with Kubernetes networking, in particular DNS resolution, on Alpine based images (mentioned in #46 and solsson/dockerfiles#5). What kind of Kafka image are you using?

@stevenpall
Copy link
Author

Sorry, I think this might actually be related to the CNI provider I'm using (Calico). I found this chain of related issues/PRs: kubernetes/kops#2538. I'll test with the updated version of Calico and report back.

Thanks!

@stevenpall
Copy link
Author

stevenpall commented Oct 16, 2017

Got this figured out. As those issues mention, there was a race condition in the Calico policy controller that would add and then remove new pod endpoints (when the opposite was supposed to happen). I updated to calico/node:v2.6.1 and calico/cni:v1.8.3 in the calico-node daemonset and to calico/kube-policy-controller:master in calico-policy-controller deployment. Since then I've had no issues with running rolling restarts of Kafka and Zookeeper StatefulSets.

@eskuai
Copy link

eskuai commented Feb 25, 2019

I am using weave and i got the same problem
ports are open 2888 and 3888
i am trying usin a hs service, and the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants