-
Notifications
You must be signed in to change notification settings - Fork 0
Debugging FAQ
Tips that may help you debug why Kubernetes isn't working.
Of course, also take a look at the documentation, especially the getting-started guides.
When asking for help, please indicate your hosting platform (GCE, Vagrant, etc.) and OS distribution (Debian, CoreOS, Fedora, etc.).
Depending on the Linux distribution, the logs of system components, including Docker, will be in /var/log or /tmp, or can be accessed using journalctl on systemd-based systems, such as Fedora, RHEL7, or CoreOS. Salt logs on minions are in /var/log/salt/minion.
If you don't see much useful in the logs, you could try turning on verbose logging on the Kubernetes component you suspect has a problem. See https://github.com/golang/glog for more details.
You can see what containers have been created on a node using docker ps -a
.
- Ensure all backend components are running
- on master: apiserver, controller, scheduler, etcd
- IMPORTANT: Some older turnup instructions don't include the scheduler. Ensure the scheduler is running on the master host.
- on nodes: proxy, kubelet, docker
- on master: apiserver, controller, scheduler, etcd
- Ensure all k8s components have --etcd_servers set correctly on the command line (if it isn't, you should see error messages in their logs)
- If it's not set, your networking setup may be broken, since it is usually initialized from the IP address of kubernetes-master, such as in cluster/saltbase/salt/apiserver/default
-
dev-build-and-up.sh
waits for ever atWaiting for cluster initialization
- Try
cluster/kube-down.sh
andhack/dev-build-and-up.sh
again- If it still hangs, ctrl-c and try
hack/dev-build-and-push.sh
- Check whether all the VMs exist -- typically one master VM and N minions
- If so, check whether you can ssh into them
- Check serial console output, if available
- If it still hangs, ctrl-c and try
- If it still doesn't work, see provider-specific issues below
- Try
-
dev-build-and-up.sh
reportsDocker failed to install on kubernetes-minion-1
- Verify that you can ssh into the minions
- Check /var/log/salt/minion to see what part of the installation failed
- kubecfg cannot reach apiserver
- Ensure KUBERNETES_MASTER or KUBE_MASTER_IP is set, or use -h
- Ensure apiserver is running
- Check that the process is running on the master
- Check its logs
- You were able to create a
replicationController
but see no pods- The replication controller didn't create the pods. Check that the controller is running, and look at its logs.
- kubecfg hangs forever or a pod is in state
Waiting
forever- Check whether hosts are being assigned to your pods. If not, then they aren't being scheduled.
- Ensure kubelet is looking in the right place in etcd for its pods. If you see something like
DEBUG: get /registry/hosts/127.0.0.1/kubelet
in a kubelet's logs, then check whether the apiserver is using the same name or IP for that minion. If not, check the value of the --hostname_override command-line flag on kubelet. - It could also be that the image fetch is not working. Check Docker logs.
- apiserver reports
Error synchronizing container: Get http://:10250/podInfo?podID=foo: dial tcp :10250: connection refused
- Just means that pod foo has not yet been scheduled (see #1285)
- Check whether the scheduler is running properly
- If the scheduler is running, possibly no minion addresses were passed to the apiserver using
--machines
(seehack/local-cluster-up.sh
for an example)
- Cannot connect to the container
- Try to telnet to the minion at its service port, and/or to the pod's IP and port
- Check whether the container has been created in Docker:
sudo docker ps -a
- If you don't see the container, there could be a problem with the pod configuration, image, Docker, or Kubelet
- If you see containers created every 10 seconds, then container creation is failing or the container's process is failing
- Why does PUT return
{"kind":"Status","creationTimestamp":null,"apiVersion":"v1beta1","status":"failure","message":"replicationController \"fooController\" cannot be updated: 105: Key already exists (/registry/controllers/fooController) [25464]","reason":"conflict","details":{"id":"fooController","kind":"replicationController"},"code":409}
?- We use
resourceVersion
for optimistic concurrency. The value assigned by the system at the last mutation of the object needs to be provided when performing an update, in order to prevent accidentally clobbering another update. kubecfg achieves this by doing a GET of the object, extracting the resourceVersion, and inserting it into the json of the PUT, which defeats the purpose of the concurrency control, but works for single-user scenarios.
- We use
make clean
or
rm -rf Godeps/_workspace/pkg output _output
TODO
TODO
- Ensure you can ssh to an instance, which may require enabling billing and/or creating an ssh key. Create an instance if you don't have one, then use
gcutil ssh
to ssh into it. -
gcutil listfirewalls ; gcutil getfirewall default-ssh
- If
default-ssh
doesn't exist, dogcutil addfirewall --description "SSH allowed from anywhere" --allowed=tcp:22 default-ssh
- If
gcutil listnetworks