This is part of the How-To guide collection. This guide covers KNE troubleshooting techniques and common issues.
The first step in troubleshooting general issues is familiarizing yourself with
common kubectl
commands.
-
get pods/services
: Useful for determining basic state info -
describe pods
: Useful for more verbose state info -
logs
: Useful to get a dump of all pod logs
The -n <namespace>
flag is necessary to specify the namespace to inspect. In
KNE there are several namespaces:
- one namespace per topology
- one namespace for
meshnet
CNI - one namespace for
metallb
ingress - one namespace per vendor controller
default
kube namespace
For an exhaustive list use the -A
flag instead of -n
.
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
multivendor r1 0/1 ImagePullBackOff 0 44m
This is due to an issue with your cluster fetching container images. Follow the
container image access steps and then
delete/recreate the topology. To check which image is causing the issue, use
kubectl describe
command on the problematic pod.
After creating a topology, some pods may get stuck in an Init:0/1
state while
others may be stuck Pending
.
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
multivendor r1 0/1 Init:0/1 0 12m
multivendor r2 0/1 Pending 0 12m
...
You may also see logs for the init-container
on one of the Init:0/1
pods
that look like this:
$ kubectl logs r1 init-r1 -n multivendor
Waiting for all 2 interfaces to be connected
Connected 1 interfaces out of 2
Connected 1 interfaces out of 2
Connected 1 interfaces out of 2
...
The Pending pods may have an error like the following:
$ kubectl describe pod r2 -n multivendor
...
Events:
Type Reason Age From Message
Warning FailedScheduling 9s (x19 over 18m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
Fix To fix this issue you will need a VM instance with more vCPUs and memory. A machine with 16 vCPUs should be sufficient. Optionally you can deploy a smaller topology instead.
$ ssh 192.168.18.100
ssh: connect to host 192.168.18.100 port 22: Connection refused
Validate the pod is running:
$ kubectl get pod r1 -n multivendor
NAME READY STATUS RESTARTS AGE
r1 1/1 Running 0 4m47s
Validate service is exposed:
$ kubectl get services r1 -n multivendor
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service-r1 LoadBalancer 10.96.134.70 192.168.18.100 443:30001/TCP,22:30004/TCP,6030:30005/TCP 4m22s
Validate you can exec
on the container:
$ kubectl exec -it r1 -n multivendor -- Cli
Defaulted container "r1" out of: r1, init-r1 (init)
error: Internal error occurred: error executing command in
container: failed to exec in container: failed to start exec
"68abfa4f3742c86f49ec00dff629728d96e589c6848d5247e29d396365d6b697": OCI
runtime exec failed: exec failed: container_linux.go:370: starting container
process caused: exec: "Cli": executable file not found in $PATH: unknown
Fix the systemd path and cgroups directory and then delete/recreate topology.
If you see something similar to the following on a cptx
Juniper node:
$ kubectl exec -it r4 -n multivendor -- cli
Defaulted container "r4" out of: r4, init-r4 (init)
System is not yet ready...
then your host likely does not support nested virtualization. Run the following
to confirm, if the output is 0
then the host does not support nested
virtualization.
$ grep -cw vmx /proc/cpuinfo
0
Enable nested virtualization or move to a new machine that supports it. When done, run the following ensuring a non-zero output:
$ grep -cw vmx /proc/cpuinfo
16
Then delete your cluster and start again.