You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Talos integration tests, the tests do a series of (pretty frequent) reboots, specifically for the tests which run with encrypted volumes, as the encryption config changes require a reboot.
This got specifically triggered by the test in #9834, which adds two more reboots, and due to the order of the tests, comes right after encryption tests.
The issue is somewhat random and pops up as the test times out on the cluster health check with the error that number of read kube-proxy pods doesn't reach the desired value (3 out of 4).
Analysis
When the node goes into a reboot cycle, Talos instructs the kubelet to do a graceful shutdown, which terminates the pods, including DaemonSet pods. There is a bit of a race with kube-scheduler there, but in the end there will be a pod in the phase Failed because kubelet denies new pods to be run as it is itself in the graceful shutdown phase.
status:
conditions:
- lastProbeTime: nulllastTransitionTime: "2024-12-03T14:45:34Z"message: Pod was terminated in response to imminent node shutdown.reason: TerminationByKubeletstatus: "True"type: DisruptionTarget
As the machine comes back up after a reboot, an existing pod in the Failed state prevents a new pod to be scheduled for the node for some time.
That cleanup has a backoff introduced in kubernetes/kubernetes#65309, to fight other issues related to misconfigured pods.
But after a series of reboots a failed pod cleanup will be delayed due to the backoff mechanism long enough for Talos cluster health checks to fail:
I1203 15:12:52.834783 1 daemon_controller.go:813] "Deleting failed pod on node has been limited by backoff" logger="daemonset-controller" pod="kube-system/kube-flannel-zftvs" node="talos-default-worker-1" currentDelay="4m16s"
The more the rate of reboots is, the issue will pop up more often.
Adding more worker nodes to the tests. We have just one, and that bothers me. I would prefer to have at least two. As the backoff key depends on the nodeName, this would give us roughly twice the reboot rate.
The text was updated successfully, but these errors were encountered:
In Talos integration tests, the tests do a series of (pretty frequent) reboots, specifically for the tests which run with encrypted volumes, as the encryption config changes require a reboot.
This got specifically triggered by the test in #9834, which adds two more reboots, and due to the order of the tests, comes right after encryption tests.
The issue is somewhat random and pops up as the test times out on the cluster health check with the error that number of read
kube-proxy
pods doesn't reach the desired value (3 out of 4).Analysis
When the node goes into a reboot cycle, Talos instructs the
kubelet
to do a graceful shutdown, which terminates the pods, includingDaemonSet
pods. There is a bit of a race withkube-scheduler
there, but in the end there will be a pod in the phaseFailed
becausekubelet
denies new pods to be run as it is itself in the graceful shutdown phase.As the machine comes back up after a reboot, an existing pod in the
Failed
state prevents a new pod to be scheduled for the node for some time.The
Failed
pods are supposed to be cleaned up by theDaemonSetsControllers
in thekube-controller-manager
: https://github.com/kubernetes/kubernetes/blob/8046362e6ff74ee18776e0cdb90ead62c577d607/pkg/controller/daemon/daemon_controller.go#L804-L826That cleanup has a backoff introduced in kubernetes/kubernetes#65309, to fight other issues related to misconfigured pods.
But after a series of reboots a failed pod cleanup will be delayed due to the backoff mechanism long enough for Talos cluster health checks to fail:
The more the rate of reboots is, the issue will pop up more often.
Solutions
It looks like backoff is hardcoded and can't be removed/reconfigured via any options: https://github.com/kubernetes/kubernetes/blob/8046362e6ff74ee18776e0cdb90ead62c577d607/cmd/kube-controller-manager/app/apps.go#L51-L52
kube-controller-manager
on reboots (e.g. on non-controlplane reboots). It should help, as backoff is in memory.nodeName
, this would give us roughly twice the reboot rate.The text was updated successfully, but these errors were encountered: