You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When deleting nodes from a k8s cluster we first drain the node leading to pods being evicted to other nodes of the cluster. However if this drain violates the pod disruption budge the drain will be stuck indefinitely as we do not use any timeout.
The pod disruption budged can be violated due to a unhealthy pod as well, where sometimes deleting the pod can help "unstuck" the eviction. Example below happened during a CI run, where manually deleting the unhealthy pod unstuck the eviction
There should be a timeout when draining the node so that we do not wait for it indefinitely. After the timeout we should check the logs of the output if there are issues with eviction. We then could verify if any of the pods of the deployment are unhealthy and try to restart them before retrying the drain on the node again.
Steps To Reproduce
Create k8s cluster
Deploy Pod disruption budged that would be violated when deleting a node from the k8s cluster
delete node from the k8s cluster
See eviction stuck indefinitely in kuber.
The text was updated successfully, but these errors were encountered:
There's a suspicion that the issue is not with PDB, but rather with the probes of coredns (or another issue with coredns). Will re-open with more details once we come across this issue again.
Current Behaviour
When deleting nodes from a k8s cluster we first drain the node leading to pods being evicted to other nodes of the cluster. However if this drain violates the pod disruption budge the drain will be stuck indefinitely as we do not use any timeout.
The pod disruption budged can be violated due to a unhealthy pod as well, where sometimes deleting the pod can help "unstuck" the eviction. Example below happened during a CI run, where manually deleting the unhealthy pod unstuck the eviction
Expected Behaviour
There should be a timeout when draining the node so that we do not wait for it indefinitely. After the timeout we should check the logs of the output if there are issues with eviction. We then could verify if any of the pods of the deployment are unhealthy and try to restart them before retrying the drain on the node again.
Steps To Reproduce
The text was updated successfully, but these errors were encountered: