You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tolerationSeconds parameter allows you to specify how long a pod stays bound to a node that has a node condition. If the condition still exists after the tolerationSeconds period, the taint remains on the node and the pods with a matching toleration are evicted. If the condition clears before the tolerationSeconds period, pods with matching tolerations are not removed.
afaicu we don't explicitly set this tolerationSeconds value anywhere so it means each pod uses a default of 300s , resulting in workloads potentially taking more than five minutes to be rescheduled.
We are planning to document this and provide recommendations on how to accelerate the failover if required, still we thought it was worth bringing this up to see if we want to expose this parameter so that each operator can eventually set a more appropriate default value.
The text was updated successfully, but these errors were encountered:
While looking into nodes failure detection and pods failover behavior/performance we stumbled upon:
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
The tolerationSeconds parameter allows you to specify how long a pod stays bound to a node that has a node condition. If the condition still exists after the tolerationSeconds period, the taint remains on the node and the pods with a matching toleration are evicted. If the condition clears before the tolerationSeconds period, pods with matching tolerations are not removed.
afaicu we don't explicitly set this tolerationSeconds value anywhere so it means each pod uses a default of 300s , resulting in workloads potentially taking more than five minutes to be rescheduled.
We are planning to document this and provide recommendations on how to accelerate the failover if required, still we thought it was worth bringing this up to see if we want to expose this parameter so that each operator can eventually set a more appropriate default value.
The text was updated successfully, but these errors were encountered: