You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During migration, after creation of new worker nodes on the CAPA WC, we are draining old Vintage WC nodes.
Currently the timeout is configured for 5'. This brings issues to customer workloads, e.g. Java applications that need longer time to start. With this short timeout it might happen that the workloads are affected, where the PDBs after the timeout are not respected, bringing downtime in the 'slow' workloads.
Here is the snipped with the configuration from the tool itself.
func getNodeShutdownHelper(ctx context.Context, client kubernetes.Interface) drain.Helper {
return drain.Helper{
Ctx: ctx, // pass the current context
Client: client, // the k8s client for making the API calls
Force: true, // forcing the draining
GracePeriodSeconds: 60, // 60 seconds of timeout before deleting the pod
IgnoreAllDaemonSets: true, // ignore the daemonsets
Timeout: 5 * time.Minute, // give a 5 minutes timeout
DeleteEmptyDirData: true, // delete all the emptyDir volumes
DisableEviction: false, // we want to evict and not delete. (might be different for the master nodes)
SkipWaitForDeleteTimeoutSeconds: 15, // in case a node is NotReady then the pods won't be deleted, so don't wait too long
Out: os.Stdout,
ErrOut: os.Stderr,
}
}
Ideally the timeout setting would be configurable via the flag when triggering the migration. This would allow us to speed up migrations based on customer's setup. Optimal default could be then 15' if not configured.
If it is not possible to make it configurable, let's set timeout for 30' in order to ensure all workloads have time to move and start up.
Acceptance Criteria
Adjust migration cli tool to cater for the requirements about 'longer' timeout for nodes drainage
Implementation details
func getNodeShutdownHelper(ctx context.Context, client kubernetes.Interface) drain.Helper {
return drain.Helper{
Ctx: ctx, // pass the current context
Client: client, // the k8s client for making the API calls
Force: true, // forcing the draining
GracePeriodSeconds: 60, // 60 seconds of timeout before deleting the pod
IgnoreAllDaemonSets: true, // ignore the daemonsets
Timeout: 5 * time.Minute, // give a 5 minutes timeout
DeleteEmptyDirData: true, // delete all the emptyDir volumes
DisableEviction: false, // we want to evict and not delete. (might be different for the master nodes)
SkipWaitForDeleteTimeoutSeconds: 15, // in case a node is NotReady then the pods won't be deleted, so don't wait too long
Out: os.Stdout,
ErrOut: os.Stderr,
}
}
The text was updated successfully, but these errors were encountered:
User Story
During migration, after creation of new worker nodes on the CAPA WC, we are draining old Vintage WC nodes.
Currently the timeout is configured for 5'. This brings issues to customer workloads, e.g. Java applications that need longer time to start. With this short timeout it might happen that the workloads are affected, where the PDBs after the timeout are not respected, bringing downtime in the 'slow' workloads.
Here is the snipped with the configuration from the tool itself.
Ideally the timeout setting would be configurable via the flag when triggering the migration. This would allow us to speed up migrations based on customer's setup. Optimal default could be then 15' if not configured.
If it is not possible to make it configurable, let's set timeout for 30' in order to ensure all workloads have time to move and start up.
Acceptance Criteria
Implementation details
The text was updated successfully, but these errors were encountered: