Increase or make timeout on node drainage configurable in capi-migration-cli #3602

T-Kukawka · 2024-07-25T12:43:45Z

User Story

During migration, after creation of new worker nodes on the CAPA WC, we are draining old Vintage WC nodes.

Currently the timeout is configured for 5'. This brings issues to customer workloads, e.g. Java applications that need longer time to start. With this short timeout it might happen that the workloads are affected, where the PDBs after the timeout are not respected, bringing downtime in the 'slow' workloads.

Here is the snipped with the configuration from the tool itself.

func getNodeShutdownHelper(ctx context.Context, client kubernetes.Interface) drain.Helper {
	return drain.Helper{
		Ctx:                             ctx,             // pass the current context
		Client:                          client,          // the k8s client for making the API calls
		Force:                           true,            // forcing the draining
		GracePeriodSeconds:              60,              // 60 seconds of timeout before deleting the pod
		IgnoreAllDaemonSets:             true,            // ignore the daemonsets
		Timeout:                         5 * time.Minute, // give a 5 minutes timeout
		DeleteEmptyDirData:              true,            // delete all the emptyDir volumes
		DisableEviction:                 false,           // we want to evict and not delete. (might be different for the master nodes)
		SkipWaitForDeleteTimeoutSeconds: 15,              // in case a node is NotReady then the pods won't be deleted, so don't wait too long
		Out:                             os.Stdout,
		ErrOut:                          os.Stderr,
	}
}

Ideally the timeout setting would be configurable via the flag when triggering the migration. This would allow us to speed up migrations based on customer's setup. Optimal default could be then 15' if not configured.

If it is not possible to make it configurable, let's set timeout for 30' in order to ensure all workloads have time to move and start up.

Acceptance Criteria

Adjust migration cli tool to cater for the requirements about 'longer' timeout for nodes drainage

Implementation details

func getNodeShutdownHelper(ctx context.Context, client kubernetes.Interface) drain.Helper {
	return drain.Helper{
		Ctx:                             ctx,             // pass the current context
		Client:                          client,          // the k8s client for making the API calls
		Force:                           true,            // forcing the draining
		GracePeriodSeconds:              60,              // 60 seconds of timeout before deleting the pod
		IgnoreAllDaemonSets:             true,            // ignore the daemonsets
		Timeout:                         5 * time.Minute, // give a 5 minutes timeout
		DeleteEmptyDirData:              true,            // delete all the emptyDir volumes
		DisableEviction:                 false,           // we want to evict and not delete. (might be different for the master nodes)
		SkipWaitForDeleteTimeoutSeconds: 15,              // in case a node is NotReady then the pods won't be deleted, so don't wait too long
		Out:                             os.Stdout,
		ErrOut:                          os.Stderr,
	}
}

The text was updated successfully, but these errors were encountered:

mnitchev · 2024-07-30T07:16:28Z

Tested it out with this app https://github.com/mnitchev/delayed-start-app. Seems to do what we want

github-project-automation bot added this to Roadmap Jul 25, 2024

github-project-automation bot moved this to Inbox 📥 in Roadmap Jul 25, 2024

T-Kukawka added team/phoenix Team Phoenix provider/cluster-api-aws Cluster API based running on AWS capi/migration and removed provider/cluster-api-aws Cluster API based running on AWS labels Jul 25, 2024

mnitchev self-assigned this Jul 29, 2024

T-Kukawka closed this as completed Jul 30, 2024

github-project-automation bot moved this from Inbox 📥 to Done ✅ in Roadmap Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase or make timeout on node drainage configurable in capi-migration-cli #3602

Increase or make timeout on node drainage configurable in capi-migration-cli #3602

T-Kukawka commented Jul 25, 2024 •

edited

Loading

mnitchev commented Jul 30, 2024

Increase or make timeout on node drainage configurable in capi-migration-cli #3602

Increase or make timeout on node drainage configurable in capi-migration-cli #3602

Comments

T-Kukawka commented Jul 25, 2024 • edited Loading

User Story

Acceptance Criteria

Implementation details

mnitchev commented Jul 30, 2024

T-Kukawka commented Jul 25, 2024 •

edited

Loading