Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase or make timeout on node drainage configurable in capi-migration-cli #3602

Closed
1 task
T-Kukawka opened this issue Jul 25, 2024 · 1 comment
Closed
1 task
Assignees

Comments

@T-Kukawka
Copy link
Contributor

T-Kukawka commented Jul 25, 2024

User Story

During migration, after creation of new worker nodes on the CAPA WC, we are draining old Vintage WC nodes.

Currently the timeout is configured for 5'. This brings issues to customer workloads, e.g. Java applications that need longer time to start. With this short timeout it might happen that the workloads are affected, where the PDBs after the timeout are not respected, bringing downtime in the 'slow' workloads.

Here is the snipped with the configuration from the tool itself.

func getNodeShutdownHelper(ctx context.Context, client kubernetes.Interface) drain.Helper {
	return drain.Helper{
		Ctx:                             ctx,             // pass the current context
		Client:                          client,          // the k8s client for making the API calls
		Force:                           true,            // forcing the draining
		GracePeriodSeconds:              60,              // 60 seconds of timeout before deleting the pod
		IgnoreAllDaemonSets:             true,            // ignore the daemonsets
		Timeout:                         5 * time.Minute, // give a 5 minutes timeout
		DeleteEmptyDirData:              true,            // delete all the emptyDir volumes
		DisableEviction:                 false,           // we want to evict and not delete. (might be different for the master nodes)
		SkipWaitForDeleteTimeoutSeconds: 15,              // in case a node is NotReady then the pods won't be deleted, so don't wait too long
		Out:                             os.Stdout,
		ErrOut:                          os.Stderr,
	}
}

Ideally the timeout setting would be configurable via the flag when triggering the migration. This would allow us to speed up migrations based on customer's setup. Optimal default could be then 15' if not configured.

If it is not possible to make it configurable, let's set timeout for 30' in order to ensure all workloads have time to move and start up.

Acceptance Criteria

  • Adjust migration cli tool to cater for the requirements about 'longer' timeout for nodes drainage

Implementation details

func getNodeShutdownHelper(ctx context.Context, client kubernetes.Interface) drain.Helper {
	return drain.Helper{
		Ctx:                             ctx,             // pass the current context
		Client:                          client,          // the k8s client for making the API calls
		Force:                           true,            // forcing the draining
		GracePeriodSeconds:              60,              // 60 seconds of timeout before deleting the pod
		IgnoreAllDaemonSets:             true,            // ignore the daemonsets
		Timeout:                         5 * time.Minute, // give a 5 minutes timeout
		DeleteEmptyDirData:              true,            // delete all the emptyDir volumes
		DisableEviction:                 false,           // we want to evict and not delete. (might be different for the master nodes)
		SkipWaitForDeleteTimeoutSeconds: 15,              // in case a node is NotReady then the pods won't be deleted, so don't wait too long
		Out:                             os.Stdout,
		ErrOut:                          os.Stderr,
	}
}
@github-project-automation github-project-automation bot moved this to Inbox 📥 in Roadmap Jul 25, 2024
@T-Kukawka T-Kukawka added team/phoenix Team Phoenix provider/cluster-api-aws Cluster API based running on AWS capi/migration and removed provider/cluster-api-aws Cluster API based running on AWS labels Jul 25, 2024
@mnitchev mnitchev self-assigned this Jul 29, 2024
@mnitchev
Copy link
Member

Tested it out with this app https://github.com/mnitchev/delayed-start-app. Seems to do what we want

@github-project-automation github-project-automation bot moved this from Inbox 📥 to Done ✅ in Roadmap Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

2 participants