Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Eviction Timeout Field for Maintenance Time #122

Closed
wants to merge 4 commits into from

Conversation

razo7
Copy link
Member

@razo7 razo7 commented Feb 28, 2024

What this PR does / why we need it:

Adds a new Spec field for the pod's eviction timeout. Configurable pod's eviction drain/deletion timeout for the user, while previously it was set to 30s (the default value).

Changes made

  • Updating API with the new field
  • Using the new timeout field in the createDrainer function in reconcile and UTs.
  • Update nm sample
  • Update Readme

Which issue(s) this PR fixes:

ECOPROJECT-1668

Test plan

Add e2e test for different evicitonTimeout values - high (50s) vs. low (10s)

Configurable drain/deletion timeout for the user. ATM it was set to 30s
Copy link
Contributor

openshift-ci bot commented Feb 28, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Feb 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: razo7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@razo7
Copy link
Member Author

razo7 commented Feb 28, 2024

/test 4.15-openshift-e2e

@@ -229,7 +229,7 @@ func (r *NodeMaintenanceReconciler) Reconcile(ctx context.Context, req ctrl.Requ
}

// createDrainer creates a drain.Helper struct for external cordon and drain API
func createDrainer(ctx context.Context, mgrConfig *rest.Config) (*drain.Helper, error) {
func createDrainer(ctx context.Context, evicitonTimeout time.Duration, mgrConfig *rest.Config) (*drain.Helper, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo evicition

//The length of time to wait before giving up, zero means infinite
drainer.Timeout = DrainerTimeout
drainer.Timeout = evicitonTimeout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's going on with the old const now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant and can be removed. The old value (30s) is the default value of the evicitonTimeout field.

Copy link
Contributor

@clobrano clobrano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Just as reminder for this or next change, since it affects old code, but there are some occurrences of "maintanance", which should be corrected into "maintenance".

razo7 added 2 commits March 4, 2024 18:49
s/eviciton/evicition, s/maintanance/maintenance. Redundant space and const
Using 50s evicitonTimeout to demonstrate a normal maintenance that won't reach eviction timeout and 10s evicitonTimeout for maintenance that will reach the eviction timeout
@razo7
Copy link
Member Author

razo7 commented Mar 4, 2024

/test 4.15-openshift-e2e
There are more (possible) changes for the e2e testing, but it can be done afterward and mostly when there are maintenance events.

@razo7
Copy link
Member Author

razo7 commented Mar 5, 2024

/test 4.15-openshift-e2e

Copy link
Contributor

openshift-ci bot commented Mar 5, 2024

@razo7: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.15-openshift-e2e 410b51c link true /test 4.15-openshift-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@@ -58,7 +58,7 @@ var _ = Describe("Starting Maintenance", func() {
if controPlaneMaintenance == nil {
// do this once only
controlPlaneNode = controlPlaneNodes[0]
controPlaneMaintenance = getNodeMaintenance(fmt.Sprintf("test-1st-control-plane-%s", controlPlaneNode), controlPlaneNode)
controPlaneMaintenance = getNodeMaintenance(fmt.Sprintf("test-1st-control-plane-%s", controlPlaneNode), controlPlaneNode, evicitonTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo (multiple occurrences) evicitonTimeout -> evictionTimeout

DeferCleanup(Client.Delete, context.TODO(), nodeMaintenance)
})

When("evction timeout is low", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: evction -> eviction

// check operator log showing it reconciled with fixed duration because of drain timeout
// it should be caused by the test deployment's termination graceperiod > drain timeout
Expect(getOperatorLogs()).To(ContainSubstring(nodemaintenance.FixedDurationReconcileLog))
Expect(getOperatorLogs()).NotTo(ContainSubstring(nodemaintenance.FixedDurationReconcileLog))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment doesn't match

By("verify printed log")
// check operator log showing it reconciled with fixed duration because of drain timeout
// it should be caused by the test deployment's termination graceperiod > drain timeout
Expect(getOperatorLogs()).To(ContainSubstring(nodemaintenance.FixedDurationReconcileLog))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • comment doesn't match because you removed the termination graceperiod
  • how do you ensure the pod isn't evicted within the drain timeout without termination graceperiod?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment doesn't match because you removed the termination graceperiod

There is a default value of 30 seconds for the graceperiod when it isn't set

@razo7
Copy link
Member Author

razo7 commented Mar 7, 2024

Closing this PR as its goal is minor, it introduces a possible longer reconciliation (while we strive for a short reconcile), an API change, and the gain doesn't worth the effort

@razo7 razo7 closed this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants