Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Events for the Maintenance Process #113

Merged
merged 5 commits into from
Mar 24, 2024

Conversation

razo7
Copy link
Member

@razo7 razo7 commented Jan 25, 2024

  • Add maintenance events and functions for verifying events' existence.
  • Fail Maintenance on Missing Node
  • Use events for verification of the maintenance flow in e2e

ECOPROJECT-1745
ECOPROJECT-1898

Copy link
Contributor

openshift-ci bot commented Jan 25, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Jan 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: razo7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@razo7 razo7 force-pushed the events-maintenance branch from 5514278 to 673b6dc Compare January 25, 2024 16:17
@razo7 razo7 force-pushed the events-maintenance branch from 673b6dc to add8cb5 Compare January 28, 2024 17:25
@razo7 razo7 changed the title Add Events Add Events for the Maintenance Process Jan 28, 2024
Copy link
Contributor

@clobrano clobrano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments.
Moreover, could you rephrase the commit message regarding the test refactoring? It is a huge change and the explaination is not clear enough IMHO

controllers/nodemaintenance_controller.go Outdated Show resolved Hide resolved
controllers/nodemaintenance_controller.go Outdated Show resolved Hide resolved
@razo7 razo7 force-pushed the events-maintenance branch 2 times, most recently from d826ba1 to 6ec0366 Compare January 29, 2024 14:58
@razo7
Copy link
Member Author

razo7 commented Jan 30, 2024

/test 4.15-openshift-e2e

@razo7
Copy link
Member Author

razo7 commented Feb 7, 2024

/hold
The PR would be better reviewed in smaller chunks, so I am moving the refactoring of unit-test and reconciling to different PRs, and keeping the current one for adding the events (needs the unit-test refactoring).

The events will help to identify the maintenance process by looking on nm CR and node events.
@razo7
Copy link
Member Author

razo7 commented Feb 28, 2024

After rebasing it is ready for review

Check if nodeName represent an existing node in the cluster. If it doesn't fail the maintenance process and emit an event
@razo7 razo7 force-pushed the events-maintenance branch from 8f4f06a to 249a327 Compare February 28, 2024 15:59
@razo7
Copy link
Member Author

razo7 commented Feb 28, 2024

/test 4.15-openshift-e2e

@razo7 razo7 force-pushed the events-maintenance branch from cc79551 to 813a86a Compare March 13, 2024 15:35
@razo7
Copy link
Member Author

razo7 commented Mar 13, 2024

/test 4.15-openshift-e2e

@clobrano
Copy link
Contributor

Why are we not using common api for events?

@razo7
Copy link
Member Author

razo7 commented Mar 13, 2024

Why are we not using common api for events?

Because it is kind of meant for remediation events (include [remediation] in the message), and here is for maintenance events.
We could modify that in common, but it will require some adaptations in the remediator repos afterward. I didn't think it was worth the effort but if you disagree then I would recommend first generalizing the events common API.

@clobrano
Copy link
Contributor

(include [remediation] in the message)

I understand your point. The idea of the [remediation] tag was to allow filter and collect all the messages from our operators (it was [medik8s] at the beginning), but I understand that can cause misunderstanding now.
We might need to reflect on a better tag 🤔

@razo7
Copy link
Member Author

razo7 commented Mar 13, 2024

We might need to reflect on a better tag 🤔

I agree. Something for a follow up :)

@slintes
Copy link
Member

slintes commented Mar 13, 2024

We might need to reflect on a better tag 🤔

I agree. Something for a follow up :)

I don't think so. The idea was to filter remediation events easily indeed, not medik8s

@clobrano
Copy link
Contributor

I don't think so. The idea was to filter remediation events easily indeed, not medik8s

Then I misunderstood

@mshitrit
Copy link
Member

/lgtm
/hold
Giving a chance for others to add more comments.
Feel free to unhold.

Copy link
Contributor

@clobrano clobrano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit

controllers/nodemaintenance_controller.go Outdated Show resolved Hide resolved
@openshift-ci openshift-ci bot removed the lgtm label Mar 20, 2024
@razo7
Copy link
Member Author

razo7 commented Mar 20, 2024

/test 4.15-openshift-e2e

@razo7 razo7 force-pushed the events-maintenance branch from 1547cad to ffdafa7 Compare March 20, 2024 14:14
@razo7
Copy link
Member Author

razo7 commented Mar 20, 2024

/test 4.15-openshift-e2e

The error message is emitted from drainer.Client.CoreV1().Nodes().Get and fetchNode function, then it propagated to onReconcileError. Thus a better var name could be used
@razo7 razo7 force-pushed the events-maintenance branch from ffdafa7 to 4583f50 Compare March 20, 2024 14:50
@razo7
Copy link
Member Author

razo7 commented Mar 21, 2024

/test 4.15-openshift-e2e

@clobrano
Copy link
Contributor

/lgtm
Giving a chance to get-more/close-pending reviews
/hold

@openshift-ci openshift-ci bot added the lgtm label Mar 21, 2024
@razo7 razo7 marked this pull request as ready for review March 21, 2024 07:24
@openshift-ci openshift-ci bot requested review from clobrano and mshitrit March 21, 2024 07:24
@razo7
Copy link
Member Author

razo7 commented Mar 21, 2024

/retest

@razo7
Copy link
Member Author

razo7 commented Mar 24, 2024

No pending reviews or new comments, so I am upholding
/unhold

@openshift-merge-bot openshift-merge-bot bot merged commit 70c6a5a into medik8s:main Mar 24, 2024
14 checks passed
@razo7 razo7 mentioned this pull request Mar 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants