Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeHealthCheck status is not updated when remediation CR is deleted by remediator #266

Open
aibarbetta opened this issue Nov 8, 2023 · 4 comments

Comments

@aibarbetta
Copy link

Hi all, I'm using NHC with a custom remediator. In some cases, my Kubernetes nodes are deleted, and as the documentation says here, my remediator will delete the remediation Custom Resource. The issue is that the NHC resource still shows these old remediations on its phase, reason, and inFlightRemediations:

    inFlightRemediations:
      yul1-r11-u14: "2023-11-07T21:53:04Z"
      yul1-r11-u15: "2023-11-07T02:49:42Z"
    observedNodes: 131
    phase: Remediating
    reason: NHC is remediating 2 nodes

this blocks all updates and deletion of the NHC resource, since the validating webhook thinks a remediation is still in progress and responds with:

admission webhook "vnodehealthcheck.kb.io" denied the request: selector update prohibited due to running remediation

am I missing a configuration to signal NHC of these deletions?

@slintes
Copy link
Member

slintes commented Nov 8, 2023

Hi @aibarbetta, thanks for reaching out.

Unfortunately that doc is outdated, sorry for that!
You don't need to delete the CR yourself anymore, but add a Condition to it, similar to the Succeeded condition. The condition type should be PermanentNodeDeletionExpected and its status True. You might want to use the const from here: https://github.com/medik8s/common/blob/main/pkg/conditions/conditions.go#L11.

I hope this helps! I will update the doc soonish 🙂

@slintes
Copy link
Member

slintes commented Nov 8, 2023

Just curious, what kind of remediator are you using? 🙂

@aibarbetta
Copy link
Author

Thanks @slintes! a NHC upgrade and setting that condition fixed the issue :)

Just curious, what kind of remediator are you using? 🙂

I needed something like self-node-remediation, but I run k8s on-premise with a custom storage provider, so I require some extra storage operations in the remediation. I also have requirements for notifying remediations in Slack and JIRA, running some IPMI commands, etc, so I decided to use NHC but integrated with a custom operator (written by me) with all that special remediation logic :)

@slintes
Copy link
Member

slintes commented Nov 20, 2023

Glad that it worked 🙂
Interesting remediator 👍🏼 Is it open source? Maybe we might want to add a "3rd party remediators" section to our docs... 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants