Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gardener-node-agent might end in a crash-loop in case of breaking changes affecting its own configuration #11025

Open
oliver-goetz opened this issue Dec 11, 2024 · 1 comment
Assignees
Labels
area/robustness Robustness, reliability, resilience related kind/bug Bug triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@oliver-goetz
Copy link
Member

How to categorize this issue?

/area robustness
/kind bug

What happened:
gardener-node-agent updates its own binary and its configuration. Usually the config changes are applied before the binary is updated because of their sequence in OperatingSystemConfig.

If there are breaking changes in gardener-node-agent (like adding a feature gate) it might end in a crash-loop in the following case.

  1. GNA saves its new configuration to disk. This configuration includes the activation of a new feature gate (NodeAgentAuthorizer in the concrete case).
  2. Pulling the new GNA version fails, so the previous version is still on the disk.
  3. GNA is restarted.

In this case the configuration of GNA already includes the feature gate parameter while the old GNA binary does not know it and refuses to start. Manual intervention is required to solve this problem.

What you expected to happen:
gardener-node-agent should be resilient in this update case.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
The issue could be solved by adding a version suffix to the GNA config files and let GNA load configs of its own version only.

Environment:

  • Gardener version: v1.109.0
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:
@gardener-prow gardener-prow bot added area/robustness Robustness, reliability, resilience related kind/bug Bug labels Dec 11, 2024
@LucaBernstein
Copy link
Member

/assign @timuthy
/triage accepted

@gardener-prow gardener-prow bot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/robustness Robustness, reliability, resilience related kind/bug Bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants