Add cluster-wide in-place upgrade proposal #55

HomayoonAlimohammadi · 2024-09-11T12:46:36Z

This PR adds the cluster-wide in-place upgrade proposal, a folllow-up on the original in-place upgrade proposal and should be merged after that.

berkayoz

Great work and thank you for picking this up, on some previous discussions we've discussed the need for phases/steps for upgrades where control-plane nodes and worker-nodes can be upgraded separately.

For this I believe we can utilize the CK8sControlPlane and MachineDeployment objects and check the annotations on these objects. This would mean creating 2 reconcilers. The phasing/step control would be left to the user as in user can choose to upgrade annotate the CK8sControlPlane first and the MachineDeployments later. What do you think?

HomayoonAlimohammadi · 2024-09-11T13:56:27Z

Thanks a lot for the suggestion @berkayoz!
So IIUC, you're suggesting that instead of applying the UpgradeTo annotation to the cluster object, we apply it to the Ck8sControlPlane and MachineDeployment objects (exactly like when we're doing a rolling upgrade and changing the version field on those manifests) so that the controlplane and bootstrap reconcilers can take care of this?
If I got that correctly, I'm totally up for it. If we have another confirmation, maybe from @bschimke95, I'll go on and make the changes.

berkayoz · 2024-09-11T14:02:34Z

Yes exactly @HomayoonAlimohammadi, we would annotate those objects. We would need to update the existing controlplane reconciler and create a new MachineDeployment reconciler to act on these annotations.

bschimke95 · 2024-09-12T07:42:25Z

+1 This sounds reasonable. IIRC, there also was the idea to still support annotating the Cluster object but this would basically just be a convenience and would be passed through to the Ck8sControlPlane and MachineDeployments respectively.

docs/proposals/002-cluster-controller.md

bschimke95

LGTM, great work!
Congrats to your first proposal.

berkayoz

Amazing work! Thank you for picking this up.
Some points of consideration:

We will be getting/creating a list of machines on the reconcilers, we should make sure these are ordered and we should also skip machines that have a release annotation set to our upgrade option which means the upgrade has been performed there(unless the option is local path).
We should add events to the resources to indicate which machine is being worked on, which machine had a failed upgrade etc.
If we are not gonna retry the operation we should also make sure to remove the upgrade-to annotation from the machine that failed since upgrade retries are done on the machine reconciler.

Add cluster-wide in-place upgrade proposal

ef68d20

HomayoonAlimohammadi requested a review from a team as a code owner September 11, 2024 12:46

HomayoonAlimohammadi mentioned this pull request Sep 11, 2024

Add in-place upgrade proposal #30

Merged

berkayoz reviewed Sep 11, 2024

View reviewed changes

Update proposal with comments and suggestions

02a2414

bschimke95 reviewed Sep 13, 2024

View reviewed changes

docs/proposals/002-cluster-controller.md Outdated Show resolved Hide resolved

docs/proposals/002-cluster-controller.md Show resolved Hide resolved

docs/proposals/002-cluster-controller.md Outdated Show resolved Hide resolved

Address comments

87cf751

bschimke95 approved these changes Sep 16, 2024

View reviewed changes

Mark as accepted

39ecc83

berkayoz approved these changes Sep 16, 2024

View reviewed changes

Add further implementation guidance

1a131d5

HomayoonAlimohammadi merged commit fe015b7 into main Sep 16, 2024
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cluster-wide in-place upgrade proposal #55

Add cluster-wide in-place upgrade proposal #55

HomayoonAlimohammadi commented Sep 11, 2024 •

edited

Loading

berkayoz left a comment

HomayoonAlimohammadi commented Sep 11, 2024

berkayoz commented Sep 11, 2024

bschimke95 commented Sep 12, 2024

bschimke95 left a comment

berkayoz left a comment

Add cluster-wide in-place upgrade proposal #55

Add cluster-wide in-place upgrade proposal #55

Conversation

HomayoonAlimohammadi commented Sep 11, 2024 • edited Loading

berkayoz left a comment

Choose a reason for hiding this comment

HomayoonAlimohammadi commented Sep 11, 2024

berkayoz commented Sep 11, 2024

bschimke95 commented Sep 12, 2024

bschimke95 left a comment

Choose a reason for hiding this comment

berkayoz left a comment

Choose a reason for hiding this comment

HomayoonAlimohammadi commented Sep 11, 2024 •

edited

Loading