From 490b0ae882044de5348fe08ccdb9976dd0273638 Mon Sep 17 00:00:00 2001 From: Berkay Tekin Oz Date: Mon, 29 Jul 2024 06:48:51 +0000 Subject: [PATCH 1/7] Add in-place upgrade proposal --- docs/proposals/000-template.md | 137 +++++++++++++++ docs/proposals/001-in-place-upgrades.md | 221 ++++++++++++++++++++++++ 2 files changed, 358 insertions(+) create mode 100644 docs/proposals/000-template.md create mode 100644 docs/proposals/001-in-place-upgrades.md diff --git a/docs/proposals/000-template.md b/docs/proposals/000-template.md new file mode 100644 index 00000000..07bf0969 --- /dev/null +++ b/docs/proposals/000-template.md @@ -0,0 +1,137 @@ + + +# Proposal information + + +- **Index**: 000 + + +- **Status**: + + +- **Name**: Feature name + + +- **Owner**: FirstName LastName / + +# Proposal Details + +## Summary + + +## Rationale + + +## User facing changes + + +none + +## Alternative solutions + + +none + +## Out of scope + + +none + +# Implementation Details + +## API Changes + +none + +## Bootstrap Provider Changes + +none + +## ControlPlane Provider Changes + +none + +## Configuration Changes + +none + +## Documentation Changes + +none + +## Testing + + +## Considerations for backwards compatibility + + +## Implementation notes and guidelines + diff --git a/docs/proposals/001-in-place-upgrades.md b/docs/proposals/001-in-place-upgrades.md new file mode 100644 index 00000000..38b6ee9b --- /dev/null +++ b/docs/proposals/001-in-place-upgrades.md @@ -0,0 +1,221 @@ + + +# Proposal information + + +- **Index**: 001 + + +- **Status**: **DRAFTING** + + +- **Name**: ClusterAPI In-Place Upgrades + + +- **Owner**: Berkay Tekin Oz [@berkayoz](https://github.com/berkayoz) + +# Proposal Details + +## Summary + + +Canonical Kubernetes CAPI providers should reconcile workload clusters and perform in-place upgrades based on the metadata in the cluster manifest. + +This can be used in environments where rolling upgrades are not a viable option such as edge deployments. + +## Rationale + + +The current Cluster API implementation does not provide a way of updating machines in-place and instead follows a rolling upgrade strategy. + +This means that a version upgrade would trigger a rolling upgrade, which is the process of creating new machines with desired configuration and removing older ones. This strategy is acceptable in most-cases for clusters that are provisioned on public or private clouds where having extra resources are not a concern. + +However this strategy is not viable for smaller bare-metal or edge deployments where resources are limited. This makes Cluster API not suitable out of the box for most of the use cases in industries like telco. + +We can enable the use of Cluster API in these use-cases by updating our providers to perform in-place upgrades. + + +## User facing changes + + +Users will be able to perform in-place upgrades per machine basis by running: +```sh +kubectl annotate machine k8sd.io/in-place-upgrade={upgrade-option} +``` + +Users can also perform in-place upgrades on the entire cluster by running: +```sh +kubectl annotate cluster k8sd.io/in-place-upgrade={upgrade-option} +``` +This would upgrade machines belonging to `` one by one. + +`{upgrade-option}` can be one of: +* `channel=` which would refresh the machine to the provided channel e.g. `channel=1.31-classic/stable` +* `revision=` which would refresh the machine to the provided revision e.g. `revision=640` +* `localPath=` which would refresh the machine to the provided local `*.snap` file e.g. `localPath=/path/to/k8s.snap` + +## Alternative solutions + + +We could alternatively use the `version` fields defined in `ControlPlane` and `MachineDeployment` manifests instead of annotations which could be a better/more native user experience. + +However at the time of writing CAPI does not have support for changing upgrade strategies which means changes to the `version` fields trigger a rolling update. + +This behaviour can be adjusted on `ControlPlane` objects as our provider has more/full control but can not be easily adjusted on `MachineDeployment` objects which causes issues for worker nodes. + +Switching to using the `version` field should take place when upstream implements support for different upgrade strategies. + +## Out of scope + + +none + +# Implementation Details + +## API Changes + +### `POST /x/capi/snap-refresh` + +```go +type SnapRefreshRequest struct { + // Channel is the channel to refresh the snap to. + Channel string + // Revision is the revision number to refresh the snap to. + Revision string + // LocalPath is the local path to use to refresh the snap. + LocalPath string +} +``` + +`POST /x/capi/snap-refresh` performs the in-place upgrade with the given options. The upgrade can be either done with a `Channel`, `Revision` or a local `*.snap` file provided via `LocalPath`. + +This endpoint should use `ValidateCAPIAuthTokenAccessHandler("capi-auth-token")` for authentication. + +## Bootstrap Provider Changes + + +A machine controller called `MachineReconciler` is added which would perform the in-place upgrade if `k8sd.io/in-place-upgrade` annotation is set on the machine. + +The controller would use the value of this annotation to make an endpoint call to the `/x/capi/snap-refresh` through `k8sd-proxy`. + +The result of this operation will be communicated back to the user via the `k8sd.io/in-place-upgrade-status` annotation. Values being: + +* `done` for a successful upgrade +* `failed` for a failed upgrade + +A failed upgrade could be re-triggered by removing the `k8sd.io/in-place-upgrade` annotation and re-adding it to the machine. + +The `ck8sconfig_controller` should check for the `k8sd.io/in-place-upgrade` annotation both on the `Machine` and on the owner `Cluster` object. The value of one of these annotations should be used instead of the `version` field while generating a cloud-init script for a new machine. The annotation on the `Machine` object should take precedence. +This would prevent adding nodes with an outdated version and possibly breaking the cluster due to a version mismatch. + +## ControlPlane Provider Changes + +A cluster controller called `ClusterReconciler` is added which would perform the one-by-one in-place upgrade of the entire workload cluster. + +The controller would propagate the `k8sd.io/in-place-upgrade` annotation on the `Cluster` object by adding this annotation one-by-one to all the machines that is owned by this cluster. + +A Kubernetes API call listing the objects of type `Machine` and filtering with `ownerRef` would produce the list of machines owned by the cluster. The controller then would iterate over this list, annotating machines and waiting for the operation to complete on each iteration. + + +## Configuration Changes + +none + +## Documentation Changes + +`How-To` page on performing in-place upgrades should be created. + +`Reference` page listing the annotations and possible values should be created/updated. + +## Testing + +The new feature can be tested manually by applying an annotation on the machine/node, waiting for the process to finish by checking for the `k8sd.io/in-place-upgrade-status` annotation and then checking for the version of the node through the Kubernetes API e.g. `kubectl get node`. A timeout should be set for waiting on the upgrade process. + +The tests can be integrated into the CI with the CAPD infrastructure provider. + +The upgrade should be performed with the `localPath` option. Under Pebble the process would replace the `kubernetes` binary with the binary provided in the annotation value. + +This means a docker image containing 2 versions should be created. The different/new version of the `kubernetes` binary would also be built and put into a path. + + +## Considerations for backwards compatibility + + +## Implementation notes and guidelines + + +The annotation method is chosen due to the "immutable infrastructure" assumption CAPI currently has. Which means updates are always done by creating new machines and fields are immutable. This might also pose some challenges on displaying accurate Kubernetes version information through CAPI. + +We should be aware of the [metadata propagation](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/metadata-propagation) performed by the upstream controllers. Some meetadata is propagated in-place, which can ultimately propagate all the way down to the `Machine` objects. This could potentially flood the cluster with upgrades if machines get annotated at the same time. The cluster wide upgrade is handled through the annotation on the actual Cluster object due to this reason. + +Updating the `version` field would trigger rolling updates by default, with the only difference than upstream being the precedence of the version value provided in the annotations. + +The in-place upgrades only address the upgrades of Canonical Kubernetes and it's respective dependencies. Which means changes on the OS front/image would not be handled since the underlying machine image stays the same. This would be handled by a rolling upgrade as usual. From e67420c697bdea00f9bfcd656b3b2924013abe90 Mon Sep 17 00:00:00 2001 From: Berkay Tekin Oz Date: Mon, 29 Jul 2024 12:20:05 +0000 Subject: [PATCH 2/7] Update proposal with new annotations --- docs/proposals/001-in-place-upgrades.md | 64 +++++++++++++++++++------ 1 file changed, 50 insertions(+), 14 deletions(-) diff --git a/docs/proposals/001-in-place-upgrades.md b/docs/proposals/001-in-place-upgrades.md index 38b6ee9b..0ca617c1 100644 --- a/docs/proposals/001-in-place-upgrades.md +++ b/docs/proposals/001-in-place-upgrades.md @@ -27,7 +27,7 @@ it is attempting to solve. Canonical Kubernetes CAPI providers should reconcile workload clusters and perform in-place upgrades based on the metadata in the cluster manifest. -This can be used in environments where rolling upgrades are not a viable option such as edge deployments. +This can be used in environments where rolling upgrades are not a viable option such as edge deployments and non-HA clusters. ## Rationale -none +The in-place upgrades only address the upgrades of Canonical Kubernetes and it's respective dependencies. Which means changes on the OS front/image would not be handled since the underlying machine image stays the same. This would be handled by a rolling upgrade as usual. # Implementation Details @@ -124,7 +124,9 @@ type SnapRefreshRequest struct { } ``` -`POST /x/capi/snap-refresh` performs the in-place upgrade with the given options. The upgrade can be either done with a `Channel`, `Revision` or a local `*.snap` file provided via `LocalPath`. +`POST /x/capi/snap-refresh` performs the in-place upgrade with the given options. + +The upgrade can be either done with a `Channel`, `Revision` or a local `*.snap` file provided via `LocalPath`. The value of `LocalPath` should be an absolute path. This endpoint should use `ValidateCAPIAuthTokenAccessHandler("capi-auth-token")` for authentication. @@ -133,18 +135,49 @@ This endpoint should use `ValidateCAPIAuthTokenAccessHandler("capi-auth-token")` This section MUST mention any changes to the bootstrap provider. --> -A machine controller called `MachineReconciler` is added which would perform the in-place upgrade if `k8sd.io/in-place-upgrade` annotation is set on the machine. +A machine controller called `MachineReconciler` is added which would perform the in-place upgrade if `k8sd.io/in-place-upgrade-to` annotation is set on the machine. The controller would use the value of this annotation to make an endpoint call to the `/x/capi/snap-refresh` through `k8sd-proxy`. The result of this operation will be communicated back to the user via the `k8sd.io/in-place-upgrade-status` annotation. Values being: +* `in-progress` for an upgrade currently in progress * `done` for a successful upgrade * `failed` for a failed upgrade -A failed upgrade could be re-triggered by removing the `k8sd.io/in-place-upgrade` annotation and re-adding it to the machine. +After an upgrade process begins: +* `k8sd.io/in-place-upgrade-status` annotation on the `Machine` would be added/updated with `in-progress` + +After a successfull upgrade: +* `k8sd.io/in-place-upgrade-to` annotation on the `Machine` would be removed +* `k8sd.io/in-place-upgrade-current` annotation on the `Machine` would be added/updated with the used `{upgrade-option}`. +* `k8sd.io/in-place-upgrade-status` annotation on the `Machine` would be added/updated with `done` + +After a failed upgrade: +* `k8sd.io/in-place-upgrade-failure` annotation on the `Machine` would be added/updated with the failure message +* `k8sd.io/in-place-upgrade-status` annotation on the `Machine` would be added/updated with `failed` + +The reconciler should ignore the upgrade if `k8sd.io/in-place-upgrade-status` is already set to `in-progress` on the machine. + +#### Changes for Rolling Upgrades and Creating New Machines +In case of a rolling upgrade or when creating new machines the `CK8sConfigReconciler` should check for the `k8sd.io/in-place-upgrade-current` annotation both on the `Machine` and on the owner `Cluster` object. + +The value of one of these annotations should be used instead of the `version` field while generating a cloud-init script for a machine. The precedence of version fields are: +1. Annotation on the `Machine` +2. Annotation on the `Cluster` +3. The `version` field + +Which means the value from the annotation on the `Machine` would be used first if found. + +Using an annotation value requires changing the `install.sh` file to perform the relevant snap operation based on the option. +* `snap install k8s --classic --channel ` for `Channel` +* `snap install k8s --classic --revision ` for `Revision` +* `snap install --classic --dangerous --name k8s` for `LocalPath` + +When a rolling upgrade is triggered the `LocalPath` option requires the newly created machine to contain the local `*.snap` file. This usually means the machine image used by the infrastructure provider should be updated to contain this image. This file could possibly be sideloaded in the cloud-init script before installation. + +This operation should not be performed if `install.sh` script is overridden by the user in the manifests. -The `ck8sconfig_controller` should check for the `k8sd.io/in-place-upgrade` annotation both on the `Machine` and on the owner `Cluster` object. The value of one of these annotations should be used instead of the `version` field while generating a cloud-init script for a new machine. The annotation on the `Machine` object should take precedence. This would prevent adding nodes with an outdated version and possibly breaking the cluster due to a version mismatch. ## ControlPlane Provider Changes @@ -153,10 +186,15 @@ This section MUST mention any changes to the controlplane provider. --> A cluster controller called `ClusterReconciler` is added which would perform the one-by-one in-place upgrade of the entire workload cluster. -The controller would propagate the `k8sd.io/in-place-upgrade` annotation on the `Cluster` object by adding this annotation one-by-one to all the machines that is owned by this cluster. +The controller would propagate the `k8sd.io/in-place-upgrade-to` annotation on the `Cluster` object by adding this annotation one-by-one to all the machines that is owned by this cluster. A Kubernetes API call listing the objects of type `Machine` and filtering with `ownerRef` would produce the list of machines owned by the cluster. The controller then would iterate over this list, annotating machines and waiting for the operation to complete on each iteration. +The reconciler should ignore a machine if `k8sd.io/in-place-upgrade-status` is already set to `in-progress`. + +Once upgrades of the underlying machines are finished: +* `k8sd.io/in-place-upgrade-to` annotation on the `Cluster` would be removed +* `k8sd.io/in-place-upgrade-current` annotation on the `Cluster` would be added/updated with the used `{upgrade-option}`. ## Configuration Changes The new feature can be tested manually by applying an annotation on the machine/node, waiting for the process to finish by checking for the `k8sd.io/in-place-upgrade-status` annotation and then checking for the version of the node through the Kubernetes API e.g. `kubectl get node`. A timeout should be set for waiting on the upgrade process. -The tests can be integrated into the CI with the CAPD infrastructure provider. +The tests can be integrated into the CI the same way with the CAPD infrastructure provider. The upgrade should be performed with the `localPath` option. Under Pebble the process would replace the `kubernetes` binary with the binary provided in the annotation value. @@ -214,8 +252,6 @@ implements it. The annotation method is chosen due to the "immutable infrastructure" assumption CAPI currently has. Which means updates are always done by creating new machines and fields are immutable. This might also pose some challenges on displaying accurate Kubernetes version information through CAPI. -We should be aware of the [metadata propagation](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/metadata-propagation) performed by the upstream controllers. Some meetadata is propagated in-place, which can ultimately propagate all the way down to the `Machine` objects. This could potentially flood the cluster with upgrades if machines get annotated at the same time. The cluster wide upgrade is handled through the annotation on the actual Cluster object due to this reason. +We should be aware of the [metadata propagation](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/metadata-propagation) performed by the upstream controllers. Some metadata is propagated in-place, which can ultimately propagate all the way down to the `Machine` objects. This could potentially flood the cluster with upgrades if machines get annotated at the same time. The cluster wide upgrade is handled through the annotation on the actual Cluster object due to this reason. Updating the `version` field would trigger rolling updates by default, with the only difference than upstream being the precedence of the version value provided in the annotations. - -The in-place upgrades only address the upgrades of Canonical Kubernetes and it's respective dependencies. Which means changes on the OS front/image would not be handled since the underlying machine image stays the same. This would be handled by a rolling upgrade as usual. From aab94ba3c60c400228914b5de11a5ba543c471d5 Mon Sep 17 00:00:00 2001 From: Berkay Tekin Oz Date: Wed, 31 Jul 2024 10:39:15 +0000 Subject: [PATCH 3/7] Update annotations, add more clarification --- docs/proposals/001-in-place-upgrades.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/proposals/001-in-place-upgrades.md b/docs/proposals/001-in-place-upgrades.md index 0ca617c1..45c92f48 100644 --- a/docs/proposals/001-in-place-upgrades.md +++ b/docs/proposals/001-in-place-upgrades.md @@ -150,17 +150,18 @@ After an upgrade process begins: After a successfull upgrade: * `k8sd.io/in-place-upgrade-to` annotation on the `Machine` would be removed -* `k8sd.io/in-place-upgrade-current` annotation on the `Machine` would be added/updated with the used `{upgrade-option}`. +* `k8sd.io/in-place-upgrade-release` annotation on the `Machine` would be added/updated with the used `{upgrade-option}`. +* `k8sd.io/in-place-upgrade-status-message` annotation on the `Machine` would be added/updated with the success message * `k8sd.io/in-place-upgrade-status` annotation on the `Machine` would be added/updated with `done` After a failed upgrade: -* `k8sd.io/in-place-upgrade-failure` annotation on the `Machine` would be added/updated with the failure message +* `k8sd.io/in-place-upgrade-status-message` annotation on the `Machine` would be added/updated with the failure message * `k8sd.io/in-place-upgrade-status` annotation on the `Machine` would be added/updated with `failed` The reconciler should ignore the upgrade if `k8sd.io/in-place-upgrade-status` is already set to `in-progress` on the machine. -#### Changes for Rolling Upgrades and Creating New Machines -In case of a rolling upgrade or when creating new machines the `CK8sConfigReconciler` should check for the `k8sd.io/in-place-upgrade-current` annotation both on the `Machine` and on the owner `Cluster` object. +#### Changes for Rolling Upgrades, Scaling Up and Creating New Machines +In case of a rolling upgrade or when creating new machines the `CK8sConfigReconciler` should check for the `k8sd.io/in-place-upgrade-release` annotation both on the `Machine` and on the owner `Cluster` object. The value of one of these annotations should be used instead of the `version` field while generating a cloud-init script for a machine. The precedence of version fields are: 1. Annotation on the `Machine` @@ -194,7 +195,7 @@ The reconciler should ignore a machine if `k8sd.io/in-place-upgrade-status` is a Once upgrades of the underlying machines are finished: * `k8sd.io/in-place-upgrade-to` annotation on the `Cluster` would be removed -* `k8sd.io/in-place-upgrade-current` annotation on the `Cluster` would be added/updated with the used `{upgrade-option}`. +* `k8sd.io/in-place-upgrade-release` annotation on the `Cluster` would be added/updated with the used `{upgrade-option}`. ## Configuration Changes +### Cluster-wide Orchestration +A cluster controller called `ClusterReconciler` is added which would perform the one-by-one in-place upgrade of the entire workload cluster. + +The controller would propagate the `k8sd.io/in-place-upgrade-to` annotation on the `Cluster` object by adding this annotation one-by-one to all the machines that is owned by this cluster. + +The reconciler would perform upgrades in 2 separate phases for control-plane and worker machines. + +A Kubernetes API call listing the objects of type `Machine` and filtering with `ownerRef` would produce the list of machines owned by the cluster. For each phase controller would iterate over this list filtering by the machine type, annotating the machines and waiting for the operation to complete on each iteration. + +The reconciler should ignore a machine if `k8sd.io/in-place-upgrade-status` is already set to `in-progress`. + +Once upgrades of the underlying machines are finished: +* `k8sd.io/in-place-upgrade-to` annotation on the `Cluster` would be removed +* `k8sd.io/in-place-upgrade-release` annotation on the `Cluster` would be added/updated with the used `{upgrade-option}`. + +This will be introduced and explained more extensively in another proposal. + +### Upgrades of Underlying OS and Dependencies The in-place upgrades only address the upgrades of Canonical Kubernetes and it's respective dependencies. Which means changes on the OS front/image would not be handled since the underlying machine image stays the same. This would be handled by a rolling upgrade as usual. # Implementation Details @@ -128,7 +146,9 @@ type SnapRefreshRequest struct { The upgrade can be either done with a `Channel`, `Revision` or a local `*.snap` file provided via `LocalPath`. The value of `LocalPath` should be an absolute path. -This endpoint should use `ValidateCAPIAuthTokenAccessHandler("capi-auth-token")` for authentication. +A refresh token per node will be generated at bootstrap time, which gets seeded into to the node under the `/capi/etc/refresh-token` file. The generated token will be stored on the management cluster in the `$clustername-token` secret, with keys formatted as `refresh-token::$nodename`. + +This endpoint will use `ValidateCAPIRefreshTokenAccessHandler("capi-refresh-token")` to check the `capi-refresh-token` header to match against the token in the `/capi/etc/refresh-token` file. ## Bootstrap Provider Changes -A cluster controller called `ClusterReconciler` is added which would perform the one-by-one in-place upgrade of the entire workload cluster. - -The controller would propagate the `k8sd.io/in-place-upgrade-to` annotation on the `Cluster` object by adding this annotation one-by-one to all the machines that is owned by this cluster. - -A Kubernetes API call listing the objects of type `Machine` and filtering with `ownerRef` would produce the list of machines owned by the cluster. The controller then would iterate over this list, annotating machines and waiting for the operation to complete on each iteration. - -The reconciler should ignore a machine if `k8sd.io/in-place-upgrade-status` is already set to `in-progress`. - -Once upgrades of the underlying machines are finished: -* `k8sd.io/in-place-upgrade-to` annotation on the `Cluster` would be removed -* `k8sd.io/in-place-upgrade-release` annotation on the `Cluster` would be added/updated with the used `{upgrade-option}`. +none ## Configuration Changes -### `POST /x/capi/snap-refresh` +### `POST /snap/refresh` ```go type SnapRefreshRequest struct { // Channel is the channel to refresh the snap to. - Channel string + Channel string `json:"channel"` // Revision is the revision number to refresh the snap to. - Revision string + Revision string `json:"revision"` // LocalPath is the local path to use to refresh the snap. - LocalPath string + LocalPath string `json:"localPath"` +} + +// SnapRefreshResponse is the response message for the SnapRefresh RPC. +type SnapRefreshResponse struct { + // The change id belonging to a snap refresh/install operation. + ChangeID string `json:"changeId"` } ``` -`POST /x/capi/snap-refresh` performs the in-place upgrade with the given options. +`POST /snap/refresh` performs the in-place upgrade with the given options and returns the change id of the snap operation. The upgrade can be either done with a `Channel`, `Revision` or a local `*.snap` file provided via `LocalPath`. The value of `LocalPath` should be an absolute path. -A refresh token per node will be generated at bootstrap time, which gets seeded into to the node under the `/capi/etc/refresh-token` file. The generated token will be stored on the management cluster in the `$clustername-token` secret, with keys formatted as `refresh-token::$nodename`. +### `POST /snap/refresh-status` + +```go +// SnapRefreshStatusRequest is the request message for the SnapRefreshStatus RPC. +type SnapRefreshStatusRequest struct { + // The change id belonging to a snap refresh/install operation. + ChangeID string `json:"changeId"` +} + +// SnapRefreshStatusResponse is the response message for the SnapRefreshStatus RPC. +type SnapRefreshStatusResponse struct { + // Status is the status of the snap refresh/install operation. + Status string `json:"status"` + // Completed is a boolean indicating if the snap refresh/install operation has completed. + // The status should be considered final when this is true. + Completed bool `json:"completed"` + // ErrorMessage is the error message if the snap refresh/install operation failed. + ErrorMessage string `json:"errorMessage"` +} +``` +`POST /snap/refresh-status` returns the status of the refresh operation for the given change id. + +The operation is considered fully complete once `Completed=true`. + +The `Status` field will contain the status of the operation, with `Done` and `Error` being statuses of interest. + +The `ErrorMessage` field is populated if the operation could not be completed successfully. + + +### Node Token Authentication + +A node token per node will be generated at bootstrap time, which gets seeded into the node under the `/capi/etc/node-token` file. On bootstrap the token under `/capi/etc/node-token` will be copied over to `/var/snap/k8s/common/node-token` with the help of `k8s x-capi set-node-token ` command. The generated token will be stored on the management cluster in the `$clustername-token` secret, with keys formatted as `refresh-token::$nodename`. + +The endpoints will use `ValidateNodeTokenAccessHandler("node-token")` to check the `node-token` header to match against the token in the `/var/snap/k8s/common/node-token` file. -This endpoint will use `ValidateCAPIRefreshTokenAccessHandler("capi-refresh-token")` to check the `capi-refresh-token` header to match against the token in the `/capi/etc/refresh-token` file. ## Bootstrap Provider Changes -A machine controller called `MachineReconciler` is added which would perform the in-place upgrade if `k8sd.io/in-place-upgrade-to` annotation is set on the machine. +A machine controller called `MachineReconciler` is added which would perform the in-place upgrade if `v1beta2.k8sd.io/in-place-upgrade-to` annotation is set on the machine. -The controller would use the value of this annotation to make an endpoint call to the `/x/capi/snap-refresh` through `k8sd-proxy`. +The controller would use the value of this annotation to make an endpoint call to the `/snap/refresh` through `k8sd-proxy`. The controller then would periodically query the `/snap/refresh-status` with the change id of the operation until the operation fully is completed(`Completed=true`). -The result of this operation will be communicated back to the user via the `k8sd.io/in-place-upgrade-status` annotation. Values being: +A failed request to `/snap/refresh` endpoint would requeue the requested upgrade without setting any annotations. + +The result of the refresh operation will be communicated back to the user via the `v1beta2.k8sd.io/in-place-upgrade-status` annotation. Values being: * `in-progress` for an upgrade currently in progress * `done` for a successful upgrade * `failed` for a failed upgrade After an upgrade process begins: -* `k8sd.io/in-place-upgrade-status` annotation on the `Machine` would be added/updated with `in-progress` +* `v1beta2.k8sd.io/in-place-upgrade-status` annotation on the `Machine` would be added/updated with `in-progress` +* `v1beta2.k8sd.io/in-place-upgrade-change-id` annotation on the `Machine` would be updated with the change id returned from the refresh response. * An `InPlaceUpgradeInProgress` event is added to the `Machine` with the `Performing in place upgrade with {upgrade-option}` message. After a successfull upgrade: -* `k8sd.io/in-place-upgrade-to` annotation on the `Machine` would be removed -* `k8sd.io/in-place-upgrade-release` annotation on the `Machine` would be added/updated with the used `{upgrade-option}`. -* `k8sd.io/in-place-upgrade-status` annotation on the `Machine` would be added/updated with `done` +* `v1beta2.k8sd.io/in-place-upgrade-to` annotation on the `Machine` would be removed +* `v1beta2.k8sd.io/in-place-change-id` annotation on the `Machine` would be removed +* `v1beta2.k8sd.io/in-place-upgrade-release` annotation on the `Machine` would be added/updated with the used `{upgrade-option}`. +* `v1beta2.k8sd.io/in-place-upgrade-status` annotation on the `Machine` would be added/updated with `done` * An `InPlaceUpgradeDone` event is added to the `Machine` with the `Successfully performed in place upgrade with {upgrade-option}` message. After a failed upgrade: -* `k8sd.io/in-place-upgrade-status` annotation on the `Machine` would be added/updated with `failed` +* `v1beta2.k8sd.io/in-place-upgrade-status` annotation on the `Machine` would be added/updated with `failed` +* `v1beta2.k8sd.io/in-place-change-id` annotation on the `Machine` would be removed * An `InPlaceUpgradeFailed` event is added to the `Machine` with the `Failed to perform in place upgrade with option {upgrade-option}: {error}` message. A custom condition with type `InPlaceUpgradeStatus` can also be added to relay these information. -The reconciler should ignore the upgrade if `k8sd.io/in-place-upgrade-status` is already set to `in-progress` on the machine. The reconciler should also perform the upgrades one by one, requeuing other requests if an in-place upgrade is already in progress. +The reconciler should not trigger the upgrade endpoint if `v1beta2.k8sd.io/in-place-upgrade-status` is already set to `in-progress` on the machine. Instead #### Changes for Rolling Upgrades, Scaling Up and Creating New Machines -In case of a rolling upgrade or when creating new machines the `CK8sConfigReconciler` should check for the `k8sd.io/in-place-upgrade-release` annotation both on the `Machine` and on the owner `Cluster` object. - -The value of one of these annotations should be used instead of the `version` field while generating a cloud-init script for a machine. The precedence of version fields are: -1. Annotation on the `Machine` -2. Annotation on the `Cluster` -3. The `version` field +In case of a rolling upgrade or when creating new machines the `CK8sConfigReconciler` should check for the `v1beta2.k8sd.io/in-place-upgrade-release` annotation both on the `Machine` object. -Which means the value from the annotation on the `Machine` would be used first if found. +The value of one of the annotation should be used instead of the `version` field while generating a cloud-init script for a machine. Using an annotation value requires changing the `install.sh` file to perform the relevant snap operation based on the option. * `snap install k8s --classic --channel ` for `Channel` @@ -234,7 +272,7 @@ updated (e.g. command outputs). -The new feature can be tested manually by applying an annotation on the machine/node, waiting for the process to finish by checking for the `k8sd.io/in-place-upgrade-status` annotation and then checking for the version of the node through the Kubernetes API e.g. `kubectl get node`. A timeout should be set for waiting on the upgrade process. +The new feature can be tested manually by applying an annotation on the machine/node and checking for the `v1beta2.k8sd.io/in-place-upgrade-status` annotation's value to be `done`. A timeout should be set for waiting on the upgrade process. The tests can be integrated into the CI the same way with the CAPD infrastructure provider. @@ -242,7 +280,6 @@ The upgrade should be performed with the `localPath` option. Under Pebble the pr This means a docker image containing 2 versions should be created. The different/new version of the `kubernetes` binary would also be built and put into a path. - ## Considerations for backwards compatibility - # Proposal information @@ -41,7 +36,7 @@ You can also provide examples of how this feature may be used. The current Cluster API implementation does not provide a way of updating machines in-place and instead follows a rolling upgrade strategy. -This means that a version upgrade would trigger a rolling upgrade, which is the process of creating new machines with desired configuration and removing older ones. This strategy is acceptable in most-cases for clusters that are provisioned on public or private clouds where having extra resources are not a concern. +This means that a version upgrade would trigger a rolling upgrade, which is the process of creating new machines with desired configuration and removing older ones. This strategy is acceptable in most cases for clusters that are provisioned on public or private clouds where having extra resources is not a concern. However this strategy is not viable for smaller bare-metal or edge deployments where resources are limited. This makes Cluster API not suitable out of the box for most of the use cases in industries like telco. @@ -73,7 +68,7 @@ should be considered. If required, add more details about why these alternative solutions were discarded. --> -We could alternatively use the `version` fields defined in `ControlPlane` and `MachineDeployment` manifests instead of annotations which could be a better/more native user experience. +We could alternatively use the `version` fields defined in `CK8sControlPlane` and `MachineDeployment` manifests instead of annotations which could be a better/more native user experience. However at the time of writing CAPI does not have support for changing upgrade strategies which means changes to the `version` fields trigger a rolling update. @@ -195,7 +190,7 @@ This section MUST mention any changes to the bootstrap provider. A machine controller called `MachineReconciler` is added which would perform the in-place upgrade if `v1beta2.k8sd.io/in-place-upgrade-to` annotation is set on the machine. -The controller would use the value of this annotation to make an endpoint call to the `/snap/refresh` through `k8sd-proxy`. The controller then would periodically query the `/snap/refresh-status` with the change id of the operation until the operation fully is completed(`Completed=true`). +The controller would use the value of this annotation to make an endpoint call to the `/snap/refresh` through `k8sd-proxy`. The controller then would periodically query the `/snap/refresh-status` with the change id of the operation until the operation is fully completed(`Completed=true`). A failed request to `/snap/refresh` endpoint would requeue the requested upgrade without setting any annotations. @@ -224,7 +219,7 @@ After a failed upgrade: A custom condition with type `InPlaceUpgradeStatus` can also be added to relay these information. -The reconciler should not trigger the upgrade endpoint if `v1beta2.k8sd.io/in-place-upgrade-status` is already set to `in-progress` on the machine. Instead +The reconciler should not trigger the upgrade endpoint if `v1beta2.k8sd.io/in-place-upgrade-status` is already set to `in-progress` on the machine. #### Changes for Rolling Upgrades, Scaling Up and Creating New Machines In case of a rolling upgrade or when creating new machines the `CK8sConfigReconciler` should check for the `v1beta2.k8sd.io/in-place-upgrade-release` annotation both on the `Machine` object. From 55d4efe073be24d78ba2bd89ad39bad1a3935b18 Mon Sep 17 00:00:00 2001 From: Berkay Tekin Oz Date: Thu, 12 Sep 2024 07:02:49 +0000 Subject: [PATCH 7/7] Mark as accepted --- docs/proposals/001-in-place-upgrades.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/proposals/001-in-place-upgrades.md b/docs/proposals/001-in-place-upgrades.md index ee347d22..f0391e9c 100644 --- a/docs/proposals/001-in-place-upgrades.md +++ b/docs/proposals/001-in-place-upgrades.md @@ -4,7 +4,7 @@ - **Index**: 001 -- **Status**: **DRAFTING** +- **Status**: **ACCEPTED** - **Name**: ClusterAPI In-Place Upgrades