From 866317cc72f052d56608f9257a7631c42e86e965 Mon Sep 17 00:00:00 2001 From: Mateo Florido Date: Mon, 9 Sep 2024 00:37:10 -0500 Subject: [PATCH 1/2] Add Certs Refresh Proposal --- docs/002-refresh-certs.md | 347 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 347 insertions(+) create mode 100644 docs/002-refresh-certs.md diff --git a/docs/002-refresh-certs.md b/docs/002-refresh-certs.md new file mode 100644 index 00000000..6a486ccc --- /dev/null +++ b/docs/002-refresh-certs.md @@ -0,0 +1,347 @@ + + +# Proposal information + + +- **Index**: 002 + + +- **Status**: **DRAFTING** + + + +- **Name**: ClusterAPI Certificates Refresh + + +- **Owner**: Mateo Florido [@mateoflorido](https://github.com/mateoflorido) + + +# Proposal Details + +## Summary + + +The proposal aims to enhance Canonical Kubernetes Cluster API Providers by +enabling administrators to refresh or renew certificates on cluster nodes +without the need for a rolling upgrade. This feature is particularly beneficial +in resource-constrained environments, such as private or edge clouds, where +performing a full node replacement may not be feasible. + +## Rationale + + +Currently, Cluster API lacks a mechanism for refreshing certificates on cluster +nodes without triggering a full rolling update. For example, while the Kubeadm +provider offers the ability to renew certificates, it requires a rolling update +of the cluster nodes or manual intervention before the certificates expire. + +This proposal aims to address this gap by enabling certificate renewal on +cluster nodes without requiring a rolling update. By providing administrators +with the ability to refresh certificates independently of node upgrades, this +feature improves cluster operation, especially in environments with limited +resources, such as private or edge clouds. + +It will enhance the user experience by minimizing downtime, reducing the need +for additional resources, and simplifying certificate management. This is +particularly valuable for users who need to maintain continuous availability +or operate in environments where rolling updates are not practical due to +resource constraints. + + +## User facing changes + + +Administrators will be able to renew certificates on cluster nodes without +triggering a full rolling update. This can be achieved by annotating the Machine +object, which will initiate the certificate renewal process: + +``` +kubectl annotate machine k8sd.io/refresh-certificates={expires-in} +``` + +`expires-in` specifies how long the certificate will remain valid. It can be +expressed in years, months, days, or any other time unit supported by the +`time.ParseDuration`. + +For tracking the validity of certificates, the Machine object will include a +`machine.cluster.x-k8s.io/certificates-expiry` annotation that indicates the +expiry date of the certificates. This annotation will be added when the cluster +is deployed and updated when certificates are renewed. The value of this +annotation will be a RFC3339 timestamp. + +## Alternative solutions + + +**Kubeadm Control Plane provider (KCP)** automates certificate rotations for +control plane machines by triggering a machine rollout when certificates are +close to expiration. + +### How to configure: +- In the KCP configuration, set the `rolloutBefore.certificatesExpiryDays` +field. This tells KCP when to trigger the rollout before certificates expire: + +```yaml +spec: + rolloutBefore: + certificatesExpiryDays: 21 # Trigger rollout when certificates expire within 21 days +``` + +### How it works: +- **Automatic Rollouts**: KCP monitors the certificate expiry dates of control +plane machines using the `Machine.Status.CertificatesExpiryDate`. If +certificates are about to expire (based on a configured threshold), KCP +triggers a machine rollout to refresh them. +- **Certificate Expiry Check**: The expiry date is sourced from the +`machine.cluster.x-k8s.io/certificates-expiry` annotation on the Machine or +Bootstrap Config object. + +For manual rotations, the administrator should run the `kubeadm certs renew` +command, ensure all control plane components are restarted, and remove the +expiry annotation for KCP to detect the updated certificate expiry date. + + +## Out of scope + + +This proposal does not cover the orchestration of certificate renewal for the +whole cluster. It focuses on renewing certificates on individual cluster nodes +via the Machine object. + +Rolling updates of the cluster nodes are out of scope. This proposal aims to +renew certificates without triggering a full rolling update of the cluster. + +External certificate authorities (CAs) are also out of scope. This proposal +focuses on renewing self-signed certificates generated by Canonical Kubernetes. + +# Implementation Details + +## API Changes + + +### `GET /x/capi/certificates-expiry` + +This endpoint will return the expiry date of the certificates on a specific +cluster node. The response will include the expiry date of the certificates +in RFC3339 format. + +```go +type CertificatesExpiryResponse struct { + // ExpiryDate is the expiry date of the certificates on the node. + ExpiryDate string `json:"expiry-date"` +} +``` + +### `POST /x/capi/request-certificates` + +This endpoint will create the necessary Certificate Signing Request (CSR) for +a worker node. The request will include the duration after which the +certificates will expire. + +```go +type RequestCertificatesRequest struct { + // ExpirationSeconds is the duration after which the certificates will expire. + ExpirationSeconds int `json:"expiration-seconds"` +} +``` + +### `POST /x/capi/refresh-certificates` + +This endpoint will trigger the renewal of certificates on a specific node. +The request will include the duration after which the certificates will expire +and a list of additional Subject Alternative Names (SANs) to include in the +certificate. + +This endpoint is applicable to both control plane and worker nodes. For worker +nodes, the request will include the seed used to generate the CSR. + +```go +type RefreshCertificatesRequest struct { + // Seed is the seed used to generate the CSR. + Seed string `json:"seed"` + // ExpirationSeconds is the duration after which the certificates will expire. + ExpirationSeconds int `json:"expiration-seconds"` + //ExtraSANs is a list of additional Subject Alternative Names to include in the certificate. + ExtraSANs []string `json:"extra-sans"` +} +``` + +### `POST /x/capi/approve-certificates` + +This endpoint will approve the renewal of certificates for a worker node and +will be run by a control plane node. The request will include a list of the CSR +names to approve, recovered from the `k8sd.io/refresh-certificates-csr-names`. + +```go +type ApproveCertificatesRequest struct { + // CertificateSigningRequests is a list of CSR names to approve. + CertificateSigningRequests []string `json:"certificate-signing-requests"` +} +``` + +## Bootstrap Provider Changes + + +A controller called `CertificatesController` will be added to the bootstrap +provider. This controller will watch for the `k8sd.io/refresh-certificates` +annotation on the Machine object and trigger the certificate renewal process +when the annotation is present. + +### Control Plane Nodes + +The controller would use the value of the `k8sd.io/refresh-certificates` annotation +to determine the duration after which the certificates will expire. It will then +call the `POST /x/capi/refresh-certificates` endpoint to trigger the certificate +renewal process. + +The controller will share the status of the certificate renewal process with the +Machine object by updating the `k8sd.io/refresh-certificates-status` annotation +with the status of the renewal process. The value of this annotation will be +one of the following: + +- `in-progress`: The certificate renewal process is in progress. +- `done`: The certificate renewal process is complete. +- `failed`: The certificate renewal process has failed. + +After the certificate renewal process is complete, the controller will update +the `machine.cluster.x-k8s.io/certificates-expiry` annotation on the Machine +object with the new expiry date of the certificates. + +Finally, the controller will remove the `k8sd.io/refresh-certificates` annotation +from the Machine object to indicate that the certificate renewal process is +complete. + +### Worker Nodes + +The controller would use the value of the `k8sd.io/refresh-certificates` annotation +to determine the duration after which the certificates will expire. It will then +call the `POST /x/capi/request-certificates` endpoint to create the Certificate +Signing Request (CSR) for the worker node. + +The controller will share the CSR names with the control plane node by updating +the `k8sd.io/refresh-certificates-csr-names` annotation with the list of CSR names +to approve. The control plane node will then call the `POST /x/capi/approve-certificates` +endpoint to approve the Certificate Signing Requests list provided by the annotation +value. + +The controller will share the status similar to the control plane nodes by updating +the `k8sd.io/refresh-certificates-status` annotation with the status of the renewal. +The value of this annotation will be the same as the control plane nodes. + +After the CSR approval process is complete, the worker node will call the +`POST /x/capi/refresh-certificates` endpoint to trigger the certificate renewal +process, using the seed generated to recover the certificates. + +After the certificate renewal process is complete, the controller will update +the `machine.cluster.x-k8s.io/certificates-expiry` annotation on the Machine +object with the new expiry date of the certificates. + +Finally, the controller will remove the `k8sd.io/refresh-certificates` annotation +from the Machine object to indicate that the certificate renewal process is +complete. + +## ControlPlane Provider Changes + + +None + +## Configuration Changes + + +None + +## Documentation Changes + + +This implementation will require adding the following documentation: +- How-to guide for renewing certificates on cluster nodes +- Reference page of the `k8sd.io/renew-certificates` annotation + +## Testing + + +Integration tests will be added to the current test suite. The tests will +create a cluster, annotate the Machine object with the `k8sd.io/renew-certificates` +annotation, and verify that the certificates are renewed in the target node. + +## Considerations for backwards compatibility + + +None + +## Implementation notes and guidelines + + +We can leverage the existing certificate renewal logic in the k8s-snap. +For worker nodes, we need to modify the exisiting code to avoid blocking +the request until the certificates have been approved and issued. Instead, +we can use a multiple step process. Generating the CSRs, approving them, and +then trigger the certificate renewal process. + From 1706c75578669eca7ad57ef04d23a82a10828338 Mon Sep 17 00:00:00 2001 From: Mateo Florido Date: Tue, 17 Sep 2024 15:11:37 -0500 Subject: [PATCH 2/2] Address comments --- .../003-refresh-certs.md} | 88 ++++++++++--------- 1 file changed, 47 insertions(+), 41 deletions(-) rename docs/{002-refresh-certs.md => proposals/003-refresh-certs.md} (80%) diff --git a/docs/002-refresh-certs.md b/docs/proposals/003-refresh-certs.md similarity index 80% rename from docs/002-refresh-certs.md rename to docs/proposals/003-refresh-certs.md index 6a486ccc..b5785a48 100644 --- a/docs/002-refresh-certs.md +++ b/docs/proposals/003-refresh-certs.md @@ -6,7 +6,7 @@ fill out the sections below. # Proposal information -- **Index**: 002 +- **Index**: 003 - **Status**: **DRAFTING** @@ -74,12 +74,12 @@ triggering a full rolling update. This can be achieved by annotating the Machine object, which will initiate the certificate renewal process: ``` -kubectl annotate machine k8sd.io/refresh-certificates={expires-in} +kubectl annotate machine v1beta2.k8sd.io/refresh-certificates={expires-in} ``` `expires-in` specifies how long the certificate will remain valid. It can be -expressed in years, months, days, or any other time unit supported by the -`time.ParseDuration`. +expressed in years, months, days, additionally to other time units supported by +the `time.ParseDuration`. For tracking the validity of certificates, the Machine object will include a `machine.cluster.x-k8s.io/certificates-expiry` annotation that indicates the @@ -155,11 +155,12 @@ APIs endpoints instead of breaking the existing APIs, such that API clients are not affected. --> -### `GET /x/capi/certificates-expiry` +### `GET /k8sd/certificates-expiry` This endpoint will return the expiry date of the certificates on a specific cluster node. The response will include the expiry date of the certificates -in RFC3339 format. +in RFC3339 format. The value will be sourced from the Kubernetes API server +certificate. ```go type CertificatesExpiryResponse struct { @@ -205,13 +206,13 @@ type RefreshCertificatesRequest struct { ### `POST /x/capi/approve-certificates` This endpoint will approve the renewal of certificates for a worker node and -will be run by a control plane node. The request will include a list of the CSR -names to approve, recovered from the `k8sd.io/refresh-certificates-csr-names`. +will be run by a control plane node. The request will include the seed used to +generate the CSR. ```go type ApproveCertificatesRequest struct { - // CertificateSigningRequests is a list of CSR names to approve. - CertificateSigningRequests []string `json:"certificate-signing-requests"` + // Seed is the seed used to generate the CSR. + Seed string `json:"seed"` } ``` @@ -221,60 +222,64 @@ This section MUST mention any changes to the bootstrap provider. --> A controller called `CertificatesController` will be added to the bootstrap -provider. This controller will watch for the `k8sd.io/refresh-certificates` +provider. This controller will watch for the `v1beta2.k8sd.io/refresh-certificates` annotation on the Machine object and trigger the certificate renewal process when the annotation is present. ### Control Plane Nodes -The controller would use the value of the `k8sd.io/refresh-certificates` annotation -to determine the duration after which the certificates will expire. It will then -call the `POST /x/capi/refresh-certificates` endpoint to trigger the certificate +The controller would use the value of the +`v1beta2.k8sd.io/refresh-certificates`annotation to determine the duration +after which the certificates will expire. It will then call the +`POST /x/capi/refresh-certificates` endpoint to trigger the certificate renewal process. -The controller will share the status of the certificate renewal process with the -Machine object by updating the `k8sd.io/refresh-certificates-status` annotation -with the status of the renewal process. The value of this annotation will be -one of the following: +The controller will share the status of the certificate renewal process by +adding events to the Machine object. The events will indicate the progress of +the renewal process following this pattern: -- `in-progress`: The certificate renewal process is in progress. -- `done`: The certificate renewal process is complete. -- `failed`: The certificate renewal process has failed. +- `RefreshCertsInProgress`: The certificate renewal process is in progress, the + event will include the `Refreshing certificates in progress` message. +- `RefreshCertsDone`: The certificate renewal process is complete, the event + will include the `Certificates have been refreshed` message. +- `RefreshCertsFailed`: The certificate renewal process has failed, the event + will include the `Certificates renewal failed: {reason}` message. After the certificate renewal process is complete, the controller will update the `machine.cluster.x-k8s.io/certificates-expiry` annotation on the Machine object with the new expiry date of the certificates. -Finally, the controller will remove the `k8sd.io/refresh-certificates` annotation -from the Machine object to indicate that the certificate renewal process is -complete. +Finally, the controller will remove the `v1beta2.k8sd.io/refresh-certificates` +annotation from the Machine object to indicate that the certificate renewal +process is complete. ### Worker Nodes -The controller would use the value of the `k8sd.io/refresh-certificates` annotation -to determine the duration after which the certificates will expire. It will then -call the `POST /x/capi/request-certificates` endpoint to create the Certificate -Signing Request (CSR) for the worker node. +The controller would use the value of the `k8sd.io/refresh-certificates` +annotation to determine the duration after which the certificates will expire. +It will then call the `POST /x/capi/request-certificates` endpoint to create +the Certificate Signing Request (CSR) for the worker node. -The controller will share the CSR names with the control plane node by updating -the `k8sd.io/refresh-certificates-csr-names` annotation with the list of CSR names -to approve. The control plane node will then call the `POST /x/capi/approve-certificates` -endpoint to approve the Certificate Signing Requests list provided by the annotation -value. +Using the `k8sd` proxy, the controller can call the +`POST /x/capi/approve-certificates` endpoint with the seed generated in the +previous step to approve the CSRs for the worker node. -The controller will share the status similar to the control plane nodes by updating -the `k8sd.io/refresh-certificates-status` annotation with the status of the renewal. -The value of this annotation will be the same as the control plane nodes. +The controller will share the status similar to the control plane nodes by +emitting events to the `Machine` object. The events will indicate the progress +of the renewal process following the same pattern as in the control plane +nodes. After the CSR approval process is complete, the worker node will call the `POST /x/capi/refresh-certificates` endpoint to trigger the certificate renewal -process, using the seed generated to recover the certificates. +process, using the seed generated to recover the certificates from the CSR +resources. After the certificate renewal process is complete, the controller will update the `machine.cluster.x-k8s.io/certificates-expiry` annotation on the Machine object with the new expiry date of the certificates. -Finally, the controller will remove the `k8sd.io/refresh-certificates` annotation +Finally, the controller will remove the `v1beta2.k8sd.io/refresh-certificates` +annotation from the Machine object to indicate that the certificate renewal process is complete. @@ -305,7 +310,7 @@ updated (e.g. command outputs). This implementation will require adding the following documentation: - How-to guide for renewing certificates on cluster nodes -- Reference page of the `k8sd.io/renew-certificates` annotation +- Reference page of the `v1beta2.k8sd.io/refresh-certificates` annotation ## Testing Integration tests will be added to the current test suite. The tests will -create a cluster, annotate the Machine object with the `k8sd.io/renew-certificates` -annotation, and verify that the certificates are renewed in the target node. +create a cluster, annotate the Machine object with the +`v1beta2.k8sd.io/refresh-certificates` annotation, and verify that the +certificates are renewed in the target node. ## Considerations for backwards compatibility