-
Notifications
You must be signed in to change notification settings - Fork 204
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
180 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
# Karpenter Drift Hash Versioning with CRD | ||
|
||
## Summary of Drift Hashing Mechanism | ||
|
||
Drift utilizes a hashing mechanism to identify when a `NodeClaim` has been drifted from its owning `NodePool`. Karpenter will hash fields that are within the `NodePool.Spec.Template`, excluding [special cases of drift](https://karpenter.sh/docs/concepts/disruption/#special-cases-on-drift) or [behavior fields](https://karpenter.sh/docs/concepts/disruption/#behavioral-fields). The `NodePool` will be annotated with the hash. When a `NodeClaim` is created, the `NodePool` hash will be annotated on the `NodeClaim`. More details can be found in the [drift documentation](https://karpenter.sh/docs/concepts/disruption/#drift). | ||
|
||
When a `NodePool` is updated, a new hash will be annotated on the `NodePool`. The `NodeClaim` disruption controller will periodically check if the `NodePool` hash match to determine when a `NodeClaim` is drifted. | ||
|
||
## Background | ||
|
||
Karpenter's static drift mechanism works well on additive fields, but the hash will change for all nodes when a field is removed or a new default is set. Currently, the team is looking to set defaults for the fields kind and apiVersion on the nodeClassRef of the nodepool.spec.template.spec [#909](https://github.com/kubernetes-sigs/karpenter/issues/909). This will result in Karpenter drifting whole clusters as the `NodePool` CRD will be updated and the `NodePool` hash will be re-calculated. The aws team uses the same drift mechanism for the EC2NodeClas and has run into serval issue. The aws team is blocked from fixing issue [#5447](https://github.com/aws/karpenter-provider-aws/issues/5447). The fix to the issue is adding volumeSize in the hashing mechanism of the `EC2NodeClass` (#5454)[https://github.com/aws/karpenter-provider-aws/pull/5454]. However, this will cause all `NodeClaims` to be drifted as soon as a customer upgrades their Karpenter CRDs in their cluster. These two issues raise the need for Karpenter to understand different versions of the CRDs during a drift evaluation. The problem is that all nodes on a customer’s cluster will be drifted without any changes to the node configuration | ||
|
||
More generally, Karpenter shouldn't induce drift when Karpenter upgrade introduces these changes: | ||
|
||
1. A `NodePool` adds default values to existing fields that are included in the hash calculations | ||
2. A `NodePool` field is added to the hash calculation with an already-set value | ||
3. A `NodePool` field is removed from the hash calculations | ||
|
||
## Consideration | ||
|
||
Karpenter currently has special cases of drift for certain fields and behavior fields. These fields do not rely on the hashing mechanism, so hash versioning will not make any changes to these fields. | ||
|
||
## Proposed changes | ||
|
||
#### Option 1: Add a CRD hash in the existing drift hash | ||
|
||
``` | ||
apiVersion: karpenter.sh/v1beta1 | ||
kind: NodePool | ||
metadata: | ||
name: default | ||
annotations: | ||
karpenter.sh/nodepool-hash: abcdef | ||
karpenter.sh/nodepool-hash-version: v1 | ||
... | ||
--- | ||
apiVersion: karpenter.sh/v1beta1 | ||
kind: Nodeclaim | ||
metadata: | ||
name: default | ||
annotations: | ||
karpenter.sh/nodepool-hash: abcdef | ||
karpenter.sh/nodepool-hash-version: v1 | ||
... | ||
``` | ||
|
||
As the Karpenter CRDs change, Karpenter’s underlying hashing function may also change. To handle the upgrade path, Karpenter will add a `nodepool-hash-version` to the `NodePool` CRD. The `nodepool-hash-version` will need to be updated manually when the conditions described above. Karpenter will propagate the annotation down from the `NodePool` CRD onto the `NodePool` and `NodeClaim`. Karpenter will assert that this `nodepool-hash-version` annotation is equivalent on both the `NodePool` and `NodeClaim` before doing a drift validation. | ||
|
||
As a result, two annotations will be used analyze and orchestrate drift: | ||
|
||
* nodepool-hash: (hash of the nodeclaim template on the nodepool) | ||
* nodepool-hash-version: (the NodePool CRD version defined by Karpenter) | ||
|
||
As part of option 1, Karpenter will need to expanded its `clusterrole` to have read permissions for the CRDs. The `clusterrole` permission will be scoped down to being on the `NodePool`. This will look as such: | ||
|
||
``` | ||
... | ||
- apiGroups: ["apiextensions.k8s.io"] | ||
resources: ["customresourcedefinitions"] | ||
verbs: ["get", "watch"] | ||
resourceNames: ["nodepools.karpenter.sh"] | ||
... | ||
``` | ||
|
||
### Option 1A(Recommended): Fallback to rehashing on the NodeClaim | ||
|
||
Option 1 will have two case for drift: | ||
|
||
1. When the CRD `nodepool-hash-version` matches between the `NodePool` and `NodeClaim`: | ||
1. Drift will work as normal and the `NodePool` hash will be compared to evaluate for drift | ||
2. When the CRD `nodepool-hash-version` does not match between the `NodePool` and `NodeClaim`: | ||
1. Karpenter would recalculate the hash of the `Nodepool` and back-propagate the annotation to the NodeClaims | ||
- Karpenter will need not back-propagate the `nodepool-crd-hash` annotation `NodeClaims` that are considered drifted. | ||
2. Drift will work as normal once a `NodePool` field is updated | ||
|
||
Pros/Cons | ||
|
||
- 👍👍 Simple implementation effort | ||
- 👍 Cover general cases of updates to the `NodePool` CRD or the hashing mechanism | ||
- 👎 Nodes that were considered drifted before the Karpenter upgrade are not able to be un-drifted after a Karpenter upgrade | ||
- 👎 Does not handle cases where users change `NodePool` resources as Karpenter is updating | ||
|
||
### Option 1B: Fallback to individual field evaluation for drift | ||
|
||
Similar to Option 1, there will be two cases for drift: | ||
|
||
1. When the `nodepool-hash-version` matches between the `NodePool` and `NodeClaim`: | ||
1. Drift will work as normal and the `NodePool` hash will be compared to evaluate for drift | ||
2. When the `nodepool-hash-version` does not match between the `NodePool` and `NodeClaim`: | ||
1. Karpenter will individually check each field for drift. *This would be an adaptation of the (Deployment Controller)[https://github.com/kubernetes/kubernetes/blob/83e6636096548f4b49a94ece6bc4a26fd67e4449/pkg/controller/deployment/util/deployment_util.go#L603] for podTemplateSpec relationship to pods.* | ||
2. Karpenter will need to populate the `NodeClaim` with values that were used in launching the instances. Drift evaluation will need a point of reference on manual checking. | ||
|
||
Pros/Cons | ||
|
||
- 👍 Cover general cases of updates to the `NodePool` CRD or the hashing mechanism | ||
- 👍 Handles cases where users change `NodePool` resources as Karpenter is updating | ||
- 👎 Cloud-provider would need to propagate on to the NodeClaim to adopt this method for static drift checking | ||
- 👎👎 Negatively impact on performance for Karpenter until all NodeClaims match the `nodepool-hash-version` | ||
- 👎👎 A high maintenance effort required in supporting individual field evaluation | ||
|
||
### Option 2: Update the NodePool API Version and utilize Conversion webhooks to update the `nodepool-hash` | ||
|
||
The Karpenter APIVersion of the CRDs will be changed each time their is a change as such: | ||
|
||
1. A `NodePool` adds default values to existing fields that are included in the hash calculations | ||
2. A `NodePool` field is added to the hash calculation with an already-set value | ||
3. A `NodePool` field is removed from the hash calculations | ||
|
||
As part of retrieving the `NodePool` from the API Server, a converion webhook would recaluclate the hash and propagte the new hash to the `NodeClaims`. | ||
|
||
Pros/Cons | ||
- 👍 No Additional permission required on the Karpenter `clusterrole` | ||
- 👍 Cover general cases of updates to the `NodePool` CRD or the hashing mechanism | ||
- 👎 Maintain conversion webhooks | ||
- 👎👎 In the case of adding a new field with defaults, the API version would be bumped for non breaking changes | ||
|
||
### Option 3: Add Versioned Hash Annotations | ||
|
||
``` | ||
apiVersion: karpenter.sh/v1beta1 | ||
kind: NodePool | ||
metadata: | ||
name: default | ||
annotations: | ||
karpenter.sh/nodepool-hash: abcdefj | ||
karpenter.sh/nodepool-hash-v1: ghijkl | ||
karpenter.sh/nodepool-hash-v2: mnopqr | ||
karpenter.sh/nodepool-hash-v3: stuvwx | ||
... | ||
--- | ||
apiVersion: karpenter.sh/v1beta1 | ||
kind: Nodeclaim | ||
metadata: | ||
name: default | ||
annotations: | ||
karpenter.sh/nodepool-hash: abcdefj | ||
karpenter.sh/nodepool-hash-v1: ghijkl | ||
karpenter.sh/nodepool-hash-v2: mnopqr | ||
karpenter.sh/nodepool-hash-v3: stuvwx | ||
... | ||
``` | ||
|
||
On a Karpenter upgrade, if the hashing mechanism has been updated to contain a new version of the CRDs, a new annotation hash will be added on to the `NodePool`. The new hash will then be back-propagated to the `NodeClaims` that are owned by `NodePool`. During drift evaluation, Karpenter will make sure that all hashes on the `NodeClaims` match the hash present on `NodePool`. If the values do not match, the `NodeClaims` will be drifted. | ||
|
||
Prior to every release, the Karpenter team will validate to see if a new version of the drift hash will be needed based on changes to the CRDs. Each `NodePool` hash annotation will be versioned. We will need to implement proper testing to catch CRD changes that would be breaking and validate that the drift works as expected. | ||
|
||
As a result, multiple annotations can potentially be used to analyze and orchestrate drift: | ||
|
||
* karpenter.sh/nodepool-hash-v...: (nodepool-hash) | ||
|
||
Pros/Cons | ||
|
||
- 👍 Cover general cases of updates to the `NodePool` CRD or the hashing mechanism | ||
- 👎 Nodes that were considered drifted before the Karpenter upgrade are not able to be un-drifted after a Karpenter upgrade | ||
- 👎👎 Does not handle cases where users change `NodePool` while Karpenter is updating | ||
|
||
### Option 4: Switch to individual field evaluation for drift | ||
|
||
Another option Karpenter to switch completely to checking each field on the `NodePool` against each field on the `NodeClaim`. This option will assume that all fields that used to validate drift will be found on the `NodeClaim`. | ||
|
||
Pros/Cons | ||
|
||
- 👍 Consistent drift behavior for upgrading the Karpenter CRDs and standard drift evaluation | ||
- 👎 Cloud-provider would need to propagate on to the NodeClaim to adopt this method for static drift checking | ||
- 👎👎 A high maintenance effort required in supporting individual field evaluation | ||
- 👎👎👎 Karpenter will experience a performance decrease as we will need to compare each field of the `NodePool` To each field of the `NodeClaim` | ||
|
||
## Recommended | ||
|
||
The main goal we are trying to achieve for drift hash versioning is stability on node churn for clusters during a Karpenter upgrade with updates to the `NodePool` CRDs. All four options will prevent Karpenter from automatically drifting nodes during a Karpenter upgrade. Option 1A is recommended. The advantage of Option 1A is that it would give Karpenter the ability to correctly identify drifted nodes when `NodePool` are changed during a Karpenter upgrade. This option also allows us to make updates to Karpenter for future cases without any additional API surface changes. However, customers using deployment tools such as Terraform could run into a race condition, where Karpenter would consider `NodeClaims` not drifted, if they update their `NodePools` while also upgrading Karpenter. The team will mitigate this issue by warning and documenting this side effect to users. | ||
|
||
## Rollback Concerns | ||
|
||
During rollback to older version of v1beta1 Karpenter that does not contain a method of drift hash versioning, we will need to back-port changes that will prevent drift from not working as expected. Since the proposed changes require altering the format of the current drift hash, older versions of Karpenter do not support any updates to the drift hash on the NodeClaim. Once users rollback to an older version of Karpenter, all of their nodes will be considered drifted. The approach going forward will be to back-port the hash-versioning changes to prior release of Karpenter, and document to customers that this is version of Karpenter that they can rollback to have drift work as expected. | ||
|
||
|
||
## Future Implications | ||
|
||
By the addition of the drift versioning hash, several issues will have solution paths. With the addition of versioned hashes, defaults of `Kind` and `apiVersion` on the `nodeClassRef` (#909)[https://github.com/kubernetes-sigs/karpenter/issues/909] can be added. **We will need to coordinate with the Azure team to make sure that `Kind` and `apiVersion` are also set on the (karpetner-provider-azure)[https://github.com/Azure/karpenter-provider-azure]**.These additions also allow the team to add improvements on the `NodeClass` drift checking (#377)[https://github.com/kubernetes-sigs/karpenter/issues/337] and a path to surfacing health conditions on resources (#493)[https://github.com/kubernetes-sigs/karpenter/issues/493]. |