Skip to content

Commit

Permalink
drift hash versioning doc
Browse files Browse the repository at this point in the history
  • Loading branch information
engedaam committed Feb 15, 2024
1 parent 0a5e3ce commit 350ef45
Showing 1 changed file with 162 additions and 0 deletions.
162 changes: 162 additions & 0 deletions designs/drift-hash-versioning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# Karpenter Drift Hash Versioning with CRD

## Summary of Drift Hashing Mechanism

Drift utilizes a hashing mechanism to identify when a `NodeClaim` has been drifted from its owning `NodePool/EC2NodeClass`. Karpenter will hash fields that are within the `NodePool.Spec/EC2NodeClass.Spec`, excluding [special cases of drift](https://karpenter.sh/docs/concepts/disruption/#special-cases-on-drift) or [behavior fields](https://karpenter.sh/docs/concepts/disruption/#behavioral-fields). The `NodePool/EC2NodeClass` will be annotated with the hash. When a `NodeClaim` is created, the `NodePool` and `EC2NodeClass` hashes will be annotated on the `NodeClaim`. More details can be found in the [drift documentation](https://karpenter.sh/docs/concepts/disruption/#drift).

When a `NodePool/EC2NodeClass` is updated, a new hash will be annotated on the `NodePool/EC2NodeClass`. The `NodeClaim` disruption controller will periodically check if the `NodePool/EC2NodeClass` hash match to determine when a `NodeClaim` is drifted.

## Background

Karpenter's static drift mechanism works well on additive fields, but the hash will change for all nodes when a field is removed or a new default is set. Currently, the team is looking to set defaults for the fields kind and apiVersion on the nodeClassRef of the nodepool.spec.template.spec [#909](https://github.com/kubernetes-sigs/karpenter/issues/909). This will result in Karpenter drifting whole clusters as the `NodePool` CRD will be updated and the `NodePool/EC2NodeClass` hash will be re-calculated. The team is also blocked from fixing issue [#5447](https://github.com/aws/karpenter-provider-aws/issues/5447). The fix to the issue is adding volumeSize in the hashing mechanism of the `EC2NodeClass` (#5454)[https://github.com/aws/karpenter-provider-aws/pull/5454]. However, this will cause all `NodeClaims` to be drifted as soon as a customer upgrades their Karpenter CRDs in their cluster. These two issues raise the need for Karpenter to understand different versions of the `NodePool/EC2NodeClass` CRD during a drift evaluation. The problem is that all nodes on a customer’s cluster will be drifted without any changes to the node configuration

More generally, Karpenter will need to guard against the following cases during a Karpenter upgrade:

1. A `NodePool/EC2NodeClass` adds default values to existing fields that are included in the hash calculations
2. A `NodePool/EC2NodeClass` field is added to the hash calculation with an already-set value
3. A `NodePool/EC2NodeClass` field is removed from the hash calculations

## Consideration

Karpenter currently has special cases of drift for certain fields and behavior fields. These fields do not rely on the hashing mechanism, so hash versioning will not make any changes to these fields.

## Proposed changes

#### Add a CRD hash in the existing drift hash

```
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
annotations:
karpenter.sh/nodepool-hash: abcdefj-ghijkl
...
---
apiVersion: karpenter.sh/v1beta1
kind: EC2NodeClass
metadata:
name: default
annotations:
karpenter.k8s.aws/ec2nodeclass-hash: mnopqr-stuvwx
...
---
apiVersion: karpenter.sh/v1beta1
kind: Nodeclaim
metadata:
name: default
annotations:
karpenter.sh/nodepool-hash: abcdef-ghijkl
karpenter.k8s.aws/ec2nodeclass-hash: mnopqr-stuvwx
...
```

As the Karpenter CRDs change, Karpenter’s underlying hashing function may also change. To handle the upgrade path, Karpenter will prepend a hash of the CRD to the customer resource spec hash. Karpenter will assert that this struct version is equivalent before checking the customer resource spec hash.

As a result, two annotations will be used analyze and orchestrate drift:

* nodepool-hash: (nodepool-crd-version)-(nodepool-hash)
* ec2nodeclass-hash: (ec2nodeclass-crd-version)-(ec2nodeclass-hash)

### Option 1(Recommended): Fallback to rehashing on the NodeClaim

Similar to option A, there will be two case for drift:

1. When the CRD `nodepool-crd-version/ec2nodeclass-crd-version` matches
1. Drift will work as normal and the `NodePool/EC2NodeClass` hash will be compared to evaluate for drift
2. When the CRD `nodepool-crd-version/ec2nodeclass-crd-version` does not match
1. Karpenter would recalculate the hash of the `Nodepool/EC2NodeClass` and back-propagate the annotation to the NodeClaims
2. Karpenter will need to lock drift on `NodeClaims` that were considered drifted prior to the Karpenter upgrade.

Pros/Cons

- 👍👍 Simple implementation effort
- 👍 Cover general cases of updates to the NodePool/EC2NodeClass CRD or the hashing mechanism
- 👎 Nodes that were considered drifted before the Karpenter upgrade are not able to be un-drifted after a Karpenter upgrade
- 👎 Does not handle cases where users change NodePool/EC2NodeClass resources as Karpenter is updating

### Option 2: Fallback to individual field evaluation for drift

There will be two cases for struct-based hash evaluation:

1. When the `nodepool-crd-version/ec2nodeclass-crd-version` matches
1. Drift will work as normal and the `NodePool/EC2NodeClass` hash will be compared to evaluate for drift
2. When the `nodepool-crd-version/ec2nodeclass-crd-version` does not match
1. Karpenter will individually check each field for drift. *This would be an adaptation of the (Deployment Controller)[https://github.com/kubernetes/kubernetes/blob/83e6636096548f4b49a94ece6bc4a26fd67e4449/pkg/controller/deployment/util/deployment_util.go#L603] for podTemplateSpec relationship to pods.*
2. Karpenter will need populate the `NodeClaim` with values that was used in launching the instances. Drift evaluation will need a point of reference on manual checking.

Pros/Cons

- 👍 Cover general cases of updates to the `NodePool/EC2NodeClass` CRD or the hashing mechanism
- 👍 Handles cases where users change `NodePool/EC2NodeClass` resources as Karpenter is updating
- 👎👎 A lot of implementation and maintenance effort required in supporting individual field evaluation

### Option 3: Add Versioned Hash Annotations

```
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
annotations:
karpenter.sh/nodepool-hash: abcdefj
karpenter.sh/nodepool-hash-v1: ghijkl
karpenter.sh/nodepool-hash-v2: mnopqr
karpenter.sh/nodepool-hash-v3: stuvwx
...
---
apiVersion: karpenter.sh/v1beta1
kind: EC2NodeClass
metadata:
name: default
annotations:
karpenter.k8s.aws/ec2nodeclass-hash: abcdefj
karpenter.k8s.aws/ec2nodeclass-hash-v1: ghijkl
karpenter.k8s.aws/ec2nodeclass-hash-v2: mnopqr
karpenter.k8s.aws/ec2nodeclass-hash-v3: stuvwx
...
---
apiVersion: karpenter.sh/v1beta1
kind: Nodeclaim
metadata:
name: default
annotations:
karpenter.sh/nodepool-hash: abcdefj
karpenter.sh/nodepool-hash-v1: ghijkl
karpenter.sh/nodepool-hash-v2: mnopqr
karpenter.sh/nodepool-hash-v3: stuvwx
karpenter.k8s.aws/ec2nodeclass-hash: abcdefj
karpenter.k8s.aws/ec2nodeclass-hash-v1: ghijkl
karpenter.k8s.aws/ec2nodeclass-hash-v2: mnopqr
karpenter.k8s.aws/ec2nodeclass-hash-v3: stuvwx
...
```

On a Karpenter upgrade, if the hashing mechanism has been updated to contain a new version of the CRDs, a new annotation hash will be added on to the `NodePool/EC2NodeClass`. The new hash will then be back-propagated to the `NodeClaims` that are owned by `NodePool/EC2NodeClass`. During drift evaluation, Karpenter will make sure that all hashes on the `NodeClaims` match the hash present on `NodePool/EC2NodeClass`. If the values do not match, the `NodeClaims` will be drifted.

Prior to every release, the Karpenter team will validate to see if a new version of the drift hash will be needed based on changes to the CRDs. Each `NodePool/EC2NodeClass` hash annotation will be versioned. We will need to implement proper testing to catch CRD changes that would be breaking and validate that the drift works as expected.

As a result, multiple annotation can potentially be used analyze and orchestrate drift:

* karpenter.sh/nodepool-hash-v...: (nodepool-hash)
* karpenter.k8s.aws/ec2nodeclass-hash-v...: (ec2nodeclass-hash)

Pros/Cons

- 👍 Cover general cases of updates to the `NodePool/EC2NodeClass` CRD or the hashing mechanism
- 👎 Nodes that were considered drifted before the Karpenter upgrade are not able to be un-drifted after a Karpenter upgrade
- 👎👎 Does not handle cases where users change `NodePool/EC2NodeClass` while Karpenter is updating

## Recommended

The main goal we are trying to achieve for drift hash versioning is stability on node churn for clusters during a Karpenter upgrade with updates to the `NodePool/EC2NodeClass` CRDs. All three options will prevent Karpenter from automatically drifting nodes during a Karpenter upgrade. Option 1 is recommended. The advantage of Option 1 is that it would give Karpenter the ability to correctly identify drifted nodes when `NodePool/EC2NodeClass` are changed during a Karpenter upgrade. This option also allows us to make updates to Karpenter for future cases without any additional API surface changes. However, customers using deployment tools such as Terraform could run into a race condition, where Karpenter would consider `NodeClaims` not drifted, if they update their `NodePools/EC2NodeClasses` while also upgrading Karpenter. The team will mitigate this issue by warning and documenting this side effect to users.

## Rollback Concerns

During rollback to older version of v1beta1 Karpenter that does not contain a method of drift hash versioning, we will need to back-port changes that will prevent drift from not working as expected. Since the proposed changes require altering the format of the current drift hash, older versions of Karpenter do not support any updates to the drift hash on the NodeClaim. This might require customers to roll some or all their nodes on their clusters. The approach going forward will be to back-port the hash-versioning changes to prior release of Karpenter, and document to customers that this is version of Karpenter that they can rollback to have drift work as expected.


## Future Implications

By the addition of the drift versioning hash, several issues will the have solution paths. With the addition of versioned hashes, defaults of `Kind` and `apiVersion` on the `nodeClassRef` (#909)[https://github.com/kubernetes-sigs/karpenter/issues/909] can be added. **We will need to coordinate with the Azure team to make sure that `Kind` and `apiVersion` are also set on the (karpetner-provider-azure)[https://github.com/Azure/karpenter]**.This additions also allow the team to add improvements on the `NodeClass` drift checking (#377)[https://github.com/kubernetes-sigs/karpenter/issues/337] and a path to surfacing health conditions on resources (#493)[https://github.com/kubernetes-sigs/karpenter/issues/493].

0 comments on commit 350ef45

Please sign in to comment.