Scale up storage capacity #1

Flask · 2022-09-23T08:33:49Z

Which component this PR applies to?

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Whether a pod has unbound volumes influences scheduling decisions and thus the scale up decisions in cluster autoscaler. These three new test cases cover: - a pod with an unbound pvc using late binding -> can scale up - the same with storage capacity feature enabled -> cannot scale up without CSIStorageCapacity - the same with manually configured CSIStorageCapacity -> can scale up

The assumption that all node labels except for the hostname label can be copied verbatim does not hold for CSI drivers which manage local storage: those drivers have a topology label where the value also depends on the hostname. It might be the same as the Kubernetes hostname, but that cannot be assumed. To solve this, search/replace with regular expressions can be defined to modify those labels. This then can be used to inform the autoscaler about available capacity on new nodes: --replace-labels ';^topology.hostpath.csi/node=aks-workerpool.*;topology.hostpath.csi/node=aks-workerpool-template;' kubectl apply -f - <<EOF apiVersion: storage.k8s.io/v1beta1 kind: CSIStorageCapacity metadata: name: aks-workerpool-fast-storage namespace: kube-system capacity: 100Gi maximumVolumeSize: 100Gi nodeTopology: matchLabels: # This never matches a real node, only the node templates # inside cluster-autoscaler. topology.hostpath.csi/node: aks-workerpool-template storageClassName: csi-hostpath-fast EOF

When a new node becomes ready, a CSI driver is not going to be running on it immediately. This can cause the cluster autoscaler to scale up once more because of pending pods that can run on that new node once the driver is ready. The actual check is about CSIStorageCapacity. By comparing the published information about the new node against the information for a template node, we can determine whether the CSI driver is done with starting up on the node. The new CSI processor needs information about existing CSIStorageCapacity objects in the cluster, just like the scheduler predicate. Both can share the same informer. For that to work, managing the informer factory must be moved up the call chain so that the setup code for both can use the same factory.

Update CA_with_AWS_IAM_OIDC.md

pohly and others added 8 commits September 20, 2022 14:59

fix some stuff

8771fb8

typo

d7122bc

more fixes

bcb3f8d

context

8293808

versions

da78d17

mweibel pushed a commit that referenced this pull request Apr 17, 2024

Merge pull request #1 from jatinbedi/patch-1

270cd3d

Update CA_with_AWS_IAM_OIDC.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale up storage capacity #1

Scale up storage capacity #1

Flask commented Sep 23, 2022

Scale up storage capacity #1

Are you sure you want to change the base?

Scale up storage capacity #1

Conversation

Flask commented Sep 23, 2022

Which component this PR applies to?

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: