Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale up storage capacity #1

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Commits on Sep 20, 2022

  1. scale up: tests for pods with volumes

    Whether a pod has unbound volumes influences scheduling decisions and
    thus the scale up decisions in cluster autoscaler.
    
    These three new test cases cover:
    - a pod with an unbound pvc using late binding -> can scale up
    - the same with storage capacity feature enabled -> cannot scale up
      without CSIStorageCapacity
    - the same with manually configured CSIStorageCapacity -> can scale up
    pohly authored and Flask committed Sep 20, 2022
    Configuration menu
    Copy the full SHA
    212e133 View commit details
    Browse the repository at this point in the history
  2. cluster-autoscaler: support modifying node labels

    The assumption that all node labels except for the hostname label can be copied
    verbatim does not hold for CSI drivers which manage local storage: those
    drivers have a topology label where the value also depends on the hostname. It
    might be the same as the Kubernetes hostname, but that cannot be assumed.
    
    To solve this, search/replace with regular expressions can be defined to modify
    those labels. This then can be used to inform the autoscaler about available
    capacity on new nodes:
    
       --replace-labels ';^topology.hostpath.csi/node=aks-workerpool.*;topology.hostpath.csi/node=aks-workerpool-template;'
    
       kubectl apply -f - <<EOF
    apiVersion: storage.k8s.io/v1beta1
    kind: CSIStorageCapacity
    metadata:
      name: aks-workerpool-fast-storage
      namespace: kube-system
    capacity: 100Gi
    maximumVolumeSize: 100Gi
    nodeTopology:
      matchLabels:
        # This never matches a real node, only the node templates
        # inside cluster-autoscaler.
        topology.hostpath.csi/node: aks-workerpool-template
    storageClassName: csi-hostpath-fast
    EOF
    pohly authored and Flask committed Sep 20, 2022
    Configuration menu
    Copy the full SHA
    4773b3a View commit details
    Browse the repository at this point in the history
  3. filter out nodes waiting for CSI driver

    When a new node becomes ready, a CSI driver is not going to be running on it
    immediately. This can cause the cluster autoscaler to scale up once more
    because of pending pods that can run on that new node once the driver is ready.
    
    The actual check is about CSIStorageCapacity. By comparing the published
    information about the new node against the information for a template node, we
    can determine whether the CSI driver is done with starting up on the node.
    
    The new CSI processor needs information about existing CSIStorageCapacity
    objects in the cluster, just like the scheduler predicate. Both can share the
    same informer. For that to work, managing the informer factory must be moved up
    the call chain so that the setup code for both can use the same factory.
    pohly authored and Flask committed Sep 20, 2022
    Configuration menu
    Copy the full SHA
    1ca91c0 View commit details
    Browse the repository at this point in the history
  4. fix some stuff

    Flask committed Sep 20, 2022
    Configuration menu
    Copy the full SHA
    8771fb8 View commit details
    Browse the repository at this point in the history
  5. typo

    Flask committed Sep 20, 2022
    Configuration menu
    Copy the full SHA
    d7122bc View commit details
    Browse the repository at this point in the history

Commits on Sep 21, 2022

  1. more fixes

    Flask committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    bcb3f8d View commit details
    Browse the repository at this point in the history
  2. context

    Flask committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    8293808 View commit details
    Browse the repository at this point in the history
  3. versions

    Flask committed Sep 21, 2022
    Configuration menu
    Copy the full SHA
    da78d17 View commit details
    Browse the repository at this point in the history