copyright | lastupdated | keywords | subcollection | ||
---|---|---|---|---|---|
|
2023-01-30 |
kubernetes, node scaling, ca, autoscaler |
containers |
{{site.data.keyword.attribute-definition-list}}
{: #cluster-scaling-deploy-apps}
To limit a pod deployment to a specific worker pool that is managed by the cluster autoscaler, use a combination of labels and nodeSelector
or nodeAffinity
to deploy apps only to the autoscaled worker pools. With nodeAffinity
, you have more control over how the scheduling behavior works to match pods to worker nodes. Then, use taints and tolerations so that only these apps can run on your autoscaled worker pools.
{: shortdesc}
For more information, see the following Kubernetes docs:
- Assigning pods to nodes{: external}
- Taints and tolerations{: external}
Before you begin:
- Install the
ibm-iks-cluster-autoscaler
plug-in. - Log in to your account. If applicable, target the appropriate resource group. Set the context for your cluster.
To limit pods to run on certain autoscaled worker pools:
-
Make sure that you labeled and tainted your autoscaled worker pool as described in Preparing your cluster for autoscaling.
-
In your pod spec template, match the
nodeSelector
ornodeAffinity
to the label that you used in your worker pool.Example of
nodeSelector
:... spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent nodeSelector: app: nginx ...
{: codeblock}
Example of
nodeAffinity
:spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: use operator: In values: - autoscale
{: codeblock}
-
In your pod spec template, match the
toleration
to the taint you set on your worker pool.Example
NoExecute
toleration:tolerations: - key: use operator: "Exists" effect: "NoExecute"
{: codeblock}
-
Deploy the pod. Because of the matching label, the pod is scheduled onto a worker node that is in the labeled worker pool. Because of the matching toleration, the pod can run on the tainted worker pool.
kubectl apply -f pod.yaml
{: pre}
{: #ca_scaleup}
As described in the Understanding how the cluster autoscaler works topic and the Kubernetes Cluster Autoscaler FAQs{: external}, the cluster autoscaler scales up your worker pools in response to your requested resources of the workload against the available recourses of the worker pool. However, you might want the cluster autoscaler to scale up worker nodes before the worker pool runs out of resources. In this case, your workload does not need to wait as long for worker nodes to be provisioned because the worker pool is already scaled up to meet the resource requests. {: shortdesc}
The cluster autoscaler does not support early scaling (overprovisioning) of worker pools. However, you can configure other Kubernetes resources to work with the cluster autoscaler to achieve early scaling.
{: #pause-pods-ca}
You can create a deployment that deploys pause containers{: external} in pods with specific resource requests, and assign the deployment a low pod priority. When these resources are needed by higher priority workloads, the pause pod is preempted and becomes a pending pod. This event triggers the cluster autoscaler to scale up. {: shortdesc}
For more information about setting up a pause pod deployment, see the Kubernetes FAQ{: external}. You can use this example overprovisioning configuration file{: external} to create the priority class, service account, and deployments.
If you use this method, make sure that you understand how pod priority works and how to set pod priority for your deployments. For example, if the pause pod does not have enough resources for a higher priority pod, the pod is not preempted. The higher priority workload remains in pending, so the cluster autoscaler is triggered to scale up. However, in this case, the scale-up action is not early because the workload that you want to run can't be scheduled because of insufficient resources. Pause pod must have the matching nodeAffinity
or nodeSelector
as well as the matching tolerations that you set for your worker pool.
{: note}
{: #hpca}
Because horizontal pod autoscaling is based on the average CPU usage of the pods, the CPU usage limit that you set is reached before the worker pool runs out of resources. {: shortdesc}
More pods are requested, which then triggers the cluster autoscaler to scale up the worker pool. For more information about setting up HPA, see the Kubernetes docs{: external}.