Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAASTA-17941: add topology spread constraints option #3641

Merged

Conversation

gmdfalk
Copy link
Contributor

@gmdfalk gmdfalk commented Jul 3, 2023

Ticket: PAASTA-17941

Problem

On EKS, we cannot configure the default Kubernetes scheduler with pod topology constraints.
But to support the Karpenter migration and to be able to tune our spread of Pods across zones and nodes, we want to be able to configure cluster wide topology constraints via PaaSTA.

Solution

Add a topology_spread_constraints option to the PaaSTA system config that allows defining rules to spread Pods across topologies per cluster.
For example to try spreading Pods evenly across both nodes and availability zones, we'd set:

topology_spread_constraints:
    - max_skew: 1
      topology_key: "topology.kubernetes.io/zone"
      when_unsatisfiable: "ScheduleAnyway"
    - max_skew: 1
      topology_key: "kubernetes.io/hostname"
      when_unsatisfiable: "ScheduleAnyway"

This can be configured once per cluster (or globally) and will be added to every Pod Spec template (i.e. both Deployments and StatefulSets), using paasta.yelp.com/service and paasta.yelp.com/instance as label selectors.

Future Work

There is a potentially conflicting interaction of this new configuration option with deploy_whitelist and deploy_blacklist because those use node affinities and could constrain a deployment to one specific habitat/AZ while the topology spread constraint might be configured to spread Pods across multiple AZs. If when_unsatisfiable: "DoNotSchedule" is set, this would lead to Pods being unable to get scheduled.

For now, this will be handled by not defining any default spread constraints and only using ScheduleAnyway but we'll probably pose a follow-up PR to change the implementation of the whitelist/blacklist options from node affinities (which are somewhat expensive anyway) to topology spread constraints and give priority to the whitelist/blacklist options, if defined.

Signed-off-by: Max Falk [email protected]

@gmdfalk gmdfalk marked this pull request as ready for review July 3, 2023 10:02
gmdfalk added 14 commits July 3, 2023 17:02
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
Signed-off-by: Max Falk <[email protected]>
@gmdfalk gmdfalk merged commit 511c718 into master Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants