CAPI Cluster Autoscaling with Machine Deployments #1376

alex-dabija · 2022-09-07T09:15:07Z

Story

-As a cluster admin, I want a CAP(O|G|V|VCD) cluster to autoscale depending on the required resources in order to ensure the stability of applications running on the cluster.

Background

CAP(O|G|V|VCD) cluster don't have support for machine pools, which means we can't leverage the cloud provider's support for autoscaling. Machine deployments are supported and cluster-autoscaler does have support for them.

Giant Swarm has been running the cluster-autoscaler on the workload cluster in order to enable the cluster to still scale in case the communication between the management cluster and the workload cluster is severed.

Unfortunately, the cluster-autoscaler needs to run on the management cluster in order to be able to update the workload cluster's machine deployement resources.

Resources

Cluster Autoscaler documentation;

Stories

Give feedback

Cluster Autoscaler Investigation #2618

area/kaas team/turtles
Upgrade cluster-api apps #2691

area/kaas team/turtles
Implement cluster autoscaling in machine deployments #2693

Ready area/kaas team/turtles
Create cluster autoscaler e2e test suite for machine deployments #2695

Ready area/kaas team/turtles
cluster-autoscaler-app should validate values such as newPodScaleUpDelay #2640

area/kaas team/turtles
https://github.com/giantswarm/giantswarm/issues/27571
Options

The text was updated successfully, but these errors were encountered:

cornelius-keller · 2022-09-27T13:17:36Z

@alex-dabija do we also need a story about making the schedular configurable to optimize for better usage of existing nodes before creating new ones? how is this currently handled in the cloud installations?

alex-dabija · 2022-09-28T12:32:54Z

We don't have any special configuration for the Kubernetes scheduler to optimize for better usage. We do have the vertical pod autoscaler running by default on vintage clusters which might have an impact on how many pods are running on a node.

This story is only meant to get the cluster autoscaler running on the management cluster to handle autoscaling of machine deployments.

I wouldn't worry for now about packing pods more efficiently on the nodes. I would just keep things simple for now.

gawertm · 2022-11-16T08:58:08Z

@alex-dabija I changed ownership from Hydra to Rocket, as most likely Rocket will implement first. We are already looking at the documentation and trying it out

gawertm · 2022-11-30T07:03:46Z

blocked until https://github.com/giantswarm/giantswarm/issues/23820

Rotfuks · 2022-12-13T11:17:42Z

Done in Clippy via: #1793

brinker211 · 2022-12-21T14:00:57Z

@alex-dabija I see the new tickets for CAPA and CAPZ. Is there still a ticket for CAPG? This was the provider this ticket originally started with.

brinker211 · 2023-01-12T12:07:54Z

@gawertm is the referenced internal ticket https://github.com/giantswarm/giantswarm/issues/23820 for CAPG? I see Hydra and Clippy referenced for CAPA and CAPZ but looking for the status of CAPG as well. Thanks!

gawertm · 2023-01-19T09:09:58Z

hi @brinker211 yes this was for CAPG but also for the Rocket providers. We initially planned to help Hydra with that. Eventhough its not a Rocket priority anymore

gawertm · 2023-05-15T10:49:40Z

@Rotfuks would the autoscaler topic go Turtles? lets move it form rocket backlog then :)
cc @alex-dabija

alex-dabija · 2023-06-01T10:01:21Z

@Rotfuks would the autoscaler topic go Turtles? lets move it form rocket backlog then :) cc @alex-dabija

Yes, Turtles makes sense. It would be great to have a common autoscaling solution for all providers (at least for CAPZ, CAPV, CAPVCD). CAPA is still using machine pools which require a different implementation.

Rotfuks · 2023-06-26T14:54:49Z

Here's the information transfer from the CAPZ specific Autoscaling Epic:

We already had a first discussion on it: https://gigantic.slack.com/archives/C04887ZSU20/p1670926088947149
Conclustion:
All infos around cluster-autoscaler in machinepools issues can be found here:
https://github.com/giantswarm/giantswarm/issues/19313

We should document all further technical infos into that ticket
CAPZ already has a feature merged for managed machine pools:
When creating AKS clusters using autoscaler enabled, do not make an update api call to agentpool service based on difference in node count kubernetes-sigs/cluster-api-provider-azure#2444
But it is currently being refactored to match the overall CAPI approach: ✨ MachinePool annotation for externally managed autoscaler kubernetes-sigs/cluster-api#7107
CAPZ already has an issue open for support of autoscaling on unmanaged machine pools:
feat: respect externally managed annotation on unmanaged MachinePools kubernetes-sigs/cluster-api-provider-azure#2588
Let's also remember to double check scaling to 0 support when the code support lands
Helpful Slack Thread in the CAPZ upstream community: https://kubernetes.slack.com/archives/CEX9HENG7/p1674083817745989
As Vintage is looking into Karpenter (https://github.com/giantswarm/flexshopper/issues/359) this might also be an interesting tool for this capability.

alex-dabija · 2023-08-22T07:29:18Z

@Rotfuks I would keep this issue scoped to autoscaling for machine deployments because of our current focus on CAPA. I moved the #2692 to #1798. It should make it easier to track the status of CAPA for our first release.

alex-dabija added team/hydra area/kaas Mission: Cloud Native Platform - Self-driving Kubernetes as a Service kind/story topic/capi provider/cluster-api-gcp Cluster API based running on GCP labels Sep 7, 2022

cornelius-keller changed the title ~~CAPG cluster autoscaling~~ CAPI cluster autoscaling Sep 23, 2022

cornelius-keller changed the title ~~CAPI cluster autoscaling~~ CAPI cluster autoscaling MachinePools Sep 23, 2022

cornelius-keller changed the title ~~CAPI cluster autoscaling MachinePools~~ CAPI cluster autoscaling MachineDeployments Sep 23, 2022

alex-dabija added the provider/openstack Related to provider OpenStack label Sep 23, 2022

alex-dabija added team/rocket Team Rocket and removed team/hydra labels Nov 16, 2022

gawertm added the kind/cross-team Epics that span across teams label Nov 23, 2022

gawertm added team/turtles Team Turtles and removed team/rocket Team Rocket labels May 17, 2023

alex-dabija added team/phoenix Team Phoenix team/hydra and removed team/hydra labels May 23, 2023

Rotfuks mentioned this issue Jun 19, 2023

Autoscaling Clusters #2596

Closed

Rotfuks mentioned this issue Jun 26, 2023

Autoscaling CAPZ Clusters #1793

Closed

1 task

Rotfuks changed the title ~~CAPI cluster autoscaling MachineDeployments~~ CAPI Cluster Autoscaling MachineDeployments Jun 27, 2023

alex-dabija mentioned this issue Jul 3, 2023

CAPI SR Release #2352

Closed

alex-dabija changed the title ~~CAPI Cluster Autoscaling MachineDeployments~~ CAPI MachineDeployments Autoscaling Jul 3, 2023

Rotfuks changed the title ~~CAPI MachineDeployments Autoscaling~~ CAPI Cluster Autoscaling Jul 31, 2023

alex-dabija mentioned this issue Aug 21, 2023

CAPZ SR release #2738

Open

alex-dabija changed the title ~~CAPI Cluster Autoscaling~~ CAPI Cluster Autoscaling with Machine Deployments Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAPI Cluster Autoscaling with Machine Deployments #1376

CAPI Cluster Autoscaling with Machine Deployments #1376

alex-dabija commented Sep 7, 2022 •

edited

Loading

Stories

cornelius-keller commented Sep 27, 2022

alex-dabija commented Sep 28, 2022

gawertm commented Nov 16, 2022

gawertm commented Nov 30, 2022

Rotfuks commented Dec 13, 2022

brinker211 commented Dec 21, 2022

brinker211 commented Jan 12, 2023

gawertm commented Jan 19, 2023

gawertm commented May 15, 2023

alex-dabija commented Jun 1, 2023

Rotfuks commented Jun 26, 2023

alex-dabija commented Aug 22, 2023

CAPI Cluster Autoscaling with Machine Deployments #1376

CAPI Cluster Autoscaling with Machine Deployments #1376

Comments

alex-dabija commented Sep 7, 2022 • edited Loading

Story

Background

Resources

Stories

cornelius-keller commented Sep 27, 2022

alex-dabija commented Sep 28, 2022

gawertm commented Nov 16, 2022

gawertm commented Nov 30, 2022

Rotfuks commented Dec 13, 2022

brinker211 commented Dec 21, 2022

brinker211 commented Jan 12, 2023

gawertm commented Jan 19, 2023

gawertm commented May 15, 2023

alex-dabija commented Jun 1, 2023

Rotfuks commented Jun 26, 2023

alex-dabija commented Aug 22, 2023

alex-dabija commented Sep 7, 2022 •

edited

Loading