Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter GA for AWS Vintage and CAPI #2705

Open
9 of 10 tasks
Tracked by #2914
T-Kukawka opened this issue Aug 7, 2023 · 6 comments
Open
9 of 10 tasks
Tracked by #2914

Karpenter GA for AWS Vintage and CAPI #2705

T-Kukawka opened this issue Aug 7, 2023 · 6 comments

Comments

@T-Kukawka
Copy link
Contributor

T-Kukawka commented Aug 7, 2023

Current Handbook: https://handbook.giantswarm.io/docs/product/managed-apps/karpenter/

  • IAM/Roles - create cloud resources for Karpenter - Role, SQS - basically automatic installation of prerequisites

    • we need to ensure that this will be working on both - Vintage and CAPA
  • Karpenter-app will have to be installed in the WCs

  • The Provisioners - right now we have the concept of NP which creates the ASG with given subnets, instance types etc. This is replaced in Karpenter with provisioners that take in the labels, taints, instancetypes, spots etc.

  • PDBs - Karpenter will never kill the machine where pods have wrong PDBs defined - this will result in hanging nodes. We could have a cronjob that could handle the timeouts on the PDBs to free up the machines.

  • Upgrades - currently we do the upgrades with ASGs on AWS. Karpenter nodes are outside of the loop. We could set the TTL on the Karpenter nodes, where the machines would be rolled after the TTL expiration.

  • The Networking - right now we create the NP, we scale it to 1 node and we feed the Karpenter with the data from the NP (like role, subnet etc).

  • Cluster-autoscaler will have to be tweaked, as in current state both; the Karpenter and autoscaler will act when a pod is pending. For now we have a 5' delay in cluster-autoscaler to favor the karpenter for scaling

Networking and Provisioners points should be properly integratged with the product, e.g. with the Custom Machine Pool (KarpenterMachinePool exposing the full Karpenter provisioner implementation) or a label on the Machine Pool. What we need from here is Role, LaunchTemplate and Subnet, which are created either with CAPA or Vintage when creating the MachinePool. There are talks upstream how to integrate in the CAPI world - kubernetes-sigs/karpenter#747

Tasks

  1. capi/migration epic/karpenter team/phoenix
  2. 4 of 4
    epic/karpenter provider/cluster-api-aws team/phoenix
    paurosello
  3. capi/migration team/phoenix
    paurosello
  4. team/honeybadger team/phoenix
    paurosello
  5. epic/karpenter phoenix-size/s provider/aws-china provider/cluster-api-aws provider/eks team/phoenix topic/cost-savings
@T-Kukawka
Copy link
Contributor Author

next steps: Spec for CAPA/Vintage implementation

@elmiko
Copy link

elmiko commented Nov 3, 2023

👋

@T-Kukawka this sounds like a really interesting direction, i just wanted to share that we have recently formed a feature group in the cluster-api community to address karpenter integration. we are planning to have our first meeting after kubecon next week, more details here https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/community/20231018-karpenter-integration.md

this topic would certainly be welcome if you are interested in having a wider discussion about karpenter and CAPA.

@paurosello
Copy link

The main issue we have with the new CRs (NodeClasses, NodePools) is that we can't use the LaunchTemplates anymore as it has been deprecated.

Not being able to reference a LaunchTemplate like in the old Provider CR means that we need an operator to create and manage a NodeClass (where we set the userData with the required values to join the cluster) which is a bit more involved.

For now I am more inclined to keep using the old releases and see if we can find a solution as a community in the new feature group that @elmiko is leading

@elmiko
Copy link

elmiko commented Dec 4, 2023

@paurosello one of the top concerns for the karpenter feature group is ensuring that cluster api users continue to have the experience of using that (CAPI) api to manage their infrastructure. we are still figuring out what that means, but i think we at least agree that it's a top goal for the group.

@paurosello
Copy link

yeah, 100% and we are committed to work with the community to evolve the Karpenter integration in CAPA, we will need to work with the old API for a while until we get there with the full integration.

@paurosello
Copy link

Currently the main issue we are facing is the CAPI taint of the nodes does not get removed because it's not a Machine in the API and it can't be disabled. More info kubernetes-sigs/cluster-api#9858

@T-Kukawka T-Kukawka moved this from Backlog 📦 to In Progress ⛏️ in Roadmap Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress ⛏️
Development

No branches or pull requests

3 participants