-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karpenter GA for AWS Vintage and CAPI #2705
Comments
next steps: Spec for CAPA/Vintage implementation |
👋 @T-Kukawka this sounds like a really interesting direction, i just wanted to share that we have recently formed a feature group in the cluster-api community to address karpenter integration. we are planning to have our first meeting after kubecon next week, more details here https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/community/20231018-karpenter-integration.md this topic would certainly be welcome if you are interested in having a wider discussion about karpenter and CAPA. |
The main issue we have with the new CRs (NodeClasses, NodePools) is that we can't use the LaunchTemplates anymore as it has been deprecated. Not being able to reference a LaunchTemplate like in the old Provider CR means that we need an operator to create and manage a NodeClass (where we set the userData with the required values to join the cluster) which is a bit more involved. For now I am more inclined to keep using the old releases and see if we can find a solution as a community in the new feature group that @elmiko is leading |
@paurosello one of the top concerns for the karpenter feature group is ensuring that cluster api users continue to have the experience of using that (CAPI) api to manage their infrastructure. we are still figuring out what that means, but i think we at least agree that it's a top goal for the group. |
yeah, 100% and we are committed to work with the community to evolve the Karpenter integration in CAPA, we will need to work with the old API for a while until we get there with the full integration. |
Currently the main issue we are facing is the CAPI taint of the nodes does not get removed because it's not a Machine in the API and it can't be disabled. More info kubernetes-sigs/cluster-api#9858 |
Current Handbook: https://handbook.giantswarm.io/docs/product/managed-apps/karpenter/
IAM/Roles - create cloud resources for Karpenter - Role, SQS - basically automatic installation of prerequisites
Karpenter-app will have to be installed in the WCs
The Provisioners - right now we have the concept of NP which creates the ASG with given subnets, instance types etc. This is replaced in Karpenter with provisioners that take in the labels, taints, instancetypes, spots etc.
PDBs - Karpenter will never kill the machine where pods have wrong PDBs defined - this will result in hanging nodes. We could have a cronjob that could handle the timeouts on the PDBs to free up the machines.
Upgrades - currently we do the upgrades with ASGs on AWS. Karpenter nodes are outside of the loop. We could set the TTL on the Karpenter nodes, where the machines would be rolled after the TTL expiration.
The Networking - right now we create the NP, we scale it to 1 node and we feed the Karpenter with the data from the NP (like role, subnet etc).
Cluster-autoscaler will have to be tweaked, as in current state both; the Karpenter and autoscaler will act when a pod is pending. For now we have a 5' delay in cluster-autoscaler to favor the karpenter for scaling
Networking and Provisioners points should be properly integratged with the product, e.g. with the Custom Machine Pool (KarpenterMachinePool exposing the full Karpenter provisioner implementation) or a label on the Machine Pool. What we need from here is Role, LaunchTemplate and Subnet, which are created either with CAPA or Vintage when creating the MachinePool. There are talks upstream how to integrate in the CAPI world - kubernetes-sigs/karpenter#747
Tasks
The text was updated successfully, but these errors were encountered: