chore: improve distruption for underutilization #992

Luke-Smartnews · 2024-02-07T03:21:04Z

Background

we're trying to use karpenter in our production environment, the scaling-out feature is working well, but the scaling-in(disruption) is a problem to us. There're are only two options: WhenUnderutilized | WhenEmpty ,

WhenEmpty: basically useless
WhenUnderutilized: we see frequently node deletion and creation which reduces SLA. (Consolidation ttl: spec.disruption.consolidateAfter #735)

Proposals

add an option to check the node utilization first, normally if a node is fully utilized no need for disruption.
support consolidateAfter for WhenUnderutilized

linux-foundation-easycla · 2024-02-07T03:21:07Z

❌ - login: @Luke-Smartnews / name: Luke . The commit (e4c9cac, 13dda7c, a3e57ea, 87626aa, 4f2ac72, 81e21fc, 0750608, ec247da, b72e6de, 0943800, f6e9789, 6b3b991, cda94ab) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.

k8s-ci-robot · 2024-02-07T03:21:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Luke-Smartnews
Once this PR has been reviewed and has the lgtm label, please assign njtran for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2024-02-07T03:21:12Z

Welcome @Luke-Smartnews!

It looks like this is your first PR to kubernetes-sigs/karpenter 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/karpenter has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2024-02-07T03:21:13Z

Hi @Luke-Smartnews. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Bryce-Soghigian · 2024-02-07T06:27:11Z

pkg/apis/crds/karpenter.sh_nodepools.yaml

@@ -3,7 +3,8 @@ apiVersion: apiextensions.k8s.io/v1
 kind: CustomResourceDefinition
 metadata:
  annotations:
-    controller-gen.kubebuilder.io/version: v0.14.0
+    controller-gen.kubebuilder.io/version: v0.8.0


please run make toolchain to update your controller-gen version

pkg/apis/v1beta1/nodeclaim_status.go

Co-authored-by: Thomas Schaaf <[email protected]>

Luke-Smartnews · 2024-02-08T04:45:47Z

Tested in our dev env, working without any issues. Will do more load tests.

almson · 2024-02-13T07:29:44Z

Can you write a description of how you've implemented this? This seems to drastically change how "WhenUnderutilized" works. Previously, it took effect when the cluster as a whole was underutilized, but now it seems to just mean that the node is underutilized?

Bryce-Soghigian · 2024-02-13T23:55:06Z

Can you write a description of how you've implemented this? This seems to drastically change how "WhenUnderutilized" works.

Excited to see a POC!

Further than a description it should probably be first proposed as an RFC to get reviewed by the community, and to talk about alternative approaches or if this is behavior the community wants to accept. Seems interesting though :)

Luke-Smartnews · 2024-02-15T02:01:25Z

will do after I confirm it works on a large scale. also is there a template of RFC?

Luke-Smartnews · 2024-02-23T12:03:27Z

Test Report

nodepool settings

 spec:
   disruption:
     budgets:
     - nodes: 10%
     consolidateAfter: 5m
     consolidationPolicy: WhenUnderutilized
     expireAfter: Never
     utilizationThreshold: 65

confirmed Underutilized added to target nodes

  conditions:
  - lastTransitionTime: "2024-02-23T11:24:05Z"
    status: "True"
    type: Initialized
  - lastTransitionTime: "2024-02-23T11:22:54Z"
    status: "True"
    type: Launched
  - lastTransitionTime: "2024-02-23T11:24:05Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-02-23T11:23:40Z"
    status: "True"
    type: Registered
  - lastTransitionTime: "2024-02-23T11:31:53Z"
    severity: Warning
    status: "True"
    type: Underutilized

pending pods scheduled in 5m
nodes are removed 5m later after pods deletion

njtran · 2024-03-04T22:38:29Z

Hey @Luke-Smartnews, this is a core difference in how Karpenter considers underutilization. We've intentionally not surfaced a utilization threshold due to its edge cases and how it ends up driving overall lower utilization. This would be a huge change to our disruption logic, and would definitely require opening up a design + RFC. I'm not sure we would be accepting this feature (at least in its current state without a design), but I would love to hear about the use-cases you're trying to solve here.

barryrobison · 2024-03-04T23:07:45Z

Hey @Luke-Smartnews, this is a core difference in how Karpenter considers underutilization. We've intentionally not surfaced a utilization threshold due to its edge cases and how it ends up driving overall lower utilization. This would be a huge change to our disruption logic, and would definitely require opening up a design + RFC. I'm not sure we would be accepting this feature (at least in its current state without a design), but I would love to hear about the use-cases you're trying to solve here.

@njtran is the current utilisation logic documented somewhere? From my brief reading of the source, it looks like karpenter does a "fake" scheduling run, and if the nodes pods can be scheduled elsewhere it consolidates the node?

Our use case is flink jobs performing data transfer. Those pods are bursty in nature, and we want to insure those pods run for at least 10 minutes so they can reach a checkpoint before getting rescheduled. The current system is reclaiming nodes in under 90s, so they barely have time to get scheduled before karpenter pulls the rug.

Thank you!

njtran · 2024-03-05T00:01:52Z

@barryrobison I think you want this #752, you're correct in that's how Karpenter considers Consolidation

Luke-Smartnews · 2024-03-07T03:09:26Z

@njtran
my change only take effect when the nodepool is configured whit UtilizationThreshold, it doesn't change the existing behavior at all.

https://github.com/kubernetes-sigs/karpenter/pull/992/files#diff-c11b5a9240bbfac3dd14fdfff84e098ade0f3bf0a5fc63673de41ff710d3d308R105

njtran · 2024-03-15T15:47:48Z

Hey @Luke-Smartnews, even if this is just a change to the candidacy, we'd need to review a design/RFC before continuing on this. Would you be able to open a design PR first?

github-actions · 2024-03-30T12:01:25Z

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

k8s-ci-robot · 2024-04-04T23:39:51Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

njtran · 2024-04-08T15:38:06Z

Will close this until there's an RFC for this. Please feel free to re-open when you cut the RFC!

booleanbetrayal · 2024-06-28T14:59:36Z

@njtran - Could you refer to the RFC process needed to unblock this PR? We are seeing worse bin-packing performance from Karpenter than cluster-autoscaler and are debating switching back. Being able to leverage advanced consolidation features (eval time + thresholds) would put Karpenter on an equal footing for us. Please advise.

chore: improve distruption for underutilization

e4c9cac

Luke-Smartnews marked this pull request as draft February 7, 2024 03:21

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Feb 7, 2024

k8s-ci-robot requested review from engedaam and jackfrancis February 7, 2024 03:21

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 7, 2024

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 7, 2024

Luke-Smartnews mentioned this pull request Feb 7, 2024

Consolidation ttl: spec.disruption.consolidateAfter #735

Closed

update manifests

87626aa

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 7, 2024

Luke-Smartnews mentioned this pull request Feb 7, 2024

Improve scaling-in behavior smartnews/karpenter#1

Merged

Bryce-Soghigian reviewed Feb 7, 2024

View reviewed changes

update tool chain

81e21fc

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 7, 2024

update comment

13dda7c

thomaschaaf reviewed Feb 7, 2024

View reviewed changes

pkg/apis/v1beta1/nodeclaim_status.go Outdated Show resolved Hide resolved

Luke-Smartnews and others added 4 commits February 8, 2024 10:16

Update pkg/apis/v1beta1/nodeclaim_status.go

a3e57ea

Co-authored-by: Thomas Schaaf <[email protected]>

update nodepool validation

0750608

make it compatiable with default consolidation behavior

f6e9789

update crd

ec247da

Luke-Smartnews requested review from thomaschaaf and Bryce-Soghigian February 9, 2024 05:36

Luke-Smartnews added 5 commits February 20, 2024 13:56

temp: add debug logs

4f2ac72

rewrite utilization checking logic

6b3b991

fix wrong operator

cda94ab

add temp log

b72e6de

fix threshold

0943800

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 30, 2024

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 4, 2024

github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2024

njtran closed this Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: improve distruption for underutilization #992

chore: improve distruption for underutilization #992

Luke-Smartnews commented Feb 7, 2024 •

edited

Loading

linux-foundation-easycla bot commented Feb 7, 2024 •

edited

Loading

k8s-ci-robot commented Feb 7, 2024

k8s-ci-robot commented Feb 7, 2024

k8s-ci-robot commented Feb 7, 2024

Bryce-Soghigian Feb 7, 2024

Luke-Smartnews commented Feb 8, 2024

almson commented Feb 13, 2024

Bryce-Soghigian commented Feb 13, 2024 •

edited

Loading

Luke-Smartnews commented Feb 15, 2024 •

edited

Loading

Luke-Smartnews commented Feb 23, 2024

njtran commented Mar 4, 2024 •

edited

Loading

barryrobison commented Mar 4, 2024

njtran commented Mar 5, 2024

Luke-Smartnews commented Mar 7, 2024

njtran commented Mar 15, 2024

github-actions bot commented Mar 30, 2024

k8s-ci-robot commented Apr 4, 2024

njtran commented Apr 8, 2024

booleanbetrayal commented Jun 28, 2024

chore: improve distruption for underutilization #992

chore: improve distruption for underutilization #992

Conversation

Luke-Smartnews commented Feb 7, 2024 • edited Loading

Background

Proposals

linux-foundation-easycla bot commented Feb 7, 2024 • edited Loading

k8s-ci-robot commented Feb 7, 2024

k8s-ci-robot commented Feb 7, 2024

k8s-ci-robot commented Feb 7, 2024

Bryce-Soghigian Feb 7, 2024

Choose a reason for hiding this comment

Luke-Smartnews commented Feb 8, 2024

almson commented Feb 13, 2024

Bryce-Soghigian commented Feb 13, 2024 • edited Loading

Luke-Smartnews commented Feb 15, 2024 • edited Loading

Luke-Smartnews commented Feb 23, 2024

Test Report

njtran commented Mar 4, 2024 • edited Loading

barryrobison commented Mar 4, 2024

njtran commented Mar 5, 2024

Luke-Smartnews commented Mar 7, 2024

njtran commented Mar 15, 2024

github-actions bot commented Mar 30, 2024

k8s-ci-robot commented Apr 4, 2024

njtran commented Apr 8, 2024

booleanbetrayal commented Jun 28, 2024

Luke-Smartnews commented Feb 7, 2024 •

edited

Loading

linux-foundation-easycla bot commented Feb 7, 2024 •

edited

Loading

Bryce-Soghigian commented Feb 13, 2024 •

edited

Loading

Luke-Smartnews commented Feb 15, 2024 •

edited

Loading

njtran commented Mar 4, 2024 •

edited

Loading