-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add options to configure consolidation timeouts #1031
feat: Add options to configure consolidation timeouts #1031
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: domgoodwin The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @domgoodwin! |
Hi @domgoodwin. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This features starts to look similar to our batch duration parameters, which we were a little iffy on whether or not we should have surfaced in the first-place. Do you also set the batch duration parameters to some custom values overriding the defaults? The other question that I have: What would you set the values to if you could override them as is proposed here? And what's the size of your cluster that doesn't work well with the current timeout values? |
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
I haven't changed the batch duration parameters currently as I figured this was more around scale out, which seems to work ok. We do get a bit of rubber banding where we add more nodes then we need and consolidate down but honestly the speed trade off by not having to run any headroom pods seems worth it. We actually are running this code as our cluster just wasn't scaling in at all with these timeouts. We've currently set them to 10m, although based on metrics 5m would be fine. We initially just threw more CPU at Karpenter hoping it would help but it only ever uses 2 CPU cores seemingly |
I'm hesitant to make this configurable, as changing these knobs can have unknown impacts on the performance of Consolidation. Can you share what cluster sizes you're running at? Are you using any complex scheduling constraints like anti affinity? If anything, I can see making this scale to the size of the cluster in the long run, but I'd be concerned with giving users free reign over this. |
The cluster is anywhere between 25k and 15k pods from peak to overnight, 250-500 nodes too. We used to have anti-affinities but removed them across the board and saw a significant scheduling timing performance increase. |
How have you come to these numbers? Did you test them out yourself? I can understand that these timeouts shouldn't be one size fits all, and it sounds like the numbers you're proposing would work for your situation better. Especially when adding an API surface (even though it's a feature gate), this is a type of feature that would still require an RFC. Do you mind writing one up? Feel free to reach out to the maintainers on the kubernetes slack to figure out how to write one, or check out the existing RFCs in the repo. |
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
Issue: #903
Description
Adds option values to configure the timeout for multi and single node consolidations. Depending on the size of a cluster and your various affinities/topology spreads this can take longer then these hard coded values. Allowing them to be configured means large complex clusters can have longer timeouts if wanted.
How was this change tested?
I added this version as the Karpenter version for the aws-provider repo, built and image and deployed it on a test cluster. Working with the value both set and not set (defaulting to existing values) things were fine.
ie.
The unit test changes also cover setting the value to a non-default and then timing out after that time.