Add c6id and r6id adjusted limits to volume_limits.go #1961

talnevo · 2024-03-11T15:40:35Z

Is this a bug fix or adding new feature?
This is a bug fix.

What is this PR about? / Why do we need it?
This is to solve a situation where the Kubernetes scheduler sends more pods with volumes to a node than the volume limit allows because it is unaware of the true volume limit of nodes based on such instance types.

What testing is done?
No direct testing was performed: Kubernetes tests on nodes based on the r6id.32xlarge instance type have shown that there is a gap between the number of [volume requiring] pods the Kubernetes scheduler allows to run on a node and the number of pods that are able to attach their respective volume. Our research brought us to determine that this is the place to make the change that will fix this problem. Similar issues were observed last year with nodes based on m5d.16xlarge & m5d.24xlarge and later with m6id.16xlarge & m6id.32xlarge. These older issues no longer exist. We concluded that a change to volume_limits.go introduced in March 2022 fixed the problem for m5d instance types and our own PR for m6id in December 2022 fixed the issue for m6id instance types and we want to do the same for r6id and c6id based nodes.
A similar PR for i4i instance types was introduced in July 2023.

Correct volume limits for i4i instance types

Similar to m6id.* instance types, we need to make sure that volume limits are correct for c6id.* and r6id.* instance types.

k8s-ci-robot · 2024-03-11T15:40:45Z

Hi @talnevo. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

torredil · 2024-03-11T16:41:25Z

/ok-to-test

torredil

Thank you @talnevo!
/lgtm

AndrewSirenko

Thank you for this @talnevo ! We will open a separate PR afterwards for other instance types like r6idn, but for now we can merge this.

ConnorJC3 · 2024-03-13T13:18:07Z

/approve

k8s-ci-robot · 2024-03-13T13:18:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ConnorJC3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ConnorJC3]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2024-03-13T14:09:40Z

Code Coverage Diff

File	Old Coverage	New Coverage	Delta
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/cloud.go	82.5%	82.9%	0.4
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/controller_modify_volume.go	85.4%	86.2%	0.8

talnevo added 3 commits July 24, 2023 08:21

Merge pull request #1 from talnevo/talnevo-patch-2

3d3a72a

Correct volume limits for i4i instance types

Merge branch 'kubernetes-sigs:master' into master

491b5e6

Add c6id and r6id limits to volume_limits.go

96bbc51

Similar to m6id.* instance types, we need to make sure that volume limits are correct for c6id.* and r6id.* instance types.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 11, 2024

k8s-ci-robot requested review from AndrewSirenko and ConnorJC3 March 11, 2024 15:40

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 11, 2024

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 11, 2024

torredil approved these changes Mar 12, 2024

View reviewed changes

k8s-ci-robot assigned torredil Mar 12, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 12, 2024

AndrewSirenko approved these changes Mar 12, 2024

View reviewed changes

k8s-ci-robot assigned AndrewSirenko Mar 12, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 13, 2024

k8s-ci-robot merged commit 0ceeadd into kubernetes-sigs:master Mar 13, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add c6id and r6id adjusted limits to volume_limits.go #1961

Add c6id and r6id adjusted limits to volume_limits.go #1961

talnevo commented Mar 11, 2024

k8s-ci-robot commented Mar 11, 2024

torredil commented Mar 11, 2024

torredil left a comment

AndrewSirenko left a comment

ConnorJC3 commented Mar 13, 2024

k8s-ci-robot commented Mar 13, 2024

github-actions bot commented Mar 13, 2024

Add c6id and r6id adjusted limits to volume_limits.go #1961

Add c6id and r6id adjusted limits to volume_limits.go #1961

Conversation

talnevo commented Mar 11, 2024

k8s-ci-robot commented Mar 11, 2024

torredil commented Mar 11, 2024

torredil left a comment

Choose a reason for hiding this comment

AndrewSirenko left a comment

Choose a reason for hiding this comment

ConnorJC3 commented Mar 13, 2024

k8s-ci-robot commented Mar 13, 2024

github-actions bot commented Mar 13, 2024

Code Coverage Diff