Fix gpus not being considered when counting allocatables #2108

ElijahQuinones · 2024-08-12T15:17:54Z

Is this a bug fix or adding new feature?
bug fix to address part of gh #2105 gpus not being considered when counting allocatables.

What is this PR about? / Why do we need it?
We need this PR in order to fix the above referenced bug

What testing is done?
Compared manually to list of gpu instances
and created unit tests for instances mentioned in ticket and for a gpu instance with 4 gpus.

k8s-ci-robot · 2024-08-12T15:18:03Z

Hi @ElijahQuinones. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

torredil · 2024-08-12T15:19:26Z

/ok-to-test

pkg/driver/node_test.go

hack/generate-gpu-count-table.sh

rdpsin · 2024-08-12T19:56:50Z

Should we do some manual testing by setting up a cluster with GPU instances?

pkg/cloud/volume_limits.go

ElijahQuinones · 2024-08-13T01:07:48Z

Should we do some manual testing by setting up a cluster with GPU instances?

I have manually tested the limit outside of Kubernettes entirely by just spinning up a g4ad.xlarge instance and manually attaching ebs volumes until I hit the limit where all volumes afterwards are stuck in the attaching state this test concluded that 24 volumes could be attached to the g4ad.xlarge which complies with what the documentation provides.

starts at 28
-1 eni
-1 root volume
-1 gpu
-1 instance store
we also end up with 24

Additionally the unit tests for a g4ad.xlarge return the expected value of 24 slots.

however if we feel it would cover more bases I can run the test with a g4ad.xlarge cluster just to make sure there is no gotchas.

…mits unit tests add go doc comment to GetReservedSlotsForInstanceType

ConnorJC3

I've personally done enough manual testing to be confident it works

Needs squash otherwise lgtm

github-actions · 2024-08-13T15:29:29Z

Code Coverage Diff

File	Old Coverage	New Coverage	Delta
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/volume_limits.go	30.2%	27.1%	-3.1

torredil

Thanks for including additional unit tests.
/lgtm

AndrewSirenko · 2024-08-13T17:11:20Z

/approve

k8s-ci-robot · 2024-08-13T17:11:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndrewSirenko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [AndrewSirenko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ConnorJC3 · 2024-08-13T17:15:20Z

/label tide/merge-method-squash

…les (kubernetes-sigs#2108) * Fix gpus not being considered when counting allocatables * Parallelize volume limits table generating scripts refactor volume limits unit tests add go doc comment to GetReservedSlotsForInstanceType

Fix gpus not being considered when counting allocatables

546c936

k8s-ci-robot requested review from ConnorJC3 and torredil August 12, 2024 15:17

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 12, 2024

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 12, 2024

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 12, 2024

AndrewSirenko reviewed Aug 12, 2024

View reviewed changes

pkg/driver/node_test.go Show resolved Hide resolved

AndrewSirenko reviewed Aug 12, 2024

View reviewed changes

hack/generate-gpu-count-table.sh Outdated Show resolved Hide resolved

AndrewSirenko reviewed Aug 12, 2024

View reviewed changes

pkg/cloud/volume_limits.go Show resolved Hide resolved

AndrewSirenko reviewed Aug 12, 2024

View reviewed changes

pkg/cloud/volume_limits.go Show resolved Hide resolved

Parallelize volume limits table generating scripts refactor volume li…

d5f6891

…mits unit tests add go doc comment to GetReservedSlotsForInstanceType

ConnorJC3 approved these changes Aug 13, 2024

View reviewed changes

k8s-ci-robot assigned ConnorJC3 Aug 13, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 13, 2024

torredil approved these changes Aug 13, 2024

View reviewed changes

k8s-ci-robot assigned torredil Aug 13, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 13, 2024

k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Aug 13, 2024

k8s-ci-robot merged commit 911b991 into kubernetes-sigs:master Aug 13, 2024
19 checks passed

ElijahQuinones mentioned this pull request Aug 14, 2024

Incorrect allocatable volumes count in csinode for AWS vt1*/g4* instance types #2105

Closed

mpatlasov mentioned this pull request Aug 20, 2024

OCPBUGS-37088: UPSTREAM: 2108, 2115: Fix allocatable volumes count for vt1 and g4 openshift/aws-ebs-csi-driver#274

Merged

ElijahQuinones mentioned this pull request Sep 24, 2024

REQUEST: New membership for ElijahQuinones kubernetes/org#5177

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gpus not being considered when counting allocatables #2108

Fix gpus not being considered when counting allocatables #2108

ElijahQuinones commented Aug 12, 2024 •

edited

Loading

k8s-ci-robot commented Aug 12, 2024

torredil commented Aug 12, 2024

rdpsin commented Aug 12, 2024

ElijahQuinones commented Aug 13, 2024 •

edited

Loading

ConnorJC3 left a comment

github-actions bot commented Aug 13, 2024

torredil left a comment

AndrewSirenko commented Aug 13, 2024

k8s-ci-robot commented Aug 13, 2024

ConnorJC3 commented Aug 13, 2024

Fix gpus not being considered when counting allocatables #2108

Fix gpus not being considered when counting allocatables #2108

Conversation

ElijahQuinones commented Aug 12, 2024 • edited Loading

k8s-ci-robot commented Aug 12, 2024

torredil commented Aug 12, 2024

rdpsin commented Aug 12, 2024

ElijahQuinones commented Aug 13, 2024 • edited Loading

ConnorJC3 left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 13, 2024

Code Coverage Diff

torredil left a comment

Choose a reason for hiding this comment

AndrewSirenko commented Aug 13, 2024

k8s-ci-robot commented Aug 13, 2024

ConnorJC3 commented Aug 13, 2024

ElijahQuinones commented Aug 12, 2024 •

edited

Loading

ElijahQuinones commented Aug 13, 2024 •

edited

Loading