Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] GPU Feature discovery label formatter #3493

Merged
merged 27 commits into from
Jun 6, 2024

Conversation

asaiacai
Copy link
Contributor

@asaiacai asaiacai commented Apr 27, 2024

Resolves #2460

This allows k8s to consume the node label nvidia.com/gpu.product created by GPU feature discovery which is commonly deployed through the NVIDIA GPU operator

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Manual test: test against GKE labels (tested against T4)
  • Manual test: test against skypilot labeler script labels on EKS deployed via eks_test_cluster.yaml
  • Manual tests: deploy k3s with gpu-operator using deploy_k3s.sh modified to exclude the skypilot k8s labeler, ensure the following can run
# check nvidia-smi and nvidia.com/gpu.product info
nvidia-smi --query-gpu=name --format=csv,noheader,nounits
kubectl describe node | grep nvidia.com/gpu.product
# test skypilot against gpu type
sky show-gpus --cloud kubernetes
sky launch --cloud kubernetes --gpus <GPU_TYPE>
  • A100-80GB
  • A100
  • H100
  • T4
  • V100
  • A10G
  • P100
  • P4
  • L4

@asaiacai asaiacai marked this pull request as ready for review May 7, 2024 00:24
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome @asaiacai! It looks very reasonable to me. @romilbhardwaj for another look to make sure it does not break our other formatters : )

sky/provision/kubernetes/utils.py Show resolved Hide resolved
@Michaelvll Michaelvll requested a review from romilbhardwaj May 10, 2024 18:10
Co-authored-by: Zhanghao Wu <[email protected]>
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @asaiacai!

sky/provision/kubernetes/utils.py Outdated Show resolved Hide resolved
sky/provision/kubernetes/utils.py Show resolved Hide resolved
sky/provision/kubernetes/utils.py Outdated Show resolved Hide resolved
tests/kubernetes/scripts/deploy_k3s.sh Show resolved Hide resolved
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @asaiacai. Tested on A100 and H100 from Lambda. Left a comment about documenting that this labelformatter cannot be used with autoscaling, otherwise lgtm!

sky/provision/kubernetes/utils.py Show resolved Hide resolved
@asaiacai
Copy link
Contributor Author

asaiacai commented Jun 4, 2024

just added the docstring @romilbhardwaj . Thanks for the review! lmk if i this needs anything else.

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @asaiacai!

@romilbhardwaj romilbhardwaj merged commit cb858b5 into skypilot-org:master Jun 6, 2024
20 checks passed
@asaiacai asaiacai deleted the gfd_formatter branch June 6, 2024 07:39
Michaelvll added a commit that referenced this pull request Aug 23, 2024
* GFDLabel formatter for k8s

* update comment

* format

* substring match against k8s labels instead of strict matching

* cleanup

* use k8s label

* map k8s label value to accelerator instead of accelerator to label value

* remove unused get_gke_accelerator_name

* remove get acc from value func

* pattern match against A100'

* pattern match against A100'

* format

* fix typo

* format

* re.search

* compare strings

* add P4000

* format

* lower case for check

Co-authored-by: Zhanghao Wu <[email protected]>

* force upper case

* match skypilot labeler logic

* format.sh

* add docstring

* fix class docstring

* grammar fix

* format

---------

Co-authored-by: Zhanghao Wu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[k8s] Support Nvidia GFD Labels for GPU type detection
3 participants