-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[k8s] Support Nvidia GFD Labels for GPU type detection #2460
Comments
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
Based on the code the maintainer for gpu feature discovery mentioned in the related issue, I want to say that the output from $ kubectl describe node | grep nvidia.com/gpu.product
nvidia.com/gpu.product=NVIDIA-H100-80GB-HBM3
$ nvidia-smi --query-gpu=name --format=csv,noheader,nounits
NVIDIA H100 80GB HBM3 so it's probably enough to use similar logic as the labeler job to compare against the canonical gpu names we have? I can draft up a PR for this. it was getting hard to remember to rerun the labeler job anytime I added a node to my k3s cluster but this way it just works directly with feature discovery. Somewhat resolves #3432 if people are running with the NVIDIA gpu operator |
Good point, if the |
To detect GPU type on the cluster, we currently support GKE labels and
skypilot.co/accelerators
labels created by our GPU labelling script (python -m sky.utils.kubernetes.gpu_labeler
).It would be good to add a
GPULabelFormatter
for Nvidia GPU Feature Discovery. To do so, we will need a list of label values generated by Nvidia GFD for SkyPilot supported GPUs (e.g.,nvidia.com/gpu.product: A100-SXM4-40GB
).Related issue: NVIDIA/k8s-device-plugin#739
The text was updated successfully, but these errors were encountered: