-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docker] Add --gpus all
for docker when GPU is available
#3833
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @Michaelvll ! Will this affect the clouds that are still using sky/skylet/providers/command_runner.py
? Those clouds are still using ray's docker configure code path iirc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we remove other --gpus all
in j2 files like {azure,gcp,paperspace}-ray.yml.j2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I just removed them. For those clouds still using node providers, they are not affected by the current PR and will still work as current master, which might be fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh actually, should we only remove --gpus all
in clouds that support the new provisioner? For other clouds like paperspace, we should keep this in the ray file so the ray docker command runner can use this option to expose GPUs to the docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paperspace is using new provisioner, and it seems none of the clouds using the old provisioned support docker as a runtime? : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point! Then maybe we could remove the old docker command runner now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Let's keep it for reference for now. We can probably remove it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this @Michaelvll ! LGTM.
There seems no harm to add
--gpus all
when GPUs exist, which is a newer version of the option--runtime nvidia
and seems docker's official doc is using this: https://docs.docker.com/engine/containers/resource_constraints/#access-an-nvidia-gpuTested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh