-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU 0 is always used in a multi-GPU setup #139
Comments
You may try passing |
Passing |
I cannot figure out what is causing the issue. I think you should set the pci id of the device you want to use directly. This method requires a bit setup but should never fail. First, before creating anything with SAPIEN, run |
Thank you very much for your answer! Sadly the result is still exactly the same. GPU 0 always gets used, even when selecting another GPU via PCI address. |
Are you using |
I'm actually having the same issue. |
System:
Describe the bug
SAPIEN always uses GPU 0 in multi-GPU setup in addition to the GPU specified by
CUDA_VISIBLE_DEVICES
To Reproduce
CUDA_VISIBLE_DEVICES=0
CUDA_VISIBLE_DEVICES=1
Expected behavior
Checking the GPU usage, only the selected GPU should be used. For
CUDA_VISIBLE_DEVICES=0
, that is the case. ForCUDA_VISIBLE_DEVICES=1
, both GPU 0 and GPU 1 get used.Screenshots
CUDA_VISIBLE_DEVICES=0
:CUDA_VISIBLE_DEVICES=1
:Additional context
Even though GPU 0 only gets used a bit when
CUDA_VISIBLE_DEVICES=1
, this usage quickly adds up when running many parallel simulations. I am using ManiSkill2 for Reinforcement Learning on an HPC node with 4 Nvidia A100 GPUs and this bug severely limits the number of parallel environments I can run. Additionally, running many parallel environments becomes slow, since GPU 0 is used by every single simulation environment instead of just 1/4th of the simulations.The text was updated successfully, but these errors were encountered: