Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when I run train.py there is only one GPU has utilization #34

Open
csyuan opened this issue Mar 3, 2020 · 3 comments
Open

when I run train.py there is only one GPU has utilization #34

csyuan opened this issue Mar 3, 2020 · 3 comments

Comments

@csyuan
Copy link

csyuan commented Mar 3, 2020

I have 4 GPUs, when I run train.py with --num_samples 1 --gpu 4,there is only one GPU has utilization.
Is it because the model does not support multiple GPUs?
But when I run search.py with --num_samples 16 --gpu 0.25 , all GPUs has utilization.

@arcelien
Copy link
Owner

arcelien commented Mar 3, 2020

Although the code has only been tested with 1 GPU, you usually have to specify CUDA_VISIBLE_DEVICES to constrain the search.

Perhaps a newer version of Ray makes this constraint instead?

@csyuan
Copy link
Author

csyuan commented Mar 3, 2020

I've added CUDA_VISIBLE_DEVICES=0,1,2,3 to the scripts。
It's no problem on search.py, because run search.py, gpu is a fractional value, and num_samples > 1 . It works with multiple GPUs on Ray.
But, I mean train.py --num_samples=1, --gpu=4, ... here gpu nums > 1, num_samples=1, it means resources_per_trial:{"gpu":4} , classification model train is too slow, it does not work on 4 GPUs, only one GPU has utilization。

So, when --num_samples=1,--gpu>1 , model could not work with multiple GPUs on Ray ?

@arcelien
Copy link
Owner

arcelien commented Mar 4, 2020

I see; single model parallelism across GPUs is unfortunately not supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants