when I run train.py there is only one GPU has utilization #34

csyuan · 2020-03-03T06:28:52Z

I have 4 GPUs, when I run train.py with --num_samples 1 --gpu 4，there is only one GPU has utilization.
Is it because the model does not support multiple GPUs?
But when I run search.py with --num_samples 16 --gpu 0.25 , all GPUs has utilization.

arcelien · 2020-03-03T07:04:38Z

Although the code has only been tested with 1 GPU, you usually have to specify CUDA_VISIBLE_DEVICES to constrain the search.

Perhaps a newer version of Ray makes this constraint instead?

csyuan · 2020-03-03T07:55:44Z

I've added CUDA_VISIBLE_DEVICES=0,1,2,3 to the scripts。
It's no problem on search.py, because run search.py, gpu is a fractional value, and num_samples > 1 . It works with multiple GPUs on Ray.
But, I mean train.py --num_samples=1, --gpu=4, ... here gpu nums > 1, num_samples=1, it means resources_per_trial:{"gpu":4} , classification model train is too slow, it does not work on 4 GPUs, only one GPU has utilization。

So, when --num_samples=1,--gpu>1 , model could not work with multiple GPUs on Ray ?

arcelien · 2020-03-04T03:42:40Z

I see; single model parallelism across GPUs is unfortunately not supported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when I run train.py there is only one GPU has utilization #34

when I run train.py there is only one GPU has utilization #34

csyuan commented Mar 3, 2020 •

edited

Loading

arcelien commented Mar 3, 2020

csyuan commented Mar 3, 2020 •

edited

Loading

arcelien commented Mar 4, 2020

when I run train.py there is only one GPU has utilization #34

when I run train.py there is only one GPU has utilization #34

Comments

csyuan commented Mar 3, 2020 • edited Loading

arcelien commented Mar 3, 2020

csyuan commented Mar 3, 2020 • edited Loading

arcelien commented Mar 4, 2020

csyuan commented Mar 3, 2020 •

edited

Loading

csyuan commented Mar 3, 2020 •

edited

Loading