Replies: 3 comments 1 reply
-
Ahh! If I set devices=4 explicitly, then this works. Setting it to -1 does not work. |
Beta Was this translation helpful? Give feedback.
-
Right, I saw that and hence the post here. However, on this machine, which is an ordinary GCP machine with 4 L4s, it did not work. |
Beta Was this translation helpful? Give feedback.
-
how did you launch the job? |
Beta Was this translation helpful? Give feedback.
-
I have a model training on a 4 GPU machine that is only using 1 Gpu.
The output when starting it was:
This makes me believe that it's seeing all of the devices, however not all of them are being used. I have
devices: -1
in the config as well.Running nvidia-smi shows that all of the GPUs were initiated because they all have at least 2mb in memory, but only the first one is doing any processing.
Any idea what's up? This seems straightforward...
Beta Was this translation helpful? Give feedback.
All reactions