You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello everyone! Recenlty, I miss some question about submit DP-GEN job to gpu node by using CH4 example. In our machine (with LSF operation system), one node contain 4 gpu and 28 cores. When I set parameter in machine.json with ""strategy": {"if_cuda_multi_devices": true}" and "gpu_per_node": 4, "
"DEEPMD INFO computing device: gpu:0
334 DEEPMD INFO CUDA_VISIBLE_DEVICES: 1
335 DEEPMD INFO Count of visible GPU: 1"
are shown in the train.log.
if I set "strategy": {"if_cuda_multi_devices": false}",
"DEEPMD INFO computing device: gpu:0
334 DEEPMD INFO CUDA_VISIBLE_DEVICES: 0,1,2,3
335 DEEPMD INFO Count of visible GPU: 4"
are shown in the train.log.
No matter which condion, 4 model are all train gpu:0.
And I use nvidia-smi to check the gpu condtion, we find 4 job are all running at gpu 0. So how can I solve this problem
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello everyone! Recenlty, I miss some question about submit DP-GEN job to gpu node by using CH4 example. In our machine (with LSF operation system), one node contain 4 gpu and 28 cores. When I set parameter in machine.json with ""strategy": {"if_cuda_multi_devices": true}" and "gpu_per_node": 4, "
"DEEPMD INFO computing device: gpu:0
334 DEEPMD INFO CUDA_VISIBLE_DEVICES: 1
335 DEEPMD INFO Count of visible GPU: 1"
are shown in the train.log.
if I set "strategy": {"if_cuda_multi_devices": false}",
"DEEPMD INFO computing device: gpu:0
334 DEEPMD INFO CUDA_VISIBLE_DEVICES: 0,1,2,3
335 DEEPMD INFO Count of visible GPU: 4"
are shown in the train.log.
No matter which condion, 4 model are all train gpu:0.
And I use nvidia-smi to check the gpu condtion, we find 4 job are all running at gpu 0. So how can I solve this problem
Beta Was this translation helpful? Give feedback.
All reactions