Relationship between OMP_NUM_THREADS and no. nodes/CPUs #3904
Unanswered
BramVanroy
asked this question in
Q&A
Replies: 1 comment
-
Deepspeed, typically launches one process per GPU, not per node. Each process then handles one GPU and its associated computation. Therefore, in your example where you have a cluster with nodes having 4 GPUs and 32 CPU cores, you would set OMP_NUM_THREADS=8 to correspond to the number of CPUs you have available per GPU. Setting OMP_NUM_THREADS=8 means that each GPU will have a dedicated process utilizing 8 threads. This configuration aligns with the assumption that every GPU is assigned to a separate process, allowing each process to make use of its designated 8 threads effectively. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Usually, I set
OMP_NUM_THREADS
corresponding to the number of CPUs I have available per GPU. So if I have a cluster with nodes with 4 GPUs and 32 CPU cores, I'd setOMP_NUM_THREADS=8
(32/4), with the assumption that every GPU is using a dedicated process, which can then make use of their 8 designated threads.But does Deepspeed work like this, too? Does Deepspeed launch one process per GPU, or one process per node? In other words, in the example above, should
OMP_NUM_THREADS
be8
(one process per GPU) or should it be the full32
(one process per node)?Beta Was this translation helpful? Give feedback.
All reactions