You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am training my Llama2-7b model with Megatron-LM, using four H20s, 32 GPUs in total. The parallel strategy is set to: TP=8/PP=2/DP=2. Now, I want to know the data capacity of different parallel groups communicating, is there some parameter setting to get these values, or if there is no such parameter, how can I get it from the code? Thank you.
This discussion was converted from issue #969 on September 04, 2024 18:34.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, I am training my Llama2-7b model with Megatron-LM, using four H20s, 32 GPUs in total. The parallel strategy is set to: TP=8/PP=2/DP=2. Now, I want to know the data capacity of different parallel groups communicating, is there some parameter setting to get these values, or if there is no such parameter, how can I get it from the code? Thank you.
Beta Was this translation helpful? Give feedback.
All reactions