Question about Detail Communication Cost and Memory Cost of Multi-Dimensional Parallelism #185
-
Dear developers, I have read the papers of 2D, 2.5D and 3D parallelism, but still have some questions about the communication cost and memory cost of Colossal-AI. Here is my question list: In 3D parallelism, I notice that Colossal-AI moves
Waiting for your responses, please help us~~ |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 8 replies
-
Hi @ConnollyLeon , thank you for your question. @kurisusnowdeng is in charge of 3D parallelism. He will try to answer your question. :) |
Beta Was this translation helpful? Give feedback.
-
Hi, @ConnollyLeon . Thank you for your interest. I give my anwsers according to my understanding to your questions one by one.
|
Beta Was this translation helpful? Give feedback.
-
@kurisusnowdeng 还有一个小问题,why not use All-gather instead of Broadcast in 2D model parallelism? According to my analysis, the total communication time (bandwidth cost specifically) of using All-gather would be smaller. |
Beta Was this translation helpful? Give feedback.
Hi, @ConnollyLeon . Thank you for your interest. I give my anwsers according to my understanding to your questions one by one.