Gemini的zero方案可以初始化出gpt3级别的模型么? #2479
Unanswered
yhcc
asked this question in
Community | Q&A
Replies: 3 comments 4 replies
-
shard init初始化原理:ColoInitContext一个参数一个参数初始化,先在所有进程上分配出global tensor,然后再切分成N份,每个进程只保留1/N数据。所以全部初始化完毕,每个进程只用1/N内存。 |
Beta Was this translation helpful? Give feedback.
3 replies
-
Hi @yhcc @taishiciR For shard init, you can refer to here |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
这里https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/gemini/train_gpt_demo.py 提供了一个基于Gemini的Zero方案,我尝试改变这个方案来直接支持gpt3级别的模型,但会导致oom问题。我看了下GeminiDDP的源码,似乎里面没有关于如何切分参数到不同卡的部分(不知道我是不是理解错了)。现在想通过Gemini这套方案训练一个GPT3级别的模型有什么推荐的方案吗?
Beta Was this translation helpful? Give feedback.
All reactions