[PROPOSAL] Project Elixir #3115
1SAA
started this conversation in
Development | Core
Replies: 1 comment
-
你好,这个项目会合并到colossal-ai到官方库吗? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Elixir (also called Gemini2.0)
Elixir is new feature used to substitute current
Gemini
in ColossalAI. Elixir should not be viewed as an incremental update of Gemini. It is a new designed feature which does the same thing as Gemini but they are based on different principles. The difference is that Gemini dynamically allocates chunk footprints and profiles CUDA memory usage while Elixir allocates and profiles statically before running models.Users can track the development of Elixir here.
cc @FrankLeeeee, @ver217 , @YuliangLiu0306 , @kurisusnowdeng.
Motivation (Drawbacks of Gemini)
ColoParameter
, a derived class fromColoTensor
which is used for "automatic" tensor parallelism. The poor compatibility ofColoTensor
makes Gemini fail at a large range of models. As we shall deprecateColoTensor
and useDTensor
instead for real automatic tensor parallelism, we can design a new parameter hook mechanism for all common models.auto
policy always fails easily. Though the profiler detects enough memory space for future activations, there has no enough continous memory space and raises an OOM error instead.What's New in Elixir
High Level Insights
Elixir consists of three main parts
SearchAlgorihtm
,ElixirModule
andElixirOptimizer
. ElixirModule and ElixirOptimizer are designed to be coupled together. One ElixirOptimizer is stricted corresponds to one specific ElixirModule.forward
andbackward
.The high level control flow is shown below:
Low Level Memory Management
The bottom memory management of chunks is based on this paper written by me.
Beta Was this translation helpful? Give feedback.
All reactions