Query Regarding ZeRO-1 in ColossalAI Not Sharding Optimizer State #4328

yhna940 · 2023-07-25T23:54:13Z

yhna940
Jul 25, 2023

I have been recently studying the ZeRO-1 strategy implemented by ColossalAI and have noticed something that seems quite unusual. As per my understanding, ColossalAI employs the LowLevelZeroOptimizer for its ZeRO-1 strategy.

According to the relevant literature, ZeRO-1 should shard the optimizer state, akin to what is done in fairscale's OSS or torch's Zero Redundancy. However, as I was perusing through the inner workings of the LowLevelZeroOptimizer, I couldn't find any section where the optimizer's state is sharded. I was able to confirm that it shards the gradients and parameters but not the optimizer state.

I am seeking verification regarding my understanding of this matter. Is it indeed the case that ColossalAI's ZeRO-1 doesn't shard the optimizer state or am I missing something? I would appreciate any insights or clarifications that you can provide.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query Regarding ZeRO-1 in ColossalAI Not Sharding Optimizer State #4328

{{title}}

Replies: 0 comments

Select a reply

Query Regarding ZeRO-1 in ColossalAI Not Sharding Optimizer State #4328

yhna940 Jul 25, 2023

Replies: 0 comments

yhna940
Jul 25, 2023