[Bug Fix] Balance transformer blocks across shards #41

abourramouss · 2023-11-26T01:47:17Z

As we were discussing, the current implementation works like this:

It first gives an equal number of parameters to each shard.
If a transformer block is going to be split across diferent shards, prevent it, and make the current transformer block part of the current shard.

This way, we can guarantee that each shard/partition will get an equal amount of transformer blocks.

But there is an edge case, where if we specify that we want 5 shards and we have 6 transformer blocks in the model, In that case:

Shard 1 to 3 get 2 transformer blocks each, shard 4 gets the final layers and shard 5 doesn't get nothing.

To prevent this, if balance is not really important, we could shard based on transformer blocks, so if 5 shards were specified, shard 1 would get 2 transformer blocks and shard 2 to 5 would get 1 transformer block.

It must use the transformers_h_X part to indentify the block, since from transformer_h_X to end changes from model to model

abourramouss added 4 commits November 25, 2023 23:24

detect start and end of block

7e5f3a1

semaphore-like implementation

5c0ddd7

It must use the transformers_h_X part to indentify the block, since from transformer_h_X to end changes from model to model

remove log.txt file

c1c4548

better debugging

5dd03a7

xrsrke merged commit 05e1c45 into xrsrke:main Nov 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Fix] Balance transformer blocks across shards #41

[Bug Fix] Balance transformer blocks across shards #41

abourramouss commented Nov 26, 2023

[Bug Fix] Balance transformer blocks across shards #41

[Bug Fix] Balance transformer blocks across shards #41

Conversation

abourramouss commented Nov 26, 2023