Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The problem of splitting transformer layers when pipeline parallelism cannot be evenly divided. #1304

Open
Baibaifan opened this issue Nov 27, 2024 · 1 comment
Labels
stale No activity in 60 days on issue or PR

Comments

@Baibaifan
Copy link

Baibaifan commented Nov 27, 2024

Describe the bug

Situation:

GPT2 Models

  • num-layers=30, pipeline-model-parallel-size=4
  • Don't use decoder-first-pipeline-num-layers and decoder-last-pipeline-num-layers

Segmentation results

stage1: 0,1,2,3,4,5,6
stage2: 7,8,9,10,11,12,13
stage3: 14,15,16,17,18,19,20
stage4: 21,22,23,24,25,26,27

sum layers: 28 layers not equal to 30 layers.

In the legacy version, there is a judgment on the number of model layers.
Image

In the Mcore version, only num-layers-per-virtual-pipeline-stage can be used to determine the number of model layers.
Image

I think if users are required to split the model layer themselves due to imbalance, judgment and necessary warnings should be added here.

Environment (please complete the following information):

  • Megatron-LM commit ID:Main branch
Copy link

Marking as stale. No activity in 60 days.

@github-actions github-actions bot added the stale No activity in 60 days on issue or PR label Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale No activity in 60 days on issue or PR
Projects
None yet
Development

No branches or pull requests

1 participant