-
Notifications
You must be signed in to change notification settings - Fork 100
Open
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/SpecForge/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
Bug Description
When training with multiple TP ranks (e.g., --tp-size 4), the saved model weights are incomplete - only containing weights from individual TP ranks instead of the full model.
Model config: hidden_size=2048, intermediate_size=12288, num_attention_heads=32
Expected vs Actual Behavior
Expected (full model weights):
midlayer.mlp.down_proj.weight: torch.Size([2048, 12288])
midlayer.mlp.gate_proj.weight: torch.Size([12288, 2048])
midlayer.mlp.up_proj.weight: torch.Size([12288, 2048])
midlayer.self_attn.k_proj.weight: torch.Size([512, 4096])
midlayer.self_attn.o_proj.weight: torch.Size([2048, 4096])
midlayer.self_attn.q_proj.weight: torch.Size([4096, 4096])
midlayer.self_attn.v_proj.weight: torch.Size([512, 4096])
Actual (incomplete/sharded weights):
midlayer.mlp.down_proj.weight: torch.Size([2048, 3072])
midlayer.mlp.gate_proj.weight: torch.Size([3072, 2048])
midlayer.mlp.up_proj.weight: torch.Size([3072, 2048])
midlayer.self_attn.k_proj.weight: torch.Size([128, 4096])
midlayer.self_attn.o_proj.weight: torch.Size([2048, 1024])
midlayer.self_attn.q_proj.weight: torch.Size([1024, 4096])
midlayer.self_attn.v_proj.weight: torch.Size([128, 4096])
Impact
It may cause SGLang to throw an error when loading the Eagle model weights:

Reproduction
Qwen3-30B-A3B
Environment
Metadata
Metadata
Assignees
Labels
No labels