You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem is caused by use_orig_params: true in FSDP configuration (link). This config means that the model variables are different from the variables for training; thus, even we add Medusa heads to the model variables, the FSDP-wrapped variables are empty.
The workaround is to use model loader in Trainer. I'll send a PR to fix this bug soon.
🐛 Describe the bug
Tensors saved in
medusa_only_heads
mode are empty.Ref: https://github.com/linkedin/Liger-Kernel/blob/main/examples/medusa/train.py#L392
Reproduce
No response
Versions
N/A
The text was updated successfully, but these errors were encountered: