Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty Medusa head tensors #309

Open
vkc1vk opened this issue Oct 14, 2024 · 2 comments
Open

Empty Medusa head tensors #309

vkc1vk opened this issue Oct 14, 2024 · 2 comments

Comments

@vkc1vk
Copy link

vkc1vk commented Oct 14, 2024

🐛 Describe the bug

Tensors saved in medusa_only_heads mode are empty.
Ref: https://github.com/linkedin/Liger-Kernel/blob/main/examples/medusa/train.py#L392

Reproduce

No response

Versions

N/A

@vkc1vk
Copy link
Author

vkc1vk commented Oct 15, 2024

cc: @jaszhu13

@chiwanpark
Copy link
Contributor

The problem is caused by use_orig_params: true in FSDP configuration (link). This config means that the model variables are different from the variables for training; thus, even we add Medusa heads to the model variables, the FSDP-wrapped variables are empty.

The workaround is to use model loader in Trainer. I'll send a PR to fix this bug soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants