-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Knowledge distillation #1035
Comments
Hi @yangliuxin-nn, That model is missing a |
Thanks @kylesayrs . I have a few questions about examples/trl_mixin/ex_trl_distillation.py:
Many thanks for your time! |
Hi @yangliuxin-nn, To produce our pretrained 2:4 Sparse Llama model, we haven’t fully open-sourced the dataset mix, so reproducing it on your side may be challenging—not to mention the significant GPU resources required for the process. For fine-tuning the sparse model with distillation on your target dataset, we used a custom fork of MosaicML's llm-foundry, but you’re welcome to use any framework you’re comfortable with. If you decide to use your own fine-tuning framework, you’ll need to implement two key features:
Could you share more details about your setup, such as the model, dataset type and size, number of GPUs, fine-tuning framework, etc.? That will help me provide a more tailored response. |
Hi @yangliuxin-nn,
For your case, I recommend compressing your model in two steps. In the first step, use SparseGPT to prune your model (examples/llama3_8b_2of4.py). When saving your model, use In the second step, load the model |
Hi team, can you please help us how to perform knowledge distillation to obtain this model, and how can we fine-tune the model based on the distilled version? Thanks a lot!
The text was updated successfully, but these errors were encountered: