Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knowledge distillation #1035

Open
yangliuxin-nn opened this issue Jan 5, 2025 · 4 comments
Open

Knowledge distillation #1035

yangliuxin-nn opened this issue Jan 5, 2025 · 4 comments
Assignees
Labels
question Further information is requested

Comments

@yangliuxin-nn
Copy link

Hi team, can you please help us how to perform knowledge distillation to obtain this model, and how can we fine-tune the model based on the distilled version? Thanks a lot!

@yangliuxin-nn yangliuxin-nn added the bug Something isn't working label Jan 5, 2025
@kylesayrs
Copy link
Collaborator

Hi @yangliuxin-nn,

That model is missing a recipe.yaml file! Despite the missing file, it's likely that this model was compressed using a script similar to examples/trl_mixin/ex_trl_distillation.py. I'm currently reaching out the research team which compressed this model in order to confirm this.

@yangliuxin-nn
Copy link
Author

Thanks @kylesayrs . I have a few questions about examples/trl_mixin/ex_trl_distillation.py:

  1. I noticed 'test_stage' in the yaml configuration - what does this stage represent?
  2. Does it make sense to switch from ConstantPruningModifier to using SparseGPT in the examples/trl_mixin/ex_trl_distillation.py?
  3. After saving the model, what's the correct way to load it? And should we expect to match the published performance metrics after pruning and distillation?

Many thanks for your time!

@eldarkurtic
Copy link
Collaborator

Hi @yangliuxin-nn,

To produce our pretrained 2:4 Sparse Llama model, we haven’t fully open-sourced the dataset mix, so reproducing it on your side may be challenging—not to mention the significant GPU resources required for the process.

For fine-tuning the sparse model with distillation on your target dataset, we used a custom fork of MosaicML's llm-foundry, but you’re welcome to use any framework you’re comfortable with. If you decide to use your own fine-tuning framework, you’ll need to implement two key features:

  1. Masking of sparse weights
  2. Knowledge distillation

Could you share more details about your setup, such as the model, dataset type and size, number of GPUs, fine-tuning framework, etc.? That will help me provide a more tailored response.

@kylesayrs
Copy link
Collaborator

Hi @yangliuxin-nn,

  1. test_stage refers to a purely aesthetic name for the stage. In this example, it's kind of misleading. A better name would be compression_stage.
  2. The ConstantPruningModifier is a modifier which maintains existing sparsity. In this example, the modifier maintains the existing sparsity of the base model, "neuralmagic/Llama-2-7b-pruned50-retrained". For more information, see (this explaination)
  3. We highly recommend using vllm to load and perform inference with the model

For your case, I recommend compressing your model in two steps. In the first step, use SparseGPT to prune your model (examples/llama3_8b_2of4.py). When saving your model, use save_pretrained(save_compressed=False).

In the second step, load the model AutoModel.from_pretrained and perform KD finetuning, as described by @eldarkurtic or using examples/trl_mixin/ex_trl_distillation.py. Save your model with save_pretrained(save_compressed=True), and then load the model using vllm.

@kylesayrs kylesayrs added question Further information is requested and removed bug Something isn't working labels Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants