Add Universal Transformers Distillation Trainer To Train Any(CausalLM… #4

Dhiraj309 · 2025-08-21T06:56:54Z

📝 Suggested PR Description

What does this PR add?

This PR introduces transformers_distillation, a lightweight library built on top of the 🤗 Transformers ecosystem to make knowledge distillation of language models simple, flexible, and reproducible.

Key features:

📦 Drop-in usage with Hugging Face models — no extra setup needed.
🔧 Built directly on top of transformers.Trainer, so researchers and practitioners can reuse the familiar API with minimal overhead.
🧑‍💻 Designed for small teams or even individual contributors to manage training, evaluation, and experimentation without extra complexity.
✅ Includes working examples (examples/) and tests (tests/) for CausalLM, Seq2SeqLM, and MLM tasks.
⚡ Full compatibility with existing Trainer features like callbacks, logging, evaluation, distributed training (multi-GPU, TPU), and Hugging Face Hub integration.

Example usage

I’ve demonstrated this in a Kaggle public notebook using a small dataset, showing how easy it is to run end-to-end distillation with just a few lines of code.
(Link: [https://www.kaggle.com/code/dignity45/transformer-distill-trainer-knowledge-distillation])

Why this matters

Distillation is becoming increasingly important for efficient model deployment.
Current pipelines often require custom code, but this library leverages the Trainer directly, meaning practitioners can:
- Reuse existing callbacks, logging, and evaluation tools.
- Integrate seamlessly with Hugging Face Hub.
- Scale training easily with distributed GPU/TPU setups.
- Avoid reinventing the wheel.

Next steps / Future directions

Potential discussion with the Hugging Face team about whether such functionality could evolve into official support inside transformers.
Broader support and benchmarking across distributed setups (multi-GPU, TPU, DeepSpeed, etc.) for larger-scale distillation experiments.

✅ Overall, this PR lowers the barrier for practitioners who want to experiment with knowledge distillation without needing large teams or custom infrastructure, while remaining fully compatible with the Hugging Face ecosystem.

…, MLM, Seq2SeqLM) Models From Scratch, With Leveraging Knowledge Distillation

Rocketknight1 · 2025-08-21T11:17:40Z

I think this makes more sense as a separate repository, not as part of Transformers itself!

Dhiraj309 · 2025-08-21T12:35:58Z

I think this makes more sense as a separate repository, not as part of Transformers itself!
Thanks for the feedback! 🙏 I agree it makes sense as a separate repo for now, but my longer-term goal is to align it with Trainer (like Seq2SeqTrainer) so we could eventually have a lightweight DistillTrainer function call inside Transformers.

Dhiraj309 · 2025-08-23T05:51:50Z

@Rocketknight1, would you like to make me any suggestions in this if possible, it will be really helpful

Add Universal Transformers Distillation Trainer To Train Any(CausalLM…

4d9864a

…, MLM, Seq2SeqLM) Models From Scratch, With Leveraging Knowledge Distillation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Universal Transformers Distillation Trainer To Train Any(CausalLM… #4

Add Universal Transformers Distillation Trainer To Train Any(CausalLM… #4

Uh oh!

Dhiraj309 commented Aug 21, 2025

Uh oh!

Rocketknight1 commented Aug 21, 2025

Uh oh!

Dhiraj309 commented Aug 21, 2025

Uh oh!

Dhiraj309 commented Aug 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Universal Transformers Distillation Trainer To Train Any(CausalLM… #4

Are you sure you want to change the base?

Add Universal Transformers Distillation Trainer To Train Any(CausalLM… #4

Uh oh!

Conversation

Dhiraj309 commented Aug 21, 2025

📝 Suggested PR Description

What does this PR add?

Example usage

Why this matters

Next steps / Future directions

Uh oh!

Rocketknight1 commented Aug 21, 2025

Uh oh!

Dhiraj309 commented Aug 21, 2025

Uh oh!

Dhiraj309 commented Aug 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants