Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Chinese distill-whisper fine-tuning results #1648

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

yuekaizhang
Copy link
Collaborator

In #1605, we fine-tuned whisper using 14k hours Chinese data. This PR added decoding results for distill-whisper fine-tuning experiment.

Instead of actually using distillation loss for training, the model structure and parameter initialization method from the distill-whisper paper (https://arxiv.org/abs/2311.00430) were adopted: only the first and last layers of the decoder were retained.

Accuracy:
Distill-whisper is slightly worse comparing with norm whisper.

Model CER (Average SPEECH_IO_TEST_SET 01-26) Training Set
whisper-large-ft-v1 4.32% multi-hans-zh (about 14k hours)
whisper-large-ft-v1-distill 4.71% multi-hans-zh (about 14k hours)

Speed:
Every decoding step could acclerate about 4x comparing with the original decoder.
Norm whisper: 32 decoder layers
image
Distill-whisper: 2 decoder layers
image

For a quick test: https://huggingface.co/yuekai/icefall_asr_multi-hans-zh_whisper/blob/main/test_model.py

Copy link
Collaborator

@JinZr JinZr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@JinZr JinZr merged commit d5be739 into k2-fsa:master Jun 12, 2024
253 checks passed
yfyeung pushed a commit to yfyeung/icefall that referenced this pull request Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants