Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetune Whisper model on LibriSpeech #1571

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

marcoyang1998
Copy link
Collaborator

@marcoyang1998 marcoyang1998 commented Mar 28, 2024

This recipe finetunes a Whisper model on LibriSpeech following #1466.

  • Update the results
  • Compare full fine-tune and partial (e.g. encoder/decode) fine-tune
  • Compare AdamW and ScaledAdam
  • Compare the compressed fbank feature (Lilcom) and un-compressed fbank feature (hdf5) as input

@marcoyang1998
Copy link
Collaborator Author

marcoyang1998 commented Mar 29, 2024

A comparison of using different fbank features to decode Whisper model. The LilcomChunkyWriter stores compressed fbank features, causing a slight mismatch when inferencing the Whisper model. NumpyHdf5Writer stores the un-compressed fbank feature, but requires more storage.

In general, using the un-compressed features is slightly better than using the compressed features. The performance difference is minor, except for large-v2. The WERs are obtained using greedy search.

model name feature type WER
small Lilcom 4.59/10.46
small hdf5 4.57/10.11
small.en Lilcom 4.83/11.06
small.en hdf5 4.82/11.04
medium Lilcom 4.02/7.53
medium hdf5 4.04/7.53
medium.en Lilcom 3.72/7.69
medium.en hdf5 3.72/7.65
large-v2 Lilcom 4.37/8.03
large-v2 hdf5 4.25/7.68
large-v3 Lilcom 3.73/6.1
large-v3 hdf5 3.73/6.1

@marcoyang1998
Copy link
Collaborator Author

marcoyang1998 commented Apr 7, 2024

Effect of freezing different modules

Num epoch = 10, with Lilcom compressed features
Only fine-tune on train-clean-100.

Finetune small.en, adam optimizer, lr=1e-5

Without fine-tuning: 4.83/11.06 (greedy)

Freeze modules Num trainable test-clean/test-other
None 241M Greedy: 3.35/7.22, Beam search: 3.28/6.63
encoder 154M Greedy: 3.67/7.81, Beam search: 3.51/7.17
decoder 87M Greedy: 3.14/7.37, Beam search: 3.02/6.98

Finetune medium, adam optimizer, lr=1e-5

Num epoch = 10, with Lilcom compressed features
Without fine-tuning: 4.02/7.53 (greedy)

Freeze modules Num trainable test-clean/test-other
None 762M Greedy: 2.82/5.88, Beam search: 2.74/5.56
encoder 457M Greedy: 3.2/6.41, Beam search: 3.02/6.0
decoder 356M Greedy: 2.81/7.38, Beam search: 2.64/5.85

@marcoyang1998
Copy link
Collaborator Author

Effect of different learning rates:

Model: small.en (without fine-tune: 4.83/11.06)

learning rate test-clean/test-other
1e-4 4.77/10.48
5e-5 3.8/8.12
1e-5 3.35/7.22
5e-6 3.24/7.01

Model: medium (without fine-tune: 4.02/7.53)

learning rate test-clean/test-other
5e-5 6.81/14.76
1e-5 2.82/5.88
5e-6 2.79/5.74

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant