How to prepare dataset for the fine-tuning #2136
Saeedmatt3r
started this conversation in
General
Replies: 1 comment 2 replies
-
@Saeedmatt3r Hello, I am developing a tool to create synthetic datasets to fine tune whisper. Whisper Temple Allows you to record, upload audios, edit the transcriptions or generating artificial datasets. Furthermore you can export this transcriptions as a HF 🤗 dataset (this feature is almost being added). |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm currently fine-tuning Whisper for my own language, guided by the blog post linked here. I have a question regarding the injection of timestamps into the dataset. Specifically, I'm interested in how to insert timestamps in the middle of transcriptions during training.
Additionally, I have a question that might be more appropriate for the original authors(@jongwook): How did you integrate timestamps? Did you use an automated method, such as a word aligner? How did you determine the correct placement of timestamps after specific words?
Beta Was this translation helpful? Give feedback.
All reactions