How to prepare dataset for the fine-tuning #2136

Saeedmatt3r · 2024-04-16T21:30:50Z

Saeedmatt3r
Apr 16, 2024

I'm currently fine-tuning Whisper for my own language, guided by the blog post linked here. I have a question regarding the injection of timestamps into the dataset. Specifically, I'm interested in how to insert timestamps in the middle of transcriptions during training.

Additionally, I have a question that might be more appropriate for the original authors(@jongwook): How did you integrate timestamps? Did you use an automated method, such as a word aligner? How did you determine the correct placement of timestamps after specific words?

gongouveia · 2024-04-17T08:36:54Z

gongouveia
Apr 17, 2024

@Saeedmatt3r Hello, I am developing a tool to create synthetic datasets to fine tune whisper. Whisper Temple Allows you to record, upload audios, edit the transcriptions or generating artificial datasets. Furthermore you can export this transcriptions as a HF 🤗 dataset (this feature is almost being added).
Please follow my project if you thinks that this tools is useful for the open-source community and If you are interested in adding the timestamps feature to the dataset You can ope an issue in the project.

2 replies

Choi-YoungHyun Jul 1, 2024

I'm using windowsOS but your program is not working about theme...

gongouveia Jul 1, 2024

@Choi-YoungHyun I am aware of that bug, does the program work other than that?
Please stick to the white theme for now. Follow the project to stay updated to when I fix this bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to prepare dataset for the fine-tuning #2136

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

How to prepare dataset for the fine-tuning #2136

Saeedmatt3r Apr 16, 2024

Replies: 1 comment · 2 replies

gongouveia Apr 17, 2024

Choi-YoungHyun Jul 1, 2024

gongouveia Jul 1, 2024

Saeedmatt3r
Apr 16, 2024

Replies: 1 comment 2 replies

gongouveia
Apr 17, 2024