No punctuation for the first 75 minutes of the video. What could be the error / bug? #194
-
Hello. I am generating subtitles for my this video : https://www.youtube.com/watch?v=77iDUQd4x90 I have provided the video file directly to the wisher with language en and model large However as can be seen from the below screenshots, there is no punctuation in the transcription for first 75 minutes of the video : But after that, the punctuation starts I am using the latest updated version of Whisper on Windows computer with Python Version 3.9.9 |
Beta Was this translation helpful? Give feedback.
Replies: 12 comments 30 replies
-
@jongwook @drdaxxy @cool-RR @gglanzani If you could check I would appreciate very much |
Beta Was this translation helpful? Give feedback.
-
Being an autoregressive model, Whisper has a certain chance to get stuck into a "no-punctuation mode". It also seems to be correlated with the tendency to create either precise timestamps you see in the first screenshot or integer timestamps you see in the second. You could try giving |
Beta Was this translation helpful? Give feedback.
-
@jongwook I am 100% there must be a way to force it to be on punctuation mode 100% time Yesterday I did various tests and finally I made it stay After converting audio into mp3, medium en model with a hypothetical sentence processed this video of mine with 100% punctuation https://www.youtube.com/watch?v=77iDUQd4x90 If only I knew Python I am sure I could figure out but I am C# guy Here 2 transcribes files for you to compare what I mean 1 : MP3 formatted input, large model, with a hypothetical sentence, in the beginning punctuation on then it becomes off then becomes on again : https://docs.google.com/document/d/12lo_Utex7dpM1qLHnxYjsYTcLYzzYuCOLkbhP6KZH3U/edit?usp=sharing 2 : MP3 formatted input, medium en model, with a hypothetical sentence, 100% time punctuation : https://docs.google.com/document/d/1j1fTf_h-086mHHfCp74GbW2e6DIqo9GCqUlkKZAG0nQ/edit?usp=sharing Both models got exactly same input and 1 of them worked with punctuation on and another didn't I really need help on this thank you very much |
Beta Was this translation helpful? Give feedback.
-
I also noticed very weird something. At some parts of the video, I make computer to read some text. For example here at the minute 11:42 i start computer voice. As you can imagine it has perfect English. Just open subtitles it is generated by Whisper. https://youtu.be/77iDUQd4x90?t=702 However, model medium.en didn't generate any text for that part. It generated almost perfect text for my speech but that part is missing. This behaviour repeats. Sometimes entire speech of computer is missing and sometimes only some part of it. I have used this command to generate this transcription
|
Beta Was this translation helpful? Give feedback.
-
Perhaps we could implement this into the whisper optionally? That can process output of whisper and save as another output? |
Beta Was this translation helpful? Give feedback.
-
I suspect the reason it drops the punctuation is this: Lines 235 to 237 in 0b1ba3d The model struggles with a segment, resets the prompt, and then decides to go without punctuation from there. I've made a pull request (#220) that I think might solve it. |
Beta Was this translation helpful? Give feedback.
-
Sorry to interrupt But is it possible for I mean, if I downloaded the auto-generated subtitles from a youtube video with |
Beta Was this translation helpful? Give feedback.
-
Strangely enough, if I pass an initial prompt of |
Beta Was this translation helpful? Give feedback.
-
This is a hack solution I came up with today: mayeaux/faster-whisper@dda1795 It could use some refining but honestly it works well for me. |
Beta Was this translation helpful? Give feedback.
-
I started using GPT4 to fix punctuation :D |
Beta Was this translation helpful? Give feedback.
-
This maybe works: |
Beta Was this translation helpful? Give feedback.
-
Hey all! I found that my “—initial_prompt” would work for a short time. And it would stop after a while. I found that adjusting the prompt by adding in 1 more sentence or word that has punctuation or full stop managed to circumvent the repetitive failure loop of getting the same transcript. It would almost be an entirely new transcript. Hope this helps! |
Beta Was this translation helpful? Give feedback.
Being an autoregressive model, Whisper has a certain chance to get stuck into a "no-punctuation mode". It also seems to be correlated with the tendency to create either precise timestamps you see in the first screenshot or integer timestamps you see in the second. You could try giving
--initial_prompt "Hello, welcome to my lecture."
to nudge the model to weigh more on the "with-punctuation mode".