Speech to text on short audio file #169
-
I am using faster-whisper for recognizing short commands in a rhasspy setup. That means that all audio files fed to faster-whisper are between 1-5 seconds. Is there any recommended parameters to use when transcribing such short audio files in order to speed up the process? Currently, transcribing for any voice command is done on a Intel Core i5-8400T and it takes around 18s using large-v2 model which is too much for intent recognition. I must use the large-v2 model because only this model correctly recognize intended commands. Thanks for any idea. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
What faster-whisper options do you currently set, if any? |
Beta Was this translation helpful? Give feedback.
It’s possible your audio triggers the "temperature fallback" which makes the transcription much slower. But that’s how Whisper tries to recover bad transcriptions by default.
Here are things you can try:
cpu_threads=6
when loading the modelbeam_size=1
temperature=0
(might impact the transcription quality)without_timestamps=True