Speker Diarization Using Whisper Transcription and Nvidia NeMo #898
Replies: 6 comments 15 replies
-
Cool, that's a great contribution! FYI, there are other approaches than WhisperX to get word timestamps, that do not require an additional wav2vec model. |
Beta Was this translation helpful? Give feedback.
-
On the last version of whisper-timestamped, there is no problem of VRAM/RAM. I also benchmarked the accuracy of word timestamp on a French dataset for which I had an accurate annotation of timestamps, and the performance of WhisperX and whisper-timestamped are very close. I will soon implement an approach that uses VAD to be independent of whisper timestamp prediction. |
Beta Was this translation helpful? Give feedback.
-
Hey, this worked pretty well! The diarization isn't the best, but it's pretty good and should be good enough for many use cases. Thanks for sharing! |
Beta Was this translation helpful? Give feedback.
-
www.lexicaps.com seamlessly adds diarization to Whispers transcription. No 3rd party packages. |
Beta Was this translation helpful? Give feedback.
-
Hi, we are also building an ASR tool using Whisper and NVIDIA NeMo for diarization on public: https://github.com/Wordcab/wordcab-transcribe Audio file transcription will never be a struggle anymore, plus we provide a top-class API on top of the full process. |
Beta Was this translation helpful? Give feedback.
-
Hi. |
Beta Was this translation helpful? Give feedback.
-
Hello, I've built a pipeline Here to enable speaker diarization using whisper's transcriptions. It includes preprocessing that separates the vocals from other sounds, and post processing by realigning the transcriptions according to punctuations (thanks to @mu4farooqi). It also uses WhisperX (by @m-bain) for timestamp correction.
From my trials, the results are better than the PyAnnote approach mentioned in #264
The code is originally written to handle 1hr+ podcasts so no need to split the audio in advance
Feel free to try it and give me your feedback and suggestions
Beta Was this translation helpful? Give feedback.
All reactions