Speaker Diarization #322
Replies: 7 comments 30 replies
-
I'm trying to adapt this for a bilingual video. Unfortunately the -diarize option doesn't work with -sentence. So far I see 2 problems with what I got: 1) you have to manually merge sentences 2) Whsiper recognizes the audio of all speakers, which can cause errors when changing language |
Beta Was this translation helpful? Give feedback.
-
Hi, Purfview.. Thanks for last release and quick development!! You are awesome!! |
Beta Was this translation helpful? Give feedback.
-
Hi, r193.1 json contained the speaker, I can't find it in r194.1. Can you fix it? Thx, |
Beta Was this translation helpful? Give feedback.
-
@Purfview I really liked this option, and although you may not be crazy about this feature, in the world of video editing it's a game-changer that speeds up content production. In seconds, I can have all the dialogue from a specific "speaker", instead of spending hours searching for all the lines from that speaker. This feature definitely needs improvement, because it's not very accurate in diarization, but this is caused by the component that performs the diarization (pyannote_v3.0, pyannote_v3.1, reverb_v1 and reverb_v2) and not by Faster-Whisper-XXL. Here's a typical example: In this short video of just 1 minute, there are 5 speakers and the diarization only managed to record 3 speakers. The two women in the video were given the name [Speaker_00] even though they have different voices and [Speaker_02] was given to two men with very different voices as the first is a man in his early 50s and the other is a teenager of approximately 19 years old. Here the command line used: faster-whisper-xxl.exe "Path/of/file/Diarize Test.mp4" --model small --device cpu --verbose true --max_line_width 40 --max_line_count 2 --diarize pyannote_v3.1 --task transcribe --output_format srt --output_dir source |
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
-
Great addition with the diarization! :-)
Keep up the great work! :-) |
Beta Was this translation helpful? Give feedback.
-
@Purfview |
Beta Was this translation helpful? Give feedback.
-
Speaker Diarization supported since
r193.1
.--diarize
choices:pyannote_v3.0
- Fastest for CPUpyannote_v3.1
- Same as v3.0 but should be faster with CUDAreverb_v1
- Allegedly better than pyannote v3reverb_v2
- The slowest, allegedly the bestOther diarization options:
--num_speakers
- Number of speakers, when known.--min_speakers
- Minimum number of speakers. Has no effect whennum_speakers
is provided.--max_speakers
- Maximum number of speakers. Has no effect whennum_speakers
is provided.--speaker
- To replace 'SPEAKER' string with your own word.--diarize_device
- "cuda" or "cpu". Automatic, no need to touch it--diarize_threads
- Threads. Automatic, no need to touch it--diarize_dump
- Dumps diarization output to a file.Legal notice: Reverb models are only for personal non-profit use.
Beta Was this translation helpful? Give feedback.
All reactions