Experiment with adding speaker diarization #45
Labels
good first issue
Good for newcomers
hacktoberfest-accepted
Issue or PR is approved for anyone who wants it to count toward Hacktoberfest
help wanted
Extra attention is needed
high priority
Opportunity to contribute something valuable that's urgently needed
Speaker diarization is where you annotate a transcript by noting which words were spoken by which speakers.
There are tools in Python that do this. It would be great to try them out and see if any would work for our project:
It's possible we may also have to implement our own speaker diarization, either here or in a separate repo that we use as a dependency here. I attended a talk last night about how News UK did this with their own dynamic clustering of their vectorized embeddings. They used the large whisper model to transcribe their audio files, and then they implemented speaker diarization using their own algorithm. I vaguely recall they used https://github.com/NVIDIA/NeMo for the auto-clustering.
Contributions welcome from anyone who wants to play with this!
The text was updated successfully, but these errors were encountered: