Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with adding speaker diarization #45

Open
audreyfeldroy opened this issue Oct 4, 2023 · 3 comments
Open

Experiment with adding speaker diarization #45

audreyfeldroy opened this issue Oct 4, 2023 · 3 comments
Assignees
Labels
good first issue Good for newcomers hacktoberfest-accepted Issue or PR is approved for anyone who wants it to count toward Hacktoberfest help wanted Extra attention is needed high priority Opportunity to contribute something valuable that's urgently needed

Comments

@audreyfeldroy
Copy link
Member

audreyfeldroy commented Oct 4, 2023

Speaker diarization is where you annotate a transcript by noting which words were spoken by which speakers.

There are tools in Python that do this. It would be great to try them out and see if any would work for our project:

It's possible we may also have to implement our own speaker diarization, either here or in a separate repo that we use as a dependency here. I attended a talk last night about how News UK did this with their own dynamic clustering of their vectorized embeddings. They used the large whisper model to transcribe their audio files, and then they implemented speaker diarization using their own algorithm. I vaguely recall they used https://github.com/NVIDIA/NeMo for the auto-clustering.

Contributions welcome from anyone who wants to play with this!

@audreyfeldroy audreyfeldroy added help wanted Extra attention is needed hacktoberfest-accepted Issue or PR is approved for anyone who wants it to count toward Hacktoberfest good first issue Good for newcomers high priority Opportunity to contribute something valuable that's urgently needed labels Oct 4, 2023
@heymanpreet
Copy link
Contributor

@audreyfeldroy Happy to experiment with this ticket if anyone not working on it.
Thanks.

@audreyfeldroy
Copy link
Member Author

This one is open for anyone looking for an issue to work on 🙂

@Subramaniam-dot
Copy link
Contributor

Can I take up this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers hacktoberfest-accepted Issue or PR is approved for anyone who wants it to count toward Hacktoberfest help wanted Extra attention is needed high priority Opportunity to contribute something valuable that's urgently needed
Projects
None yet
Development

No branches or pull requests

3 participants