Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identify persons in audio #306

Open
louis030195 opened this issue Sep 11, 2024 · 8 comments
Open

identify persons in audio #306

louis030195 opened this issue Sep 11, 2024 · 8 comments

Comments

@louis030195
Copy link
Collaborator

No description provided.

Copy link

linear bot commented Sep 11, 2024

@NicodemPL
Copy link

I tried myself with paynnote audio - this works pretty good. fully local. but python based.
https://github.com/pyannote/pyannote-audio

highly recommended - tried different systems (also paid) and this one is really efficient and delivers good quality.

On top of this - once you have separate audio - microphone and display this is even more promising for better quality meeting notes.
I've developed a Python tool that combines Whisper transcription and Pyannote diarization to create comprehensive meeting transcript. This automated system transcribes audio, identifies speakers, and integrates the results, laying the groundwork for AI-assisted prompt for good notes generations.
Still got some issues on my side but its basically working and 100% local. So this is doable for sure.
And it beats Rewind.ai / Limitless for sure :) Locally.

@louis030195
Copy link
Collaborator Author

/bounty 100

definition of done:

  • screenpipe-audio has some code that identify speakers - i guess after the transcription?
  • this is sent to the screenpipe-server which would insert speakers into DB
  • which is then returned in db queries & api

rules:

  • use rust, local
  • do not use too much compute, screenpipe must still be usable on normal consumer hardware (eventually if it's possible to use GPU/NPU...)
  • ideally separate file in screenpipe-audio
  • works on all OSes

Copy link

algora-pbc bot commented Sep 13, 2024

💎 $100 bounty • Screenpi.pe

Steps to solve:

  1. Start working: Comment /attempt #306 with your implementation plan
  2. Submit work: Create a pull request including /claim #306 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to mediar-ai/screenpipe!

Add a bountyShare on socials

@kernel-loophole
Copy link

@louis030195 is this issue is still open ,would love to work on this

@louis030195
Copy link
Collaborator Author

@EzraEllette is on it i believe

@EzraEllette
Copy link
Contributor

Doing this now. Almost ready for a PR. Writing Tests.

@EzraEllette
Copy link
Contributor

So it seems like all of the hallucinations match to the same speaker, which could be useful for determining if part of a transcript is hallucination... I will update my branch, but I need to do performance improvements because right now I am segmenting speech then performing stt on each segment. @louis030195

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants