Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm interested in this... #1

Open
francqz31 opened this issue Feb 19, 2024 · 5 comments
Open

I'm interested in this... #1

francqz31 opened this issue Feb 19, 2024 · 5 comments

Comments

@francqz31
Copy link

Hey Author , I really like the architecture used and the technique.
I was looking for something similar to this to diarize 1k+ hours of different speakrs for tts as accurate as it can get.
I wanna see any result of nanodrz in real use for example this video https://streamable.com/m5xvgf

I would like to contribute by compute or knowledge to scale this up and for it to become the new Sota, or be 99-100% accurate to unknown number of speakers.

Thanks in advance

@mogwai
Copy link
Owner

mogwai commented Feb 20, 2024

Thanks for your interest. I'm just having one last clean of the data and rejigging the synthetic generation for a last run to see if I can improve the model, my notes are all in the readme. My biggest issue is just how slow the data processing is taking at the moment, I'm getting slightly distracted by solving that problem :)

My new mega moonshot is to run all the audiothrough a denoiser first before training. This can be kind of seen as a normlisation step and will hopefully mean that new data won't be so "out of domain".

I'll hopefully have some results from this at the end of the week.

Compute wise if you have an a100 I can ssh into that would definitely speed things up :)

feel free to DM me on Signal

@francqz31
Copy link
Author

1-amazing , well I'm short of A100s now i used to have 9 , I have Rtx 4090 and Rtx 3090, one is getting trained on and one's not , so i don't know if that would help ??
2- I can recommend some of the best or SOTA denoiser/speech enhancement algos if you want

@mogwai
Copy link
Owner

mogwai commented Feb 21, 2024

  1. I've got two 4090's and due to get some a100 / h100 from LAOIN
  2. Yes Please! I'm not too worried about this being perfect yet, want to see it's affects first

@francqz31
Copy link
Author

Ok wonderful , also once i have my 9A100s back i will still offer them if you need them in any interesting project. :)
for denoising and enhancement the best thing till now is that https://github.com/yxlu-0102/MP-SENet :) . try it if you want and see if it is suitable for your usage if not i will recommend something else . but in my use case this works the best

@francqz31
Copy link
Author

there is also hifigan v2 https://daps.cs.princeton.edu/projects/Su2021HiFi2/ ? but no code available for it , but later i might try implementing it from https://github.com/rishikksh20/hifigan-denoiser (which is v1 unofficial implementation) and add something more to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants