-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I'm interested in this... #1
Comments
Thanks for your interest. I'm just having one last clean of the data and rejigging the synthetic generation for a last run to see if I can improve the model, my notes are all in the readme. My biggest issue is just how slow the data processing is taking at the moment, I'm getting slightly distracted by solving that problem :) My new mega moonshot is to run all the audiothrough a denoiser first before training. This can be kind of seen as a normlisation step and will hopefully mean that new data won't be so "out of domain". I'll hopefully have some results from this at the end of the week. Compute wise if you have an a100 I can ssh into that would definitely speed things up :) feel free to DM me on Signal |
1-amazing , well I'm short of A100s now i used to have 9 , I have Rtx 4090 and Rtx 3090, one is getting trained on and one's not , so i don't know if that would help ?? |
|
Ok wonderful , also once i have my 9A100s back i will still offer them if you need them in any interesting project. :) |
there is also hifigan v2 https://daps.cs.princeton.edu/projects/Su2021HiFi2/ ? but no code available for it , but later i might try implementing it from https://github.com/rishikksh20/hifigan-denoiser (which is v1 unofficial implementation) and add something more to it. |
Hey Author , I really like the architecture used and the technique.
I was looking for something similar to this to diarize 1k+ hours of different speakrs for tts as accurate as it can get.
I wanna see any result of nanodrz in real use for example this video https://streamable.com/m5xvgf
I would like to contribute by compute or knowledge to scale this up and for it to become the new Sota, or be 99-100% accurate to unknown number of speakers.
Thanks in advance
The text was updated successfully, but these errors were encountered: