You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi devs. I am not a dev, but I'm suggesting making this tts system available for assistive tech on windows. This could mean 1 of 2 things, 1 of them is crossprogram and the other is for a specific program:
Making a windows sapi5(speech application programming interface v5.x) implimentation which makes it available for a veriaty of programs such as screen readers, text readers, etc etc.
Making an addon for the NVDA(non visual desktop access) free open source screen reader for windows.
The voice has to be responsive, meaning that it should have no delay before the speech and in the middle of it. In both cases, I suggest using the hifigan version as it's probably the fastest one, plus even with the lowest amount of steps it still sounds awesome. If you can't achieve this with a neural network, maybe training an hts on an ljspeech dataset will be good. I hope you consider my suggestion and let's discuss this. Have a good time
The text was updated successfully, but these errors were encountered:
Hi,
I made an NVDA addon since a time ago but uses http to synthesize speech and the full torch (forwardTacotron and HiFi-GAN) models. I've been testing the model export to torchscript and the response time in the inference is slightly fast. I've been trying to experiment with an onnx export too, but some internal changes seem to be required.
Hi devs. I am not a dev, but I'm suggesting making this tts system available for assistive tech on windows. This could mean 1 of 2 things, 1 of them is crossprogram and the other is for a specific program:
The voice has to be responsive, meaning that it should have no delay before the speech and in the middle of it. In both cases, I suggest using the hifigan version as it's probably the fastest one, plus even with the lowest amount of steps it still sounds awesome. If you can't achieve this with a neural network, maybe training an hts on an ljspeech dataset will be good. I hope you consider my suggestion and let's discuss this. Have a good time
The text was updated successfully, but these errors were encountered: