-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for text-to-speech
(w/ Speecht5)
#345
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
My onnx_py.mp4(running that code in python w/ same speaker embeddings). My hunch is that the problem is either:
will continue investigating. |
Due to bug in transformers: huggingface/transformers#26547
@fxmarty The dropout patch fixed it! 🥳 fixed.mp4 |
And the volume difference can be fixed by multiplying the waveform by some constant factor. I don't see any post-processing in the Pythonpy.mp4JavaScript
|
I tried to use WebGPU in v3 to increase the generation speed, but got an error. Would it be difficult to add WebGPU support for TextToSpeech? |
This PR adds text-to-speech support to Transformers.js, with speecht5. We will add support for bark and other models in future updates (and when Optimum supports those exports).
closes #59, #279, #315
Example usage:
fixed.mp4
(converted to mp4 since GH doesn't allow wav)
Notes:
There are minor artifacts in the output, so just need to check this (cc @fxmarty). e.g., here is the python output:python.mp4
TODO: