Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include MeloTTS or Openvoice #16

Open
djdookie opened this issue Jun 20, 2024 · 7 comments
Open

Include MeloTTS or Openvoice #16

djdookie opened this issue Jun 20, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@djdookie
Copy link

Is there a way to include and serve MeloTTS and/or OpenVoice?
They're state-of-the-art TTS (and voice cloning) and pretty fast, even on CPU only.

https://github.com/myshell-ai/MeloTTS
https://github.com/myshell-ai/OpenVoice

@matatonic
Copy link
Owner

Not yet, but I want to have the best options available so will take a look at these when I get some more time.

@ground-creative
Copy link

While we wait.

https://github.com/ground-creative/openvoice-api-python

@matatonic matatonic added the enhancement New feature or request label Aug 27, 2024
@bi1101
Copy link

bi1101 commented Sep 19, 2024

In my opinion, Tortoise TTS currently offers the best balance between quality and speed. It can achieve up to 7x real-time generation, surpassing xtts, which is capped at 3x. In this video demonstration, the model generated a 20-second audio clip in just 3 seconds with optimization. It seems that performance improves even further with longer text inputs. In terms of audio quality, Tortoise TTS is on par with xtts. Additionally, the Tortoise repository is actively maintained and regularly updated, whereas Coqui has already shut down.

Another promising option is Parler TTS, which is backed by Hugging Face and has planned improvements for the future. One major advantage of Parler TTS is its support for batching, allowing it to handle high traffic more efficiently, faster than queuing and generating sample per sample.

@matatonic
Copy link
Owner

An older version had parler TTS support (original version) but I removed it because it just seemed like random voices, which doesn't fit this project. the new parler version with stable voice identities is back on my radar, but I haven't tested it yet for quality or speed.

Re tortoise, that's news to me that it's faster, it has always been slower, I'll give it another look.

@matatonic
Copy link
Owner

The openai speech API doesn't support batching according to the API reference, so I don't plan to include batch support.

For cases outside API compatibility, especially batching, I recommend you implement inference with the model directly in your code and not via a network API. It would be much more efficient.

@bi1101
Copy link

bi1101 commented Sep 19, 2024

The openai speech API doesn't support batching according to the API reference, so I don't plan to include batch support.

I think there's been a misunderstanding. When I mentioned batching, I was referring to the server intelligently switching to batching mode when it receives concurrent requests. This allows it to process those requests in parallel. From reviewing your code, I can see that there is parallelism implemented, but it isn’t fully optimized using Parler’s native code, which offers a significant performance boost in such cases.

@matatonic
Copy link
Owner

The openai speech API doesn't support batching according to the API reference, so I don't plan to include batch support.

I think there's been a misunderstanding. When I mentioned batching, I was referring to the server intelligently switching to batching mode when it receives concurrent requests. This allows it to process those requests in parallel. From reviewing your code, I can see that there is parallelism implemented, but it isn’t fully optimized using Parler’s native code, which offers a significant performance boost in such cases.

I think I get you now, yeah so to implement continuous batching for processing parallel requests, not batch processing of a single batched request.

I hadn't considered that yet, but it is a much better solution to parallel processing than the current setup. Thanks for the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants