-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Fish model #58
Comments
I am considering it, so far I've heard it's not as good as xtts, but haven't tried it myself yet. |
Imho its far superior to xtts - less robotic and more emotional. |
I see a major reason to implement support for Fish: it seems to support quantization. I have an old GPU with 8G of RAM so every byte matters to me and I really struggled to find any good information on how to quantize XTTS. I conclude that it's not something that can be relied upon so seeing this PR that adds quantization support for Fish Speech makes me very interested! PS: what's up with deepspeed for XTTS btw? I see that it takes a |
That's a great point, thanks for that. Re: deepspeed, can you start a new issue or discussion? it's worth its own space, I know it would help low VRAM folks a lot but it's a bit complex, especially for windows. |
Hi, I took a quick look at fish audio again. I'm sharing this to make it easier to give it a try! Their reference is there https://speech.fish.audio/ but I ended up doing my thing: git clone https://github.com/fishaudio/fish-speech/
cd fish-speech Then create docker-compose.yml with content: services:
fish-speech:
image: fishaudio/fish-speech:latest-dev # avoid building it
volumes:
- ./:/exp
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
network_mode: host # to access their gradio
My takeaway is that its of super high quality, and quite fast. Hard to quantify but I never saw it take more than 2.2G of VRAM, whereas xtts often took all my 8Go (might actually be a bug come to think of it?!). Fish on my old gpu seems to take 60s to generate 30s of audio. But have done zero optimization. I don't really understand how to enable quantization. There seems to be some args to setup I think to go further I would need to compile it from the repo to modify the entry point to the other python gradio scripts. There are some related to quantization directly. |
Have you seen the new fish speech model https://github.com/fishaudio/fish-speech ?
Wonderful voice cloning and intonation performance.
Would you consider supporting it?
The text was updated successfully, but these errors were encountered: