Did you manage to get xtts to stop hallucinating? #31

DrewThomasson · 2024-09-10T19:00:05Z

DrewThomasson
Sep 10, 2024

Or is that the thing in your readme that continuously re-generates lines until they have a high enough "detected quantity rating"

IE- Not having the mumbled hallucinations at the end

Answered by DrewThomasson

Oct 9, 2024

Yup, thx for the info, it was very helpful

Temperature seems to play the biggest role in reducing hallucinations.

The others seem to also effect it but with default temp seems to do the most

After mapping those controls to ebook2audiobookxtts I was able to see first hand what they do.

Lowering the top_p and the top_k seems to also increase the generation speed as well.

Heres the docs I found on it as well.

https://docs.coqui.ai/en/latest/models/xtts.html

And this too

https://docs.coqui.ai/en/latest/_modules/TTS/tts/models/xtts.html

lol just found out about num_beams too so I could potentially have it generate multiple outputs and have the model choose the highest rated one via greedy s…

View full answer

lukaszliniewicz · 2024-09-10T19:47:39Z

lukaszliniewicz
Sep 10, 2024
Maintainer

I don't have a lot of issues with those, but they do sometimes appear. I apply some fade in and fade out, which may be helping with it a little. I haven't played a lot with the evaluation model, to be honest, as my GPU doesn't handle it very well (I only have a 3050 at the moment). I think the hallucinations may be caused by some wav samples. Maybe try denosing them or trying a few of the same voice in a folder? Using whisperx with its enhanced per-word alignment might be a solution too, probably even the small or medium model would work well for this purpose, and if it runs on the cpu using a separate thread, it should not interfere with GPU generations much.

5 replies

DrewThomasson Sep 10, 2024
Author

HMMM ill try playing around with those,

thx

lukaszliniewicz Sep 10, 2024
Maintainer

PPS. Now that I think of it, I believe hallucinations may be more likely to occur for longer chunks in my experience, which is why I set the character limit to 160 by default.

lukaszliniewicz Sep 17, 2024
Maintainer

Have you figured it out? :)

DrewThomasson Oct 9, 2024
Author

Yup, thx for the info, it was very helpful

Temperature seems to play the biggest role in reducing hallucinations.

The others seem to also effect it but with default temp seems to do the most

After mapping those controls to ebook2audiobookxtts I was able to see first hand what they do.

Lowering the top_p and the top_k seems to also increase the generation speed as well.

Heres the docs I found on it as well.

https://docs.coqui.ai/en/latest/models/xtts.html

And this too

https://docs.coqui.ai/en/latest/_modules/TTS/tts/models/xtts.html

lol just found out about num_beams too so I could potentially have it generate multiple outputs and have the model choose the highest rated one via greedy search stuff

Answer selected by DrewThomasson

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Did you manage to get xtts to stop hallucinating? #31

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Did you manage to get xtts to stop hallucinating? #31

DrewThomasson Sep 10, 2024

Replies: 1 comment · 5 replies

lukaszliniewicz Sep 10, 2024 Maintainer

DrewThomasson Sep 10, 2024 Author

lukaszliniewicz Sep 10, 2024 Maintainer

lukaszliniewicz Sep 17, 2024 Maintainer

DrewThomasson Oct 9, 2024 Author

DrewThomasson
Sep 10, 2024

Replies: 1 comment 5 replies

lukaszliniewicz
Sep 10, 2024
Maintainer

DrewThomasson Sep 10, 2024
Author

lukaszliniewicz Sep 10, 2024
Maintainer

lukaszliniewicz Sep 17, 2024
Maintainer

DrewThomasson Oct 9, 2024
Author