Did you manage to get xtts to stop hallucinating? #31
-
Or is that the thing in your readme that continuously re-generates lines until they have a high enough "detected quantity rating" IE- Not having the mumbled hallucinations at the end |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
I don't have a lot of issues with those, but they do sometimes appear. I apply some fade in and fade out, which may be helping with it a little. I haven't played a lot with the evaluation model, to be honest, as my GPU doesn't handle it very well (I only have a 3050 at the moment). I think the hallucinations may be caused by some wav samples. Maybe try denosing them or trying a few of the same voice in a folder? Using whisperx with its enhanced per-word alignment might be a solution too, probably even the small or medium model would work well for this purpose, and if it runs on the cpu using a separate thread, it should not interfere with GPU generations much. |
Beta Was this translation helpful? Give feedback.
Yup, thx for the info, it was very helpful
Temperature seems to play the biggest role in reducing hallucinations.
The others seem to also effect it but with default temp seems to do the most
After mapping those controls to ebook2audiobookxtts I was able to see first hand what they do.
Heres the docs I found on it as well.
https://docs.coqui.ai/en/latest/models/xtts.html
And this too
https://docs.coqui.ai/en/latest/_modules/TTS/tts/models/xtts.html