-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Small Gaps Between Audio Chunks to Avoid Rushed Speech #53
Comments
Are you experiencing this with piper or xtts models? both? (any more details about which voice settings you're using are also helpful for me to test with) |
I'm experiencing this with xtts, (the piper is rushed by default in my experience, so this is not an issue). With every 2 sentence or so, the audio is kinda rushed to the next sentence without a pause. The pace remain similar between different voices And the curl
|
Pretty subtle, but yeah, I hear it. Have you considered using speed: 0.9? There is no built in option to add silence in xtts, so the change is fiddling with the wav output in the stream. This will probably not happen by me unless it gets more support, but I'm open to a PR for it. |
#55 This should mitigate the issue |
Funny, just yesterday I figured this out also (silence was 0s,). |
Hi,
I noticed that the repository uses a really smart approach to account for text length by segmenting the text into smaller chunks and generating audio for each segment separately.
But there's a small issue. The part where the stitching happens is very noticeable. When the audio chunks are stitched together, there are no gaps between them, which makes the resulting speech sound a bit rushed and unnatural.
Can you add this in future versions? Making it configurable via .env would also be nice.
The text was updated successfully, but these errors were encountered: