Add Small Gaps Between Audio Chunks to Avoid Rushed Speech #53

bi1101 · 2024-08-27T09:50:44Z

Hi,

I noticed that the repository uses a really smart approach to account for text length by segmenting the text into smaller chunks and generating audio for each segment separately.

But there's a small issue. The part where the stitching happens is very noticeable. When the audio chunks are stitched together, there are no gaps between them, which makes the resulting speech sound a bit rushed and unnatural.

Can you add this in future versions? Making it configurable via .env would also be nice.

matatonic · 2024-08-27T13:18:38Z

Are you experiencing this with piper or xtts models? both? (any more details about which voice settings you're using are also helpful for me to test with)

bi1101 · 2024-08-27T13:36:08Z

I'm experiencing this with xtts, (the piper is rushed by default in my experience, so this is not an issue). With every 2 sentence or so, the audio is kinda rushed to the next sentence without a pause. The pace remain similar between different voices
This is the audio file response.webm

And the curl

{
    "model": "tts-1-hd",
    "input": "Once upon a time in a small village nestled between rolling hills, there lived a young girl named Lila. She was known for her curiosity and love for the stars. Every night, she would climb up the hill behind her house to gaze at the twinkling lights in the sky. Her favorite star was the brightest one, which she named Lumina. One evening, as she was lying on the grass, Lila noticed Lumina flickering strangely. Concerned, she whispered, 'What's wrong, Lumina?' To her amazement, the star responded with a soft glow and a gentle voice, 'I am losing my light, Lila. I need your help.' Determined to save her beloved star, Lila asked, 'What can I do?' Lumina explained that a dark shadow from a distant galaxy was slowly dimming her light. To restore it, Lila would need to gather the light of the purest hearts in her village and send it to Lumina. The next day, Lila set out on a quest. She visited her neighbors, friends, and even the animals in the village, asking for their purest wishes and hopes. She collected them in a small glass jar that sparkled with a soft, golden light. With the jar full, Lila returned to the hill at sunset. Holding the jar high, she whispered a wish for Lumina to shine brightly again. The jar burst open, releasing a beam of golden light that shot up into the sky, enveloping Lumina. The star began to glow with renewed brilliance, brighter than ever before. Lumina twinkled joyfully and thanked Lila for her kindness and bravery. From that night on, Lumina became a guiding star for travelers and a symbol of hope for the village. Lila continued to visit Lumina every night, knowing that the light of a pure heart could brighten even the darkest of skies.",
    "voice": "alloy",
    "response_format": "mp3",
    "speed": "1"
}

matatonic · 2024-08-27T14:50:06Z

Pretty subtle, but yeah, I hear it. Have you considered using speed: 0.9?

There is no built in option to add silence in xtts, so the change is fiddling with the wav output in the stream. This will probably not happen by me unless it gets more support, but I'm open to a PR for it.

bi1101 · 2024-08-28T09:43:44Z

#55 This should mitigate the issue

matatonic · 2024-08-28T13:08:42Z

Funny, just yesterday I figured this out also (silence was 0s,).

matatonic added the enhancement New feature or request label Aug 27, 2024

bi1101 closed this as completed Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Small Gaps Between Audio Chunks to Avoid Rushed Speech #53

Add Small Gaps Between Audio Chunks to Avoid Rushed Speech #53

bi1101 commented Aug 27, 2024

matatonic commented Aug 27, 2024 •

edited

Loading

bi1101 commented Aug 27, 2024 •

edited

Loading

matatonic commented Aug 27, 2024 •

edited

Loading

bi1101 commented Aug 28, 2024

matatonic commented Aug 28, 2024

Add Small Gaps Between Audio Chunks to Avoid Rushed Speech #53

Add Small Gaps Between Audio Chunks to Avoid Rushed Speech #53

Comments

bi1101 commented Aug 27, 2024

matatonic commented Aug 27, 2024 • edited Loading

bi1101 commented Aug 27, 2024 • edited Loading

matatonic commented Aug 27, 2024 • edited Loading

bi1101 commented Aug 28, 2024

matatonic commented Aug 28, 2024

matatonic commented Aug 27, 2024 •

edited

Loading

bi1101 commented Aug 27, 2024 •

edited

Loading

matatonic commented Aug 27, 2024 •

edited

Loading