Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Small Gaps Between Audio Chunks to Avoid Rushed Speech #53

Closed
bi1101 opened this issue Aug 27, 2024 · 5 comments
Closed

Add Small Gaps Between Audio Chunks to Avoid Rushed Speech #53

bi1101 opened this issue Aug 27, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@bi1101
Copy link

bi1101 commented Aug 27, 2024

Hi,

I noticed that the repository uses a really smart approach to account for text length by segmenting the text into smaller chunks and generating audio for each segment separately.

But there's a small issue. The part where the stitching happens is very noticeable. When the audio chunks are stitched together, there are no gaps between them, which makes the resulting speech sound a bit rushed and unnatural.

Can you add this in future versions? Making it configurable via .env would also be nice.

@matatonic
Copy link
Owner

matatonic commented Aug 27, 2024

Are you experiencing this with piper or xtts models? both? (any more details about which voice settings you're using are also helpful for me to test with)

@bi1101
Copy link
Author

bi1101 commented Aug 27, 2024

I'm experiencing this with xtts, (the piper is rushed by default in my experience, so this is not an issue). With every 2 sentence or so, the audio is kinda rushed to the next sentence without a pause. The pace remain similar between different voices
This is the audio file response.webm

And the curl

{
    "model": "tts-1-hd",
    "input": "Once upon a time in a small village nestled between rolling hills, there lived a young girl named Lila. She was known for her curiosity and love for the stars. Every night, she would climb up the hill behind her house to gaze at the twinkling lights in the sky. Her favorite star was the brightest one, which she named Lumina. One evening, as she was lying on the grass, Lila noticed Lumina flickering strangely. Concerned, she whispered, 'What's wrong, Lumina?' To her amazement, the star responded with a soft glow and a gentle voice, 'I am losing my light, Lila. I need your help.' Determined to save her beloved star, Lila asked, 'What can I do?' Lumina explained that a dark shadow from a distant galaxy was slowly dimming her light. To restore it, Lila would need to gather the light of the purest hearts in her village and send it to Lumina. The next day, Lila set out on a quest. She visited her neighbors, friends, and even the animals in the village, asking for their purest wishes and hopes. She collected them in a small glass jar that sparkled with a soft, golden light. With the jar full, Lila returned to the hill at sunset. Holding the jar high, she whispered a wish for Lumina to shine brightly again. The jar burst open, releasing a beam of golden light that shot up into the sky, enveloping Lumina. The star began to glow with renewed brilliance, brighter than ever before. Lumina twinkled joyfully and thanked Lila for her kindness and bravery. From that night on, Lumina became a guiding star for travelers and a symbol of hope for the village. Lila continued to visit Lumina every night, knowing that the light of a pure heart could brighten even the darkest of skies.",
    "voice": "alloy",
    "response_format": "mp3",
    "speed": "1"
}

@matatonic
Copy link
Owner

matatonic commented Aug 27, 2024

Pretty subtle, but yeah, I hear it. Have you considered using speed: 0.9?

There is no built in option to add silence in xtts, so the change is fiddling with the wav output in the stream. This will probably not happen by me unless it gets more support, but I'm open to a PR for it.

@matatonic matatonic added the enhancement New feature or request label Aug 27, 2024
@bi1101
Copy link
Author

bi1101 commented Aug 28, 2024

#55 This should mitigate the issue

@bi1101 bi1101 closed this as completed Aug 28, 2024
@matatonic
Copy link
Owner

Funny, just yesterday I figured this out also (silence was 0s,).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants