Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dialogue Tags No Examples #449

Open
DeceptiveMagic opened this issue Oct 29, 2024 · 1 comment
Open

Dialogue Tags No Examples #449

DeceptiveMagic opened this issue Oct 29, 2024 · 1 comment
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@DeceptiveMagic
Copy link

Path: /speech-synthesis/prompting

The examples given for using dialogue tags appear to make the process simple and easy, by just adding quotes around the verbal portions, and describing the dialogue outside of the quotes. However, this does not work for the voice clip generation, as the characters will read the entirety of the script, including the parts outside of the quotation marks.

This is an issue that is recognized by the documentation states "You will also have to somehow remove the prompt as the AI will read exactly what you give it."

What does this mean? There is no example on how to do this, and by removing the prompt, then the context is also removed. While the feature would be extremely useful for context and inflections, the current implementation causes the program to rely entirely on AI inferences of intended emotion.

The AI tends to do a decent job at picking up on and displaying emotions. However, there are problems that occur with inflections and emotion enough to where it causes problems.

If there could be an example posted of how to use the dialogue tags, instead of how not to correctly prompt the AI, that would really help clear up some of the issues.

@mpm
Copy link

mpm commented Oct 30, 2024

I'm not affiliated with ElevenLabs, but I think it meant that you have to open the generated audio with an audio editor (like Audacity) and trim it, so select the part that contains the text from inside the quotes and delete everything else.

You might be able to automate this by using the text-to-speech-with-timestamps API endpoint.

With this you'll recieve an array with each character from your prompt, paired with a timestamp. There you'd have to look for the first and last occurence of the quotes character, read that timestamp and use this then to trim the audio sample.

Of course I'd be delighted to learn about a more practical approach to achieve this ;-)

@louisjoecodes louisjoecodes added help wanted Extra attention is needed question Further information is requested labels Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants