Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between total character durations and buffer duration #599

Open
lbaliunas opened this issue Dec 17, 2024 · 2 comments
Open

Difference between total character durations and buffer duration #599

lbaliunas opened this issue Dec 17, 2024 · 2 comments
Labels
bug Something isn't working d-hard Issues that involve significant effort, such as major code refactors, backend changes, or new featur question Further information is requested

Comments

@lbaliunas
Copy link

Hi,

I'm using ElevenLabs websockets. I'm using ulaw_8000 output format, and get the alignment times for each chunk. I notice that there is a difference between audio buffer duration, calculated by dividing the number of bytes by 8000 (as per ulaw_8000 encoding), and the sum of the charDurationsMs provided in the alignment.

Could I get clarification on why is that so? Shouldn't they be the same?

Thanks!

@louisjoecodes
Copy link
Collaborator

Hey @lbaliunas apologies for the delay here.

Is the delta between the alignment timings significant? Does it impact the accuracy of the alignments at word-level for example?

Curious to know if/how you solved this.

@louisjoecodes louisjoecodes added bug Something isn't working question Further information is requested d-hard Issues that involve significant effort, such as major code refactors, backend changes, or new featur labels Dec 31, 2024
@lbaliunas
Copy link
Author

Hi @louisjoecodes , it depends, but it varies to both sides (sometimes less sometimes more than the actual duration of the buffer). The issue is that I'm trying to track "how much was said" when the audio is stopped, and it's difficult to use alignment timestamps when they don't add up to the actual buffer duration.

Any ideas where these discrepancies appear? Maybe in the pauses/whitespaces?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working d-hard Issues that involve significant effort, such as major code refactors, backend changes, or new featur question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants