You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
lbaliunas opened this issue
Dec 17, 2024
· 2 comments
Labels
bugSomething isn't workingd-hardIssues that involve significant effort, such as major code refactors, backend changes, or new featurquestionFurther information is requested
I'm using ElevenLabs websockets. I'm using ulaw_8000 output format, and get the alignment times for each chunk. I notice that there is a difference between audio buffer duration, calculated by dividing the number of bytes by 8000 (as per ulaw_8000 encoding), and the sum of the charDurationsMs provided in the alignment.
Could I get clarification on why is that so? Shouldn't they be the same?
Thanks!
The text was updated successfully, but these errors were encountered:
Is the delta between the alignment timings significant? Does it impact the accuracy of the alignments at word-level for example?
Curious to know if/how you solved this.
louisjoecodes
added
bug
Something isn't working
question
Further information is requested
d-hard
Issues that involve significant effort, such as major code refactors, backend changes, or new featur
labels
Dec 31, 2024
Hi @louisjoecodes , it depends, but it varies to both sides (sometimes less sometimes more than the actual duration of the buffer). The issue is that I'm trying to track "how much was said" when the audio is stopped, and it's difficult to use alignment timestamps when they don't add up to the actual buffer duration.
Any ideas where these discrepancies appear? Maybe in the pauses/whitespaces?
bugSomething isn't workingd-hardIssues that involve significant effort, such as major code refactors, backend changes, or new featurquestionFurther information is requested
Hi,
I'm using ElevenLabs websockets. I'm using ulaw_8000 output format, and get the alignment times for each chunk. I notice that there is a difference between audio buffer duration, calculated by dividing the number of bytes by 8000 (as per ulaw_8000 encoding), and the sum of the charDurationsMs provided in the alignment.
Could I get clarification on why is that so? Shouldn't they be the same?
Thanks!
The text was updated successfully, but these errors were encountered: