Open
Description
Hi,
I'm using ElevenLabs websockets. I'm using ulaw_8000 output format, and get the alignment times for each chunk. I notice that there is a difference between audio buffer duration, calculated by dividing the number of bytes by 8000 (as per ulaw_8000 encoding), and the sum of the charDurationsMs provided in the alignment.
Could I get clarification on why is that so? Shouldn't they be the same?
Thanks!