v0.1.109
Optimizations:
- Refactors
StreamingConversation
as a pipeline of consumer-producer workers - now transcription / agent response / synthesis are decoupled into their own async processes. Shoutout to @jnak for helping us out with the refactor. Upshots:- The LLM call no longer blocks the processing of new transcripts
- Playing the output audio runs concurrently with both generating the responses and synthesizing audio, so while each sentence is being played, the next response is being generated and synthesized - for synthesizer with latencies > 1s, there is no longer a delay between each sentence of a response.
- Resource management: synthesizers no longer need a dedicated thread, so e.g. a single telephony server can now support double the number of concurrent phone calls
Contribution / Code cleanliness:
- Simple tests that assert
StreamingConversation
works across all supported Python versions: run this locally withmake test
- Typechecking with
mypy
: run this locally withmake typecheck
Features:
- ElevenLabs
optimize_streaming_latency
parameter - Adds the Twilio
to
andfrom
numbers to theCallConfig
in theConfigManager
(h/t @Nikhil-Kulkarni) - AssemblyAI buffering (solves vocodedev/vocode-react-sdk#6) (h/t @m-ods)
- Option to record Twilio calls (h/t @shahafabileah)
- Adds
mute_during_speech
parameter to Transcribers as a solution to speaker feedback into microphone: see note in #16