Skip to content

v0.1.109

Compare
Choose a tag to compare
@ajar98 ajar98 released this 19 May 23:19
· 337 commits to main since this release

Optimizations:

  • Refactors StreamingConversation as a pipeline of consumer-producer workers - now transcription / agent response / synthesis are decoupled into their own async processes. Shoutout to @jnak for helping us out with the refactor. Upshots:
    • The LLM call no longer blocks the processing of new transcripts
    • Playing the output audio runs concurrently with both generating the responses and synthesizing audio, so while each sentence is being played, the next response is being generated and synthesized - for synthesizer with latencies > 1s, there is no longer a delay between each sentence of a response.
    • Resource management: synthesizers no longer need a dedicated thread, so e.g. a single telephony server can now support double the number of concurrent phone calls

Contribution / Code cleanliness:

  • Simple tests that assert StreamingConversation works across all supported Python versions: run this locally with make test
  • Typechecking with mypy: run this locally with make typecheck

Features:

  • ElevenLabs optimize_streaming_latency parameter
  • Adds the Twilio to and from numbers to the CallConfig in the ConfigManager (h/t @Nikhil-Kulkarni)
  • AssemblyAI buffering (solves vocodedev/vocode-react-sdk#6) (h/t @m-ods)
  • Option to record Twilio calls (h/t @shahafabileah)
  • Adds mute_during_speech parameter to Transcribers as a solution to speaker feedback into microphone: see note in #16