Skip to content

Conversation

jsun-m
Copy link

@jsun-m jsun-m commented Jul 26, 2025

Many limitations from free tier API usage
(probably could be optimized)

  1. Deepgram realtime audio has massive delay of up to 10-20 seconds
  2. Elevenlabs does not support realtime api so I handle the chunking

@jsun-m jsun-m changed the title Benchmarking multiple audio providers for STT Implement Conversational Fastloop Integration and Clients Jul 29, 2025
@@ -0,0 +1,316 @@
import asyncio
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to potentially build this client into fastloop as a helper. I'll make a similar one for node js

asyncio.create_task(self._handle_websocket_event(websocket, request_id))
# llm_manager = LLMManager(self._fastloop, request_id)
# stt_task = self.executor.submit(stt_manager.on_voice_stream, websocket)
# tts_manager = TextToSpeechManager(self._fastloop, request_id)
Copy link
Author

@jsun-m jsun-m Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm pretty sure I will either have tts_manager pipe directly into llm_manager or have it pass it through an event. If we pipe through an event, we can possibly allow for users to intercept it

# waiting for the transcription to be generated before sending the next chunk.


class ElevenLabsSpeechToTextManager(BaseSpeechToTextManager):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eleven labs doesn't have live audio transcription yet but this is a good example of how we can implement it with non realtime api endpoints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant