An open-source AI voice assistant that uses:
- OpenAI Whisper for speech-to-text
- Web browser-based agent for AI response generation (using Claude 3.7 Sonnet)
- OpenAI TTS for text-to-speech
This project enables a fully conversational AI experience similar to Siri, but using powerful AI models, a web browser agent, and high-quality audio APIs.
-
API Keys:
-
Install PortAudio (required for audio recording):
- Debian/Ubuntu:
apt install portaudio19-dev
- MacOS:
brew install portaudio sox
- Debian/Ubuntu:
-
For MacOS users only: Install MPV for audio streaming
brew install mpv
-
Chrome Browser:
- Google Chrome must be installed as the web agent will launch and control Chrome
uv sync
-
Copy the template file to create your own environment file:
cp .envtemplate .env
-
Edit the
.env
file and replace the placeholder values with your actual API keys:OPENAI_API_KEY=your_actual_openai_key_here PORTKEY_API_KEY=your_actual_portkey_key_here PORTKEY_VIRTUAL_KEY_ANTHROPIC=your_actual_portkey_virtual_key_here
Optional TTS configuration:
SYRI_TTS_VOICE=coral # Options: alloy, echo, fable, onyx, nova, shimmer SYRI_TTS_SPEED=1.2 # Speech speed multiplier
You can run the assistant using either of these methods:
uv run run.py
This script performs pre-checks and starts the assistant in an inactive listening state.
- To start listening, press Enter or run
./scripts/start_listening.sh
(useful for automation) - Describe your request
- Press Enter again or run
./scripts/stop_listening.sh
when done - The AI will transcribe your speech, process it through the web agent, and respond both in text (console) and through speech
This is PoC sketch. Newer version uses OpenAI voice agents and was moved to different repo.
When you speak to Syri:
- Your voice is recorded using PyAudio
- The recording is transcribed to text using OpenAI Whisper
- The transcribed text is sent to a web agent that runs Chrome browser automation
- The web agent uses Claude 3.7 Sonnet through Portkey to generate responses
- The response is converted to speech using OpenAI TTS
MIT