This Python project implements a voice assistant using OpenAI's new Realtime API. It features a client-side Voice Activity Detection (VAD) system to optimize token usage and reduce costs.
- Utilizes OpenAI's Realtime API for real-time conversation
- Implements client-side Voice Activity Detection (VAD)
- Supports both text and audio modalities
- Provides real-time audio input and output
- Calculates and displays token usage and associated costs
- Allow to stop the assistant when start talking again
sample.mp4
This implementation uses Voice Activity Detection (VAD) on the client side, which offers several advantages:
-
Cost Efficiency: By only sending audio to OpenAI when speech is detected, you significantly reduce the number of tokens processed, lowering your API usage costs.
-
Reduced Latency: Client-side VAD allows for quicker response times as it doesn't rely on server-side processing to determine when speech has ended.
-
Bandwidth Optimization: Only relevant audio data is transmitted, reducing bandwidth usage.
- Clone this repository
- Install the required dependencies:
pip install -r requirements.txt
- Set up your OpenAI API key in the
.env
file - Run the application:
python start.py
You can customize various settings in the start.py
file, including:
- Silence threshold for VAD
- Minimum silence duration
You can also modify the prompt.txt
file to change the prompt of the assistant.
After starting the application, speak into your microphone. The system will detect your voice, process your speech, and provide both text and audio responses from the AI assistant.
This project is designed for educational and experimental purposes. Make sure you keep your API key private and do not expose it to the public.