Callia is a modular real-time voice assistant that is attuned to your every word! It listens on the microphone, detects when you're speaking, transcribes your speech, and synthesizes a spoken response. The following is an illustration of the pipeline -
Please proceed with the following steps:
-
Clone the repository
git clone https://github.com/XynoCrest/CalliaAI
cd CalliaAI
-
Create and activate a virtual environment (example using anaconda)
conda create -p venv python=3.10
conda activate venv/
-
Install Dependencies
pip install -r requirements.txt
-
Install MPV
- Download and Install MPV
- Add the folder containing
mpv.exe
to your PATH environment variable. - MPV is required to play the stream output audio.
Please insert your API keys in key_retriever.py
def get_elevenlabs_key():
# List of ElevenLabs Keys
keys = [] <- Add your key as a list item
-
At least one GROQ and one ElevenLabs API Key is required!
-
To stop git from tracking changes to your
key_retriever.py
, execute -git update-index --assume-unchanged key_retriever.py
Once you have installed everything perfectly, simply run -
python main.py
You can finally start taking to Callia! Once input is detected the pipeline will:
- Transcribe your speech
- Generate a spoken response
- Play it back to you! Pretty neat huh?
Here's what each file in this repository is for:
Callia/
├── main.py # Entry point
├── config.py # Configuration
├── vad.py # Voice Activity Detection
├── vad_utils.py # VAD model utilities
├── vad_model.jit # TorchScript compiled AI model
├── transcriber.py # Speech-to-text handler
├── inference.py # Generates Text-to-Text Response
├── synthesis.py # TTS using ElevenLabs
├── key_retriever.py # Your API key(s) retriever
├── requirements.txt # Dependencies
├── .gitignore # Tells git what to ignore
└── README.md # You're reading this