Skip to content

XynoCrest/CalliaAI

Repository files navigation

Callia - Real Time Voice Assistant

Silero Whisper Gemma LangChain LangGraph

Callia is a modular real-time voice assistant that is attuned to your every word! It listens on the microphone, detects when you're speaking, transcribes your speech, and synthesizes a spoken response. The following is an illustration of the pipeline -

🛠️ Installation

Please proceed with the following steps:

  1.  Clone the repository

    git clone https://github.com/XynoCrest/CalliaAI
    cd CalliaAI
  2.  Create and activate a virtual environment (example using anaconda)

    conda create -p venv python=3.10
    conda activate venv/
  3.  Install Dependencies

    pip install -r requirements.txt
  4.  Install MPV

    • Download and Install MPV
    • Add the folder containing mpv.exe to your PATH environment variable.
    • MPV is required to play the stream output audio.

🔑 API Keys

Please insert your API keys in key_retriever.py

ElevenLabs Groq

def get_elevenlabs_key():
    # List of ElevenLabs Keys
    keys = []   <- Add your key as a list item
  • At least one GROQ and one ElevenLabs API Key is required!

  • To stop git from tracking changes to your key_retriever.py, execute -

     git update-index --assume-unchanged key_retriever.py

▶️ Running Callia

Once you have installed everything perfectly, simply run -

python main.py

You can finally start taking to Callia! Once input is detected the pipeline will:

  1. Transcribe your speech
  2. Generate a spoken response
  3. Play it back to you! Pretty neat huh?

📁 Project Structure

Here's what each file in this repository is for:

Callia/
├── main.py                      # Entry point
├── config.py                    # Configuration
├── vad.py                       # Voice Activity Detection
├── vad_utils.py                 # VAD model utilities
├── vad_model.jit                # TorchScript compiled AI model
├── transcriber.py               # Speech-to-text handler
├── inference.py                 # Generates Text-to-Text Response
├── synthesis.py                 # TTS using ElevenLabs
├── key_retriever.py             # Your API key(s) retriever
├── requirements.txt             # Dependencies
├── .gitignore                   # Tells git what to ignore
└── README.md                    # You're reading this

About

The realtime AI Voice Agent Pipeline for Callia Innovations

Resources

Stars

Watchers

Forks

Languages