GANGLIA:
- General
- AI
- Nurturing
- Guided
- Linguistic
- Interface and
- Automation (working title)
GANGLIA is a highly modularized, generic personal assistant. Built partially by AI (supervised by software developers), GANGLIA allows users to have multi-modal interactions with AI through natural language conversations. Its flexible architecture allows each component to be replaced or mocked, enabling developers to tailor GANGLIA to their specific needs.
Module | Possible Values | Default |
---|---|---|
Speech Recognition | Static Google Cloud Speech-to-Text | Live Google Cloud Speech-to-Text |
Text To Speech | Google, Natural Reader (Unavailable), Amazon Polly (Unavailable) | |
AI Backend | GPT-3.5 (Unavailable), GPT-4 | GPT-4 |
Response Visualizer | CLI, NaturalReaderUI (Unavailable) | CLI |
Note: This list is non-exhaustive and can be expanded as needed.
- Python 3.9 or higher
- FFmpeg installed and available in PATH
- DejaVu fonts installed (required for video captions)
- On Ubuntu/Debian:
sudo apt-get install fonts-dejavu
- On macOS:
brew install font-dejavu
- On Windows: Download and install from DejaVu Fonts
- On Ubuntu/Debian:
-
Install Homebrew (if not already installed):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
-
Install brew dependencies:
brew install python pyenv zinit direnv openssl readline sqlite3 xz zlib portaudio ffmpeg opencv gh wget
-
Install Python 3.x.
-
Install the necessary libraries using:
pip install -r requirements.txt
- Google cloud cli
- pip install google-cloud-speech
- https://cloud.google.com/docs/authentication/provide-credentials-adc#how-to
- this is also tell you how to setup Google CLI if you haven't already
To start GANGLIA, run the following command in your terminal:
python GANGLIA.py [-d DEVICE_INDEX] [-t TTS_INTERFACE] [--static-response]
Here's a description of each command-line argument:
-d DEVICE_INDEX
or--device_index DEVICE_INDEX
: Sets the index of the input device to use. The default value is 0.-t TTS_INTERFACE
or--tts_interface TTS_INTERFACE
: Sets the text-to-speech interface to use. Available options are 'google'. The default value is 'google'.--help
or-h
: Displays usage instructions and a list of available options.
Once GANGLIA is running, it will listen for voice prompts. When you're ready to ask a question or make a request, simply speak into your microphone. Once you've finished speaking, GANGLIA will generate a response using OpenAI's GPT-3 engine and speak it aloud using the pyttsx3 library.
GANGLIA can be used without API keys for certain features. However, if you want to utilize features that require API keys, you'll need to set up your API keys for the respective services.
To set up the API keys, copy the .envrc.template
file in the root directory of the project and rename it to .envrc
. Then, update the values for the features you want to use.
Here's a table of features, their implementation names, and the corresponding environment variable names for the .env
file:
Feature | Implementation Name | Environment Variable |
---|---|---|
AI Backend | OpenAI GPT-4 | OPENAI_API_KEY |
Music Generation | Suno MusicGen | FOXAI_SUNO_API_KEY, SUNO_API_URL |
YouTube Upload | YouTube API | YOUTUBE_CLIENT_ID, YOUTUBE_CLIENT_SECRET |
-
there are a few options for how to render the AI's text response as audio. One option is to use the coqui api.
-
--tts-interface google
[DEFAULT]- the free, simple female google tts voice
-
--tts-interface coqui
- coqui is an incredible voice synthesis service that offers endless options for speechification
- when using Coqui as TTS, set up the coqui_config.json in the project root (see section below)
- when using chatgpt, the persona of the AI can be tweaked by modifying the config file:
config/chatgpt_session_config.json
- see
config/chatgpt_session_config.template
for examples and explanations.
- If you want store the session events to gcp, use the
--store-logs
command-line flag when running GANGLIA
- Install Google Cloud SDK if you haven't already, and authenticate using
gcloud auth application-default login
. - Make sure that you have a Google Cloud Storage bucket where the logs will be stored. Take note of the bucket name and your project name.
- Update the
.env
file in your project root directory to include the following:GCP_BUCKET_NAME=<your_bucket_name>
GCP_PROJECT_NAME=<your_project_name>
- GANGLIA can be configured to listen for a hotword (a word or phrase that triggers GANGLIA to generate a pre-determined response)
- see example hotword config in
config/hotword_config.json.template
- copy your input/output pairs to
config/hotword_config.json
and try saying them during a chat session. You should observe GANGLIA responding to the hotword with the output you've provided right away.
- this program should work on windows. If you don't want the console output to look wierd, download Windows Console and use that to run the program
If you would like to contribute to GANGLIA, please fork the repository and submit a pull request with your changes.
GANGLIA is licensed under the MIT License. See the LICENSE file for more information.
GANGLIA was created by William R Martin.
If you have any questions or feedback about GANGLIA, please contact Will Martin at unique dot will dot martin at gmail.
When using the text-to-video feature, you can customize various aspects of the video generation through a configuration file. A template configuration file is provided at config/ttv_config.template.json
.
Here's a comprehensive list of all available configuration options:
style
(string): The visual style to apply to generated images. Example: "digital art", "photorealistic", "anime"story
(array of strings): The story to convert into a video, with each string representing one scenetitle
(string): The title of the video, used in credits and file naming
-
caption_style
(string, default: "static"): Controls how captions are displayed in the video"static"
: Traditional subtitles that appear at the bottom of the screen"dynamic"
: Word-by-word captions that are synchronized with the audio and use dynamic positioning and sizing
-
background_music
(object): Configuration for the background music that plays during the main video- Can be either file-based or prompt-based:
or
{ "file": "path/to/music.mp3", // Use an existing audio file "prompt": null }
{ "file": null, "prompt": "ambient piano music with a gentle mood" // Generate music using this prompt }
- Can be either file-based or prompt-based:
-
closing_credits
(object): Configuration for the closing credits sectionmusic
(object): Music to play during credits- Same format as background_music (file or prompt-based)
poster
(object): Image to show during credits- Can be file-based or prompt-based:
or
{ "file": "path/to/poster.png", // Use an existing image "prompt": null }
{ "file": null, "prompt": "A beautiful sunset scene" // Generate image using this prompt }
- Can be file-based or prompt-based:
See config/ttv_config.template.json
for a complete example configuration. Here's a minimal example:
{
"style": "digital art",
"story": [
"A mysterious figure emerges from the shadows",
"They walk through a glowing portal"
],
"title": "The Portal",
"caption_style": "dynamic"
}
- All paths in the configuration file should be relative to the project root
- When using file-based resources (music/images), ensure the files exist before running
- When using prompt-based generation, ensure you have the necessary API access configured
GANGLIA uses direnv
to manage environment variables. This ensures that environment variables are automatically loaded when you enter the project directory and unloaded when you leave.
-
Install direnv:
# On macOS brew install direnv # On Linux # Follow instructions at https://direnv.net/docs/installation.html
-
Add direnv hook to your shell:
# For zsh (add to ~/.zshrc) eval "$(direnv hook zsh)" # For bash (add to ~/.bashrc) eval "$(direnv hook bash)"
-
Create your local environment file:
cp .envrc.template .envrc
-
Edit
.envrc
with your actual values:- OpenAI API key for GPT interactions and DALL-E image generation
- Google Cloud configuration for speech and storage
- MusicGen/AudioGen credentials
- Optional: Custom temporary directory path via
GANGLIA_TEMP_DIR
- Other service-specific settings
-
Allow direnv to load the environment:
direnv allow
The .envrc
file contains all required environment variables, including:
OPENAI_API_KEY
: Your OpenAI API key for GPT and DALL-EGOOGLE_APPLICATION_CREDENTIALS
: Path to Google Cloud credentialsGCP_BUCKET_NAME
: Google Cloud Storage bucket nameGCP_PROJECT_NAME
: Google Cloud project nameFOXAI_SUNO_API_KEY
: API key for MusicGen/AudioGen
GANGLIA_TEMP_DIR
: Override the default temporary directory location- If not set, uses system temp directory (
/tmp
on Unix,%TEMP%
on Windows) - GANGLIA will create a subdirectory named 'GANGLIA' within this location
- If not set, uses system temp directory (
PLAYBACK_MEDIA_IN_TESTS
: Enable/disable media playback during tests
When you enter the project directory, these variables will be automatically loaded, and when you leave, they'll be unloaded.
GANGLIA uses Google Cloud services for speech-to-text and text-to-speech. The credentials are handled differently depending on the environment:
For local development, set the GOOGLE_APPLICATION_CREDENTIALS
environment variable to point to your credentials file:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/credentials.json"
For running tests in Docker, you'll need to have Docker installed and running on your system. The Docker environment is automatically handled by run_tests.sh
- see the test README for more details.
If you need to run Docker commands manually:
docker build --build-arg GOOGLE_CREDENTIALS_PATH=/path/to/your/credentials.json -t ganglia:latest .
In CI, credentials are handled automatically through GitHub Secrets. No additional setup is required.
GANGLIA uses YouTube API for uploading test videos. For CI environments, you'll need to set up the appropriate credentials:
-
First, obtain your YouTube OAuth credentials:
- Go to the Google Cloud Console
- Create a new project or select an existing one
- Enable the YouTube Data API v3
- Create OAuth 2.0 credentials
- Download the credentials JSON file
-
Run the local setup once to generate a token:
# Set the credentials file path export YOUTUBE_CREDENTIALS_FILE=/path/to/your/credentials.json # Run any test that uses YouTube to trigger the OAuth flow python -m pytest tests/third_party/test_youtube_live.py -v -s
This will open a browser window for authentication and save the token.
-
Base64 encode the token for GitHub Actions:
base64 ~/.config/ganglia/youtube_token.json > youtube_token_base64.txt
-
Add the base64-encoded token as a GitHub Actions secret:
- Go to your repository settings
- Navigate to Secrets and Variables > Actions
- Create a new secret named
YOUTUBE_TOKEN_BASE64
- Paste the contents of youtube_token_base64.txt
This setup allows the CI environment to upload test videos to YouTube without requiring interactive authentication.
- Create and activate a Python 3.9 virtual environment:
python3.9 -m venv .venv
source .venv/bin/activate # On Unix/macOS
# or
.venv\Scripts\activate # On Windows
- Install core dependencies (includes test dependencies):
pip install -r requirements_core.txt
- Install additional dependencies if needed:
pip install -r requirements_large.txt # For ML/AI features
Always run tests from within the virtual environment to ensure you're using the correct dependencies:
# Activate virtual environment first
source .venv/bin/activate # On Unix/macOS
# or
.venv\Scripts\activate # On Windows
# Then run tests
python -m pytest # Run all tests
python -m pytest tests/unit # Run unit tests
python -m pytest tests/integration # Run integration tests
python -m pytest tests/smoke # Run smoke tests
Note: Always use python -m pytest
instead of calling pytest
directly to ensure you're using the version installed in your virtual environment.
Common pytest options:
-v
: Verbose output-s
: Show print statements (don't capture stdout)-k "test_name"
: Run tests matching the given name--pdb
: Drop into debugger on test failures
GANGLIA can automatically upload test results to YouTube. To enable this feature:
-
Set up a Google Cloud Project and enable the YouTube Data API v3
-
Create OAuth 2.0 credentials in the Google Cloud Console:
- Go to APIs & Services > Credentials
- Create OAuth 2.0 Client ID
- Download the client configuration
-
Configure the YouTube environment variables in your
.envrc
:export YOUTUBE_CLIENT_ID="your-client-id" export YOUTUBE_CLIENT_SECRET="your-client-secret" export YOUTUBE_CREDENTIALS_FILE="$HOME/.config/ganglia/youtube_credentials.json"
-
Configure test upload settings:
export UPLOAD_SMOKE_TESTS_TO_YOUTUBE="false" # Set to "true" to upload smoke test results export UPLOAD_INTEGRATION_TESTS_TO_YOUTUBE="true" # Set to "true" to upload integration test results
The first time you run tests with YouTube upload enabled, you'll be prompted to authenticate through your browser. The credentials will be saved to the path specified in YOUTUBE_CREDENTIALS_FILE
for future use.