Skip to content

garyzava/chat-to-database-chatbot

Repository files navigation

Hippo GIF

Chat2DB GenAI Chatbot

An LLM-powered chatbot for natural language database queries with extensive observability

Tweet about Chat2DB


Chat to your Database GenAI Chatbot

A web chatbot interface for database interactions using natural language questions through various interaction methods (RAG, TAG) with different LLMs, including comprehensive observability and tracking.

Features

  • Multiple interaction methods (RAG, TAG)
  • LLM provider selection (OpenAI, Claude)
  • Intent classification (Details in Classifier README)
  • Vector search with PGVector
  • Langfuse Analytics
  • Conversation memory (until browser refresh)
  • Docker-based deployment

Prerequisites

  • Docker and Docker Compose
  • Python 3.9+
  • OpenAI API key
  • Anthropic API key
  • Langfuse account (optional)

Installation

  1. Clone the repository:
git clone https://github.com/garyzava/chat-to-database-chatbot.git
cd chat-to-database-chatbot
  1. Configure environment variables: Copy .env.example to .env and fill in your API keys and configurations.

  2. Build and Start the Docker services (one-off)

One-off command:

make run

After the installation, simply run:

make up
  1. Or run the application in developer mode:
make dev

The developer mode installs the streamlit app locally but the databases are still installed on Docker

  1. Shut down the application:
make down

Call the Modules Directly

Running in local mode (make dev), go to the chat2dbchatbot directory. Make sure the virtual enviroment has been activated. Open a new terminal:

cd chat2dbchatbot

Run the RAG utility

python -m tools.rag "what is the track with the most revenue" --llm OpenAI --temperature 0.1

Or run the TAG utility

python -m tools.tag "what is the track with the most revenue" --llm OpenAI --temperature 0.1

Chatbot Usage

  1. Go to http://localhost:8501 for the main chatbot interface
  2. Select your preferred interaction method (RAG, TAG)
  3. Choose an LLM provider (OpenAI or Claude)
  4. Start asking questions about your database
  5. Go to http://localhost:3000 for the Langfuse interface when not running on dev mode

Architecture

Paper References

Data Source Statement

This project uses the Chinook database, a media store database, for development and testing purposes. However, it can be easily adapted for any enterprise or domain-specific use case.

The intent classifier piece uses data from the following datasets. Access to the data is subject to the respective terms:


Evaluation Framework