Skip to content

Latest commit

 

History

History

chatbot

TinyLLM Web Based Chatbot and Document Manager

Chatbot: Chatbot DocMan: DocMan

The TinyLLM Chatbot is a web based python flask app that allows you to chat with a LLM using the OpenAI API.

The intent of this project is to build and interact with a locally hosted LLM using consumer grade hardware. With the Chatbot, we explore stitching context through conversational threads, rendering responses via realtime token streaming from LLM, and using external data to provide context for the LLM response (Retrieval Augmented Generation). With the Document Manager, we explore uploading documents to a Vector Database to use in retrieval augmented generation, allowing our Chatbot to produce answers grounded in knowledge that we provide.

Below are steps to get the Chatbot and Document Manager running.

Chatbot

The Chatbot can be launched as a Docker container or via command line.

Method 1: Docker Compose

A quickstart method is located in the litellm folder. This setup will launch the Chatbot + LiteLLM and PostgreSQL. This works on Mac and Linux (or WSL) systems.

cd litellm

# Edit compose.yaml and config.yaml for your setup.
nano compose.yaml
nano config.yaml

# Launch
docker compose up -d

The containers will download and launch. The database will be set up in the ./db folder.

Method 2: Docker

# Create placeholder prompts.json
touch prompts.json

# Run Chatbot - see run.sh for additional settings
docker run \
    -d \
    -p 5000:5000 \
    -e PORT=5000 \
    -e OPENAI_API_BASE="http://localhost:8000/v1" \
    -e TZ="America/Los_Angeles" \
    -v $PWD/.tinyllm:/app/.tinyllm \
    --name chatbot \
    --restart unless-stopped \
    jasonacox/chatbot

LiteLLM Proxy Option

You can optionally set up LiteLLM to proxy multiple LLM backends (e.g. local vLLM, AWS Bedrock, OpenAI, Azure, Anthropic). See LiteLLM documentation for more information.

First, define your LLM connections in the local config.yaml file (see LiteLLM options). Note, if you are using a cloud provider service like AWS Bedrock or Azure, make sure you set up access first.

model_list:

  - model_name: local-pixtral
    litellm_params:
      model: openai/mistralai/Pixtral-12B-2409
      api_base: http://localhost:8000/v1
      api_key: myAPIkey

  - model_name: bedrock-titan
    litellm_params:
      model: bedrock/amazon.titan-text-premier-v1:0
      aws_access_key_id: os.environ/CUSTOM_AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/CUSTOM_AWS_SECRET_ACCESS_KEY
      aws_region_name: os.environ/CUSTOM_AWS_REGION_NAME

  - model_name: gpt-3.5-turbo
    litellm_params:
      model: openai/gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY

Now,run the LiteLLM container. Edit this script to include your AWS and/or OpenAI keys for the models you want.

# Run LiteLLM Proxy - see
docker run \
    -d \
    -v $(pwd)/config.yaml:/app/config.yaml \
    -e CUSTOM_AWS_ACCESS_KEY_ID=your_AWS_key_here \
    -e CUSTOM_AWS_SECRET_ACCESS_KEY=your_AWS_secret_here \
    -e CUSTOM_AWS_REGION_NAME=us-east-1 \
    -e OPENAI_API_KEY=your_OpenAI_key_option \
    -p 4000:4000 \
    --name $CONTAINER \
    --restart unless-stopped \
    ghcr.io/berriai/litellm:main-latest \
    --config /app/config.yaml 

Finally, set up the chatbot to use LiteLLM:

# Run Chatbot - see run.sh for additional settings
docker run \
    -d \
    -p 5000:5000 \
    -e PORT=5000 \
    -e LITELLM_PROXY="http://localhost:4000/v1" \
    -e LITELLM_KEY="sk-mykey" \
    -e LLM_MODEL="local-pixtral" \
    -e TZ="America/Los_Angeles" \
    -v $PWD/.tinyllm:/app/.tinyllm \
    --name chatbot \
    --restart unless-stopped \
    jasonacox/chatbot

The Chatbot will try to use the specified model (LLM_MODEL) but if it is not available, it will select another available model. You can list and change the models inside the chatbot using the /model commands.

View the chatbot at http://localhost:5000

Method 3: Command Line

# Install required packages
pip install -r requirements.txt

# Run the chatbot web server - change the base URL to be where you host your llmserver
OPENAI_API_BASE="http://localhost:8000/v1" python3 server.py

Chat Commands and Retrieval Augmented Generation (RAG)

Some RAG (Retrieval Augmented Generation) features including:

  • Summarizing external websites and PDFs (paste a URL in chat window)
  • If a Weaviate host is specified, the chatbot can use the vector database information to respond. See rag for details on how to set up Weaviate.
  • Perform chain of thought (CoT) reasoning with /think on command (see reasoning for more details).
  • Command - There are information commands using /
/reset                                  # Reset session
/version                                # Display chatbot version
/sessions                               # Display current sessions
/news                                   # List top 10 headlines from current new
/stock [company]                        # Display stock symbol and current price
/weather [location]                     # Provide current weather conditions
/rag on [library] [opt:number]          # Route all prompts through RAG using specified library
/rag off                                #   Disable
/think on                               # Perform Chain of Thought thinking on relevant prompts
/think off                              #   Disable
/think filter [on|off]                  # Have chatbot filter out <think></think> content
/model [LLM_name]                       # Display or select LLM model to use (dialogue popup)

See the rag for more details about RAG.

Example Session

The examples below use a Llama 2 7B model served up with the OpenAI API compatible llmserver on an Intel i5 systems with an Nvidia GeForce GTX 1060 GPU.

Chatbot

Open http://127.0.0.1:5000 - Example session:

image

Read URL

If a URL is pasted in the text box, the chatbot will read and summarize it.

image

Current News

The /news command will fetch the latest news and have the LLM summarize the top ten headlines. It will store the raw feed in the context prompt to allow follow-up questions.

image

Model Selection

The /model command will popup the list of available models. Use the dropdown to select your model. Alternatively, specify the model with the command (e.g. /model mixtral) to select it immediately without the popup.

image

Document Manager (Weaviate)

The document manager allows you to manage the collections and documents in the Weaviate vector database. It provides an easy way for you to upload and ingest the content from files or URL. It performs simple chunking (if requested). The simple UI let's you navigate through the collections and documents.

Environment Variables

  • MAX_CHUNK_SIZE: Maximum size of a chunk in bytes (default 1024)
  • UPLOAD_FOLDER: Folder where uploaded files are stored (default uploads)
  • HOST: Weaviate host (default localhost)
  • COLLECTIONS: Comma separated list of collections allowed (default all)
  • PORT: Port for the web server (default 8000)
  • COLLECTIONS_ADMIN: Allow users to create and delete collections (default True)

Docker Setup

The Document Manager uses a vector database to store the uploaded content. Set up the Weaviate vector database using this docker compose and the included docker-compose.yml file.

# Setup and run Weaviate vector database on port 8080

docker compose up -d

To run the Document Manager, run the following and adjust as needed. Once running, the document manager will be available at http://localhost:5001

docker run \
    -d \
    -p 5001:5001 \
    -e PORT="5001" \
    -e WEAVIATE_HOST="localhost" \
    -e WEAVIATE_GRPC_HOST="localhost" \
    -e WEAVIATE_PORT="8080" \
    -e WEAVIATE_GRPC_PORT="50051" \
    -e MAX_CHUNK_SIZE="1024" \
    -e UPLOAD_FOLDER="uploads" \
    -e COLLECTIONS_ADMIN="true" \
    --name docman \
    --restart unless-stopped \
    jasonacox/docman

Note - You can restrict collections by providing the environmental variable COLLECTIONS to a string of comma separated collection names.

Usage

You can now create collections (libraries of content) and upload files and URLs to be stored into the vector database for the Chatbot to reference.

image image

The Chatbot can use this information if you send the prompt command:

# Usage: /rag {library} {opt:number} {prompt}

# Examples:
/rag records How much did we donate to charity in 2022?
/rag blog 5 List some facts about solar energy.