This repo spins up a minimal docker stack that leverages Redis + RedisVL as a vector database for calculating semantic similarity between "documents" (text strings) embedded via the Ollama inference engine.
The stack defined in docker-compose.yaml
creates an instance for Redis (for storing vector embeddings and jobs), Ollama (for creating embeddings), and FastAPI (as a simple "business logic" gateway) as well as an RQ worker service for embedding documents asynchronously.
NOTE Using job queues can offload long-running, computationally intensive processes to other containers keeping server load to a minimum. Embedding jobs are processed in FIFO order so you are in less danger of DDOSing your RAG system by leveraging an asynchronous job queue architecture.
First you'll need to build the FastAPI server application in the /src
directory:
git clone https://github.com/TtheBC01/redis-vector-db.git
cd redis-vector-db
docker build -t vector-gateway ./src
Once you've successfully build the server application and tagged it as vector-gateway
, bring the stack up:
docker compose up -d
You should have 4 services up: fastapi
, rq-worker
, redis-server
, and ollama-service
. If you visit http://localhost:8000
, you should get
{"message":"Redis vector demo is up!"}
You'll need to download an embedding model in order to build a queryable vector store. Run the following command to pull Nomic's open source embedding model:
curl -X GET "http://localhost:8000/load-model/?model=nomic-embed-text"
This model embeds text strings into a 768-dimensional vector field. There are other embedding models offered by Ollama too. Check the models you have cached by running:
curl -X GET http://localhost:8000/available-models/
Try embedding some text and storing it in your Redis instance like this:
curl -X POST http://localhost:8000/embed/ -H "Content-Type: application/json" -d '{"payload": ["Paris is the capital of France.", "The dog ran after the cat.", "Mark Twain was not his real name."]}'
You can embed many many "documents" at once, but if your text blob is longer than the context size of your embedding model, any text over the limit will be ignored by the model. If this is your situation, you'll need to "chunk" you documents appropriately. For reference, the nomic-embed-text
model has a context size of 8192 tokens.
sequenceDiagram
User->>+FastAPI: Document(s)
FastAPI->>+RQ: Embedding Job
FastAPI->>-User: Job ID
RQ->>+Ollama: Embed w/ Parameters
Ollama->>-RQ: Return Vectors
RQ->>+RedisVL: Store Docs+Vectors(+tags)
Now that your Redis instance has some vectors loaded, try to query it:
curl -X GET http://localhost:8000/search/ -H "Content-Type: application/json" -d '{"payload": "Where is Paris?"}'
You'll get the top 3 documents that match your query string in order of relevance as well as their vector distance (computed using cosine similarity).
sequenceDiagram
User->>+FastAPI: User Query String
FastAPI->>+Ollama: Embed query
Ollama->>-FastAPI: Query vector
FastAPI->>+RedisVL: Submit query vector with filters
RedisVL->>-FastAPI: k nearest neighbor docs (cosine similarity)
FastAPI->>-User: Top results related to query
You can greatly increase the speed of embeddings by mounting a local gpu to the Ollama service defined in the docker-compose.yml
file.