This example deploys a basic RAG pipeline for chat Q&A and serves inferencing from an NVIDIA API Catalog endpoint. You do not need a GPU on your machine to run this example.
Model | Embedding | Framework | Vector Database | File Types |
---|---|---|---|---|
meta/llama3-8b-instruct | nvidia/nv-embedqa-e5-v5 | LlamaIndex | Milvus | HTML, TXT, PDF, MD, DOCX, PPTX, XLSX |
Complete the common prerequisites.
-
Export your NVIDIA API key as an environment variable:
export NVIDIA_API_KEY="nvapi-<...>"
-
Start the containers:
cd RAG/examples/basic_rag/llamaindex/ docker compose up -d --build
Example Output
✔ Network nvidia-rag Created ✔ Container rag-playground Started ✔ Container milvus-minio Started ✔ Container chain-server Started ✔ Container milvus-etcd Started ✔ Container milvus-standalone Started
-
Confirm the containers are running:
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
Example Output
CONTAINER ID NAMES STATUS 39a8524829da rag-playground Up 2 minutes bfbd0193dbd2 chain-server Up 2 minutes ec02ff3cc58b milvus-standalone Up 3 minutes 6969cf5b4342 milvus-minio Up 3 minutes (healthy) 57a068d62fbb milvus-etcd Up 3 minutes (healthy)
-
Open a web browser and access http://localhost:8090 to use the RAG Playground.
Refer to Using the Sample Web Application for information about uploading documents and using the web interface.
- Vector Database Customizations
- Stop the containers by running
docker compose down
. - Use the RAG Application: Text QA Chatbot Helm chart to deploy this example in Kubernetes.