This example showcases multi-turn conversational AI in a RAG pipeline. The chain server stores the conversation history and knowledge base in a vector database and retrieves them at runtime to understand contextual queries.
The example supports ingestion of PDF and text files. The documents are ingested in a dedicated document vector store, multi_turn_rag. The prompt for the example is tuned to act as a document chat bot. To maintain the conversation history, the chain server stores the previously asked query and the model's generated answer as a text entry in a different and dedicated vector store for conversation history, conv_store. Both of these vector stores are part of a LangChain LCEL chain as LangChain Retrievers. When the chain is invoked with a query, the query passes through both the retrievers. The retriever retrieves context from the document vector store and the closest-matching conversation history from conversation history vector store. The document chunks retrieved from the document vector store are then passed through a reranker model to determine the most relevant top_k context. The context is then passed onto the LLM prompt for response generation. Afterward, the chunks are added into the LLM prompt as part of the chain.
Model | Embedding | Ranking (Optional) | Framework | Vector Database | File Types |
---|---|---|---|---|---|
meta/llama3-8b-instruct | nvidia/nv-embedqa-e5-v5 | nvidia/nv-rerankqa-mistral-4b-v3 | LangChain | Milvus | TXT, PDF, MD |
Complete the common prerequisites.
-
Export your NVIDIA API key as an environment variable:
export NVIDIA_API_KEY="nvapi-<...>"
-
Start the containers:
cd RAG/examples/advanced_rag/multi_turn_rag/ docker compose up -d --build
Example Output
✔ Network nvidia-rag Created ✔ Container milvus-etcd Running ✔ Container milvus-minio Running ✔ Container milvus-standalone Running ✔ Container chain-server Started ✔ Container rag-playground Started
-
Confirm the containers are running:
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
Example Output
CONTAINER ID NAMES STATUS dd4fc3da6c9c rag-playground Up About a minute ac1f039a1db8 chain-server Up About a minute cd0a57ee20e0 milvus-standalone Up 2 hours a36370e7ed75 milvus-minio Up 2 hours (healthy) a796a4e59b68 milvus-etcd Up 2 hours (healthy)
-
Open a web browser and access http://localhost:8090 to use the RAG Playground.
Refer to Using the Sample Web Application for information about uploading documents and using the web interface.
- Vector Database Customizations
- Stop the containers by running
docker compose down
. - Use the RAG Application: Multi Turn Agent Helm Chart to deploy this example in Kubernetes.