This repository serves as a starting point for generative AI developers looking to integrate with the NVIDIA software ecosystem to accelerate their generateive AI systems. Whether you are building RAG pipelines, agentic workflows, or finetuning models, this repository will help you integrate NVIDIA, seamlesly and natively, with your development stack.
The example implements a GPU-accelerated pipeline for creating and querying knowledge graphs using RAG by leveraging NIM microservices and the RAPIDS ecosystem for efficient processing of large-scale datasets.
- Build an Agentic RAG Pipeline with Llama 3.1 and NVIDIA NeMo Retriever NIM microservices [Blog, notebook]
- NVIDIA Morpheus, NIM microservices, and RAG pipelines integrated to create LLM-based agent pipelines
- Tips for Building a RAG Pipeline with NVIDIA AI LangChain AI Endpoints by Amit Bleiweiss. [Blog, notebook]
For more details view the releases.
Experience NVIDIA RAG Pipelines with just a few steps!
-
Get your NVIDIA API key.
Visit the NVIDIA API Catalog, select on any model, then click on
Get API Key
Afterward, run
export NVIDIA_API_KEY=nvapi-...
. -
Clone the repository and then build and run the basic RAG pipeline:
git clone https://github.com/nvidia/GenerativeAIExamples.git cd GenerativeAIExamples/RAG/examples/basic_rag/langchain/ docker compose up -d --build
Open a browser to https://localhost:8090/ and submit queries to the sample RAG Playground.
When done, stop containers by running docker compose down
.
NVIDIA has first class support for popular generative AI developer frameworks like LangChain, LlamaIndex and Haystack. These notebooks will show you how to integrate NIM microservices using your preferred generative AI development framework.
Use the notebooks to learn about the LangChain and LlamaIndex connectors.
- RAG
- Agents
By default, the examples use preview NIM endpoints on NVIDIA API Catalog. Alternatively, you can run any of the examples on premises.
- Change the inference or embedding model
- Customize the vector database
- Customize the chain server:
- Support multiturn conversations
- Configure LLM parameters at runtime
- Speak queries and listen to responses with NVIDIA Riva.
Example tools and tutorials to enhance LLM development and productivity when using NVIDIA RAG pipelines.
We're posting these examples on GitHub to support the NVIDIA LLM community and facilitate feedback. We invite contributions! Open a GitHub issue or pull request!
Check out the community examples and notebooks.
-
NVIDIA Tokkio LLM-RAG: Use Tokkio to add avatar animation for RAG responses.
-
Hybrid RAG Project on AI Workbench: Run an NVIDIA AI Workbench example project for RAG.