RAGViz (Retrieval Augmented Generation Visualization) is a tool that visualizes both document and token-level attention on the retrieved context feeded to the LLM to ground answer generation.
- RAGViz provides an add/remove document functionality to compare the generated tokens when certain documents are not included in the context.
- Combining both functionalities allows for a diagnosis on the effectiveness and influence of certain retrieved documents or sections of text on the LLM's answer generation.
A basic demonstration of RAGViz is available here.
The following are the system configurations of our RAGViz demonstration:
- The Pile-CC English documents are used for retrieval
- Documents are partioned into 4 DiskANN indexes on separate nodes, each with ~20 million documents
- Documents are embedded into feature vectors using AnchorDR. To use AnchorDR in RAGViz you must follow the installation instructions on the repo here to ensure your Python environment is set up correctly. Do this after running
pip install -r backend/requirements.txt
. - LLaMa2 generation/attention output done with vLLM and HuggingFace transformers library
- Frontend UI is adapted from Lepton search engine
You can modify the snippets used for context in RAG by adding a new file and class in backend/snippet
, adding it to backend/ragviz.py
and frontend/src/app/components/search.tsx
. We currently offer the following snippets:
- Naive First:
- Represent a document with its first 128 tokens
- Sliding Window
- Compute inner product similarity between windows of 128 tokens and the query; use the most similar window to the query to represent a document
New datasets for retrieval can be added using a new file and class in backend/search
, and modifying backend/ragviz.py
accordingly.
We currently have implemented both a implementation the following datasets:
- Clueweb22B english documents
- Pile-CC dataset
Any model supported by HuggingFace transformers library can be used as the LLM backbone.
To apply vLLM for fast inference, the LLM backbone needs to be supported by vLLM. A list of vLLM supported model is available here.
You can set the model path of the model for RAG inside of backend/.env.example
. We used meta-llama/Llama-2-7b-chat-hf
for the demo.