An agentic RAG engine with support for heterogeneous source data formats, query routing between local and external knowledge sources, multiple topics.
Components:
- LangGraph ingestion workflow
- LangGraph query workflow
- FastAPI backend
- Admin CLI
- NextJS front end (in a separate repository)
demo.mp4
Table of Contents
- Description
- Setup
- Running the API
- Running the front end
- Running the CLI
- Development
- Architecture and development notes
git clone https://github.com/bard/rag-engine
cd rag-engine
poetry install
cp .env.example .env
Edit .env
to specify API keys and database connection strings.
poetry run task start_api
git clone https://github.com/bard/rag-frontend
cd rag-frontend
pnpm install
pnpm dev
$ poetry shell
$ python src/cli.py initdb
Database initialized successfully
$ python src/cli.py create_topic --name Paris
Created topic 'Paris' with ID: 059a97ed-3d7d-4fc9-a2b6-9b12df52b414
$ python src/cli.py ingest https://en.wikivoyage.org/wiki/Paris
Data ingested successfully
$ python src/cli.py list_topics
Available topics:
059a97ed-3d7d-4fc9-a2b6-9b12df52b414: Paris
$ poetry run python src/cli.py query --topic_id 059a97ed-3d7d-4fc9-a2b6-9b12df52b414 'what are some nice things to see?'
Some nice things to see in Paris include the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Additionally, the charming neighborhood of Montmartre and the historic district of Le Marais are also worth exploring.
Run tests in watch mode:
poetry run task test_watch
When adding a test for code that relies on LLM calls, run poetry run task test_with_new_network_calls
(see LLMs and testing below.)
%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
__start__([<p>__start__</p>]):::first
fetch(fetch)
extract(extract)
ingest(ingest)
__end__([<p>__end__</p>]):::last
__start__ --> fetch;
extract --> ingest;
fetch --> extract;
ingest --> __end__;
classDef default fill:#f2f0ff,line-height:1.2
classDef first fill-opacity:0
classDef last fill:#bfb6fc
There are three data extractors, meant to provide a framework and examples within the framework, not to exhaust the possibilities:
- specialized local parsing for well-known structured data (on the ateam branch)
- generic textual data
- generic LLM-driven parsing of tabular data
extract runs through extractors in sequence until one is successful. It's up to the extractor to bail out early if it recognizes it cannot do anything useful with the received data.
%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
__start__([<p>__start__</p>]):::first
classify_query(classify_query)
retrieve_from_weather_service(retrieve_from_weather_service)
retrieve_from_knowledge_base(retrieve_from_knowledge_base)
rerank(rerank)
generate(generate)
__end__([<p>__end__</p>]):::last
__start__ --> classify_query;
generate --> __end__;
rerank --> generate;
retrieve_from_knowledge_base --> rerank;
retrieve_from_weather_service --> retrieve_from_knowledge_base;
classify_query -.-> retrieve_from_weather_service;
classify_query -.-> retrieve_from_knowledge_base;
classDef default fill:#f2f0ff,line-height:1.2
classDef first fill-opacity:0
classDef last fill:#bfb6fc
The conditional edge and the node retrieve_from_weather_service
isn't necessarily the best design for sourcing external knowledge, and a case could be made for either:
- the
classify_query
node populating anexternal_knowledge_sources
array in the agent's state with a list of sources it decided it would be useful to query (theclassify_query
already does this for the limited case of weather queries), then passing control to theretrieve
node for retrieval from all knowledge sources, both local and external; - defining external knowledge sources as LangChain tools and leaving it to the LLM to decide whether to call call those tools.
Class abstractions for the agentic functionality are intentionally avoided since configuration and state are already covered by LangGraph-native concepts (agent state and RunnableConfig
).
All runnables (workflow nodes, but also API route handlers and CLI commands) instantiate their own dependencies (database connections, third-party API clients, ...) upon invocation, based on the configuration object, instead of expecting them from module scope. Together with the configuration object being strictly serializable, this allows extracting a runnable to a separate process (e.g. lambda) with minimal effort if the need arises.
vcr.py is used to keep tests realistic, cheap, fast, and to protect from the variability of LLM responses. When a test marked with @pytest.mark.vcr
runs for the first time, requests go to the network and responses are recorded; in subsequent runs, recorded responses are replayed, thus avoiding latency and API billing, and ensuring stable responses.
The following is missing:
- database migrations
- post-retrieval reranking (only stubbed)
- protection against prompt injection
- monitoring
- support for vector stores other than ChromaDB (Pinecone is stubbed)
- multi-user
- per-task LLM configuration
Any SQL database supported by SQLAlchemy should work, but only SQLite and Postgres are tested.