GitHub - bard/rag-engine

Description

An agentic RAG engine with support for heterogeneous source data formats, query routing between local and external knowledge sources, multiple topics.

Components:

LangGraph ingestion workflow
LangGraph query workflow
FastAPI backend
Admin CLI
NextJS front end (in a separate repository)

Demo

demo.mp4

Table of Contents

Description
Setup
Running the API
Running the front end
Running the CLI
Development
Architecture and development notes

Setup

git clone https://github.com/bard/rag-engine
cd rag-engine
poetry install
cp .env.example .env

Edit .env to specify API keys and database connection strings.

Running the API

poetry run task start_api

Running the front end

git clone https://github.com/bard/rag-frontend
cd rag-frontend
pnpm install
pnpm dev

Running the CLI

$ poetry shell
$ python src/cli.py initdb

Database initialized successfully

$ python src/cli.py create_topic --name Paris

Created topic 'Paris' with ID: 059a97ed-3d7d-4fc9-a2b6-9b12df52b414

$ python src/cli.py ingest https://en.wikivoyage.org/wiki/Paris

Data ingested successfully

$ python src/cli.py list_topics

Available topics:
  059a97ed-3d7d-4fc9-a2b6-9b12df52b414: Paris

$ poetry run python src/cli.py query --topic_id 059a97ed-3d7d-4fc9-a2b6-9b12df52b414 'what are some nice things to see?'

Some nice things to see in Paris include the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Additionally, the charming neighborhood of Montmartre and the historic district of Le Marais are also worth exploring.

Development

Run tests in watch mode:

poetry run task test_watch

When adding a test for code that relies on LLM calls, run poetry run task test_with_new_network_calls (see LLMs and testing below.)

Architecture and development notes

The ingestion workflow

%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
	__start__([<p>__start__</p>]):::first
	fetch(fetch)
	extract(extract)
	ingest(ingest)
	__end__([<p>__end__</p>]):::last
	__start__ --> fetch;
	extract --> ingest;
	fetch --> extract;
	ingest --> __end__;
	classDef default fill:#f2f0ff,line-height:1.2
	classDef first fill-opacity:0
	classDef last fill:#bfb6fc

There are three data extractors, meant to provide a framework and examples within the framework, not to exhaust the possibilities:

extract runs through extractors in sequence until one is successful. It's up to the extractor to bail out early if it recognizes it cannot do anything useful with the received data.

The query workflow

%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
	__start__([<p>__start__</p>]):::first
	classify_query(classify_query)
	retrieve_from_weather_service(retrieve_from_weather_service)
	retrieve_from_knowledge_base(retrieve_from_knowledge_base)
	rerank(rerank)
	generate(generate)
	__end__([<p>__end__</p>]):::last
	__start__ --> classify_query;
	generate --> __end__;
	rerank --> generate;
	retrieve_from_knowledge_base --> rerank;
	retrieve_from_weather_service --> retrieve_from_knowledge_base;
	classify_query -.-> retrieve_from_weather_service;
	classify_query -.-> retrieve_from_knowledge_base;
	classDef default fill:#f2f0ff,line-height:1.2
	classDef first fill-opacity:0
	classDef last fill:#bfb6fc

The conditional edge and the node retrieve_from_weather_service isn't necessarily the best design for sourcing external knowledge, and a case could be made for either:

the classify_query node populating an external_knowledge_sources array in the agent's state with a list of sources it decided it would be useful to query (the classify_query already does this for the limited case of weather queries), then passing control to the retrieve node for retrieval from all knowledge sources, both local and external;
defining external knowledge sources as LangChain tools and leaving it to the LLM to decide whether to call call those tools.

Modelling, configuration, dependencies

Class abstractions for the agentic functionality are intentionally avoided since configuration and state are already covered by LangGraph-native concepts (agent state and RunnableConfig).

All runnables (workflow nodes, but also API route handlers and CLI commands) instantiate their own dependencies (database connections, third-party API clients, ...) upon invocation, based on the configuration object, instead of expecting them from module scope. Together with the configuration object being strictly serializable, this allows extracting a runnable to a separate process (e.g. lambda) with minimal effort if the need arises.

LLMs and testing

vcr.py is used to keep tests realistic, cheap, fast, and to protect from the variability of LLM responses. When a test marked with @pytest.mark.vcr runs for the first time, requests go to the network and responses are recorded; in subsequent runs, recorded responses are replayed, thus avoiding latency and API billing, and ensuring stable responses.

Limitations and possible improvements

The following is missing:

database migrations
post-retrieval reranking (only stubbed)
protection against prompt injection
monitoring
support for vector stores other than ChromaDB (Pinecone is stubbed)
multi-user
per-task LLM configuration

Any SQL database supported by SQLAlchemy should work, but only SQLite and Postgres are tested.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
demo.mp4		demo.mp4
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Description

Demo

Setup

Running the API

Running the front end

Running the CLI

Development

Architecture and development notes

The ingestion workflow

The query workflow

Modelling, configuration, dependencies

LLMs and testing

Limitations and possible improvements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

bard/rag-engine

Folders and files

Latest commit

History

Repository files navigation

Description

Demo

Setup

Running the API

Running the front end

Running the CLI

Development

Architecture and development notes

The ingestion workflow

The query workflow

Modelling, configuration, dependencies

LLMs and testing

Limitations and possible improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages