Skip to content

Commit

Permalink
dev (#1269)
Browse files Browse the repository at this point in the history
* Feature/encapsulate orchestration (#1265)

* fully encapsulate orchestration

* fully encapsulate orchestration

* complete encapsulation

* revert import cmt

* making default r2r lighter (#1268)

* making default r2r lighter

* fix bug in ingest files

* checkin

* workingupdate

* complete simple orch

* update docs

* up (#1273)

* up

* up

* merge (#1276)

* Postgres configuration settings (#1277)

* Improvements on Auth in JS, CLI (#1267)

* CLI Telemetry (#1266)

* check in

* working

* redundant

* JS auth improvements (#1263)

* Check in JS auth improvements

* Update login with toke

* Fix to allow disabling telemetry

* fix lock

* Try to avoid merge conflicts

* Clean up collection bugs

* remove comments

* Add Postgres configuration settings

* Image

* bad github conflict

* merge (#1278)

* port KG to postgres (#1272)

* create + cluster

* local search

* up

* clean

* format

* basics

* add collection_id and paginate

* rename

* change api

* up

* kg_creation_status

* up

* up

* up

* Feature/cleanup docker (#1279)

* merge

* up

* rm neo4j refs and cleanup docker cmds

* fixup

* Patch/cleanup kg migration (#1281)

* cleanup kg migration

* up

* Kg testing (#1280)

* up

* up

* up

* up

* slay neo4j

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>

* add back poetry lock

* Default Collections (#1282)

* Default collections

* Naughty naughty need to follow the SRP

* Testing (#1284)

* CICD

* actions

* poetry

* poetry

* Add env vars

* name

* increase timeout

* add user to collection

* Kg testing (#1283)

* up

* up

* cleanup kg migration

* up

* up

* up

* Kg testing (#1280)

* up

* up

* up

* up

* rename

* project name

* up

* add chunk order

* fragments => extractions

* bug squash

* up

* up

* up

* change postgres project name

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>

* Feature/fix logic bugs (#1285)

* fixing minor logic bugs in dev branch

* fixing minor logic bugs in dev branch

* merge

* Application docs

* add image (#1287)

* Add version to CLI telemetry (#1288)

* add image

* Add version to cli telemetry

* KG hatchet orchestration (#1286)

* up

* up

* cleanup kg migration

* up

* up

* up

* Kg testing (#1280)

* up

* up

* up

* up

* rename

* project name

* up

* add chunk order

* fragments => extractions

* bug squash

* up

* up

* up

* change postgres project name

* up

* up

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>

* Feature/update documentation rebased (#1289)

* up

* merge

* rebase

* fix ingestion issues (#1291)

* fix ingestion issues

* fix lock file

* fix embedding

* Fix SDK KG Serialization (#1292)

* add image

* serialization

* cleanup cli (#1294)

* CLI serialization (#1295)

* add image

* Fix more serialization around kg

* Nolan/schemacreation (#1296)

* add image

* Fix more serialization around kg

* add quotes to prevent reserved keywords from failing

* Prevent errors if config name is reserved name in postgres (#1297)

* Prevent reserved words (#1298)

* Move default collection id method to utils (#1299)

* Allow json fallback (#1301)

* hotfix: import

* Fix description error (#1302)

* up (#1303)

* rename to `full` (#1304)

* rename to `full`

* add html parser

* Remove postgres vecs variables (#1306)

* Feature/rename ingest files (#1307)

* rename to `full`

* add html parser

* Vec Removal (#1308)

* Remove postgres vecs variables

* up

* change kg settings parsing (#1309)

* offset + limit (#1305)

* offset + limit

* fix order

* update query

* change entity offset

* leiden seed

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
  • Loading branch information
3 people authored Oct 2, 2024
1 parent 8644a08 commit 3721fcb
Show file tree
Hide file tree
Showing 257 changed files with 8,098 additions and 8,247 deletions.
6 changes: 0 additions & 6 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,3 @@ export POSTGRES_HOST=your_host
export POSTGRES_PORT=your_port
export POSTGRES_DBNAME=your_db
export POSTGRES_PROJECT_NAME=your_project_name

# Environment variables for KG provider (currently only Neo4j)
# export NEO4J_USER=YOUR_NEO4J_USER
# export NEO4J_PASSWORD=YOUR_NEO4J_PASSWORD
# export NEO4J_URL=YOUR_NEO4J_URL
# export NEO4J_DATABASE=YOUR_NEO4J_DATABASE
169 changes: 93 additions & 76 deletions .github/workflows/integration-test-workflow-debian.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Debian R2R Docker Build and Integration Test (Debian GNU/Linux 12 (bookworm) amd64)
name: R2R CLI Integration Test (Debian GNU/Linux 12 (bookworm) amd64)

on:
push:
Expand All @@ -8,124 +8,141 @@ on:

jobs:
build-and-test:
runs-on: arm3
runs-on: ubuntu-latest
permissions:
packages: write
contents: read
id-token: write
actions: write
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
TELEMETRY_ENABLED: false
POSTGRES_USER: ${{ secrets.POSTGRES_USER }}
POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }}
POSTGRES_DBNAME: ${{ secrets.POSTGRES_DBNAME }}
POSTGRES_HOST: ${{ secrets.POSTGRES_HOST }}
POSTGRES_PORT: ${{ secrets.POSTGRES_PORT }}
POSTGRES_PROJECT_NAME: ${{ secrets.POSTGRES_PROJECT_NAME }}

steps:
- uses: actions/checkout@v4

- name: Clean up disk space
uses: jlumbroso/free-disk-space@main
- name: Set up Python
uses: actions/setup-python@v4
with:
tool-cache: true
android: true
dotnet: true
haskell: true
large-packages: true
swap-storage: true

- name: Docker Auth
uses: docker/login-action@v3
with:
username: ${{ secrets.RAGTORICHES_DOCKER_UNAME }}
password: ${{ secrets.RAGTORICHES_DOCKER_TOKEN }}

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Set up QEMU
uses: docker/setup-qemu-action@v3
python-version: '3.x'

- name: Set image name
id: image
- name: Install Poetry
run: |
echo "IMAGE_NAME=ragtoriches/dev" >> $GITHUB_OUTPUT
curl -sSL https://install.python-poetry.org | python3 -
- name: Build and Push Docker Image
uses: docker/build-push-action@v5
with:
context: ./py
file: ./py/Dockerfile
push: true
tags: ragtoriches/dev:latest
platforms: linux/amd64
no-cache: true
pull: true

- name: Run cloud LLM integration tests in Docker
- name: Install dependencies
working-directory: ./py
run: |
python3 -m venv venv
source venv/bin/activate
pip install -e .
echo "R2R Version"
r2r version
echo "R2R Serve --docker"
r2r serve --docker --exclude-neo4j=true --exclude-ollama=true --image=ragtoriches/dev:latest
poetry install -E core -E ingestion-bundle
- name: Start R2R server
working-directory: ./py
run: |
poetry run r2r serve &
echo "Waiting for services to start..."
sleep 30
- name: Run integration tests
working-directory: ./py
run: |
echo "R2R Version"
poetry run r2r version
- name: Walkthrough
working-directory: ./py
run: |
echo "Ingest Data"
r2r ingest-sample-files
poetry run r2r ingest-sample-files
echo "Get Documents Overview"
r2r documents-overview
poetry run r2r documents-overview
echo "Get Document Chunks"
r2r document-chunks --document-id=77f67c65-6406-5076-8176-3844f3ef3688
poetry run r2r document-chunks --document-id=9fbe403b-c11c-5aae-8ade-ef22980c3ad1
echo "Delete Documents"
r2r delete --filter="document_id:eq:f25fd516-5cac-5c09-b120-0fc841270c7e"
poetry run r2r delete --filter=document_id:eq:9fbe403b-c11c-5aae-8ade-ef22980c3ad1
echo "Update Document"
poetry run r2r update-files core/examples/data/aristotle_v2.txt --document-ids=9fbe403b-c11c-5aae-8ade-ef22980c3ad1
echo "Vector Search"
r2r search --query="What was Uber'\''s profit in 2020?"
poetry run r2r search --query="What was Uber's profit in 2020?"
echo "Hybrid Search"
r2r search --query="What is a fierce nerd?" --use-hybrid-search
r2r search --query="What was Uber's profit in 2020?" --use-hybrid-search
echo "Basic RAG"
r2r rag --query="What was Uber'\''s profit in 2020?"
poetry run r2r rag --query="What was Uber's profit in 2020?"
echo "RAG with Hybrid Search"
r2r rag --query="Who is John Snow?" --use-hybrid-search
poetry run r2r rag --query="Who is John Snow?" --use-hybrid-search
echo "Streaming RAG"
r2r rag --query="What was Lyft'\''s profit in 2020?" --stream
poetry run r2r rag --query="who was aristotle" --use-hybrid-search --stream
echo "User Registration"
curl -X POST http://localhost:7272/v2/register \
-H "Content-Type: application/json" \
-d '{
"email": "[email protected]",
"password": "password123"
}'
echo "User Login"
curl -X POST http://localhost:7272/v2/login \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "[email protected]&password=password123"
echo "Users Overview"
r2r users-overview
poetry run r2r users-overview
echo "Logging"
poetry run r2r logs
echo "Analytics"
r2r analytics --filters '{"search_latencies": "search_latency"}' --analysis-types '{"search_latencies": ["basic_statistics", "search_latency"]}'
poetry run r2r analytics --filters '{"search_latencies": "search_latency"}' --analysis-types '{"search_latencies": ["basic_statistics", "search_latency"]}'
echo "Logging"
r2r logs
- name: GraphRAG
working-directory: ./py
run: |
echo "Create Knowledge Graph"
poetry run r2r create-graph --document-ids=9fbe403b-c11c-5aae-8ade-ef22980c3ad1
echo "Docker Down"
r2r docker-down
echo "Inspect Knowledge Graph"
poetry run r2r inspect-knowledge-graph
cleanup:
needs: build-and-test
runs-on: arm3
if: always()
steps:
- name: Clean up Virtual Environment
echo "Graph Enrichment"
poetry run r2r enrich-graph
echo "Local Search"
r2r search --query="Who is Aristotle?" --use-kg-search --kg-search-type=local
echo "Global Search"
r2r search --query="What were Aristotles key contributions to philosophy?" --use-kg-search --kg-search-type=global --max-llm-queries-for-global-search=100
echo "RAG"
r2r rag --query="What are the key contributions of Aristotle to modern society?" --use-kg-search --kg-search-type=global --max-llm-queries-for-global-search=100
- name: Advanced RAG
working-directory: ./py
run: |
echo "HyDE"
poetry run r2r rag --query="who was aristotle" --use-hybrid-search --stream --search-strategy=hyde
echo "Rag-Fusion"
r2r rag --query="Explain the theory of relativity" --use-hybrid-search --stream --search-strategy=rag_fusion
- name: Stop R2R server
run: |
if [ -d "venv" ]; then
deactivate || true
rm -rf venv
fi
docker stop $(docker ps -a -q) || true
docker system prune -af --volumes
docker network prune --force
docker volume rm $(docker volume ls -qf dangling=true) || true
pkill -f "r2r serve"
2 changes: 1 addition & 1 deletion docs/api-reference/openapi.json

Large diffs are not rendered by default.

24 changes: 17 additions & 7 deletions docs/cookbooks/application.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ icon: 'display'

R2R offers an [open-source React+Next.js application](https://github.com/SciPhi-AI/R2R-Application) designed to give developers an administrative portal for their R2R deployment, and users an application to communicate with out of the box.

In addition, R2R comes with an orchestration GUI powered by Hatchet, which you can learn about [here](/cookbooks/orchestration).

## Setup

### Install PNPM
Expand Down Expand Up @@ -36,7 +34,7 @@ After installation, you may need to add PNPM to your system's PATH.

### Installing and Running the R2R Dashboard

If you're running R2R with the Docker, you already have the R2R dashboard running! Just navigate to [http://localhost:3000](http://localhost:3000).
If you're running R2R with the Docker, you already have the R2R application running! Just navigate to [http://localhost:7273](http://localhost:7273).

If you're running R2R outside of Docker, run the following commands to install the R2R Dashboard.

Expand Down Expand Up @@ -74,15 +72,27 @@ By default, an R2R instance is hosted on port 7272. The login page will include

### Documents

The documents page provides an overview of uploaded documents and their metadata. You can upload new documents and update or delete existing ones.
The documents page provides an overview of uploaded documents and their metadata. You can upload new documents and update, download, or delete existing ones. Additionally, you can view information about each document, including the documents' chunks and previews of PDFs.

![Documents Page](/images/oss_dashboard_documents.png)

### Playground
### Collections

Collections allow users to create and share sets of documents. The collections page provides a place to manage your existing collections or create new collections.

![Collections Page](/images/oss_collections_page.png)

### Chat

In the chat page, you can stream RAG responses with different models and configurable settings. You can interact with both the RAG Agent and RAG endpoints here.

![Chat Interface](/images/chat.png)

### Users

The playground allows streaming RAG responses with different models and configurable settings.
Manage your users and gain insight into their interactions.

![Playground Interface](/images/playground.png)
![Users Page](/images/users.png)

### Logs

Expand Down
25 changes: 11 additions & 14 deletions docs/cookbooks/graphrag.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,25 +30,25 @@ r2r serve
<Accordion icon="gear" title="Configuration: r2r.toml">
``` toml
[kg]
provider = "neo4j"
provider = "postgres"
batch_size = 256

[kg.kg_creation_settings]
kg_extraction_prompt = "graphrag_triplet_extraction_zero_shot"
kg_triples_extraction_prompt = "graphrag_triples_extraction_few_shot"
entity_types = [] # if empty, all entities are extracted
relation_types = [] # if empty, all relations are extracted
max_knowledge_triples = 100
fragment_merge_count = 4 # number of fragments to merge into a single extraction
generation_config = { model = "gpt-4o-mini" } # and other params, model used for triplet extraction
generation_config = { model = "openai/gpt-4o-mini" } # and other params, model used for triplet extraction

[kg.kg_enrichment_settings]
max_description_input_length = 65536 # increase if you want more comprehensive descriptions
max_summary_input_length = 65536 # increase if you want more comprehensive summaries
generation_config = { model = "gpt-4o-mini" } # and other params, model used for node description and graph clustering
leiden_params = { max_levels = 10 } # more params here: https://neo4j.com/docs/graph-data-science/current/algorithms/leiden/
generation_config = { model = "openai/gpt-4o-mini" } # and other params, model used for node description and graph clustering
leiden_params = {}

[kg.kg_search_settings]
generation_config = { model = "gpt-4o-mini" }
generation_config = { model = "openai/gpt-4o-mini" }
```
</Accordion>
</Tab>
Expand Down Expand Up @@ -92,13 +92,13 @@ batch_size = 32
add_title_as_prefix = true

[parsing]
excluded_parsers = [ "gif", "jpeg", "jpg", "png", "svg", "mp3", "mp4" ]
excluded_parsers = [ "mp4" ]

[kg]
provider = "neo4j"
provider = "postgres"

[kg.kg_creation_settings]
kg_extraction_prompt = "graphrag_triplet_extraction_zero_shot"
kg_triples_extraction_prompt = "graphrag_triples_extraction_few_shot"
entity_types = [] # if empty, all entities are extracted
relation_types = [] # if empty, all relations are extracted
max_knowledge_triples = 100
Expand All @@ -109,7 +109,7 @@ provider = "neo4j"
max_description_input_length = 65536 # increase if you want more comprehensive descriptions
max_summary_input_length = 65536
generation_config = { model = "ollama/llama3.1" } # and other params, model used for node description and graph clustering
leiden_params = { max_levels = 10 } # more params here: https://neo4j.com/docs/graph-data-science/current/algorithms/leiden/
leiden_params = {}

[kg.kg_search_settings]
generation_config = { model = "ollama/llama3.1" }
Expand Down Expand Up @@ -175,10 +175,7 @@ r2r create-graph --document-ids=9fbe403b-c11c-5aae-8ade-ef22980c3ad1
[{'message': 'Graph creation task queued successfully.', 'task_id': 'd9dae1bb-5862-4a16-abaf-5297024df390'}]
```

This step will create a knowledge graph with nodes and relationships. You can visualize the graph in two ways:


1. Using the neo4j browser on `http://localhost:7474`. The username and password are `neo4j` and `ineedastrongerpassword`. To visualize the graph, run the following command in the neo4j browser:
This step will create a knowledge graph with nodes and relationships. Below is a visualization of the graph which we produced with Neo4j:

```
MATCH (a)
Expand Down
2 changes: 1 addition & 1 deletion docs/cookbooks/observability.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ queries = [
# Perform random searches
for _ in range(1000):
query = random.choice(queries)
app.rag(query, GenerationConfig(model="gpt-4o-mini"))
app.rag(query, GenerationConfig(model="openai/gpt-4o-mini"))

print("Preloading complete. You can now run analytics on this data.")
```
Expand Down
Loading

0 comments on commit 3721fcb

Please sign in to comment.