Skip to content

Latest commit

 

History

History
43 lines (29 loc) · 2.92 KB

README.md

File metadata and controls

43 lines (29 loc) · 2.92 KB

Neo4j GraphRAG with GNN+LLM

Knowledge graph retrieval to improve multi-hop Q&A performance, optimized with GNN + LLM models.

This repo contains experiments for combining Knowledge Graph Retrieval with GNN+LLM models to improve RAG. Currently leveraging Neo4j, G-Retriever, and the STaRK-Prime dataset for benchmarking.

Architecture Overview

Architecture

  • RAG on large knowledge graphs that require multi-hop retrieval and reasoning, beyond node classification and link prediction.
  • General, extensible 2-part architecture: KG Retrieval & GNN+LLM.
  • Efficient, stable inference time and output for real-world use cases.

Installation

The database & dataset

Install the Neo4j database (and relevant JDK) by following official instructions. You'll also need the Neo4j GenAI plugin.

With the database installed and running, you can load the STaRK-Prime dataset by running the python notebook in data-loading/stark_prime_neo4j_loading.ipynb. Alternatively, obtain a database dump at AWS S3 for database version 5.23.

Other requirements

Install all required libraries in requirements.txt. Additionally, make sure huggingface-cli authentications are set up for using relevant (Llama2, Llama3) models.

Reproduce results

  1. To train a model with default configurations, run the following command: python train.py --checkpointing --llama_version llama3.1-8b --retrieval_config_version 0 --algo_config_version 0 --g_retriever_config_version 0 --eval_batch_size 4
  2. To get result for Pipline, run eval_pcst_ordering.ipynb on using the intermediate dataset and g-retriever model.
  3. To exactly reproduce results in the below table, use the stanford-workshop-2024 branch. The main branch contains new incremental changes and improvements.

Table Description

Additional Neo4j GraphRAG Resources