Evaluating batch-wise subgraph-querying of large graphs in relational and graph databases for Graph Neural Networks

📘 Project Overview

This project evaluates the efficiency of relational (PostgreSQL, MySQL) and graph-based (Neo4j) databases for batch-wise subgraph querying in Graph Neural Network (GNN) training. It identifies optimal backends for different GNN depths and CRUD operations, aiming to enhance memory efficiency and reduce training overhead on large-scale graphs.

✨ Key Features

🔍 Subgraph Query Benchmarking
Performance analysis for 1st to 3rd-order subgraph queries.
⚙️ Full CRUD Evaluation
Measures Create, Read, Update, and Delete times across databases.
🧪 Synthetic and Real Datasets
Includes scale-free and PPI graphs with controlled edge/node parameters.
📊 GNN-Compatible Output
Outputs PyTorch Geometric–ready feature matrices and COO edge indices.
🛠️ Schema-Aware DB Setup
Benchmarks both flat and list-column feature storage.

🚀 Getting Started

Clone the repository and set up your Python environment:

git clone https://github.com/danielwalke/DatabaseGraphEvaluation.git
cd DatabaseGraphEvaluation
# Optional: create virtual env
# python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

🛠️ Local Database Setup (Ubuntu)

This section explains how to install and configure PostgreSQL, Neo4j, and MySQL on Ubuntu with a unified credential scheme. Each service uses:

Username: postgres / neo4j / root
Password: password
Ports (default):
- PostgreSQL: 5432
- MySQL: 3306
- Neo4j: 7474 (HTTP), 7687 (Bolt)

1️⃣ PostgreSQL Setup

sudo apt update
sudo apt install postgresql postgresql-contrib -y

# Set password for the 'postgres' user
sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';"

# Restart PostgreSQL
sudo systemctl restart postgresql

✅ Access with:

psql -U postgres -h localhost

2️⃣ Neo4j Setup

# Add repository and key
wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/neo4j.gpg
echo "deb [signed-by=/usr/share/keyrings/neo4j.gpg] https://debian.neo4j.com stable 5" | sudo tee /etc/apt/sources.list.d/neo4j.list

# Install and start
sudo apt update
sudo apt install neo4j -y
sudo systemctl enable neo4j
sudo systemctl start neo4j

✅ Set password:

Open http://localhost:7474
Login with:
- Username: neo4j
- Initial Password: neo4j
You will be prompted to change it → set to password

3️⃣ MySQL Setup (with Local File Uploads)

sudo apt update
sudo apt install mysql-server -y

# Secure install (optional)
sudo mysql_secure_installation

Set password and enable local_infile:

sudo mysql -u root

-- Inside MySQL shell:
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
SET GLOBAL local_infile = 1;
EXIT;

Enable local file import permanently:

echo "[mysqld]
local_infile=1" | sudo tee -a /etc/mysql/mysql.conf.d/mysqld.cnf

sudo systemctl restart mysql

✅ Test with:

mysql -u root -p --local-infile=1

All three databases are now configured with consistent credentials and ready for integration with your GNN benchmarking pipeline. 🚀

Generating synthetic datasets

Graphs with a fixed number of input edges:

 python SynthDataGeneration.py

Graphs with various numbers of input edges (scale-free graphs):

 python SynthRealDistGraphs.py

Docu

First, main.py serves as the entry point.
Evaluator is executed, which performs the pre-specified number of iterations for the specified databases
DBMSEvaluator is an executor that iterates over all datasets

DBMSEValuator executes CRUDEvaluator which executes all CRUD operations
The connectors represent drivers to connect to the database (A)
These are used in the evaluators for each database that are executed in CRUDEvaluators (B)
The queries used in the evaluators for all CURD operations are listed in Queries (C)
The data reader (E) reads all data from input files

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
dbms_configs		dbms_configs
docu		docu
figures		figures
CRUD.py		CRUD.py
DBMSEvaluator.py		DBMSEvaluator.py
DBMSEvaluatorRouter.py		DBMSEvaluatorRouter.py
Data.py		Data.py
Dockerfile		Dockerfile
Evaluator.py		Evaluator.py
InMemorySubgraphReader.py		InMemorySubgraphReader.py
MeanStdDfs.ipynb		MeanStdDfs.ipynb
MySQLCol.py		MySQLCol.py
MySQLConnector.py		MySQLConnector.py
MySQLList.py		MySQLList.py
MySQLQuery.py		MySQLQuery.py
MySQLQueryList.py		MySQLQueryList.py
MySQL_List.py		MySQL_List.py
Neo4jCol.py		Neo4jCol.py
Neo4jConnector.py		Neo4jConnector.py
Neo4jList.py		Neo4jList.py
Neo4jQuery.py		Neo4jQuery.py
Neo4jQueryList.py		Neo4jQueryList.py
PlotResults.ipynb		PlotResults.ipynb
PostgresCol.py		PostgresCol.py
PostgresConnector.py		PostgresConnector.py
PostgresList.py		PostgresList.py
PostgresQuery.py		PostgresQuery.py
PostgresQueryList.py		PostgresQueryList.py
Postgres_Col.py		Postgres_Col.py
README.md		README.md
Subgraph.py		Subgraph.py
SynthDataGeneration.py		SynthDataGeneration.py
SynthRealDistGraphs.py		SynthRealDistGraphs.py
main.py		main.py
ogb_dataset.py		ogb_dataset.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluating batch-wise subgraph-querying of large graphs in relational and graph databases for Graph Neural Networks

📘 Project Overview

✨ Key Features

🚀 Getting Started

🛠️ Local Database Setup (Ubuntu)

1️⃣ PostgreSQL Setup

2️⃣ Neo4j Setup

3️⃣ MySQL Setup (with Local File Uploads)

Generating synthetic datasets

Docu

About

Uh oh!

Releases

Packages

Uh oh!

Languages

danielwalke/DatabaseGraphEvaluation

Folders and files

Latest commit

History

Repository files navigation

Evaluating batch-wise subgraph-querying of large graphs in relational and graph databases for Graph Neural Networks

📘 Project Overview

✨ Key Features

🚀 Getting Started

🛠️ Local Database Setup (Ubuntu)

1️⃣ PostgreSQL Setup

2️⃣ Neo4j Setup

3️⃣ MySQL Setup (with Local File Uploads)

Generating synthetic datasets

Docu

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages