Evaluating batch-wise subgraph-querying of large graphs in relational and graph databases for Graph Neural Networks
This project evaluates the efficiency of relational (PostgreSQL, MySQL) and graph-based (Neo4j) databases for batch-wise subgraph querying in Graph Neural Network (GNN) training. It identifies optimal backends for different GNN depths and CRUD operations, aiming to enhance memory efficiency and reduce training overhead on large-scale graphs.
-
🔍 Subgraph Query Benchmarking
Performance analysis for 1st to 3rd-order subgraph queries. -
⚙️ Full CRUD Evaluation
Measures Create, Read, Update, and Delete times across databases. -
🧪 Synthetic and Real Datasets
Includes scale-free and PPI graphs with controlled edge/node parameters. -
📊 GNN-Compatible Output
Outputs PyTorch Geometric–ready feature matrices and COO edge indices. -
🛠️ Schema-Aware DB Setup
Benchmarks both flat and list-column feature storage.
Clone the repository and set up your Python environment:
git clone https://github.com/danielwalke/DatabaseGraphEvaluation.git
cd DatabaseGraphEvaluation
# Optional: create virtual env
# python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
This section explains how to install and configure PostgreSQL, Neo4j, and MySQL on Ubuntu with a unified credential scheme. Each service uses:
- Username:
postgres
/neo4j
/root
- Password:
password
- Ports (default):
- PostgreSQL:
5432
- MySQL:
3306
- Neo4j:
7474 (HTTP)
,7687 (Bolt)
- PostgreSQL:
sudo apt update
sudo apt install postgresql postgresql-contrib -y
# Set password for the 'postgres' user
sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';"
# Restart PostgreSQL
sudo systemctl restart postgresql
✅ Access with:
psql -U postgres -h localhost
# Add repository and key
wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/neo4j.gpg
echo "deb [signed-by=/usr/share/keyrings/neo4j.gpg] https://debian.neo4j.com stable 5" | sudo tee /etc/apt/sources.list.d/neo4j.list
# Install and start
sudo apt update
sudo apt install neo4j -y
sudo systemctl enable neo4j
sudo systemctl start neo4j
✅ Set password:
- Open http://localhost:7474
- Login with:
- Username:
neo4j
- Initial Password:
neo4j
- Username:
- You will be prompted to change it → set to
password
sudo apt update
sudo apt install mysql-server -y
# Secure install (optional)
sudo mysql_secure_installation
Set password and enable local_infile
:
sudo mysql -u root
-- Inside MySQL shell:
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
SET GLOBAL local_infile = 1;
EXIT;
Enable local file import permanently:
echo "[mysqld]
local_infile=1" | sudo tee -a /etc/mysql/mysql.conf.d/mysqld.cnf
sudo systemctl restart mysql
✅ Test with:
mysql -u root -p --local-infile=1
All three databases are now configured with consistent credentials and ready for integration with your GNN benchmarking pipeline. 🚀
- Graphs with a fixed number of input edges:
python SynthDataGeneration.py
- Graphs with various numbers of input edges (scale-free graphs):
python SynthRealDistGraphs.py
- First, main.py serves as the entry point.
- Evaluator is executed, which performs the pre-specified number of iterations for the specified databases
- DBMSEvaluator is an executor that iterates over all datasets
- DBMSEValuator executes CRUDEvaluator which executes all CRUD operations
- The connectors represent drivers to connect to the database (A)
- These are used in the evaluators for each database that are executed in CRUDEvaluators (B)
- The queries used in the evaluators for all CURD operations are listed in Queries (C)
- The data reader (E) reads all data from input files