Thesis-GNN-Rec-2025

Introduction
Dataset
Methodology
Results
- Performance
- Influence Analysis
Project Structure
Quick Start
- QuickStart.ipynb
- quick_start.py
Setup
Download the Data

🕵️ Introduction

This repository contains code and resources for my final thesis titled "Application of Graph Neural Networks to Music Recommender Systems." Recommender Systems (RSs) play a crucial role in filtering vast amounts of data to deliver personalized content. Music Recommender Systems (MuRSs) enhance user experience by predicting preferences, helping users navigate extensive music libraries. Recent advancements in Graph Neural Networks (GNNs) have set new standards in RSs, but their evaluation remains inconsistent across datasets and splitting strategies. This work applies traditional and GNN-based models to a new music industry dataset, utilizing temporal data-splitting for a realistic evaluation. Therefore, the recent evaluation pipeline proposed by Malitesta et. al. (2024) has been applied and extented towards a broad set of models and beyond-accuracy metrics. Code and results are available in this repository.

💾 Dataset

MIDS ... Music Industry Dataset

# of customers: 58.747
# of records: 37.370

dataset	# rows, (users, items)	sparsity	features
`MIDS` (filtered)	17.665.904, ( 58.747, 37.370 )	99.1953 %	userID, itemID, timestamp

⚙️ Methodology

Following the steps within the evaluation pipeline:

Create data splits
Calculate dataset characteristics (classical & topological)
Apply traditional and GNNs-based models to each split (for RO & TO split)
Apply explanatory model (linear regression)

Distribution of the number of interactions.

All tests have been conducted using RecBole and RecBole-GNN.

💎 Results

📈 Performance

After hyperparameter and epoch tuning, a Top-10 recommendation was applied to all users on each dataset, using a random order split (70/10/20) and a temporal order split with leave-5-out (5/5) for the validation and test sets. The following tables present the mean performance across all datasets for each model, ranked in descending order by NDCG(@10). The best values are bolded, while the second-highest values are underlined.

RO (70/10/20)

Algorithm	Pre	MRR	NDCG	IC	ARP	APLT
ALS-MF	0.152966	0.329359	0.198749	0.075893	57.758945	0.000234
XSimGCL	0.150149	0.329086	0.194740	0.120137	51.662866	0.010322
AsymUserkNN	0.145127	0.328943	0.190405	0.108999	74.511185	0.015634
SGL	0.147338	0.319677	0.190309	0.144644	44.765344	0.011939
BPR	0.138416	0.310411	0.178590	0.080508	73.938899	0.001084
LightGCN	0.134315	0.302034	0.173993	0.115027	60.163477	0.004301
UltraGCN	0.133322	0.292251	0.172359	0.084255	70.992891	0.001501
AsymItemkNN	0.124047	0.193475	0.137902	0.130814	46.564064	0.051082
MostPop	0.039420	0.096802	0.048425	0.001969	120.488178	0.000000

Boxplots of the performance of test runs with RO.

TO (5/5)

Algorithm	Pre	MRR	NDCG	IC	ARP	APLT
ALS-MF	0.032978	0.079593	0.065587	0.089447	71.046930	0.000184
UltraGCN	0.031105	0.079218	0.061834	0.068403	96.395474	0.000218
XSimGCL	0.030875	0.077371	0.061724	0.098007	71.387480	0.006263
AsymUserkNN	0.030218	0.074655	0.190405	0.105359	96.254631	0.009973

Boxplots of the performance of test runs with RO.

💡 Influence Analysis

Different characteristics $(X_c)$ have been investigated for all models regarding their influence $(\beta_c)$ on certain target metrics $(y)$. Therefore, a linear regression model has been applied as follows:

$$y=\beta_0 + \beta_c X_c + e$$

where:

$X_c \equiv {SpaceSize, Shape, Density, Gini_U, Gini_I, AvgDeg_U, AvgDeg_I, AvgClustC_U, AvgClustC_I, Assort_U, Assort_I}$
$y \equiv {NDCG@10, IC@10, ARP@10}$

The model was tested under the following null hypothesis:

$$H_0: \beta_c = 0, \quad H_1: \beta_c \neq 0$$

The values of $\beta_c$ are represented by the bar length, while the $p$-value is indicated by the color.

Influence dataset characteristics for XSimGCL

Furthermore, for each run, the number of interactions, average clustering coefficients, assortativity, and the average popularity of the interacted items have been recorded for the $10$ best and worst users who received recommendations.

Characteristics of the $10$ users who received the best and worst recommendations for XSimGCL.

🔍 Project Structure

assets: Stores material for the README.md files.
data: Location for the recommendation dataset.
- mids-100000: The first 100,000 rows of the MIDS dataset.
- mids-raw: The raw dataset to be processed in 1-DataPreparation to generate splits.
- mids-splits: Storage for data splits used in the evaluation pipeline (output of DataPreparation.ipynb).
src: Contains all steps performed as described in the thesis (see README.md)
- DataPreparation: Creates dataset splits and calculates traditional & topological metrics.
- HyperParameterTuning: Tunes and evaluates all model hyperparameters on the mids-100000-1 split.
- EpochTuning: Determines the optimal number of epochs on 10 randomly drawn datasets.
- TestRuns: Conducts tests using random order and temporal order splits.
- Evaluation: Builds evaluation files, performs evaluation, and conducts significance tests.
- AdditionalMaterial: Contains additional plots referenced in the thesis.
- assets: Stores generated plots and statistics.
- config: Stores config_files, constants such as Colors and Paths, and methods used in many other directories.
- README.md: Further information about the source code itself.
test:
- hello.py: says hello
.gitignore: Specifies files to be excluded from the repository.
.python-version: Defines the explicit Python version used (for uv).
pyproject.toml: Lists dependencies required to run this project (for uv).
requirements.txt: Lists dependencies required to run this project (for pip).
quick_start.py: Provides a quick-start interface to access RecBole and RecBole-GNN for running models.
quick_start.yaml: Configuration file for the quick-start setup in run.py.
QuickEvaluation.ipynb: Provides a quick-start interface to access the results.
uv.lock: Contains locked versions of all dependencies in this project (for uv sync --frozen).

Quick Start

After a successful setup, the quick_start.py and QuickEvaluation.ipynb provide a quick way to use RecBole, RecBole-GNN, and load the results of this work.

via notebook

The QuickEvaluation.ipynb offers an quick view into the results of this work by loading the final evaluation dataset and create diverse tables, plots and perform the statistical analysis with plots.

via script

The quick_start.py script allows to run any model provided through RecBole and RecBole-GNN on the datasets. All configurations, including filtering, train/test splitting, and other settings, can be adjusted in quick_start.yaml.
For more details, refer to the RecBole configuration introduction.

The best model hyperparameter settings are listed at the bottom of the quick_start.yaml file.
Additionally, specific configuration files can be accessed through the config directory.

In quick_start.py, you can modify the following lines to select the desired dataset and models:

model = '<Model>'
config_files = str(CONFIG_DIRECTORY.joinpath('<config_file>.yaml'))
dataset = '<Dataset>'
config_dict = {
    'data_path': PROJECT_DIRECTORY.joinpath('<Path_to_Dataset>')
}

Possible values:

Model ['AsymKNN', 'LightGCN', 'UltraGCN', 'ALS', 'BPR', 'SGL', 'XSimGCL', 'Pop']
config_file ['quick_start', user_asym', 'item_asym', 'lightgcn', 'ultragcn', 'als', 'bpr', 'sgl', 'xsimgcl', 'mostpop']
Dataset ['mids-100000', 'mids-raw', 'mids-splits-i']
Path_to_Dataset ['', 'mids-splits]

For dataset="mids-splits-i", where $i \in {1, \dots, 176}$, the splits must be created, and the correct path to these datasets must be specified.

Path_to_Dataset == 'mids-splits'

For Model="AsymKNN", one of the following must be set in quick_start.yaml:

knn_method: ['item', 'user']

For config_file="quick_start", adjust the config_files path:

config_files = str(PROJECT_DIRECTORY.joinpath('quick_start.yaml'))

Setup

Setup with uv (recommended)

Install uv (https://github.com/astral-sh/uv).

Ensure proper python version 3.12.x, if not:

 uv python install 3.12.5
 uv python pin 3.12.5

Install Packages

uv sync --frozen --extra build
uv sync --frozen --extra build --extra compile

Setup with uv pip

Install uv (https://github.com/astral-sh/uv)

Ensure proper python version 3.12.x, if not:

 uv python install 3.12.5
 uv python pin 3.12.5

Create virtual environment:
```
 uv venv
```
Activate virtual environment:

on Mac / Linux:
```
  source .venv/bin/activate
```
on Windows:
```
  .venv\Scripts\activate
```

Install Packages
```
 uv pip install -r requirements.txt
```

Setup with pip

Use python version 3.12.5
Create virtual environment in project directory:
```
python3 -m venv .venv
```
Activate virtual environment:

on Mac / Linux:
```
  source .venv/bin/activate
```
on Windows:
```
  .venv\Scripts\activate
```

Upgrade pip
```
pip3 install --upgrade pip
```
Install Packages
```
 pip install -r requirements.txt
```

Download and store the Dataset

only necessary for creating the datasplits, for evaluation

The dataset can be found in the Google Drive. Store the containing files as follows:

mids_RAW_ANONYMIZED.txt -> data/mids-raw

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thesis-GNN-Rec-2025

🕵️ Introduction

💾 Dataset

⚙️ Methodology

💎 Results

📈 Performance

RO (70/10/20)

TO (5/5)

💡 Influence Analysis

🔍 Project Structure

Quick Start

via notebook

via script

Setup

Setup with uv (recommended)

Setup with uv pip

Setup with pip

Download and store the Dataset

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
data		data
src		src
test		test
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
QuickEvaluation.ipynb		QuickEvaluation.ipynb
README.md		README.md
pyproject.toml		pyproject.toml
quick_start.py		quick_start.py
quick_start.yaml		quick_start.yaml
requirements.txt		requirements.txt
uv.lock		uv.lock

mkhe93/Thesis-GNN-Rec-2025

Folders and files

Latest commit

History

Repository files navigation

Thesis-GNN-Rec-2025

🕵️ Introduction

💾 Dataset

⚙️ Methodology

💎 Results

📈 Performance

RO (70/10/20)

TO (5/5)

💡 Influence Analysis

🔍 Project Structure

Quick Start

via notebook

via script

Setup

Setup with uv (recommended)

Setup with uv pip

Setup with pip

Download and store the Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages