This repository contains an implementation of a VGAE (variational graph auto-encoder) connected to a FNN (feedforward neural network) for link prediction.
This work is inspired by the model presented in this paper. Actually the first goal is to reproduce the results presented in the paper and then try to study the validity of the model with different types of datasets.
Up to now there's a massive lack of documentation that will be added soon, hopefully.
This model is implemented in PyTorch with PyTorch Geometric.
From terminal go into an empty folder and clone this repository:
git clone https://github.com/TommyGiak/VGAE_FNN.git
- python
- pytorch
- torch_geometric
- matplotlib (quite common)
- numpy (even more common)
Move into the cloned folder from the terminal and run the main file:
python main.py
The dataset to use can be choosed in the main file. The default epochs for the VGAE and FNN training may not be enough and can be changed in the main file.
Up to now I tried with some different dataset to understand the generalization of this model in different systems.
The best results are obtained with the biological protein-protein interactions, while for the citation papers datasets and for the Twitch dataset the results are just decent: the predictions for the link are not to bad in general but there exist other models that can outperform this scripts.
I performed a 'long' training in Google Colab with the GPU runtime using the HPRD (Human Protein Reference Database) dataset, which is also the biggest, for more details on this dataset look at the paper cited above.
The training involved 100k epochs for the VGAE (which are not so useful) and 300k epochs for the FNN. The computational time took 495s for the VGAE and 825s for the FNN. According to the parameters used in the paper, the results are:
- accuracy: 0.9731
- sensitivity: 0.9729
- specificity: 0.9741
- precision: 0.9918
- f-score: 0.9822