Skip to content

Semi-supervised learning and inference for sparsely labelled graphs

License

Notifications You must be signed in to change notification settings

northeastern-datalab/factorized-graphs

Repository files navigation

Factorized Graph Representations for Semi-supervised Learning from Sparse Labels

SIGMOD Paper Python 3.6 License

This library provides various Python modules and scripts to perform semi-supervised learning with heterophily (SSLH). It includes methods to perform label propagation with linearized belief propagation and to estimate class-to-class compatibilities from very sparsely labeled graphs, extending an earlier release of SSLH and prior ideas. Also included are code and experimental traces to reproduce the experiments from our SIGMOD 2020 paper: Factorized Graph Representations for Semi-supervised Learning from Sparse Labels

Overview of SSLH: Given a partially labeled graph and a class-to-class compatibility matrix, linearized belief propagation (LinBP) performs a generalized form of label propagation to label the remaining nodes. Distant compatibility estimation (DCE) performs the same function but does not require the compatibility matrix as input. For quick understanding of the approach, please also see the video presented at SIGMOD 2020:

Watch the video

Dependencies

Dependencies can be installed using requirements.txt.

Project structure

  • experiments_sigmod20/ folder containing scripts and notebooks for recreating figures from the paper
    • datacache/ folder containing traces from experiments saved as CSV
    • figs/ folder in which code places figures from experiments
    • realData/ place real data sets into this folder before running experiments
    • ... various modules that perform varous experiments
    • Figures_realdata_sigmod20.ipynb Notebook that plots all figures for experiments on 8 real data sets
    • Figures_syntheticdata_sigmod20.ipynb Start here: Notebook that plots all other figures in the paper
  • sslh/ folder containing modules with main functions
    • estimation.py module containing main functions for parameter estimation
    • fileInteraction.py module containing functions for loading and saving experimental results
    • graphGenerator.py module containing synthetic graph generator with planted graph properties
    • inference.py module containing main propagation methods for linearized belief propagation
    • utils.py module containing various helper functions
    • visualize.py helper function to plot figures
  • test_sslh/ folder with unit tests for modules and functions in sslh/

Real data sets

A copy of the 8 real datasets we used in our experiments is available in the form of 16 CSV files totaling 1.2GB on Google Drive. To run the experiments, place them into the folder experiments_sigmod20/realData/, then run the respective methods in experiments_sigmod20/.

Usage

  1. For examples on the usage of the various methods, please see the test_sslh directory in the source tree.
  2. /reproducibility.md contains a detailed description to reproduce the experimental results reported in the paper (as submitted to the ACM SIGMOD 2021 Reproducibility).

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Citation

If you use this code in your work, please cite:

@inproceedings{DBLP:conf/sigmod/PLG20,
  author    = {Krishna Kumar P. and Paul Langton and Wolfgang Gatterbauer},
  title     = {Factorized Graph Representations for Semi-Supervised Learning from Sparse Data},
  booktitle = {International Conference on Management of Data (SIGMOD)},
  pages     = {1383--1398},
  publisher = {{ACM}},
  year      = {2020},
  url       = {https://doi.org/10.1145/3318464.3380577},
}

Contributors

For any clarification, comments, or suggestions on the main methods in sslh/ please create an issue or contact Wolfgang. For any questions on the scripts in experiments_sigmod/ and reproducability of the experiments, please contact Paul and Krishna.