Skip to content

junhaobearxiong/graph_independence_test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Graph Independence Testing

This repository contains the code for running the experiments in the manuscript: Xiong, Junhao, et al. “Graph Independence Testing.” arXiv preprint arXiv: 1906.03661 (2019).

The manuscript is currently under major revision, so is the code, so you may not find the exact code to reproduce the figures in the manuscript. For some more updated results, you may consult the slides here.

Files

The core functionalities are in core.py, which contains functions and the necessary utilities to compute test statistic, p-value and power of naive pearson, gcorr (graph correlation) and gcorrDC (a DC-SBM version of gcorr). Note that gcorr is slightly modified from the test statistic in the manuscript, so it is an unbiased estimate of the actual correlation (rather than differ by a constant for SBM).

Simulations

simulations.py contains function to simulate $\rho$-correlated Bernoulli SBMs, $\rho$-correlated Bernoulli DC-SBMs and $\rho$-correlated Gaussian SBMs (based on graspologic implementation but are more general)

The following files correspond (roughly) to figures in the manuscript. Results can be viewed here.

  1. experiments/sim_teststats.py and plotting/plot_sim_test_statistic.py are used to generate Figure 1.
  2. experiments/sim_power.py and plotting/plot_sim_power.py are used to generate Figure 3 and 4.

Real data experiments

This directory currently contains the code to run experiment on the the following datasets:

  1. mouse: a dataset containing connectomes of 4 different species of mice. See some results here
  2. timeseries: a dataset containing the connectome of a single subject sequenced over many time points in time
  3. cpac200: a dataset with connectomes from different subjects.
  4. enron: a dataset where each graph represent email correspondence between subjects in a network.

To run experiments on the associated dataset, the standard workflow is as follow:

  1. Preprocess the raw dataset into a numpy.array with the following format: [# graphs, # vertices, # vertices]. You may need to write some code for this, but it should be straightforward using the functions available in data_utils/.
  2. (optional) Apply a transformation to the graphs using experiments/real_transfrom_data.py
  3. (optional) Estimate community assignments of the graphs using experiments/real_community_estimation.py, if the test statistics and p-value methods you are using required community assignments to be given.
  4. Run experiments/real_teststats_pval.py with the appropriate command-line arguments

Current limitations

Currently, simulation results look good, but the main problem is that we seem to have a big type I error inflation in the real data (the test rarely rejects the null, so we have very low p-values across the board, even when we don’t think there should be acutal dependence). One proposed fix is to use a DC-SBM based test, which seems to work in simulation when the generating models are DC-SBMs, but in real data, it still doesn’t seem to decrease the test statistic or results in a more reasonable p-values.

Also, the test statistics seem to reflect meaningful difference in some datasets (e.g. mouse), but not others (e.g. timeseries, cpac200). It is unclear whether this is because the signal is just not in those datasets, or the test is not powered enough to detect the signal, or due to some preprocessing choices (e.g. choosing the appropriate trimming values for DC-SBM).

Some attempts to address the aforementioned problems can be seen here.

Setup

To run code in this repository, first install Python 3.6. You can use pyenv to manage the Python versions on your machine.

Next, set up the local environment in the ./venv directory:

python -m venv ./venv

To activate the environment, type:

. venv/bin/activate

Then, install the requirements in the local environment:

pip install --upgrade pip
pip install -r requirements.txt

About

Independence testing between a pair of graphs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published