Skip to content

Supplementary code and data for the paper "Role Detection in Bicycle-Sharing Networks Using Multilayer Stochastic Block Models"

Notifications You must be signed in to change notification settings

jcarlen/tdsbm_supplementary_material

Repository files navigation

Description of the contents of tdsbm_supplementary_material:

The material in this folder is hosted at: https://github.com/jcarlen/tdsbm_supplementary_material

  1. tdsbm_data_plots.R
  • This is the main replication script of the paper. It contains all code to clean the data as described in the paper, estimate model parameters and create all plots in the paper. (It has commented out code to call the TDMM-SBM estimation (in Python) directly from R, but this code can also be accessed directly in the mixed_model_implementation_python folder.)

  • At beginning of the script, set your working directory as the path to your tdsbm_supplementary_material folder by modifying the line:

    setwd("~/Documents/tdsbm_supplementary_material")

  • One way to run this script in the terminal is to navigate to the tdsbm_supplementary_material folder and then (assuming you have R installed) enter

> Rscript tdsbm_data_plots.R

  • Output from the discrete (TDD-SBM) models run by this script is stored in the discrete_model_results_folder.
    • If models are very time consuming to run (for New York City), code to run the model has been commented out and pre-run model results are loaded instead from the discrete_model_results folder.
  1. IMG
  • Destination folder for saved plots generated by tdsbm_data_plots.R. (This includes more plots than in the paper.)
  1. data
  • LA, SF, NY - city specific trip histories and (for LA and SF) station data tables.

  • maps - Background maps for some plots. Code to generate the maps is in the replication script, but commented out because it requires an individual google API key.

  • zoning - Shape files for LA and New York City zoning maps used as background for some figures.

  • cleaned - This is a folder where cleaned versions of the data sets will be stored once created in the tdsbm_data_plots.R script. Cleaned data sets without weekend trips for LA, SF and our Manhattan subnetwork of New York City are given as examples, but additional data sets will be populated by running the tdsbm_data_plots.R script. Note the complete set of cleaned data sets is about 400mb. Also note that the python replication script for the TDMM-SBM models, tdmm_sbm_model_replication.py, uses the cleaned data (without weekends) as input, so it's recommended to run the tdsbm_data_plots.R scripts before tdmm_sbm_model_replication.py.

  1. mixed_model_implementation_python
  • tdmm_sbm_model_replication.py - This contains all the code to estimate parameters of the TDMM-SBM models referenced in the paper.

  • For convenience the code in this file is also saved in files for each city in the paper: LosAngeles.py, Manhattan.py, NewYork.py, and SanFrancisco.py.

    • Output from the mixed-membership (TDMM-SBM) models run in this script is stored in the mixed_model_results_folder.
    • *.png files are produced for visualizing the models and are saved in city-specific folders of an "img" folder that is created in mixed_model_implementation_python.
  • The scripts are set up to be run from the mixed_model_implementation_python folder.

  • lib contains the Python package of functions to estimate parameters of the TDMM-SBM and calculate likelihood of a given model as described in the paper.

  1. discrete_model_results
  • Various output from TDD-SBM models labeled by city (note that ny_hm refers to the Manhattan subnetwork of the New York City network) and number of blocks, which is the first number in each name. The second number in each name, 3, indicates that the type of degree correction applied was the type described in the paper. Results with names ending in _T indicated time-aggregated data (i.e., the results of a time-independent SBM). All objects in this folder are in R data format (.RDS), which can be loaded into R or Python.
  1. mixed_model_results
  • Output from the TDMM-SBM models run by the tdmm_sbm_model_replication.py script. Each model output is listed by city (note that Manhattan refers to the Manhattan subnetwork of the New York City network) and number of blocks. Each model run stores both a "role" and "omega" object. The role object contains the estimated C parameters for the model, listed with row names equal to the corresponding station ID. The omega object is a (K*K) x T matrix of the time-dependent block-to-block activity parameters, where the K x K matrix of block-to-block parameters for an individual time step has been converted to a column (of length K*K) of the matrix.
  1. rstan_mixed_implementation
  • Code in R which calls the Stan language to perform an alternate (Hamiltonian MCMC) estimation of our TDMM-SBM models. This is meant as a check on the output of our gradient-descent mixed-model estimation implementation in Python.

  • The tdmm_sbm_stan.R script implements this estimation method for several two-block TDMM-SBM models. The script uses data objects created in the main replication script, tdsbm_data_plots.R.

  • The tdmm_sbm.stan script is called by tdmm_sbm_stan.R, and does not need to be accessed directly.

  • stan_results contains the estimated omega and C parameters for two-block TDMM-SBM of LA, SF, and the Manhattan subnetwork of New York City. (We did not complete estimation in R/Stan of the entire NYC network with an adequate sample size.)

  1. sim_study
  • The simulation_study.R script contains code to generate synthetic networks using our TDD-SBM and TDMM-SBM and fit those networks with several SBM variants, including discrete- and mixed-membership models and non-degree corrected models.

  • The sbmt_london_cycles.R script contains code to fit our TDD-SBM to the London cycles data set used in "A semiparametric extension of the stochastic block model for longitudinal networks" by Matias et al. (the paper introducing their PPSBM).

  • Output from the above two scripts is contained in the output folder. (Those scripts take a while to run.)

About

Supplementary code and data for the paper "Role Detection in Bicycle-Sharing Networks Using Multilayer Stochastic Block Models"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published