Skip to content

Sparse Signaling Pathway Sampling: MCMC for signaling pathway inference

License

Notifications You must be signed in to change notification settings

gitter-lab/ssps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

d686761 · Oct 24, 2022
Oct 19, 2022
Oct 19, 2022
Apr 27, 2020
Jan 30, 2020
Apr 27, 2020
Aug 7, 2020
Oct 19, 2022
Mar 5, 2021
May 29, 2020
Dec 2, 2019
Apr 23, 2020
Oct 24, 2022
Jun 8, 2020
Apr 27, 2020

Repository files navigation

Sparse Signaling Pathway Sampling

Test SSPS DOI

Code related to the manuscript Inferring signaling pathways with probabilistic programming (Merrell & Gitter, 2020) Bioinformatics, 36:Supplement_2, i822–i830.

This repository contains the following:

  • SSPS: A method that infers relationships between variables using time series data.
    • Modeling assumption: the time series data is generated by a Dynamic Bayesian Network (DBN).
    • Inference strategy: MCMC sampling over possible DBN structures.
    • Implementation: written in Julia, using the Gen probabilistic programming language
  • Analysis code:
    • simulation studies;
    • convergence analyses;
    • evaluation on experimental data;
    • a Snakefile for managing all of the analyses.

Installation and basic setup

(If you plan to reproduce all of the analyses, then make sure you're on a host with access to plenty of CPUs. Ideally, you would have access to a cluster of some sort.)

  1. Clone this repository
git clone [email protected]:gitter-lab/ssps.git
  1. Install Julia 1.6 (and all Julia dependencies)
    $ wget https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.7-linux-x86_64.tar.gz 
    $ tar -xvzf julia-1.6.7-linux-x86_64.tar.gz
    
    $ cd ssps/SSPS
    $ julia --project=. 
                   _
       _       _ _(_)_     |  Documentation: https://docs.julialang.org
      (_)     | (_) (_)    |
       _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
      | | | | | | |/ _` |  |
      | | |_| | | | (_| |  |  Version 1.6.7 (2022-07-19)
     _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
    |__/                   |
    
    julia> using Pkg
    julia> Pkg.instantiate()
    julia> exit()
    

Reproducing the analyses

In order to reproduce the analyses, you will need some extra bits of software.

  • We use Snakemake -- a python package -- to manage the analysis workflow.
  • We use some other python packages to postprocess the results, produce plots, etc.
  • Some of the baseline methods are implemented in R or MATLAB.

Hence, the analyses entail some extra setup:

  1. Install python dependencies (using conda)

    • For the purposes of these instructions, we assume you have Anaconda3 or Miniconda3 installed, and have access to the conda environment manager.
      (We recommend using Miniconda; find full installation instructions here.)
    • We recommend setting up a dedicated virtual environment for this project. The following will create a new environment named ssps and install the required python packages:
    $ conda create -n ssps -c conda-forge pandas matplotlib numpy bioconda::snakemake-minimal
    $ conda activate ssps
    (ssps) $
    
    • If you plan to reproduce the analyses on a cluster, then install cookiecutter and the complete version of snakemake
    (ssps) $ conda install -c conda-forge cookiecutter bioconda::snakemake
    

    and find the appropriate Snakemake profile from this list: https://github.com/Snakemake-Profiles/doc install the Snakemake profile using cookiecutter:

    (ssps) $ cookiecutter https://github.com/Snakemake-Profiles/htcondor.git
    

    replacing the example with the desired profile.

  2. Install R packages

  3. Check whether MATLAB is installed.

After completing this additional setup, we are ready to run the analyses.

  1. Make any necessary modifications to the configuration file: analysis_config.yaml. This file controls the space of hyperparameters and datasets explored in the analyses.
  2. Run the analyses using snakemake:
    • If you're running the analyses on your local host, simply move to the directory containing Snakefile and call snakemake.
    (ssps) $ cd ssps
    (ssps) $ snakemake
    
    • Since Julia is a dynamically compiled language, some time will be devoted to compilation when you run SSPS for the first time. You may see some warnings in stdout -- this is normal.
    • If you're running the analyses on a cluster, call snakemake with the same Snakemake profile you found here:
    (ssps) $ cd ssps
    (ssps) $ snakemake --profile YOUR_PROFILE_NAME
    
    (You will probably need to edit the job submission parameters in the profile's config.yaml file.)
  3. Relax. It will take tens of thousands of cpu-hours to run all of the analyses.

Running SSPS on your data

Follow these steps to run SSPS on your dataset. You will need

  • a CSV file (tab separated) containing your time series data
  • a CSV file (comma separated) containing your prior edge confidences.
  • Optional: a JSON file containing a list of variable names (i.e., node names).
  1. Install the python dependencies if you haven't already. Find detailed instructions above.
  2. cd to the run_ssps directory
  3. Configure the parameters in ssps_config.yaml as appropriate
  4. Run Snakemake: $ snakemake --cores 1. Increase 1 to increase the maximum number of CPU cores to be used.

A note about parallelism

SSPS allows two levels of parallelism: (1) at the Markov chain level and (2) at the iteration level.

  • Chain-level parallelism is provided via Snakemake. For example, Snakemake can run 4 chains simultaneously if you specify --cores 4 at the command line: $ snakemake --cores 4. In essence, this just creates 4 instances of SSPS that run simultaneously.
  • Iteration-level parallelism is provided by Julia's multi-threading features. The number of threads available to a SSPS instance is specified by an environment variable: JULIA_NUM_THREADS.
  • The total number of CPUs used by your SSPS jobs is the product of Snakemake's --cores parameter and Julia's JULIA_NUM_THREADS environment variable. Concretely: if we run snakemake --cores 2 and have JULIA_NUM_THREADS=4, then up to 8 CPUs may be used at one time by the SSPS jobs.

Licenses

SSPS is available under the MIT License, Copyright © 2020 David Merrell.

The MATLAB code dynamic_network_inference.m has been modified from the original version, Copyright © 2012 Steven Hill and Sach Mukherjee.

The dream-challenge data is described in Hill et al., 2016 and is originally from Synapse.