Note: This pipeline is currently a work in progress.
This is the automated portion of the ENCODE single-cell/single-nucleus ATAC-Seq pipeline.
Information on the specific analysis steps can be found in the pipeline specification document.
- A Linux-based OS
- A conda-based Python 3 installation
- Snakemake v6.6.1+ (full installation)
- An ENCODE DCC account with access to the necessary datasets
Additional requirements for cloud execution:
- Kubectl
- A cloud provider CLI for Kubernetes cluster creation
- A cloud provider CLI for remote storage (if different from above)
All other dependencies are handled by the pipeline itself
- Install any necessary requirements above
- Download the pipeline
git clone https://github.com/kundajelab/ENCODE_scatac
- Activate the
snakemake
conda environment:conda activate snakemake
- Configure the pipeline in the
/config
directory. Detailed information can be found here. - Run the pipeline:
Here,
snakemake -k --use-conda --cores $NCORES
$NCORES
is the number of cores to utilize
Note: When run for the first time, the pipeline will take some time to install conda packages.
- Install and configure the pipeline as specified above
- Create a cloud cluster. Note that setup specifics may differ depending on the cloud provider. Example setup instructions for GCP and for Azure.
- Configure remote storage. Instructions for each provider can be found here. For our purpose, only the environment variables and command line configuration are needed.
- Run the pipeline:
Here:
snakemake -k --kubernetes --use-conda --default-remote-provider $REMOTE --default-remote-prefix $PREFIX --jobs $NJOBS --envvars $VARS
$REMOTE
is the cloud storage provider, and should be one of{S3,GS,FTP,SFTP,S3Mocked,gfal,gridftp,iRODS,AzBlob,XRootD}
$PREFIX
is the target bucket name or subfolder in storage$NJOBS
is the maximum number of jobs to be run in parallel$VARS
is a list of environment variables for accessing remote storage. The--envvars
flag can be omitted if no variables are required.
This pipeline has been tested locally and on the cloud via Kubernetes. However, Snakemake offers a number of additional execution modes.
Documentation on cluster execution
Documentation on cloud execution
Austin Wang
Primary developer
[email protected]
Surag Nair
Secondary developer and advisor
[email protected]
Ben Parks
Secondary developer and advisor
[email protected]
Laksshman Sundaram
Advisor
[email protected]
Caleb Lareau
Advisor
[email protected]
William Greenleaf
Supervisor
[email protected]
Anshul Kundaje
Supervisor
[email protected]