Clone the repository
git clone [email protected]:Melbourne-COVID-Predict/jascap.git
Enter in the jascap directory within R and activate renv, for reproducibility
module load R/4.2.1
open R in the terminal:
install.packages("renv")
renv::activate()
renv::restore()
Install makeflow
# Load miniconda
module load miniconda3
# run this command once
$ conda create -n cctools-env -y -c conda-forge --strict-channel-priority python ndcctools
# run this command every time you want to use cctools
$ conda activate cctools-env
Create input and result directories
# Define dataset directory, for example test_pipeline
master_directory=test_jacap
mkdir $master_directory
result_directory=$master_directory/results
reports_directory=$master_directory/results/reports
mkdir $result_directory
input_directory=$master_directory/input
mkdir $input_directory
# The R directory in the same location where you cloned jascap
code_directory=~/PostDoc/jascap
# This is the location of the metadata, which should match by `sample` with the column in the seurat objects. The metadata file needs to be in a different directory than the input file.
metadata_path=~/PostDoc/covid19pbmc/data/3_prime_batch_1/metadata.rds
# This is in a shared location
reference_azimuth_path=/stornext/Bioinf/data/bioinf-data/Papenfuss_lab/projects/reference_azimuth.rds
Note: If no
metadata.rds
is available: create a data frame with 2 columns (sample | batch) with thesamples
matching theSAMPLE_NAME
in the input file, >as described below. Setbatch
as Sample_name.
sample | batch |
---|---|
spleen | spleen |
liver | liver |
Add input files in the input directory.
- Each input file should be a seurat object
- Each input file should just include row counts in the
RNA
assay - Each input file should include only one sample
- Each input file should be named
SAMPLE_NAME
.rds - All
SAMPLE_NAME
should be unique for each sample (if a sample has different time points theSAMPLE_NAME
should be made unique, for exampleSAMPLE_NAME_TIME_POINT
)
You should copy those files in your input directory, for example
cp myfiles/* $input_directory
Execute pipeline using makeflow
The available modalities are
- preprocessing (filtering and annotation without integration)
- fast (preprocessing + differential transcription, tissue composition, and cell communication analyses)
- complete (still not ready)
The available tissue annotation are
- pbmc (preprocessing based on Seurat Azimuth pbmc annotation)
- solid (preprocessing based on SingleR blueprint annotation)
- atypical (preprocessing based on unsupervised clustering)
# Create $input_directory/pipeline.makeflow
#
# The pipeline has 4 modes (first argument of create_pipeline_makefile.R)
# preprocessing, fast_pipeline, slow_pipeline, complete
Rscript $code_directory/R_scripts/create_pipeline_makefile.R complete pbmc unfiltered $result_directory $reports_directory $input_directory $code_directory $metadata_path $reference_azimuth_path
# Execute makeflow
conda activate cctools-env
makeflow -J 200 -T slurm $result_directory/pipeline.makeflow
Monitor progress
makeflow_monitor $result_directory/pipeline.makeflow.makeflowlog