Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include dsb step #13

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## [1.0.6] - 2024-11-XX

### Added
- include as default `dsb` ambient protein background correction for ADT counts, for experiments without hashing deconvolution

## [1.0.5] - 2024-10-09
- removed `SeqRunName` variable, as it is not needed by cellranger anymore
Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,15 @@ Note: Snakemake needs to access the internet for this set up. With Snakemake 7.1

### Dependencies

Most of the software used in the default workflow can be installed in an automated fashion using Snakemake's [--use-conda](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management) functionality when running the pipeline.
In case you would like to start from the raw sequencing data using cellranger processing, the following software needs to be installed manually.
Most of the software used in the default workflow can be installed in an automated fashion using Snakemake's [--use-conda](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#integrated-package-management) functionality when running the pipeline. Exceptions that must be downloaded or installed manually are listed below.


- [Cellranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger): Follow the instructions on the 10xGenomics installation support page to install cellranger and to include the cellranger binary to your path.
Webpage: [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/installation](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/installation)

- `dsb` CRAN package: is not available on Conda. Therefore, a singularity image `dsb_r-base_4.2.3.sif` containg `dsb` and all other R packages necessary for this step is available for download [here](https://depot.nexus.ethz.ch/software/multi_tool_container_stack/dsb_r-base_4.2.3.sif). Please download the image and then provide the path in the config section "dsb_normalize_adt" -> "singularity".


## Before running the pipeline

Before the pipeline can be run make sure that
Expand Down
2 changes: 2 additions & 0 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ tools:
# The number of PCA dimensions cannot be larger than the number of ADTs in the experiment, otherwise the function (and the script) fails.
number_pca_adt: 20

dsb_normalize_adt:
singularity: "/path/to/dsb_r-base_4.2.3.sif"
scampi:
# scampi is a snakemake workflow that runs general scRNA processing steps
# scampi is used as a snakemake module inside of gExcite & therefore does not need to be installed separately
Expand Down
Binary file modified images/gExcite_pipeline_rulegraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
464 changes: 217 additions & 247 deletions images/gExcite_pipeline_rulegraph.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/gExcite_pipeline_rulegraph_no_hashing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
274 changes: 140 additions & 134 deletions images/gExcite_pipeline_rulegraph_no_hashing.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 10 additions & 1 deletion workflow/Snakefile_no_hashing.smk
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ module scampi:
snakefile:
github(
"ETH-NEXUS/scAmpi_single_cell_RNA",
# Working offline: Have the scAmpi pipeline `scAmpi_single_cell_RNA` checked out next to `gExcite_pipeline`
# gitfile(
# "file:///../../scAmpi_single_cell_RNA/",
path="workflow/snakefile_basic.smk",
tag="v2.0.7",
)
Expand All @@ -33,6 +36,8 @@ module scampi:
## Include the preprocessing rules
include: "rules/gex_cellranger_no_hashing.smk"
include: "rules/adt_cellranger_no_hashing.smk"
## Include rule for dsb normalization of ADT counts
include: "rules/adt_dsb_normalization.smk"
## Include scampi
include: "rules/scampi_module.smk"
## Include citseq rules
Expand Down Expand Up @@ -63,6 +68,10 @@ rule gExcite:
"results/cellranger_adt/{sample}.matrix.mtx",
sample=getSimpleSampleNames(),
),
expand(
"results/dsb_normalize_adt/{sample}.dsb_normalize_adt.RDS",
sample=getSimpleSampleNames(),
),
# List of final files from scampi
expand(
"results/counts_raw/{sample}.h5",
Expand Down Expand Up @@ -98,7 +107,7 @@ rule gExcite:
),
# List of final files from citeseq analysis
expand(
"results/citeseq_analysis/{sample}/{sample}.GEX_cellrangerADT_SCE.RDS",
"results/citeseq_analysis/{sample}/{sample}.GEX_cellrangerADT_SCE.dsb.RDS",
sample=getSimpleSampleNames(),
),
output:
Expand Down
45 changes: 44 additions & 1 deletion workflow/rules/adt_analyse_citeseq.smk
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ rule create_initial_threshold_file:


# Rule to Analyse ADT Data in combination with GEXdata
# Input: RDS and h5 file from single sample GEX run, threshold file as generated by rule create_initial_threshold_file, Cellranger adt file as generated from rule Rscript_demultiplex_count_matrix
# Input: RDS and h5 file from single sample GEX run, threshold file as generated by rule create_initial_threshold_file,
# Cellranger adt file as generated from rule Rscript_demultiplex_count_matrix
# Output: One folder under /results/citeseq_analysis/{sample} containing lots of diagnostic plots and an RDS file containing all the data.
rule Rscript_analyse_citeseq:
input:
Expand Down Expand Up @@ -62,3 +63,45 @@ rule Rscript_analyse_citeseq:
"--number_variable_genes {params.numberVariableGenes} "
"--number_pca_adt {params.number_pca_adt} "
"--output {params.outdir} &> {log} "


# Rule to Analyse ADT Data in combination with GEXdata
# Input: RDS and h5 file from single sample GEX run,
# Cellranger adt matrix after normalization with dsb
# Output: One folder under /results/citeseq_analysis/{sample} containing lots of diagnostic plots and an RDS file containing all the data.
rule Rscript_analyse_citeseq_dsb:
input:
RDS="results/atypical_removed/{sample}.atypical_removed.RDS",
cellrangerADT="results/dsb_normalize_adt/{sample}.dsb_normalize_adt.RDS",
h5="results/counts_corrected/{sample}.corrected.variable_genes.h5",
output:
RDS="results/citeseq_analysis/{sample}/{sample}.GEX_cellrangerADT_SCE.dsb.RDS",
conda:
"../envs/adt_analyse_citeseq.yaml"
params:
colorConfig=config["scampi"]["resources"]["colour_config"],
lookup=config["resources"]["adt_lookup"],
numberVariableGenes=config["tools"]["analyse_citeseq"]["numberVariableGenes"],
number_pca_adt=config["tools"]["analyse_citeseq"]["number_pca_adt"],
outdir="results/citeseq_analysis/{sample}/",
custom_script=workflow.source_path("../scripts/analyse_citeseq_dsb.R"),
log:
"logs/Rscript_analyse_citeseq_dsb/{sample}.log",
benchmark:
"results/citeseq_analysis/benchmark/{sample}.benchmark"
resources:
mem_mb=config["computingResources"]["mem_mb"]["medium"],
runtime=config["computingResources"]["runtime"]["medium"],
threads: config["computingResources"]["threads"]["medium"]
shell:
"Rscript {params.custom_script} "
"--RDS {input.RDS} "
"--cellrangerADT {input.cellrangerADT} "
"--h5 {input.h5} "
"--colorConfig {params.colorConfig} "
"--lookup {params.lookup} "
"--threads {threads} "
"--sampleName {wildcards.sample} "
"--number_variable_genes {params.numberVariableGenes} "
"--number_pca_adt {params.number_pca_adt} "
"--output {params.outdir} &> {log} "
6 changes: 3 additions & 3 deletions workflow/rules/adt_cellranger.smk
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,9 @@ rule cellranger_count_adt:
"{params.variousParams} "
"{params.targetCells}) "
"&> {log} "
"&& gunzip -c {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/features.tsv.gz > {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/features.tsv ; "
"gunzip -c {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/barcodes.tsv.gz > {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/barcodes.tsv ; "
"gunzip -c {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/matrix.mtx.gz > {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/matrix.mtx "
"&& gzip -dk {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/features.tsv.gz ; "
"gzip -dk {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/barcodes.tsv.gz ; "
"gzip -dk {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/matrix.mtx.gz "
"&& ln -rs '{params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/features.tsv' '{output.features_file}' ; "
"ln -rs '{params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/matrix.mtx' '{output.matrix_file}' ; "
"ln -rs '{params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/barcodes.tsv' '{output.barcodes_file}' ; "
Expand Down
6 changes: 3 additions & 3 deletions workflow/rules/adt_cellranger_no_hashing.smk
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,9 @@ rule cellranger_count_adt:
"{params.variousParams} "
"{params.targetCells}) "
"&> {log} "
"&& gunzip -c {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/features.tsv.gz > {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/features.tsv ; "
"gunzip -c {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/barcodes.tsv.gz > {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/barcodes.tsv ; "
"gunzip -c {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/matrix.mtx.gz > {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/matrix.mtx "
"&& gzip -dk {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/features.tsv.gz ; "
"gzip -dk {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/barcodes.tsv.gz ; "
"gzip -dk {params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/matrix.mtx.gz "
"&& ln -rs '{params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/features.tsv' '{output.features_file}' ; "
"ln -rs '{params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/matrix.mtx' '{output.matrix_file}' ; "
"ln -rs '{params.cr_out}{params.sample}/outs/filtered_feature_bc_matrix/barcodes.tsv' '{output.barcodes_file}' ; "
Expand Down
35 changes: 35 additions & 0 deletions workflow/rules/adt_dsb_normalization.smk
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import os

WORKDIR = os.getcwd()


# Rule normalizing the ADT counts using R package "dsb"
# Input is cellranger ADT output
# Output is SCE object in RDS file
rule dsb_normalize_adt:
input:
CellrangerADT="results/cellranger_adt/{sample}.matrix.mtx",
output:
normalized="results/dsb_normalize_adt/{sample}.dsb_normalize_adt.RDS",
singularity:
config["tools"]["dsb_normalize_adt"]["singularity"]
params:
adt_raw_dir=WORKDIR
+ "/results/cellranger_adt/{sample}/outs/raw_feature_bc_matrix/",
adt_filtered_dir=WORKDIR
+ "/results/cellranger_adt/{sample}/outs/filtered_feature_bc_matrix/",
outdir="results/dsb_normalize_adt/",
threads: config["computingResources"]["threads"]["high"]
log:
"logs/dsb_normalize_adt/{sample}.log",
resources:
mem_mb=config["computingResources"]["mem_mb"]["high"],
runtime=config["computingResources"]["runtime"]["high"],
benchmark:
"results/dsb_normalize_adt/benchmark/{sample}.benchmark"
shell:
"Rscript workflow/scripts/dsb_normalize_adt.R "
"--adt_raw_dir {params.adt_raw_dir} "
"--adt_filtered_dir {params.adt_filtered_dir} "
"--outdir {params.outdir} "
"--sample {wildcards.sample} "
6 changes: 3 additions & 3 deletions workflow/rules/gex_cellranger.smk
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ rule cellranger_count_gex:
"--localcores={threads} "
"{params.variousParams}) "
"&> {log} ; "
"gunzip {params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/features.tsv.gz ; "
"gunzip {params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/barcodes.tsv.gz ; "
"gunzip {params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/matrix.mtx.gz ; "
"gzip -dk {params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/features.tsv.gz ; "
"gzip -dk {params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/barcodes.tsv.gz ; "
"gzip -dk {params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/matrix.mtx.gz ; "
"ln -frs '{params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/features.tsv' '{output.features_file}' ; "
"ln -frs '{params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/matrix.mtx' '{output.matrix_file}' ; "
"ln -frs '{params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/barcodes.tsv' '{output.barcodes_file}' ; "
Expand Down
6 changes: 3 additions & 3 deletions workflow/rules/gex_cellranger_no_hashing.smk
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@ rule cellranger_count_gex:
"--localcores={threads} "
"{params.variousParams}) "
"&> {log} ; "
"gunzip {params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/features.tsv.gz ; "
"gunzip {params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/barcodes.tsv.gz ; "
"gunzip {params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/matrix.mtx.gz ; "
"gzip -dk {params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/features.tsv.gz ; "
"gzip -dk {params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/barcodes.tsv.gz ; "
"gzip -dk {params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/matrix.mtx.gz ; "
"ln -frs '{params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/features.tsv' '{output.features_file}' ; "
"ln -frs '{params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/matrix.mtx' '{output.matrix_file}' ; "
"ln -frs '{params.cr_out}{params.mySample}/outs/filtered_feature_bc_matrix/barcodes.tsv' '{output.barcodes_file}' ; "
Expand Down
Loading
Loading