Merge branch 'dev' into nf-core-template-merge-3.0.2

nf-core · Oct 17, 2024 · 575e736 · 575e736
2 parents 2bfd1d7 + cc831f1
commit 575e736
Show file tree

Hide file tree

Showing 193 changed files with 37,820 additions and 349 deletions.
diff --git a/.editorconfig b/.editorconfig
@@ -25,6 +25,10 @@ insert_final_newline = unset
 trim_trailing_whitespace = unset
 indent_style = unset
 
+[/modules/local/report/create/app/dependencies/**]
+indent_size = unset
+trim_trailing_whitespace = unset
+
 [/assets/email*]
 indent_size = unset
 

diff --git a/.prettierignore b/.prettierignore
@@ -10,3 +10,4 @@ testing/
 testing*
 *.pyc
 bin/
+modules/local/report/create/app/dependencies/*
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,7 +3,7 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## v0.0.1dev - [date]
+## v0.0.1dev - [2024-10-17]
 
 Initial release of nf-core/tfactivity, created with the [nf-core](https://nf-co.re/) template.
 

diff --git a/CITATIONS.md b/CITATIONS.md
@@ -10,6 +10,22 @@
 
 ## Pipeline tools
 
+- [DeSeq2](https://doi.org/10.1186/s13059-014-0550-8)
+
+  > Love, M.I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014).
+
+- [STARE](https://doi.org/10.1093/bioinformatics/btad062)
+
+  > Dennis Hecker, Fatemeh Behjati Ardakani, Alexander Karollus, Julien Gagneur, Marcel H Schulz, The adapted Activity-By-Contact model for enhancer–gene assignment and its application to single-cell data, Bioinformatics, Volume 39, Issue 2, February 2023
+
+- [Bedtools](https://doi.org/10.1093%2Fbioinformatics%2Fbtq033)
+
+  > Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PMID: 20110278; PMCID: PMC2832824.
+
+- [GTFtools](https://doi.org/10.1093/bioinformatics/btac561)
+
+  > Hong-Dong Li, Cui-Xiang Lin, Jiantao Zheng, GTFtools: a software package for analyzing various features of gene models, Bioinformatics, Volume 38, Issue 20, 15 October 2022, Pages 4806–4808,
+
 ## Software packaging/containerisation tools
 
 - [Anaconda](https://anaconda.com)

diff --git a/README.md b/README.md
@@ -19,47 +19,83 @@
 
 ## Introduction
 
-**nf-core/tfactivity** is a bioinformatics pipeline that ...
+**nf-core/tfactivity** is a bioinformatics pipeline that can identify the most differentially active transcription factors (TFs) between multiple conditions. It takes a count matrix and open chromatin data (ATAC-seq, DNase-seq, HM-ChIP-seq) as input. It produces a ranking of transcription factors.
 
-<!-- TODO nf-core:
-   Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
-   major pipeline sections and the types of output it produces. You're giving an overview to someone new
-   to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
--->
+![Metro map](docs/images/metromap.png)
 
-<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
-     workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples.   -->
-<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
+1. Identify accessible regions (can perform footprinting between close ChIP-seq peaks or take ATAC-seq peaks)
+2. Calculate affinity scores for combinations of transcription factors and target genes (TGs) using [STARE](https://doi.org/10.1093/bioinformatics/btad062)
+3. Identify differentially expressed genes between conditions
+4. Utilize linear regression to identify the transcription factors that are most likely to be responsible for the differential gene expression
+5. Calculate the TF-TG score based on:
+   1. Differential expression of the target genes
+   2. Affinity of the transcription factors to the target genes
+   3. The regression coefficient of the transcription factors
+6. Perform a Mann-Whitney U test and create a ranking of the transcription factors
+
+A more biological visualization of the workflow can be found here:
+
+> [!NOTE]
+> The following image was created for the TF-Prioritizer publication. Parts of the workflow have been adapted for the nf-core pipeline, but the general idea is still valid.
+
+![TF-Prioritizer workflow](docs/images/tfprio.jpeg)
 
 ## Usage
 
 > [!NOTE]
 > If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
 
-<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
-     Explain what rows and columns represent. For instance (please edit as appropriate):
+The pipeline supports processing of previously called peaks from ATAC-seq, DNase-seq, or histone modification ChIP-seq data. The peaks can then either be used as-is or be subjected to footprinting analysis. Additionally, BAM files can be provided in a separate samplesheet, which will be used to predict enhancer regions.
 
-First, prepare a samplesheet with your input data that looks as follows:
+```csv title="samplesheet.csv"
+sample,condition,assay,peak_file
+condition1_H3K27ac_1,condition1,H3K27ac,condition1_H3K27ac_1.broadPeak
+condition1_H3K27ac_2,condition1,H3K27ac,condition1_H3K27ac_2.broadPeak
+condition1_H3K4me3,condition1,H3K4me3,condition1_H3K4me3.broadPeak
+condition2_H3K27ac,condition2,H3K27ac,condition2_H3K27ac.broadPeak
+condition3_H3K27ac,condition3,H3K27ac,condition3_H3K27ac.broadPeak
+condition3_H3K4me3,condition3,H3K4me3,condition3_H3K4me3.broadPeak
+```
 
-`samplesheet.csv`:
+Each row represents a peak file. The `sample` column should contain a unique identifier for each peak file. The `peak_file` column should contain the path to the peak file. Peak files need to be in a format that is compatible with the `bed` format. Only the first three columns of the `bed` format are used.
 
-```csv
-sample,fastq_1,fastq_2
-CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
+```csv title="samplesheet_bam.csv"
+sample,condition,assay,signal,control
+condition1_H3K27ac_1,condition1,H3K27ac,condition1_H3K27ac_1.bam,condition1_control.bam
+condition1_H3K27ac_2,condition1,H3K27ac,condition1_H3K27ac_2.bam,condition1_control.bam
+condition1_H3K4me3,condition1,H3K4me3,condition1_H3K4me3.bam,condition1_control.bam
+condition2_H3K27ac,condition2,H3K27ac,condition2_H3K27ac.bam,condition2_control.bam
+condition3_H3K27ac,condition3,H3K27ac,condition3_H3K27ac.bam,condition3_control.bam
+condition3_H3K4me3,condition3,H3K4me3,condition3_H3K4me3.bam,condition3_control.bam
 ```
 
-Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
+The first three columns are the same as in the peak file samplesheet. The `signal` column should contain the path to the signal BAM file. The `control` column should contain the path to the control BAM file.
 
--->
+Second, you need a raw count matrix (e.g. from [nf-core/rnaseq](https://nf-co.re/rnaseq)) with gene IDs as rows and samples as columns. You also need a design matrix that specifies the conditions of the samples in the count matrix. The design matrix should look as follows:
 
-Now, you can run the pipeline using:
+```csv title="design_matrix.csv"
+sample,condition
+sample1,condition1
+sample2,condition1
+sample3,condition2
+sample4,condition3
+```
 
-<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
+The `sample` column should match the columns in the expression matrix. The `condition` column is needs to match the `condition` column in the samplesheet. Additionally,batches can be added to the design matrix and will be considered in the differential expression analysis.
+
+:::tip
+There is an alternative way of providing expression values. Instead of providing a single count matrix for all samples, you can provide a gene list and one count file per sample. Details can be found in the [usage documentation](https://nf-co.re/tfactivity/usage).
+:::
+
+Now, you can run the pipeline using:
 
 ```bash
 nextflow run nf-core/tfactivity \
    -profile <docker/singularity/.../institute> \
    --input samplesheet.csv \
+   --genome GRCh38 \
+   --counts <EXPRESSION_MATRIX> \
+   --counts_design design_matrix.csv \
    --outdir <OUTDIR>
 ```
 
@@ -76,11 +112,12 @@ For more details about the output files and reports, please refer to the
 
 ## Credits
 
-nf-core/tfactivity was originally written by Nico Trummer.
+nf-core/tfactivity was originally written by [Nico Trummer](https://github.com/nictru).
 
 We thank the following people for their extensive assistance in the development of this pipeline:
 
-<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
+- [Markus Hoffmann](https://scholar.google.com/citations?user=_qXUS28AAAAJ) (project and scientific management)
+- [Leon Hafner](https://www.linkedin.com/in/leon-hafner/) (implementations)
 
 ## Contributions and Support
 
@@ -90,6 +127,12 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
 
 ## Citations
 
+> **TF-Prioritizer: a Java pipeline to prioritize condition-specific transcription factors**
+>
+> Markus Hoffmann, Nico Trummer, Leon Schwartz, Jakub Jankowski, Hye Kyung Lee, Lina-Liv Willruth, Olga Lazareva, Kevin Yuan, Nina Baumgarten, Florian Schmidt, Jan Baumbach, Marcel H Schulz, David B Blumenthal, Lothar Hennighausen & Markus List
+>
+> GigaScience, Volume 12, 2023, giad026, https://doi.org/10.1093/gigascience/giad026
+
 <!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
 <!-- If you use nf-core/tfactivity for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
-Original file line number
+Diff line change
@@ Expand Up / @@ -10,3 +10,4 @@ testing/ @@
     testing*
     *.pyc
     bin/
+    modules/local/report/create/app/dependencies/*