Skip to content

Commit

Permalink
Merge branch 'dev' into nf-core-template-merge-3.0.2
Browse files Browse the repository at this point in the history
  • Loading branch information
nictru committed Oct 17, 2024
2 parents 2bfd1d7 + cc831f1 commit 575e736
Show file tree
Hide file tree
Showing 193 changed files with 37,820 additions and 349 deletions.
4 changes: 4 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ insert_final_newline = unset
trim_trailing_whitespace = unset
indent_style = unset

[/modules/local/report/create/app/dependencies/**]
indent_size = unset
trim_trailing_whitespace = unset

[/assets/email*]
indent_size = unset

Expand Down
1 change: 1 addition & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ testing/
testing*
*.pyc
bin/
modules/local/report/create/app/dependencies/*
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v0.0.1dev - [date]
## v0.0.1dev - [2024-10-17]

Initial release of nf-core/tfactivity, created with the [nf-core](https://nf-co.re/) template.

Expand Down
16 changes: 16 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,22 @@
## Pipeline tools

- [DeSeq2](https://doi.org/10.1186/s13059-014-0550-8)

> Love, M.I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014).
- [STARE](https://doi.org/10.1093/bioinformatics/btad062)

> Dennis Hecker, Fatemeh Behjati Ardakani, Alexander Karollus, Julien Gagneur, Marcel H Schulz, The adapted Activity-By-Contact model for enhancer–gene assignment and its application to single-cell data, Bioinformatics, Volume 39, Issue 2, February 2023
- [Bedtools](https://doi.org/10.1093%2Fbioinformatics%2Fbtq033)

> Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PMID: 20110278; PMCID: PMC2832824.
- [GTFtools](https://doi.org/10.1093/bioinformatics/btac561)

> Hong-Dong Li, Cui-Xiang Lin, Jiantao Zheng, GTFtools: a software package for analyzing various features of gene models, Bioinformatics, Volume 38, Issue 20, 15 October 2022, Pages 4806–4808,
## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
87 changes: 65 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,47 +19,83 @@

## Introduction

**nf-core/tfactivity** is a bioinformatics pipeline that ...
**nf-core/tfactivity** is a bioinformatics pipeline that can identify the most differentially active transcription factors (TFs) between multiple conditions. It takes a count matrix and open chromatin data (ATAC-seq, DNase-seq, HM-ChIP-seq) as input. It produces a ranking of transcription factors.

<!-- TODO nf-core:
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
major pipeline sections and the types of output it produces. You're giving an overview to someone new
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
-->
![Metro map](docs/images/metromap.png)

<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
1. Identify accessible regions (can perform footprinting between close ChIP-seq peaks or take ATAC-seq peaks)
2. Calculate affinity scores for combinations of transcription factors and target genes (TGs) using [STARE](https://doi.org/10.1093/bioinformatics/btad062)
3. Identify differentially expressed genes between conditions
4. Utilize linear regression to identify the transcription factors that are most likely to be responsible for the differential gene expression
5. Calculate the TF-TG score based on:
1. Differential expression of the target genes
2. Affinity of the transcription factors to the target genes
3. The regression coefficient of the transcription factors
6. Perform a Mann-Whitney U test and create a ranking of the transcription factors

A more biological visualization of the workflow can be found here:

> [!NOTE]
> The following image was created for the TF-Prioritizer publication. Parts of the workflow have been adapted for the nf-core pipeline, but the general idea is still valid.
![TF-Prioritizer workflow](docs/images/tfprio.jpeg)

## Usage

> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
Explain what rows and columns represent. For instance (please edit as appropriate):
The pipeline supports processing of previously called peaks from ATAC-seq, DNase-seq, or histone modification ChIP-seq data. The peaks can then either be used as-is or be subjected to footprinting analysis. Additionally, BAM files can be provided in a separate samplesheet, which will be used to predict enhancer regions.

First, prepare a samplesheet with your input data that looks as follows:
```csv title="samplesheet.csv"
sample,condition,assay,peak_file
condition1_H3K27ac_1,condition1,H3K27ac,condition1_H3K27ac_1.broadPeak
condition1_H3K27ac_2,condition1,H3K27ac,condition1_H3K27ac_2.broadPeak
condition1_H3K4me3,condition1,H3K4me3,condition1_H3K4me3.broadPeak
condition2_H3K27ac,condition2,H3K27ac,condition2_H3K27ac.broadPeak
condition3_H3K27ac,condition3,H3K27ac,condition3_H3K27ac.broadPeak
condition3_H3K4me3,condition3,H3K4me3,condition3_H3K4me3.broadPeak
```

`samplesheet.csv`:
Each row represents a peak file. The `sample` column should contain a unique identifier for each peak file. The `peak_file` column should contain the path to the peak file. Peak files need to be in a format that is compatible with the `bed` format. Only the first three columns of the `bed` format are used.

```csv
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
```csv title="samplesheet_bam.csv"
sample,condition,assay,signal,control
condition1_H3K27ac_1,condition1,H3K27ac,condition1_H3K27ac_1.bam,condition1_control.bam
condition1_H3K27ac_2,condition1,H3K27ac,condition1_H3K27ac_2.bam,condition1_control.bam
condition1_H3K4me3,condition1,H3K4me3,condition1_H3K4me3.bam,condition1_control.bam
condition2_H3K27ac,condition2,H3K27ac,condition2_H3K27ac.bam,condition2_control.bam
condition3_H3K27ac,condition3,H3K27ac,condition3_H3K27ac.bam,condition3_control.bam
condition3_H3K4me3,condition3,H3K4me3,condition3_H3K4me3.bam,condition3_control.bam
```

Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
The first three columns are the same as in the peak file samplesheet. The `signal` column should contain the path to the signal BAM file. The `control` column should contain the path to the control BAM file.

-->
Second, you need a raw count matrix (e.g. from [nf-core/rnaseq](https://nf-co.re/rnaseq)) with gene IDs as rows and samples as columns. You also need a design matrix that specifies the conditions of the samples in the count matrix. The design matrix should look as follows:

Now, you can run the pipeline using:
```csv title="design_matrix.csv"
sample,condition
sample1,condition1
sample2,condition1
sample3,condition2
sample4,condition3
```

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
The `sample` column should match the columns in the expression matrix. The `condition` column is needs to match the `condition` column in the samplesheet. Additionally,batches can be added to the design matrix and will be considered in the differential expression analysis.

:::tip
There is an alternative way of providing expression values. Instead of providing a single count matrix for all samples, you can provide a gene list and one count file per sample. Details can be found in the [usage documentation](https://nf-co.re/tfactivity/usage).
:::

Now, you can run the pipeline using:

```bash
nextflow run nf-core/tfactivity \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--genome GRCh38 \
--counts <EXPRESSION_MATRIX> \
--counts_design design_matrix.csv \
--outdir <OUTDIR>
```

Expand All @@ -76,11 +112,12 @@ For more details about the output files and reports, please refer to the

## Credits

nf-core/tfactivity was originally written by Nico Trummer.
nf-core/tfactivity was originally written by [Nico Trummer](https://github.com/nictru).

We thank the following people for their extensive assistance in the development of this pipeline:

<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
- [Markus Hoffmann](https://scholar.google.com/citations?user=_qXUS28AAAAJ) (project and scientific management)
- [Leon Hafner](https://www.linkedin.com/in/leon-hafner/) (implementations)

## Contributions and Support

Expand All @@ -90,6 +127,12 @@ For further information or help, don't hesitate to get in touch on the [Slack `#

## Citations

> **TF-Prioritizer: a Java pipeline to prioritize condition-specific transcription factors**
>
> Markus Hoffmann, Nico Trummer, Leon Schwartz, Jakub Jankowski, Hye Kyung Lee, Lina-Liv Willruth, Olga Lazareva, Kevin Yuan, Nina Baumgarten, Florian Schmidt, Jan Baumbach, Marcel H Schulz, David B Blumenthal, Lothar Hennighausen & Markus List
>
> GigaScience, Volume 12, 2023, giad026, https://doi.org/10.1093/gigascience/giad026
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
<!-- If you use nf-core/tfactivity for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->

Expand Down
Loading

0 comments on commit 575e736

Please sign in to comment.