From 8b7bd9d414b5938c92e1cae09fcd11caeddb8f58 Mon Sep 17 00:00:00 2001 From: christopher-hakkaart Date: Fri, 22 Apr 2022 10:12:24 +0200 Subject: [PATCH] Make pretty --- CHANGELOG.md | 134 ++++++++++++------------ CITATIONS.md | 116 ++++++++++++--------- README.md | 40 +++---- assets/multiqc_config.yaml | 14 +-- docs/output.md | 208 ++++++++++++++++++------------------- docs/usage.md | 28 ++--- modules.json | 2 +- nextflow_schema.json | 5 +- 8 files changed, 282 insertions(+), 265 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 3fe6399d..69625230 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,36 +7,36 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Major enhancements -* Add DNA variant calling functionality -* Add RNA modification and fusion detection functionalities -* Add `demux_fast5` module to output demultiplexed fast5 files when `--output_demultiplex_fast5` is set -* Add `--trim_barcodes` in Guppy basecaller to trim the barcodes from output fastq -* Port pipeline to the updated Nextflow DSL2 syntax adopted on nf-core/modules - * Removed `--publish_dir_mode` as it is no longer required for the new syntax -* Bump minimum Nextflow version from 21.04.0 -> 21.10.3 -* Update pipeline template to nf-core/tools `2.2` -* Update `bambu` version from `1.0.2` to `2.0.0` -* Update `multiqc` version from `1.10.1` to `1.11` +- Add DNA variant calling functionality +- Add RNA modification and fusion detection functionalities +- Add `demux_fast5` module to output demultiplexed fast5 files when `--output_demultiplex_fast5` is set +- Add `--trim_barcodes` in Guppy basecaller to trim the barcodes from output fastq +- Port pipeline to the updated Nextflow DSL2 syntax adopted on nf-core/modules + - Removed `--publish_dir_mode` as it is no longer required for the new syntax +- Bump minimum Nextflow version from 21.04.0 -> 21.10.3 +- Update pipeline template to nf-core/tools `2.2` +- Update `bambu` version from `1.0.2` to `2.0.0` +- Update `multiqc` version from `1.10.1` to `1.11` ### Parameters -* Added `--output_demultiplex_fast5` to output demultiplexed fast5 -* Added `--trim_barcodes` in Guppy basecaller to trim the barcodes from output fastq -* Added `--call_variants` to detect DNA variants -* Added `--split_mnps` to split multi-nucleotide polymorphisms into single nucleotide polymorphisms -* Added `--phase_vcf` to output a phased vcf -* Added `--skip_medaka` to skip `medaka_variant` -* Added `--skip_sniffles` to skip `sniffles` -* Added `--skip_modification_analysis` to skip RNA modification detection -* Added `--skip_xpore` to skip `xpore` -* Added `--skip_m6anet` to skip `m6anet` -* Added `--skip_fusion_analysis` to skip RNA fusion detection -* Added `--jaffal_ref_dir` to indicate the reference directory path required by `JAFFAL` +- Added `--output_demultiplex_fast5` to output demultiplexed fast5 +- Added `--trim_barcodes` in Guppy basecaller to trim the barcodes from output fastq +- Added `--call_variants` to detect DNA variants +- Added `--split_mnps` to split multi-nucleotide polymorphisms into single nucleotide polymorphisms +- Added `--phase_vcf` to output a phased vcf +- Added `--skip_medaka` to skip `medaka_variant` +- Added `--skip_sniffles` to skip `sniffles` +- Added `--skip_modification_analysis` to skip RNA modification detection +- Added `--skip_xpore` to skip `xpore` +- Added `--skip_m6anet` to skip `m6anet` +- Added `--skip_fusion_analysis` to skip RNA fusion detection +- Added `--jaffal_ref_dir` to indicate the reference directory path required by `JAFFAL` ### Software dependencies | Dependency | Old version | New version | -|-------------------------|-------------|-------------| +| ----------------------- | ----------- | ----------- | | `bioconductor-bambu` | 1.0.2 | 2.0.0 | | `bioconductor-bsgenome` | 1.58.0 | 1.62.0 | | `guppy` | 4.0.14 | 5.0.16 | @@ -50,77 +50,77 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Bug fix -* The `GET_TEST_DATA` process now uses checks for any file in the path. +- The `GET_TEST_DATA` process now uses checks for any file in the path. -> **NB:** Dependency has been __updated__ if both old and new version information is present. -> **NB:** Dependency has been __added__ if just the new version information is present. -> **NB:** Dependency has been __removed__ if version information isn't present. +> **NB:** Dependency has been **updated** if both old and new version information is present. +> **NB:** Dependency has been **added** if just the new version information is present. +> **NB:** Dependency has been **removed** if version information isn't present. ## [2.0.1] - 2021-11-29 ### Bug fix -* The `UCSC_BEDGRAPHTOBIGWIG` process now uses the `ucsc-bedgraphtobigwig` container -* The full-size and minimal AWS tests have successfully finished after changing to the `ucsc-bedgraphtobigwig` container +- The `UCSC_BEDGRAPHTOBIGWIG` process now uses the `ucsc-bedgraphtobigwig` container +- The full-size and minimal AWS tests have successfully finished after changing to the `ucsc-bedgraphtobigwig` container ## [2.0.0] - 2021-11-26 ### Major enhancements -* Pipeline has been re-implemented in [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) -* Software containers are now obtained from [Biocontainers](https://biocontainers.pro/#/registry) -* Update pipeline template to nf-core/tools `2.1` -* [#77](https://github.com/nf-core/nanoseq/issues/77) - Skipped alignment steps -* [#97](https://github.com/nf-core/nanoseq/issues/97) - Add optional DNA cleaning option +- Pipeline has been re-implemented in [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) +- Software containers are now obtained from [Biocontainers](https://biocontainers.pro/#/registry) +- Update pipeline template to nf-core/tools `2.1` +- [#77](https://github.com/nf-core/nanoseq/issues/77) - Skipped alignment steps +- [#97](https://github.com/nf-core/nanoseq/issues/97) - Add optional DNA cleaning option ### Parameters -* Added `--run_nanolyse` to run NanoLyse for DNA cleaning of FastQ files -* Added `--nanolyse_fasta` to provide a fasta file for nanolyse to filter against +- Added `--run_nanolyse` to run NanoLyse for DNA cleaning of FastQ files +- Added `--nanolyse_fasta` to provide a fasta file for nanolyse to filter against ### Software dependencies -| Dependency | Old version | New version | -|-------------------------|-------------|-------------| -| `bioconductor-bambu` | 1.0.0 | 1.0.2 | -| `nanolyse` | | 1.2.0 | -| `r-base` | 4.0.3 | 4.0.2 | +| Dependency | Old version | New version | +| -------------------- | ----------- | ----------- | +| `bioconductor-bambu` | 1.0.0 | 1.0.2 | +| `nanolyse` | | 1.2.0 | +| `r-base` | 4.0.3 | 4.0.2 | -> **NB:** Dependency has been __updated__ if both old and new version information is present. -> **NB:** Dependency has been __added__ if just the new version information is present. -> **NB:** Dependency has been __removed__ if version information isn't present. +> **NB:** Dependency has been **updated** if both old and new version information is present. +> **NB:** Dependency has been **added** if just the new version information is present. +> **NB:** Dependency has been **removed** if version information isn't present. ## [1.1.0] - 2020-11-06 ### Major enhancements -* Transcript reconstruction and quantification ([`bambu`](https://bioconductor.org/packages/release/bioc/html/bambu.html) or [`StringTie2`](https://ccb.jhu.edu/software/stringtie/) and [`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/)) -* Differential expression analysis at the gene-level ([`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)) and transcript-level ([`DEXSeq`](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html)) -* Ability to provide BAM input to the pipeline -* Change samplesheet format to be more flexible to BAM input files -* Add pycoQC and featureCounts output to MultiQC report -* Add AWS full-sized test data -* Add parameter JSON schema for pipeline -* Add citations file -* Update pipeline template to nf-core/tools `1.11` -* Collapsible sections for output files in `docs/output.md` -* Replace `set` with `tuple` and `file` with `path` in `input` section of all processes -* Capitalise process names -* Added `--gpus all` to Docker `runOptions` when using GPU as mentioned [here](https://github.com/docker/compose/issues/6691#issuecomment-514429646) -* Cannot invoke method `containsKey()` on null object when `--igenomes_ignore` is set [#76](https://github.com/nf-core/nanoseq/issues/76) +- Transcript reconstruction and quantification ([`bambu`](https://bioconductor.org/packages/release/bioc/html/bambu.html) or [`StringTie2`](https://ccb.jhu.edu/software/stringtie/) and [`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/)) +- Differential expression analysis at the gene-level ([`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)) and transcript-level ([`DEXSeq`](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html)) +- Ability to provide BAM input to the pipeline +- Change samplesheet format to be more flexible to BAM input files +- Add pycoQC and featureCounts output to MultiQC report +- Add AWS full-sized test data +- Add parameter JSON schema for pipeline +- Add citations file +- Update pipeline template to nf-core/tools `1.11` +- Collapsible sections for output files in `docs/output.md` +- Replace `set` with `tuple` and `file` with `path` in `input` section of all processes +- Capitalise process names +- Added `--gpus all` to Docker `runOptions` when using GPU as mentioned [here](https://github.com/docker/compose/issues/6691#issuecomment-514429646) +- Cannot invoke method `containsKey()` on null object when `--igenomes_ignore` is set [#76](https://github.com/nf-core/nanoseq/issues/76) ### Parameters -* Added `--barcode_both_ends` requires barcode on both ends for Guppy basecaller -* Added `--quantification_method` to specify the transcript quantification method to use -* Added `--skip_quantification` to skip transcript quantification and differential analysis -* Added `--skip_differential_analysis` to skip differential analysis with DESeq2 and DEXSeq -* Added `--publish_dir_mode` to customise method of publishing results to output directory [nf-core/tools#585](https://github.com/nf-core/tools/issues/585) +- Added `--barcode_both_ends` requires barcode on both ends for Guppy basecaller +- Added `--quantification_method` to specify the transcript quantification method to use +- Added `--skip_quantification` to skip transcript quantification and differential analysis +- Added `--skip_differential_analysis` to skip differential analysis with DESeq2 and DEXSeq +- Added `--publish_dir_mode` to customise method of publishing results to output directory [nf-core/tools#585](https://github.com/nf-core/tools/issues/585) ### Software dependencies | Dependency | Old version | New version | -|-------------------------|-------------|-------------| +| ----------------------- | ----------- | ----------- | | `Guppy` | 3.4.4 | 4.0.14 | | `markdown` | 3.1.1 | 3.3.3 | | `multiqc` | 1.8 | 1.9 | @@ -143,9 +143,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | `subread` | - | 2.0.1 | | `psutil` | - | - | -> **NB:** Dependency has been __updated__ if both old and new version information is present. -> **NB:** Dependency has been __added__ if just the new version information is present. -> **NB:** Dependency has been __removed__ if version information isn't present. +> **NB:** Dependency has been **updated** if both old and new version information is present. +> **NB:** Dependency has been **added** if just the new version information is present. +> **NB:** Dependency has been **removed** if version information isn't present. ## [1.0.0] - 2020-03-05 diff --git a/CITATIONS.md b/CITATIONS.md index 9750d4f3..7e2b314b 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -10,81 +10,101 @@ ## Pipeline tools -* [BEDTools](https://www.ncbi.nlm.nih.gov/pubmed/20110278/) - > Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824. +- [BEDTools](https://www.ncbi.nlm.nih.gov/pubmed/20110278/) -* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) + > Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824. -* [featureCounts](https://www.ncbi.nlm.nih.gov/pubmed/24227677/) - > Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014 Apr 1;30(7):923-30. doi: 10.1093/bioinformatics/btt656. Epub 2013 Nov 13. PubMed PMID: 24227677. +- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) -* [GraphMap](https://pubmed.ncbi.nlm.nih.gov/27079541/) - > Sović I, Šikić M, Wilm A, Fenlon SN, Chen S, Nagarajan N. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat Commun. 2016 Apr 15;7:11307. doi: 10.1038/ncomms11307. PMID: 27079541; PMCID: PMC4835549. +- [featureCounts](https://www.ncbi.nlm.nih.gov/pubmed/24227677/) -* [Guppy](https://nanoporetech.com/nanopore-sequencing-data-analysis) + > Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014 Apr 1;30(7):923-30. doi: 10.1093/bioinformatics/btt656. Epub 2013 Nov 13. PubMed PMID: 24227677. -* [JAFFAL](https://doi.org/10.1186/s13059-021-02588-5) - > Davidson NM, et al., JAFFAL: detecting fusion genes with long-read transcriptome sequencing. Genome Biology (2022) +- [GraphMap](https://pubmed.ncbi.nlm.nih.gov/27079541/) -* [m6anet](https://www.biorxiv.org/content/10.1101/2021.09.20.461055v1) - > Hendra C, et al., Detection of m6A from direct RNA sequencing using a Multiple Instance Learning framework. bioRXiv (2021) + > Sović I, Šikić M, Wilm A, Fenlon SN, Chen S, Nagarajan N. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat Commun. 2016 Apr 15;7:11307. doi: 10.1038/ncomms11307. PMID: 27079541; PMCID: PMC4835549. -* [Minimap2](https://pubmed.ncbi.nlm.nih.gov/29750242/) - > Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191. PMID: 29750242; PMCID: PMC6137996. +- [Guppy](https://nanoporetech.com/nanopore-sequencing-data-analysis) -* [Medaka](https://github.com/nanoporetech/medaka) +- [JAFFAL](https://doi.org/10.1186/s13059-021-02588-5) -* [MultiQC](https://www.ncbi.nlm.nih.gov/pubmed/27312411/) - > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. + > Davidson NM, et al., JAFFAL: detecting fusion genes with long-read transcriptome sequencing. Genome Biology (2022) -* [NanoLyse](https://pubmed.ncbi.nlm.nih.gov/29547981/) - > De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669. PubMed PMID: 29547981; PubMed Central PMCID: PMC6061794. +- [m6anet](https://www.biorxiv.org/content/10.1101/2021.09.20.461055v1) -* [NanoPlot](https://pubmed.ncbi.nlm.nih.gov/29547981/) - > De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018 Aug 1;34(15):2666-2669. doi: 10.1093/bioinformatics/bty149. PubMed PMID: 29547981; PubMed Central PMCID: PMC6061794. + > Hendra C, et al., Detection of m6A from direct RNA sequencing using a Multiple Instance Learning framework. bioRXiv (2021) -* [pycoQC](https://doi.org/10.21105/joss.01236) - > Leger A, Leonardi T, (2019). pycoQC, interactive quality control for Oxford Nanopore Sequencing. Journal of Open Source Software, 4(34), 1236. +- [Minimap2](https://pubmed.ncbi.nlm.nih.gov/29750242/) -* [qcat](https://github.com/nanoporetech/qcat) + > Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191. PMID: 29750242; PMCID: PMC6137996. -* [SAMtools](https://www.ncbi.nlm.nih.gov/pubmed/19505943/) - > Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002. +- [Medaka](https://github.com/nanoporetech/medaka) -* [Sniffles](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5990442/) - > Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A, Schatz MC. Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods. 2018 Jun;15(6):461-468. doi: 10 1038/s41592-018-0001-7. Epub 2018 Apr 30. PMID: 29713083; PMCID: PMC5990442. +- [MultiQC](https://www.ncbi.nlm.nih.gov/pubmed/27312411/) -* [StringTie2](https://www.ncbi.nlm.nih.gov/pubmed/31842956/) - > Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2 Genome Biol. 2019 Dec 16;20(1):278. doi: 10.1186/s13059-019-1910-1. PubMed PMID: 31842956; PubMed Central PMCID: PMC6912988. + > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. -* [UCSC tools](https://www.ncbi.nlm.nih.gov/pubmed/20639541/) - > Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010 Sep 1;26(17):2204-7. doi: 10.1093/bioinformatics/btq351. Epub 2010 Jul 17. PubMed PMID: 20639541; PubMed Central PMCID: PMC2922891. +- [NanoLyse](https://pubmed.ncbi.nlm.nih.gov/29547981/) -* [xPore](https://doi.org/10.1038/s41587-021-00949-w) - > Pratanwanich PN, et al.,Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore. Nat Biotechnol (2021) + > De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669. PubMed PMID: 29547981; PubMed Central PMCID: PMC6061794. + +- [NanoPlot](https://pubmed.ncbi.nlm.nih.gov/29547981/) + + > De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018 Aug 1;34(15):2666-2669. doi: 10.1093/bioinformatics/bty149. PubMed PMID: 29547981; PubMed Central PMCID: PMC6061794. + +- [pycoQC](https://doi.org/10.21105/joss.01236) + + > Leger A, Leonardi T, (2019). pycoQC, interactive quality control for Oxford Nanopore Sequencing. Journal of Open Source Software, 4(34), 1236. + +- [qcat](https://github.com/nanoporetech/qcat) + +- [SAMtools](https://www.ncbi.nlm.nih.gov/pubmed/19505943/) + + > Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002. + +- [Sniffles](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5990442/) + + > Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A, Schatz MC. Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods. 2018 Jun;15(6):461-468. doi: 10 1038/s41592-018-0001-7. Epub 2018 Apr 30. PMID: 29713083; PMCID: PMC5990442. + +- [StringTie2](https://www.ncbi.nlm.nih.gov/pubmed/31842956/) + + > Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2 Genome Biol. 2019 Dec 16;20(1):278. doi: 10.1186/s13059-019-1910-1. PubMed PMID: 31842956; PubMed Central PMCID: PMC6912988. + +- [UCSC tools](https://www.ncbi.nlm.nih.gov/pubmed/20639541/) + + > Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010 Sep 1;26(17):2204-7. doi: 10.1093/bioinformatics/btq351. Epub 2010 Jul 17. PubMed PMID: 20639541; PubMed Central PMCID: PMC2922891. + +- [xPore](https://doi.org/10.1038/s41587-021-00949-w) + > Pratanwanich PN, et al.,Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore. Nat Biotechnol (2021) ## R packages -* [R](https://www.R-project.org/) - > R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. +- [R](https://www.R-project.org/) + + > R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. + +- [bambu](https://bioconductor.org/packages/release/bioc/html/bambu.html) + + > Chen Y, Goeke J, Wan YK (2020). bambu: Reference-guided isoform reconstruction and quantification for long read RNA-Seq data. R package version 1.0.0. + +- [BSgenome](https://bioconductor.org/packages/release/bioc/html/BSgenome.html) + + > Pagès H (2020). BSgenome: Software infrastructure for efficient representation of full genomes and their SNPs. doi: 10.18129/B9.bioc.BSgenome. + +- [DESeq2](https://www.ncbi.nlm.nih.gov/pubmed/25516281/) -* [bambu](https://bioconductor.org/packages/release/bioc/html/bambu.html) - > Chen Y, Goeke J, Wan YK (2020). bambu: Reference-guided isoform reconstruction and quantification for long read RNA-Seq data. R package version 1.0.0. + > Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. PubMed PMID: 25516281; PubMed Central PMCID: PMC4302049. -* [BSgenome](https://bioconductor.org/packages/release/bioc/html/BSgenome.html) - > Pagès H (2020). BSgenome: Software infrastructure for efficient representation of full genomes and their SNPs. doi: 10.18129/B9.bioc.BSgenome. +- [DEXSeq](https://pubmed.ncbi.nlm.nih.gov/22722343/) -* [DESeq2](https://www.ncbi.nlm.nih.gov/pubmed/25516281/) - > Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. PubMed PMID: 25516281; PubMed Central PMCID: PMC4302049. + > Anders S, Reyes A, Huber W. Detecting differential usage of exons from RNA-seq data. Genome Res. 2012 Oct;22(10):2008-17. doi: 10.1101/gr.133744.111. Epub 2012 Jun 21. PubMed PMID: 22722343; PubMed Central PMCID: PMC3460195. -* [DEXSeq](https://pubmed.ncbi.nlm.nih.gov/22722343/) - > Anders S, Reyes A, Huber W. Detecting differential usage of exons from RNA-seq data. Genome Res. 2012 Oct;22(10):2008-17. doi: 10.1101/gr.133744.111. Epub 2012 Jun 21. PubMed PMID: 22722343; PubMed Central PMCID: PMC3460195. +- [DRIMSeq](https://pubmed.ncbi.nlm.nih.gov/28105305/) -* [DRIMSeq](https://pubmed.ncbi.nlm.nih.gov/28105305/) - > Nowicka M, Robinson MD. DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Res. 2016 Jun 13;5:1356. doi: 10.12688/f1000research.8900.2. PubMed PMID: 28105305; PubMed Central PMCID: PMC5200948. + > Nowicka M, Robinson MD. DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Res. 2016 Jun 13;5:1356. doi: 10.12688/f1000research.8900.2. PubMed PMID: 28105305; PubMed Central PMCID: PMC5200948. -* [stageR](https://pubmed.ncbi.nlm.nih.gov/28784146/) - > Van den Berge K, Soneson C, Robinson MD, Clement L. stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage. Genome Biol. 2017 Aug 7;18(1):151. doi: 10.1186/s13059-017-1277-0. PubMed PMID: 28784146; PubMed Central PMCID: PMC5547545. +- [stageR](https://pubmed.ncbi.nlm.nih.gov/28784146/) + > Van den Berge K, Soneson C, Robinson MD, Clement L. stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage. Genome Biol. 2017 Aug 7;18(1):151. doi: 10.1186/s13059-017-1277-0. PubMed PMID: 28784146; PubMed Central PMCID: PMC5547545. ## Software packaging/containerisation tools diff --git a/README.md b/README.md index 4ae0867d..08115fd5 100644 --- a/README.md +++ b/README.md @@ -25,24 +25,24 @@ On release, automated continuous integration tests run the pipeline on a [full-s ## Pipeline Summary -1. Basecalling and/or demultiplexing ([`Guppy`](https://nanoporetech.com/nanopore-sequencing-data-analysis), [`demux_fast5`](https://github.com/nanoporetech/ont_fast5_api#demux_fast5) or [`qcat`](https://github.com/nanoporetech/qcat); *optional*) +1. Basecalling and/or demultiplexing ([`Guppy`](https://nanoporetech.com/nanopore-sequencing-data-analysis), [`demux_fast5`](https://github.com/nanoporetech/ont_fast5_api#demux_fast5) or [`qcat`](https://github.com/nanoporetech/qcat); _optional_) 2. Sequencing QC ([`pycoQC`](https://github.com/a-slide/pycoQC), [`NanoPlot`](https://github.com/wdecoster/NanoPlot)) -3. Raw read DNA cleaning ([NanoLyse](https://github.com/wdecoster/nanolyse); *optional*) +3. Raw read DNA cleaning ([NanoLyse](https://github.com/wdecoster/nanolyse); _optional_) 4. Raw read QC ([`NanoPlot`](https://github.com/wdecoster/NanoPlot), [`FastQC`](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) 5. Alignment ([`GraphMap2`](https://github.com/lbcb-sci/graphmap2) or [`minimap2`](https://github.com/lh3/minimap2)) - * Both aligners are capable of performing unspliced and spliced alignment. Sensible defaults will be applied automatically based on a combination of the input data and user-specified parameters - * Each sample can be mapped to its own reference genome if multiplexed in this way - * Convert SAM to co-ordinate sorted BAM and obtain mapping metrics ([`SAMtools`](http://www.htslib.org/doc/samtools.html)) + - Both aligners are capable of performing unspliced and spliced alignment. Sensible defaults will be applied automatically based on a combination of the input data and user-specified parameters + - Each sample can be mapped to its own reference genome if multiplexed in this way + - Convert SAM to co-ordinate sorted BAM and obtain mapping metrics ([`SAMtools`](http://www.htslib.org/doc/samtools.html)) 6. Create bigWig ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/)) and bigBed ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedToBigBed`](http://hgdownload.soe.ucsc.edu/admin/exe/)) coverage tracks for visualisation 7. DNA-specific downstream analysis: - * DNA variant calling ([`medaka`](https://github.com/nanoporetech/medaka) and/or [`sniffles`](https://github.com/fritzsedlazeck/Sniffles)) + - DNA variant calling ([`medaka`](https://github.com/nanoporetech/medaka) and/or [`sniffles`](https://github.com/fritzsedlazeck/Sniffles)) 8. RNA-specific downstream analysis: - * Transcript reconstruction and quantification ([`bambu`](https://bioconductor.org/packages/release/bioc/html/bambu.html) or [`StringTie2`](https://ccb.jhu.edu/software/stringtie/)) - * bambu performs both transcript reconstruction and quantification. - * When StringTie2 is chosen, each sample can be processed individually and combined. After which, [`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/) will be used for both gene and transcript quantification. - * Differential expression analysis ([`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) and/or [`DEXSeq`](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html)) - * RNA modification detection ([`xpore`](https://github.com/GoekeLab/xpore) and/or [`m6anet`](https://github.com/GoekeLab/m6anet)) - * RNA fusion detection ([`JAFFAL`](https://github.com/Oshlack/JAFFA)) + - Transcript reconstruction and quantification ([`bambu`](https://bioconductor.org/packages/release/bioc/html/bambu.html) or [`StringTie2`](https://ccb.jhu.edu/software/stringtie/)) + - bambu performs both transcript reconstruction and quantification. + - When StringTie2 is chosen, each sample can be processed individually and combined. After which, [`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/) will be used for both gene and transcript quantification. + - Differential expression analysis ([`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) and/or [`DEXSeq`](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html)) + - RNA modification detection ([`xpore`](https://github.com/GoekeLab/xpore) and/or [`m6anet`](https://github.com/GoekeLab/m6anet)) + - RNA fusion detection ([`JAFFAL`](https://github.com/Oshlack/JAFFA)) 9. Present QC for raw read and alignment results ([`MultiQC`](https://multiqc.info/docs/)) ### Functionality Overview @@ -61,16 +61,16 @@ A graphical overview of suggested routes through the pipeline depending on the d 3. Download the pipeline and test it on a minimal dataset with a single command: - ```console - nextflow run nf-core/nanoseq -profile test,YOURPROFILE - ``` + ```console + nextflow run nf-core/nanoseq -profile test,YOURPROFILE + ``` - Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (`YOURPROFILE` in the example command above). You can chain multiple config profiles in a comma-separated string. + Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (`YOURPROFILE` in the example command above). You can chain multiple config profiles in a comma-separated string. - > * The pipeline comes with config profiles called `docker`, `singularity`, `podman`, `shifter`, `charliecloud` and `conda` which instruct the pipeline to use the named tool for software management. For example, `-profile test,docker`. - > * Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. - > * If you are using `singularity` and are persistently observing issues downloading Singularity images directly due to timeout or network issues, then you can use the `--singularity_pull_docker_container` parameter to pull and convert the Docker image instead. Alternatively, you can use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to download images first, before running the pipeline. Setting the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options enables you to store and re-use the images from a central location for future pipeline runs. - > * If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs. + > - The pipeline comes with config profiles called `docker`, `singularity`, `podman`, `shifter`, `charliecloud` and `conda` which instruct the pipeline to use the named tool for software management. For example, `-profile test,docker`. + > - Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + > - If you are using `singularity` and are persistently observing issues downloading Singularity images directly due to timeout or network issues, then you can use the `--singularity_pull_docker_container` parameter to pull and convert the Docker image instead. Alternatively, you can use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to download images first, before running the pipeline. Setting the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options enables you to store and re-use the images from a central location for future pipeline runs. + > - If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs. 4. Start running your own analysis! diff --git a/assets/multiqc_config.yaml b/assets/multiqc_config.yaml index 365f1651..df14117a 100644 --- a/assets/multiqc_config.yaml +++ b/assets/multiqc_config.yaml @@ -1,11 +1,11 @@ report_comment: > - This report has been generated by the nf-core/nanoseq - analysis pipeline. For information about how to interpret these results, please see the - documentation. + This report has been generated by the nf-core/nanoseq + analysis pipeline. For information about how to interpret these results, please see the + documentation. report_section_order: - software_versions: - order: -1000 - nf-core-nanoseq-summary: - order: -1001 + software_versions: + order: -1000 + nf-core-nanoseq-summary: + order: -1001 export_plots: true diff --git a/docs/output.md b/docs/output.md index 545684e8..61f12ad4 100644 --- a/docs/output.md +++ b/docs/output.md @@ -19,55 +19,55 @@ The directories listed below will be created in the output directory after the p
Output files -* `guppy/fastq/` - Merged fastq output files for each barcode. -* `guppy/basecalling//` - fastq output files for each barcode. -* `guppy/basecalling/unclassified/` - fastq files with reads were unassigned to any given barcode. -* `guppy/basecalling/sequencing_summary.txt` - Sequencing summary file generated by *Guppy*. -* `guppy/basecalling/sequencing_telemetry.js` - Sequencing telemetry file generated by *Guppy*. -* `guppy/basecalling/guppy_basecaller_log-.log` - Log file for *Guppy* execution. -* `demux_fast5/demultiplexed_fast5//` - fast5 output files for each barcode. -* `demux_fast5/demultiplexed_fast5/unclassified/` - fast5 files with reads were unassigned to any given barcode. -* `qcat/fastq/.fastq.gz` - fastq output files for each barcode. -* `qcat/fastq/none.fastq.gz` - fastq file with reads were unassigned to any given barcode. +- `guppy/fastq/` + Merged fastq output files for each barcode. +- `guppy/basecalling//` + fastq output files for each barcode. +- `guppy/basecalling/unclassified/` + fastq files with reads were unassigned to any given barcode. +- `guppy/basecalling/sequencing_summary.txt` + Sequencing summary file generated by _Guppy_. +- `guppy/basecalling/sequencing_telemetry.js` + Sequencing telemetry file generated by _Guppy_. +- `guppy/basecalling/guppy_basecaller_log-.log` + Log file for _Guppy_ execution. +- `demux_fast5/demultiplexed_fast5//` + fast5 output files for each barcode. +- `demux_fast5/demultiplexed_fast5/unclassified/` + fast5 files with reads were unassigned to any given barcode. +- `qcat/fastq/.fastq.gz` + fastq output files for each barcode. +- `qcat/fastq/none.fastq.gz` + fastq file with reads were unassigned to any given barcode.
-*Documentation*: +_Documentation_: [Guppy](https://nanoporetech.com/nanopore-sequencing-data-analysis), [demux_fast5](https://github.com/nanoporetech/ont_fast5_api#demux_fast5), [qcat](https://github.com/nanoporetech/qcat) -*Description*: +_Description_: The pipeline has been written to deal with the various scenarios where you would like to include/exclude the basecalling and demultiplexing steps. This will be dependent on what type of input data you would like to provide the pipeline. Additionally, if you would like to align your samples to a reference genome there are various options for providing this information. Please see [`usage.md`](usage.md#--input) for more details about the format of the input samplesheet, associated commands and how to provide reference genome data. -*Guppy* will be used to basecall and demultiplex the data. Various options have been provided to customise specific parameters and to be able to run *Guppy* on GPUs. +_Guppy_ will be used to basecall and demultiplex the data. Various options have been provided to customise specific parameters and to be able to run _Guppy_ on GPUs. -*demux_fast5* will demultiplex the fast5 files, gives the *Guppy* summary file. +_demux_fast5_ will demultiplex the fast5 files, gives the _Guppy_ summary file. -If you have a pre-basecalled fastq file then *qcat* will be used to perform the demultiplexing if you provide the `--skip_basecalling` parameter. If you would like to skip both of these steps entirely then you can provide `--skip_basecalling --skip_demultiplexing` when running the pipeline. As a result, the structure of the output folder will depend on which steps you have chosen to run in the pipeline. +If you have a pre-basecalled fastq file then _qcat_ will be used to perform the demultiplexing if you provide the `--skip_basecalling` parameter. If you would like to skip both of these steps entirely then you can provide `--skip_basecalling --skip_demultiplexing` when running the pipeline. As a result, the structure of the output folder will depend on which steps you have chosen to run in the pipeline. ## Removal of DNA contaminants
Output files -* `nanolyse/.clean.fastq.gz` - FastQ file after the removal of reads mapping to DNA contaminants. +- `nanolyse/.clean.fastq.gz` + FastQ file after the removal of reads mapping to DNA contaminants.
-*Documentation*: +_Documentation_: [NanoLyse](https://github.com/wdecoster/nanolyse) -*Description*: +_Description_: If you would like to run NanoLyse on the raw FastQ files then you can provide `--run_nanolyse` when running the pipeline. By default, the pipeline will filter the raw reads relative to lambda phage but you can provide your own fasta file of "contaminants" with `--nanolyse_fasta`. The filtered FastQ files will contain raw reads without the provided reference sequences (default: lambda phage sequences). ## Sequencing QC @@ -75,18 +75,18 @@ If you would like to run NanoLyse on the raw FastQ files then you can provide `-
Output files -* `pycoqc/pycoqc.html` - `*.html` file that includes a run summary and graphical representation of various QC metrics including distribution of read length, distribution of read quality scores, mean read quality per sequence length, output per channel over experiment time and percentage of reads per barcode. -* `nanoplot/summary/` - `*.html` files for QC metrics and individual `*.png` image files for plots. +- `pycoqc/pycoqc.html` + `*.html` file that includes a run summary and graphical representation of various QC metrics including distribution of read length, distribution of read quality scores, mean read quality per sequence length, output per channel over experiment time and percentage of reads per barcode. +- `nanoplot/summary/` + `*.html` files for QC metrics and individual `*.png` image files for plots.
-*Documentation*: +_Documentation_: [PycoQC](https://github.com/a-slide/pycoQC), [NanoPlot](https://github.com/wdecoster/NanoPlot) -*Description*: -*PycoQC* and *NanoPlot* compute metrics and generate QC plots using the sequencing summary information generated by *Guppy* e.g. distribution of read length, read length over time, number of reads per barcode and other general stats. *NanoPlot* also generates QC metrics directly from fastq files as described in the next section. +_Description_: +_PycoQC_ and _NanoPlot_ compute metrics and generate QC plots using the sequencing summary information generated by _Guppy_ e.g. distribution of read length, read length over time, number of reads per barcode and other general stats. _NanoPlot_ also generates QC metrics directly from fastq files as described in the next section. ![PycoQC - Number of reads per barcode](images/pycoqc_readsperbarcode.png) @@ -95,44 +95,44 @@ If you would like to run NanoLyse on the raw FastQ files then you can provide `-
Output files -* `nanoplot/fastq//` - Per-sample `*.html` files for QC metrics and individual `*.png` image files for plots. -* `fastqc/` - *FastQC* `*.html` and `*.zip` files. +- `nanoplot/fastq//` + Per-sample `*.html` files for QC metrics and individual `*.png` image files for plots. +- `fastqc/` + _FastQC_ `*.html` and `*.zip` files.
-*Documentation*: +_Documentation_: [NanoPlot](https://github.com/wdecoster/NanoPlot), [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/) -*Description*: -*NanoPlot* can also be used to produce general quality metrics from the per-barcode fastq files generated by *Guppy* e.g. quality score distribution, read lengths and other general stats. +_Description_: +_NanoPlot_ can also be used to produce general quality metrics from the per-barcode fastq files generated by _Guppy_ e.g. quality score distribution, read lengths and other general stats. ![Nanoplot - Read quality vs read length](images/nanoplot_readlengthquality.png) -*FastQC* gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T). You get information about adapter contamination and other overrepresented sequences. +_FastQC_ gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T). You get information about adapter contamination and other overrepresented sequences. ## Alignment
Output files -* `/bam` - Per-sample coordinate sorted alignment files in [`*.bam`](https://samtools.github.io/hts-specs/SAMv1.pdf) format. -* `/bam_index` - Per-sample coordinate sorted alignment index files in [`*.bai`](https://samtools.github.io/hts-specs/SAMv1.pdf) format. -* `/samtools_stats/` - *SAMtools* `*.flagstat`, `*.idxstats` and `*.stats` files generated from the alignment files. +- `/bam` + Per-sample coordinate sorted alignment files in [`*.bam`](https://samtools.github.io/hts-specs/SAMv1.pdf) format. +- `/bam_index` + Per-sample coordinate sorted alignment index files in [`*.bai`](https://samtools.github.io/hts-specs/SAMv1.pdf) format. +- `/samtools_stats/` + _SAMtools_ `*.flagstat`, `*.idxstats` and `*.stats` files generated from the alignment files.
-*Documentation*: +_Documentation_: [GraphMap2](https://github.com/lbcb-sci/graphmap2), [MiniMap2](https://github.com/lh3/minimap2), [SAMtools](http://samtools.sourceforge.net/) -*Description*: -Reads are mapped to a user-defined genome or transcriptome using either *GraphMap2* or *Minimap2*, and the resulting BAM files are sorted and indexed. If the same reference is specified multiple times in the input sample sheet then the aligner index will only be built once for re-use across all samples. You can skip the alignment and downstream processes by providing the `--skip_alignment` parameter. +_Description_: +Reads are mapped to a user-defined genome or transcriptome using either _GraphMap2_ or _Minimap2_, and the resulting BAM files are sorted and indexed. If the same reference is specified multiple times in the input sample sheet then the aligner index will only be built once for re-use across all samples. You can skip the alignment and downstream processes by providing the `--skip_alignment` parameter. -The initial SAM alignment files created by *GraphMap2* or *Minimap2* are not saved by default to be more efficient with storage space. You can override this behaviour with the use of the `--save_align_intermeds` parameter. +The initial SAM alignment files created by _GraphMap2_ or _Minimap2_ are not saved by default to be more efficient with storage space. You can override this behaviour with the use of the `--save_align_intermeds` parameter. ![MultiQC - SAMtools stats plot](images/mqc_samtools_stats_plot.png) @@ -141,17 +141,17 @@ The initial SAM alignment files created by *GraphMap2* or *Minimap2* are not sav
Output files -* `/bigwig/` - Per-sample `*.bigWig` files. -* `/bigbed/` - Per-sample `*.bigBed` files. +- `/bigwig/` + Per-sample `*.bigWig` files. +- `/bigbed/` + Per-sample `*.bigBed` files.
-*Documentation*: +_Documentation_: [BEDTools](https://bedtools.readthedocs.io/en/latest/), [bedGraphToBigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html#Ex3), [`bedToBigBed`](https://genome.ucsc.edu/goldenPath/help/bigBed.html#Ex2) -*Description*: +_Description_: The [bigWig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) format is in an indexed binary format useful for displaying dense, continuous data in Genome Browsers such as the [UCSC](https://genome.ucsc.edu/cgi-bin/hgTracks) and [IGV](http://software.broadinstitute.org/software/igv/). This mitigates the need to load the much larger BAM files for data visualisation purposes which will be slower and result in memory issues. The bigWig format is also supported by various bioinformatics software for downstream processing such as meta-profile plotting. [bigBed](https://genome.ucsc.edu/goldenPath/help/bigBed.html) are more useful for displaying distribution of reads across exon intervals as is typically observed for RNA-seq data. Therefore, these files will only be generated if `--protocol directRNA` or `--protocol cDNA` are defined. @@ -163,18 +163,18 @@ The creation of these files can be bypassed by setting the parameters `--skip_bi
Output files -* `minimap2/medaka//round_1.vcf` - VCF file with small variants for each sample. -* `minimap2/sniffles/_sniffles.vcf` - VCF files with unflitered structural variants. +- `minimap2/medaka//round_1.vcf` + VCF file with small variants for each sample. +- `minimap2/sniffles/_sniffles.vcf` + VCF files with unflitered structural variants.
-*Documentation*: +_Documentation_: [Medaka](https://github.com/nanoporetech/medaka), [Sniffles](https://github.com/fritzsedlazeck/Sniffles) -*Description*: -If the protocol is set to `--protocol DNA` and the *Minimap2* aligner was used, then the `--call_variants` parameter can be invoked to call small variants and structural variants using Medaka and Sniffles, respectively. These steps won't be run if you provide the `--skip_medaka` or `--skip_sniffles` parameters. +_Description_: +If the protocol is set to `--protocol DNA` and the _Minimap2_ aligner was used, then the `--call_variants` parameter can be invoked to call small variants and structural variants using Medaka and Sniffles, respectively. These steps won't be run if you provide the `--skip_medaka` or `--skip_sniffles` parameters. ## Transcript Reconstruction and Quantification @@ -183,31 +183,31 @@ If the protocol is set to `--protocol DNA` and the *Minimap2* aligner was used, If bambu is used: -* `bambu/` - * `extended_annotations.gtf` - a gtf file that contains both annotated and novel transcripts - * `counts_gene.txt` - gene expression estimates - * `counts_transcript.txt` - transcript expression estimates +- `bambu/` + - `extended_annotations.gtf` - a gtf file that contains both annotated and novel transcripts + - `counts_gene.txt` - gene expression estimates + - `counts_transcript.txt` - transcript expression estimates If StringTie2 is used: -* `stringtie2/` - * `*.bam` - Per-sample coordinate sorted alignment files in [`*.bam`](https://samtools.github.io/hts-specs/SAMv1.pdf) format. - * `*.stringtie.gtf` - Per-sample annotations for novel transcripts obtained in *StringTie2*. - * `stringtie.merged.gtf` - Extended annotation that combines provided gtf with gtf files from each sample via *StringTie2 Merge*. - * `counts_gene.txt` - gene expression estimates calculated by featureCounts. - * `counts_gene.txt.summary` - featureCounts gene level log file. - * `counts_transcript.txt` - transcript expression estimates calculated by featureCounts. - * `counts_transcript.txt.summary` - featureCounts transcript level log file. +- `stringtie2/` + - `*.bam` + Per-sample coordinate sorted alignment files in [`*.bam`](https://samtools.github.io/hts-specs/SAMv1.pdf) format. + - `*.stringtie.gtf` + Per-sample annotations for novel transcripts obtained in _StringTie2_. + - `stringtie.merged.gtf` + Extended annotation that combines provided gtf with gtf files from each sample via _StringTie2 Merge_. + - `counts_gene.txt` - gene expression estimates calculated by featureCounts. + - `counts_gene.txt.summary` - featureCounts gene level log file. + - `counts_transcript.txt` - transcript expression estimates calculated by featureCounts. + - `counts_transcript.txt.summary` - featureCounts transcript level log file. -*Documentation*: +_Documentation_: [bambu](https://bioconductor.org/packages/release/bioc/html/bambu.html), [StringTie2](https://ccb.jhu.edu/software/stringtie/), [featureCounts](http://bioinf.wehi.edu.au/featureCounts/) -*Description*: +_Description_: After genomic alignment, novel transcripts can be reconstructed using tools such as bambu and StringTie2. Quantification can then be performed on a more complete annotation based on the transcripts detected within a given set of samples. bambu performs both the reconstruction and quantification steps. An an alternative approach, we also provides an option to run StringTie2 to identify novel transcripts. However, when multiple samples are provided, quantification for multiple samples are not implemented explicitly in the software. Hence a second step is required to merge novel transcripts across multiple samples followed by quantification for both gene and transcripts using featureCounts. You can skip transcript reconstruction and quantification by providing the `--skip_quantification` parameter. ## Differential expression analysis @@ -215,15 +215,15 @@ After genomic alignment, novel transcripts can be reconstructed using tools such
Output files -* `/deseq2/deseq2.results.txt` - a `.txt` file that contains differential expression results for genes. -* `/dexseq/dexseq.results.txt` - a `.txt` file that contains differential expression results for transcripts. +- `/deseq2/deseq2.results.txt` - a `.txt` file that contains differential expression results for genes. +- `/dexseq/dexseq.results.txt` - a `.txt` file that contains differential expression results for transcripts.
-*Documentation*: +_Documentation_: [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html), [DEXSeq](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html) -*Description*: +_Description_: If multiple conditions and multiple replicates are available then the pipeline is able to run differential analysis on gene and transcripts with DESeq2 and DEXSeq, respectively. These steps won't be run if you provide the `--skip_quantification` or `--skip_differential_analysis` parameters or if all of the samples in the samplesheet don't have the same fasta and GTF reference files. ## RNA modification analysis @@ -231,15 +231,15 @@ If multiple conditions and multiple replicates are available then the pipeline i
Output files -* `rna_modifications/xpore/diffmod/diffmod_outputs/diffmod.table` - a `.csv` file that contains differentially modified sites. -* `rna_modifications/m6anet/inference//data.result.csv.gz` - a `.csv` file that contains detected m6A sites. +- `rna_modifications/xpore/diffmod/diffmod_outputs/diffmod.table` - a `.csv` file that contains differentially modified sites. +- `rna_modifications/m6anet/inference//data.result.csv.gz` - a `.csv` file that contains detected m6A sites.
-*Documentation*: +_Documentation_: [xPore](https://xpore.readthedocs.io/en/latest/), [m6anet](https://m6anet.readthedocs.io/en/latest/) -*Description*: +_Description_: If multiple conditions are available then the pipeline is able to run differential modification analysis with xPore. These steps won't be run if you provide the `--skip_modification_analysis` or `--skip_xpore` or `--skip_m6anet` parameters. ## RNA fusion analysis @@ -247,15 +247,15 @@ If multiple conditions are available then the pipeline is able to run differenti
Output files -* `jaffal/jaffa_results.csv` - a `.csv` file that contains detected RNA fusion results. -* `jaffal/jaffa_results.fasta` - a `.fasta` file that contains the sequence of the detected RNA fusions. +- `jaffal/jaffa_results.csv` - a `.csv` file that contains detected RNA fusion results. +- `jaffal/jaffa_results.fasta` - a `.fasta` file that contains the sequence of the detected RNA fusions.
-*Documentation*: +_Documentation_: [jaffal](https://github.com/Oshlack/JAFFA/wiki) -*Description*: +_Description_: This step won't be run if you provide the `--skip_fusion_analysis` parameter. ## MultiQC @@ -270,15 +270,15 @@ This step won't be run if you provide the `--skip_fusion_analysis` parameter. -*Documentation*: +_Documentation_: [MultiQC](https://multiqc.info/docs/) -*Description*: -*MultiQC* is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available within the report data directory. +_Description_: +_MultiQC_ is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available within the report data directory. -Results generated by *MultiQC* for this pipeline collate QC from *FastQC*, *samtools flagstat*, *samtools idxstats* and *samtools stats*. +Results generated by _MultiQC_ for this pipeline collate QC from _FastQC_, _samtools flagstat_, _samtools idxstats_ and _samtools stats_. -The pipeline has special steps which also allow the software versions to be reported in the *MultiQC* output for future traceability. For more information about how to use *MultiQC* reports, see . +The pipeline has special steps which also allow the software versions to be reported in the _MultiQC_ output for future traceability. For more information about how to use _MultiQC_ reports, see . ## Pipeline information @@ -292,8 +292,8 @@ The pipeline has special steps which also allow the software versions to be repo -*Documentation*: +_Documentation_: [Nextflow](https://www.nextflow.io/docs/latest/tracing.html) -*Description*: -*Nextflow* provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to trouble-shoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. +_Description_: +_Nextflow_ provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to trouble-shoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. diff --git a/docs/usage.md b/docs/usage.md index 732277c7..ab3596c1 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -8,25 +8,25 @@ You will need to create a file with information about the samples in your experiment/run before executing the pipeline. Use the `--input` parameter to specify its location. It has to be a comma-separated file with 6 columns and a header row: -| Column | Description | -|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `group` | Group identifier for sample. This will be identical for replicate samples from the same experimental group. | -| `replicate` | Integer representing replicate number. Must start from `1..`. | -| `barcode` | Barcode identifier attributed to that sample during multiplexing. Must be an integer. | -| `input_file` | Full path to FastQ file if previously demultiplexed, BAM file if previously aligned, or a path to a directory with subdirectories containing fastq or fast5 files. FastQ file has to be zipped and have the extension ".fastq.gz" or ".fq.gz". BAM file has to have the extension ".bam" | -| `genome` | Genome fasta file for alignment. This can either be blank, a local path, or the appropriate key for a genome available in [iGenomes config file](../conf/igenomes.config). Must have the extension ".fasta", ".fasta.gz", ".fa" or ".fa.gz". | -| `transcriptome` | Transcriptome fasta/gtf file for alignment. This can either be blank or a local path. Must have the extension ".fasta", ".fasta.gz", ".fa", ".fa.gz", ".gtf" or ".gtf.gz". | +| Column | Description | +| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `group` | Group identifier for sample. This will be identical for replicate samples from the same experimental group. | +| `replicate` | Integer representing replicate number. Must start from `1..`. | +| `barcode` | Barcode identifier attributed to that sample during multiplexing. Must be an integer. | +| `input_file` | Full path to FastQ file if previously demultiplexed, BAM file if previously aligned, or a path to a directory with subdirectories containing fastq or fast5 files. FastQ file has to be zipped and have the extension ".fastq.gz" or ".fq.gz". BAM file has to have the extension ".bam" | +| `genome` | Genome fasta file for alignment. This can either be blank, a local path, or the appropriate key for a genome available in [iGenomes config file](../conf/igenomes.config). Must have the extension ".fasta", ".fasta.gz", ".fa" or ".fa.gz". | +| `transcriptome` | Transcriptome fasta/gtf file for alignment. This can either be blank or a local path. Must have the extension ".fasta", ".fasta.gz", ".fa", ".fa.gz", ".gtf" or ".gtf.gz". | ### Specifying a reference genome/transcriptome Each sample in the sample sheet can be mapped to its own reference genome or transcriptome. Please see below for additional details required to fill in the `genome` and `transcriptome` columns appropriately: -* If both `genome` and `transcriptome` are not specified then the mapping will be skipped for that sample. -* If both `genome` and `transcriptome` are specified as local fasta files then the transcriptome will be preferentially used for mapping. -* If `genome` is specified as a local fasta file and `transcriptome` is left blank then mapping will be performed relative to the genome. -* If `genome` isnt specified and `transcriptome` is provided as a fasta file then mapping will be performed relative to the transcriptome. -* If `genome` is specified as an AWS iGenomes key then the `transcriptome` column can be blank. The associated gtf file for the `transcriptome` will be automatically obtained in order to create a transcriptome fasta file. However, the reads will only be mapped to the transcriptome if `--protocol cDNA` or `--protocol directRNA`. If `--protocol DNA` then the reads will still be mapped to the genome essentially ignoring the gtf file. -* If `genome` is specified as a local fasta file and `transcriptome` is a specified as a local gtf file then both of these will be used to create a transcriptome fasta file. However, the reads will only be mapped to the transcriptome if `--protocol cDNA` or `--protocol directRNA`. If `--protocol DNA` then the reads will still be mapped to the genome essentially ignoring the gtf file. +- If both `genome` and `transcriptome` are not specified then the mapping will be skipped for that sample. +- If both `genome` and `transcriptome` are specified as local fasta files then the transcriptome will be preferentially used for mapping. +- If `genome` is specified as a local fasta file and `transcriptome` is left blank then mapping will be performed relative to the genome. +- If `genome` isnt specified and `transcriptome` is provided as a fasta file then mapping will be performed relative to the transcriptome. +- If `genome` is specified as an AWS iGenomes key then the `transcriptome` column can be blank. The associated gtf file for the `transcriptome` will be automatically obtained in order to create a transcriptome fasta file. However, the reads will only be mapped to the transcriptome if `--protocol cDNA` or `--protocol directRNA`. If `--protocol DNA` then the reads will still be mapped to the genome essentially ignoring the gtf file. +- If `genome` is specified as a local fasta file and `transcriptome` is a specified as a local gtf file then both of these will be used to create a transcriptome fasta file. However, the reads will only be mapped to the transcriptome if `--protocol cDNA` or `--protocol directRNA`. If `--protocol DNA` then the reads will still be mapped to the genome essentially ignoring the gtf file. ### Skip basecalling/demultiplexing diff --git a/modules.json b/modules.json index d3292b1c..c57e9eac 100644 --- a/modules.json +++ b/modules.json @@ -38,4 +38,4 @@ } } } -} \ No newline at end of file +} diff --git a/nextflow_schema.json b/nextflow_schema.json index a98bd285..e2d1fe18 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -10,10 +10,7 @@ "type": "object", "fa_icon": "fas fa-terminal", "description": "Define where the pipeline should find input data and save output data.", - "required": [ - "input", - "protocol" - ], + "required": ["input", "protocol"], "properties": { "input": { "type": "string",