Merge branch 'variant_calling' into nf-core-template-merge-2.4

qbic-projects · May 18, 2022 · 4eb31af · 4eb31af
2 parents 896c8d3 + 4a63781
commit 4eb31af
Show file tree

Hide file tree

Showing 123 changed files with 7,636 additions and 578 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -43,8 +43,68 @@ jobs:
           sudo mv nextflow /usr/local/bin/
 
       - name: Run pipeline with test data
-        # TODO nf-core: You can customise CI pipeline run tests as required
-        # For example: adding multiple test runs with different parameters
-        # Remember that you can parallelise this by using strategy.matrix
         run: |
+
           nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results
+
+  profile:
+    name: Run profile tests
+    if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/nanoseq') }}"
+    runs-on: ubuntu-latest
+    env:
+      NXF_VER: "21.10.3"
+      NXF_ANSI_LOG: false
+    strategy:
+      matrix:
+        profiles:
+          - "test_bc_nodx"
+          - "test_nobc_dx"
+          - "test_nobc_nodx_vc"
+          - "test_nobc_nodx_stringtie"
+          - "test_nobc_nodx_noaln"
+          - "test_nobc_nodx_rnamod"
+    steps:
+      - name: Check out pipeline code
+        uses: actions/checkout@v2
+
+      - name: Install Nextflow
+        env:
+          CAPSULE_LOG: none
+        run: |
+          wget -qO- get.nextflow.io | bash
+          sudo mv nextflow /usr/local/bin/
+
+      - name: Run pipeline with different profiles
+        run: |
+          nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.profiles }},docker --outdir ./results
+
+  parameters:
+    name: Run parameter tests
+    if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/nanoseq') }}"
+    runs-on: ubuntu-latest
+    env:
+      NXF_VER: "21.10.3"
+      NXF_ANSI_LOG: false
+    strategy:
+      matrix:
+        parameters:
+          - "--aligner graphmap2"
+          - "--skip_alignment"
+          - "--skip_qc"
+          - "--skip_quantification"
+    steps:
+      - name: Check out pipeline code
+        uses: actions/checkout@v2
+
+      - name: Install Nextflow
+        env:
+          CAPSULE_LOG: none
+        run: |
+          wget -qO- get.nextflow.io | bash
+          sudo mv nextflow /usr/local/bin/
+
+      - name: Run pipeline with different parameters
+        run: |
+          nextflow run ${GITHUB_WORKSPACE} -profile test,docker ${{ matrix.parameters }}
+
+#
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,14 +3,154 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## v2.0.1 - [date]
+## [3.0.0] - 2022-05-10
 
-Initial release of nf-core/nanoseq, created with the [nf-core](https://nf-co.re/) template.
+### Major enhancements
 
-### `Added`
+- Add DNA variant calling functionality
+- Add RNA modification and fusion detection functionalities
+- Add `demux_fast5` module to output demultiplexed fast5 files when `--output_demultiplex_fast5` is set
+- Add `--trim_barcodes` in Guppy basecaller to trim the barcodes from output fastq
+- Port pipeline to the updated Nextflow DSL2 syntax adopted on nf-core/modules
+  - Removed `--publish_dir_mode` as it is no longer required for the new syntax
+- Bump minimum Nextflow version from 21.04.0 -> 21.10.3
+- Update pipeline template to nf-core/tools `2.2`
+- Update `bambu` version from `1.0.2` to `2.0.0`
+- Update `multiqc` version from `1.10.1` to `1.11`
 
-### `Fixed`
+### Parameters
 
-### `Dependencies`
+- Added `--output_demultiplex_fast5` to output demultiplexed fast5
+- Added `--trim_barcodes` in Guppy basecaller to trim the barcodes from output fastq
+- Added `--call_variants` to detect DNA variants
+- Added `--split_mnps` to split multi-nucleotide polymorphisms into single nucleotide polymorphisms when using medaka
+- Added `--phase_vcf` to output a phased vcf when using medaka
+- Added `--skip_vc` to skip `variant_calling`
+- Added `--skip_sv` to skip `structural_variant_calling`
+- Added `--variant_caller` to specify variant caller.
+- Added `--structural_variant_caller` to specify structural variant caller
+- Added `--skip_modification_analysis` to skip RNA modification detection
+- Added `--skip_xpore` to skip `xpore`
+- Added `--skip_m6anet` to skip `m6anet`
+- Added `--skip_fusion_analysis` to skip RNA fusion detection
+- Added `--jaffal_ref_dir` to indicate the reference directory path required by `JAFFAL`
 
-### `Deprecated`
+### Software dependencies
+
+| Dependency                  | Old version | New version |
+| --------------------------- | ----------- | ----------- |
+| `bioconductor-bambu`        | 1.0.2       | 2.0.0       |
+| `bioconductor-bsgenome`     | 1.58.0      | 1.62.0      |
+| `cutesv`                    |             | 1.0.12      |
+| `deepvariant`               |             | 1.0.3       |
+| `jaffa`                     |             | 2.0         |
+| `m6anet`                    |             | 1.0         |
+| `medaka`                    |             | 1.4.4       |
+| `multiqc`                   | 1.10.1      | 1.11        |
+| `ont_fast5_api`             |             | 4.0.0       |
+| `pepper_margin_deepvariant` |             | 0.8         |
+| `sniffles`                  |             | 1.0.12      |
+| `xpore`                     |             | 2.1         |
+
+### Bug fix
+
+- The `GET_TEST_DATA` process now uses checks for any file in the path.
+
+> **NB:** Dependency has been **updated** if both old and new version information is present.
+> **NB:** Dependency has been **added** if just the new version information is present.
+> **NB:** Dependency has been **removed** if version information isn't present.
+
+## [2.0.1] - 2021-11-29
+
+### Bug fix
+
+- The `UCSC_BEDGRAPHTOBIGWIG` process now uses the `ucsc-bedgraphtobigwig` container
+- The full-size and minimal AWS tests have successfully finished after changing to the `ucsc-bedgraphtobigwig` container
+
+## [2.0.0] - 2021-11-26
+
+### Major enhancements
+
+- Pipeline has been re-implemented in [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html)
+- Software containers are now obtained from [Biocontainers](https://biocontainers.pro/#/registry)
+- Update pipeline template to nf-core/tools `2.1`
+- [#77](https://github.com/nf-core/nanoseq/issues/77) - Skipped alignment steps
+- [#97](https://github.com/nf-core/nanoseq/issues/97) - Add optional DNA cleaning option
+
+### Parameters
+
+- Added `--run_nanolyse` to run NanoLyse for DNA cleaning of FastQ files
+- Added `--nanolyse_fasta` to provide a fasta file for nanolyse to filter against
+
+### Software dependencies
+
+| Dependency           | Old version | New version |
+| -------------------- | ----------- | ----------- |
+| `bioconductor-bambu` | 1.0.0       | 1.0.2       |
+| `nanolyse`           |             | 1.2.0       |
+| `r-base`             | 4.0.3       | 4.0.2       |
+
+> **NB:** Dependency has been **updated** if both old and new version information is present.
+> **NB:** Dependency has been **added** if just the new version information is present.
+> **NB:** Dependency has been **removed** if version information isn't present.
+
+## [1.1.0] - 2020-11-06
+
+### Major enhancements
+
+- Transcript reconstruction and quantification ([`bambu`](https://bioconductor.org/packages/release/bioc/html/bambu.html) or [`StringTie2`](https://ccb.jhu.edu/software/stringtie/) and [`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/))
+- Differential expression analysis at the gene-level ([`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)) and transcript-level ([`DEXSeq`](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html))
+- Ability to provide BAM input to the pipeline
+- Change samplesheet format to be more flexible to BAM input files
+- Add pycoQC and featureCounts output to MultiQC report
+- Add AWS full-sized test data
+- Add parameter JSON schema for pipeline
+- Add citations file
+- Update pipeline template to nf-core/tools `1.11`
+- Collapsible sections for output files in `docs/output.md`
+- Replace `set` with `tuple` and `file` with `path` in `input` section of all processes
+- Capitalise process names
+- Added `--gpus all` to Docker `runOptions` when using GPU as mentioned [here](https://github.com/docker/compose/issues/6691#issuecomment-514429646)
+- Cannot invoke method `containsKey()` on null object when `--igenomes_ignore` is set [#76](https://github.com/nf-core/nanoseq/issues/76)
+
+### Parameters
+
+- Added `--barcode_both_ends` requires barcode on both ends for Guppy basecaller
+- Added `--quantification_method` to specify the transcript quantification method to use
+- Added `--skip_quantification` to skip transcript quantification and differential analysis
+- Added `--skip_differential_analysis` to skip differential analysis with DESeq2 and DEXSeq
+- Added `--publish_dir_mode` to customise method of publishing results to output directory [nf-core/tools#585](https://github.com/nf-core/tools/issues/585)
+
+### Software dependencies
+
+| Dependency              | Old version | New version |
+| ----------------------- | ----------- | ----------- |
+| `Guppy`                 | 3.4.4       | 4.0.14      |
+| `markdown`              | 3.1.1       | 3.3.3       |
+| `multiqc`               | 1.8         | 1.9         |
+| `nanoplot`              | 1.28.4      | 1.32.1      |
+| `pygments`              | 2.5.2       | 2.7.2       |
+| `pymdown-extensions`    | 6.0         | 8.0.1       |
+| `python`                | 3.7.3       | 3.8.6       |
+| `samtools`              | 1.9         | 1.11        |
+| `ucsc-bedgraphtobigwig` | 357         | 377         |
+| `ucsc-bedtobigbed`      | 357         | 377         |
+| `bioconductor-bambu`    | -           | 1.0.0       |
+| `bioconductor-bsgenome` | -           | 1.58.0      |
+| `bioconductor-deseq2`   | -           | 1.30.0      |
+| `bioconductor-dexseq`   | -           | 1.36.0      |
+| `bioconductor-drimseq`  | -           | 1.18.0      |
+| `bioconductor-stager`   | -           | 1.12.0      |
+| `r-base`                | -           | 4.0.3       |
+| `seaborn`               | -           | 0.10.1      |
+| `stringtie`             | -           | 2.1.4       |
+| `subread`               | -           | 2.0.1       |
+| `psutil`                | -           | -           |
+
+> **NB:** Dependency has been **updated** if both old and new version information is present.
+> **NB:** Dependency has been **added** if just the new version information is present.
+> **NB:** Dependency has been **removed** if version information isn't present.
+
+## [1.0.0] - 2020-03-05
+
+Initial release of nf-core/nanoseq, created with the [nf-core](http://nf-co.re/) template.
diff --git a/CITATIONS.md b/CITATIONS.md
@@ -1,20 +1,129 @@
 # nf-core/nanoseq: Citations
 
-## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)
+## [nf-core](https://www.ncbi.nlm.nih.gov/pubmed/32055031/)
 
 > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.
 
-## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)
+## [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/)
 
 > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
 
 ## Pipeline tools
 
+- [BEDTools](https://www.ncbi.nlm.nih.gov/pubmed/20110278/)
+
+* [cuteSV](https://pubmed.ncbi.nlm.nih.gov/32746918/)
+
+  > Jiang T, Liu Y, Jiang Y, Li J, Gao Y, Cui Z, Liu Y, Liu B, Wang Y. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 2020 Aug 3;21(1):189. doi: 10.1186/s13059-020-02107-y. PMID: 32746918; PMCID: PMC7477834.
+
+* [DeepVariant](https://pubmed.ncbi.nlm.nih.gov/30247488/)
+
+  > Poplin R, Chang PC, Alexander D, Schwartz S, Colthurst T, Ku A, Newburger D, Dijamco J, Nguyen N, Afshar PT, Gross SS, Dorfman L, McLean CY, DePristo MA. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018 Nov;36(10):983-987. doi: 10.1038/nbt.4235. Epub 2018 Sep 24. PMID: 30247488.
+
+* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
+
+  > Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824.
+
 - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
 
-- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
+- [featureCounts](https://www.ncbi.nlm.nih.gov/pubmed/24227677/)
+
+  > Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014 Apr 1;30(7):923-30. doi: 10.1093/bioinformatics/btt656. Epub 2013 Nov 13. PubMed PMID: 24227677.
+
+- [GraphMap](https://pubmed.ncbi.nlm.nih.gov/27079541/)
+
+  > Sović I, Šikić M, Wilm A, Fenlon SN, Chen S, Nagarajan N. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat Commun. 2016 Apr 15;7:11307. doi: 10.1038/ncomms11307. PMID: 27079541; PMCID: PMC4835549.
+
+- [Guppy](https://nanoporetech.com/nanopore-sequencing-data-analysis)
+
+- [JAFFAL](https://doi.org/10.1186/s13059-021-02588-5)
+
+  > Davidson NM, et al., JAFFAL: detecting fusion genes with long-read transcriptome sequencing. Genome Biology (2022)
+
+- [m6anet](https://www.biorxiv.org/content/10.1101/2021.09.20.461055v1)
+
+  > Hendra C, et al., Detection of m6A from direct RNA sequencing using a Multiple Instance Learning framework. bioRXiv (2021)
+
+* [PEPPER-Margin-DeepVariant](https://pubmed.ncbi.nlm.nih.gov/34725481/)
+
+  > Shafin K, Pesout T, Chang PC, Nattestad M, Kolesnikov A, Goel S, Baid G, Kolmogorov M, Eizenga JM, Miga KH, Carnevali P, Jain M, Carroll A, Paten B. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat Methods. 2021 Nov;18(11):1322-1332. doi: 10.1038/s41592-021-01299-w. Epub 2021 Nov 1. PMID: 34725481; PMCID: PMC8571015.
+
+* [pycoQC](https://doi.org/10.21105/joss.01236)
+
+  > Leger A, Leonardi T, (2019). pycoQC, interactive quality control for Oxford Nanopore Sequencing. Journal of Open Source Software, 4(34), 1236.
+
+- [Minimap2](https://pubmed.ncbi.nlm.nih.gov/29750242/)
+
+  > Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191. PMID: 29750242; PMCID: PMC6137996.
+
+- [Medaka](https://github.com/nanoporetech/medaka)
+
+- [MultiQC](https://www.ncbi.nlm.nih.gov/pubmed/27312411/)
+
   > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
 
+- [NanoLyse](https://pubmed.ncbi.nlm.nih.gov/29547981/)
+
+  > De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669. PubMed PMID: 29547981; PubMed Central PMCID: PMC6061794.
+
+- [NanoPlot](https://pubmed.ncbi.nlm.nih.gov/29547981/)
+
+  > De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018 Aug 1;34(15):2666-2669. doi: 10.1093/bioinformatics/bty149. PubMed PMID: 29547981; PubMed Central PMCID: PMC6061794.
+
+- [pycoQC](https://doi.org/10.21105/joss.01236)
+
+  > Leger A, Leonardi T, (2019). pycoQC, interactive quality control for Oxford Nanopore Sequencing. Journal of Open Source Software, 4(34), 1236.
+
+- [qcat](https://github.com/nanoporetech/qcat)
+
+- [SAMtools](https://www.ncbi.nlm.nih.gov/pubmed/19505943/)
+
+  > Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002.
+
+- [Sniffles](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5990442/)
+
+  > Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A, Schatz MC. Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods. 2018 Jun;15(6):461-468. doi: 10 1038/s41592-018-0001-7. Epub 2018 Apr 30. PMID: 29713083; PMCID: PMC5990442.
+
+- [StringTie2](https://www.ncbi.nlm.nih.gov/pubmed/31842956/)
+
+  > Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2 Genome Biol. 2019 Dec 16;20(1):278. doi: 10.1186/s13059-019-1910-1. PubMed PMID: 31842956; PubMed Central PMCID: PMC6912988.
+
+- [UCSC tools](https://www.ncbi.nlm.nih.gov/pubmed/20639541/)
+
+  > Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010 Sep 1;26(17):2204-7. doi: 10.1093/bioinformatics/btq351. Epub 2010 Jul 17. PubMed PMID: 20639541; PubMed Central PMCID: PMC2922891.
+
+- [xPore](https://doi.org/10.1038/s41587-021-00949-w)
+  > Pratanwanich PN, et al.,Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore. Nat Biotechnol (2021)
+
+## R packages
+
+- [R](https://www.R-project.org/)
+
+  > R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
+
+- [bambu](https://bioconductor.org/packages/release/bioc/html/bambu.html)
+
+  > Chen Y, Goeke J, Wan YK (2020). bambu: Reference-guided isoform reconstruction and quantification for long read RNA-Seq data. R package version 1.0.0.
+
+- [BSgenome](https://bioconductor.org/packages/release/bioc/html/BSgenome.html)
+
+  > Pagès H (2020). BSgenome: Software infrastructure for efficient representation of full genomes and their SNPs. doi: 10.18129/B9.bioc.BSgenome.
+
+- [DESeq2](https://www.ncbi.nlm.nih.gov/pubmed/25516281/)
+
+  > Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. PubMed PMID: 25516281; PubMed Central PMCID: PMC4302049.
+
+- [DEXSeq](https://pubmed.ncbi.nlm.nih.gov/22722343/)
+
+  > Anders S, Reyes A, Huber W. Detecting differential usage of exons from RNA-seq data. Genome Res. 2012 Oct;22(10):2008-17. doi: 10.1101/gr.133744.111. Epub 2012 Jun 21. PubMed PMID: 22722343; PubMed Central PMCID: PMC3460195.
+
+- [DRIMSeq](https://pubmed.ncbi.nlm.nih.gov/28105305/)
+
+  > Nowicka M, Robinson MD. DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Res. 2016 Jun 13;5:1356. doi: 10.12688/f1000research.8900.2. PubMed PMID: 28105305; PubMed Central PMCID: PMC5200948.
+
+- [stageR](https://pubmed.ncbi.nlm.nih.gov/28784146/)
+  > Van den Berge K, Soneson C, Robinson MD, Clement L. stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage. Genome Biol. 2017 Aug 7;18(1):151. doi: 10.1186/s13059-017-1277-0. PubMed PMID: 28784146; PubMed Central PMCID: PMC5547545.
+
 ## Software packaging/containerisation tools
 
 - [Anaconda](https://anaconda.com)