Skip to content

Commit

Permalink
Merge branch 'variant_calling' into nf-core-template-merge-2.4
Browse files Browse the repository at this point in the history
  • Loading branch information
yuukiiwa authored May 18, 2022
2 parents 896c8d3 + 4a63781 commit 4eb31af
Show file tree
Hide file tree
Showing 123 changed files with 7,636 additions and 578 deletions.
66 changes: 63 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,68 @@ jobs:
sudo mv nextflow /usr/local/bin/
- name: Run pipeline with test data
# TODO nf-core: You can customise CI pipeline run tests as required
# For example: adding multiple test runs with different parameters
# Remember that you can parallelise this by using strategy.matrix
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results
profile:
name: Run profile tests
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/nanoseq') }}"
runs-on: ubuntu-latest
env:
NXF_VER: "21.10.3"
NXF_ANSI_LOG: false
strategy:
matrix:
profiles:
- "test_bc_nodx"
- "test_nobc_dx"
- "test_nobc_nodx_vc"
- "test_nobc_nodx_stringtie"
- "test_nobc_nodx_noaln"
- "test_nobc_nodx_rnamod"
steps:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Install Nextflow
env:
CAPSULE_LOG: none
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Run pipeline with different profiles
run: |
nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.profiles }},docker --outdir ./results
parameters:
name: Run parameter tests
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/nanoseq') }}"
runs-on: ubuntu-latest
env:
NXF_VER: "21.10.3"
NXF_ANSI_LOG: false
strategy:
matrix:
parameters:
- "--aligner graphmap2"
- "--skip_alignment"
- "--skip_qc"
- "--skip_quantification"
steps:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Install Nextflow
env:
CAPSULE_LOG: none
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Run pipeline with different parameters
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker ${{ matrix.parameters }}
#
152 changes: 146 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,154 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v2.0.1 - [date]
## [3.0.0] - 2022-05-10

Initial release of nf-core/nanoseq, created with the [nf-core](https://nf-co.re/) template.
### Major enhancements

### `Added`
- Add DNA variant calling functionality
- Add RNA modification and fusion detection functionalities
- Add `demux_fast5` module to output demultiplexed fast5 files when `--output_demultiplex_fast5` is set
- Add `--trim_barcodes` in Guppy basecaller to trim the barcodes from output fastq
- Port pipeline to the updated Nextflow DSL2 syntax adopted on nf-core/modules
- Removed `--publish_dir_mode` as it is no longer required for the new syntax
- Bump minimum Nextflow version from 21.04.0 -> 21.10.3
- Update pipeline template to nf-core/tools `2.2`
- Update `bambu` version from `1.0.2` to `2.0.0`
- Update `multiqc` version from `1.10.1` to `1.11`

### `Fixed`
### Parameters

### `Dependencies`
- Added `--output_demultiplex_fast5` to output demultiplexed fast5
- Added `--trim_barcodes` in Guppy basecaller to trim the barcodes from output fastq
- Added `--call_variants` to detect DNA variants
- Added `--split_mnps` to split multi-nucleotide polymorphisms into single nucleotide polymorphisms when using medaka
- Added `--phase_vcf` to output a phased vcf when using medaka
- Added `--skip_vc` to skip `variant_calling`
- Added `--skip_sv` to skip `structural_variant_calling`
- Added `--variant_caller` to specify variant caller.
- Added `--structural_variant_caller` to specify structural variant caller
- Added `--skip_modification_analysis` to skip RNA modification detection
- Added `--skip_xpore` to skip `xpore`
- Added `--skip_m6anet` to skip `m6anet`
- Added `--skip_fusion_analysis` to skip RNA fusion detection
- Added `--jaffal_ref_dir` to indicate the reference directory path required by `JAFFAL`

### `Deprecated`
### Software dependencies

| Dependency | Old version | New version |
| --------------------------- | ----------- | ----------- |
| `bioconductor-bambu` | 1.0.2 | 2.0.0 |
| `bioconductor-bsgenome` | 1.58.0 | 1.62.0 |
| `cutesv` | | 1.0.12 |
| `deepvariant` | | 1.0.3 |
| `jaffa` | | 2.0 |
| `m6anet` | | 1.0 |
| `medaka` | | 1.4.4 |
| `multiqc` | 1.10.1 | 1.11 |
| `ont_fast5_api` | | 4.0.0 |
| `pepper_margin_deepvariant` | | 0.8 |
| `sniffles` | | 1.0.12 |
| `xpore` | | 2.1 |

### Bug fix

- The `GET_TEST_DATA` process now uses checks for any file in the path.

> **NB:** Dependency has been **updated** if both old and new version information is present.
> **NB:** Dependency has been **added** if just the new version information is present.
> **NB:** Dependency has been **removed** if version information isn't present.
## [2.0.1] - 2021-11-29

### Bug fix

- The `UCSC_BEDGRAPHTOBIGWIG` process now uses the `ucsc-bedgraphtobigwig` container
- The full-size and minimal AWS tests have successfully finished after changing to the `ucsc-bedgraphtobigwig` container

## [2.0.0] - 2021-11-26

### Major enhancements

- Pipeline has been re-implemented in [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html)
- Software containers are now obtained from [Biocontainers](https://biocontainers.pro/#/registry)
- Update pipeline template to nf-core/tools `2.1`
- [#77](https://github.com/nf-core/nanoseq/issues/77) - Skipped alignment steps
- [#97](https://github.com/nf-core/nanoseq/issues/97) - Add optional DNA cleaning option

### Parameters

- Added `--run_nanolyse` to run NanoLyse for DNA cleaning of FastQ files
- Added `--nanolyse_fasta` to provide a fasta file for nanolyse to filter against

### Software dependencies

| Dependency | Old version | New version |
| -------------------- | ----------- | ----------- |
| `bioconductor-bambu` | 1.0.0 | 1.0.2 |
| `nanolyse` | | 1.2.0 |
| `r-base` | 4.0.3 | 4.0.2 |

> **NB:** Dependency has been **updated** if both old and new version information is present.
> **NB:** Dependency has been **added** if just the new version information is present.
> **NB:** Dependency has been **removed** if version information isn't present.
## [1.1.0] - 2020-11-06

### Major enhancements

- Transcript reconstruction and quantification ([`bambu`](https://bioconductor.org/packages/release/bioc/html/bambu.html) or [`StringTie2`](https://ccb.jhu.edu/software/stringtie/) and [`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/))
- Differential expression analysis at the gene-level ([`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)) and transcript-level ([`DEXSeq`](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html))
- Ability to provide BAM input to the pipeline
- Change samplesheet format to be more flexible to BAM input files
- Add pycoQC and featureCounts output to MultiQC report
- Add AWS full-sized test data
- Add parameter JSON schema for pipeline
- Add citations file
- Update pipeline template to nf-core/tools `1.11`
- Collapsible sections for output files in `docs/output.md`
- Replace `set` with `tuple` and `file` with `path` in `input` section of all processes
- Capitalise process names
- Added `--gpus all` to Docker `runOptions` when using GPU as mentioned [here](https://github.com/docker/compose/issues/6691#issuecomment-514429646)
- Cannot invoke method `containsKey()` on null object when `--igenomes_ignore` is set [#76](https://github.com/nf-core/nanoseq/issues/76)

### Parameters

- Added `--barcode_both_ends` requires barcode on both ends for Guppy basecaller
- Added `--quantification_method` to specify the transcript quantification method to use
- Added `--skip_quantification` to skip transcript quantification and differential analysis
- Added `--skip_differential_analysis` to skip differential analysis with DESeq2 and DEXSeq
- Added `--publish_dir_mode` to customise method of publishing results to output directory [nf-core/tools#585](https://github.com/nf-core/tools/issues/585)

### Software dependencies

| Dependency | Old version | New version |
| ----------------------- | ----------- | ----------- |
| `Guppy` | 3.4.4 | 4.0.14 |
| `markdown` | 3.1.1 | 3.3.3 |
| `multiqc` | 1.8 | 1.9 |
| `nanoplot` | 1.28.4 | 1.32.1 |
| `pygments` | 2.5.2 | 2.7.2 |
| `pymdown-extensions` | 6.0 | 8.0.1 |
| `python` | 3.7.3 | 3.8.6 |
| `samtools` | 1.9 | 1.11 |
| `ucsc-bedgraphtobigwig` | 357 | 377 |
| `ucsc-bedtobigbed` | 357 | 377 |
| `bioconductor-bambu` | - | 1.0.0 |
| `bioconductor-bsgenome` | - | 1.58.0 |
| `bioconductor-deseq2` | - | 1.30.0 |
| `bioconductor-dexseq` | - | 1.36.0 |
| `bioconductor-drimseq` | - | 1.18.0 |
| `bioconductor-stager` | - | 1.12.0 |
| `r-base` | - | 4.0.3 |
| `seaborn` | - | 0.10.1 |
| `stringtie` | - | 2.1.4 |
| `subread` | - | 2.0.1 |
| `psutil` | - | - |

> **NB:** Dependency has been **updated** if both old and new version information is present.
> **NB:** Dependency has been **added** if just the new version information is present.
> **NB:** Dependency has been **removed** if version information isn't present.
## [1.0.0] - 2020-03-05

Initial release of nf-core/nanoseq, created with the [nf-core](http://nf-co.re/) template.
115 changes: 112 additions & 3 deletions CITATIONS.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,129 @@
# nf-core/nanoseq: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)
## [nf-core](https://www.ncbi.nlm.nih.gov/pubmed/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.
## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)
## [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
## Pipeline tools

- [BEDTools](https://www.ncbi.nlm.nih.gov/pubmed/20110278/)

* [cuteSV](https://pubmed.ncbi.nlm.nih.gov/32746918/)

> Jiang T, Liu Y, Jiang Y, Li J, Gao Y, Cui Z, Liu Y, Liu B, Wang Y. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 2020 Aug 3;21(1):189. doi: 10.1186/s13059-020-02107-y. PMID: 32746918; PMCID: PMC7477834.
* [DeepVariant](https://pubmed.ncbi.nlm.nih.gov/30247488/)

> Poplin R, Chang PC, Alexander D, Schwartz S, Colthurst T, Ku A, Newburger D, Dijamco J, Nguyen N, Afshar PT, Gross SS, Dorfman L, McLean CY, DePristo MA. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018 Nov;36(10):983-987. doi: 10.1038/nbt.4235. Epub 2018 Sep 24. PMID: 30247488.
* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

> Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824.
- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
- [featureCounts](https://www.ncbi.nlm.nih.gov/pubmed/24227677/)

> Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014 Apr 1;30(7):923-30. doi: 10.1093/bioinformatics/btt656. Epub 2013 Nov 13. PubMed PMID: 24227677.
- [GraphMap](https://pubmed.ncbi.nlm.nih.gov/27079541/)

> Sović I, Šikić M, Wilm A, Fenlon SN, Chen S, Nagarajan N. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat Commun. 2016 Apr 15;7:11307. doi: 10.1038/ncomms11307. PMID: 27079541; PMCID: PMC4835549.
- [Guppy](https://nanoporetech.com/nanopore-sequencing-data-analysis)

- [JAFFAL](https://doi.org/10.1186/s13059-021-02588-5)

> Davidson NM, et al., JAFFAL: detecting fusion genes with long-read transcriptome sequencing. Genome Biology (2022)
- [m6anet](https://www.biorxiv.org/content/10.1101/2021.09.20.461055v1)

> Hendra C, et al., Detection of m6A from direct RNA sequencing using a Multiple Instance Learning framework. bioRXiv (2021)
* [PEPPER-Margin-DeepVariant](https://pubmed.ncbi.nlm.nih.gov/34725481/)

> Shafin K, Pesout T, Chang PC, Nattestad M, Kolesnikov A, Goel S, Baid G, Kolmogorov M, Eizenga JM, Miga KH, Carnevali P, Jain M, Carroll A, Paten B. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat Methods. 2021 Nov;18(11):1322-1332. doi: 10.1038/s41592-021-01299-w. Epub 2021 Nov 1. PMID: 34725481; PMCID: PMC8571015.
* [pycoQC](https://doi.org/10.21105/joss.01236)

> Leger A, Leonardi T, (2019). pycoQC, interactive quality control for Oxford Nanopore Sequencing. Journal of Open Source Software, 4(34), 1236.
- [Minimap2](https://pubmed.ncbi.nlm.nih.gov/29750242/)

> Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191. PMID: 29750242; PMCID: PMC6137996.
- [Medaka](https://github.com/nanoporetech/medaka)

- [MultiQC](https://www.ncbi.nlm.nih.gov/pubmed/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
- [NanoLyse](https://pubmed.ncbi.nlm.nih.gov/29547981/)

> De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669. PubMed PMID: 29547981; PubMed Central PMCID: PMC6061794.
- [NanoPlot](https://pubmed.ncbi.nlm.nih.gov/29547981/)

> De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018 Aug 1;34(15):2666-2669. doi: 10.1093/bioinformatics/bty149. PubMed PMID: 29547981; PubMed Central PMCID: PMC6061794.
- [pycoQC](https://doi.org/10.21105/joss.01236)

> Leger A, Leonardi T, (2019). pycoQC, interactive quality control for Oxford Nanopore Sequencing. Journal of Open Source Software, 4(34), 1236.
- [qcat](https://github.com/nanoporetech/qcat)

- [SAMtools](https://www.ncbi.nlm.nih.gov/pubmed/19505943/)

> Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002.
- [Sniffles](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5990442/)

> Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A, Schatz MC. Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods. 2018 Jun;15(6):461-468. doi: 10 1038/s41592-018-0001-7. Epub 2018 Apr 30. PMID: 29713083; PMCID: PMC5990442.
- [StringTie2](https://www.ncbi.nlm.nih.gov/pubmed/31842956/)

> Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2 Genome Biol. 2019 Dec 16;20(1):278. doi: 10.1186/s13059-019-1910-1. PubMed PMID: 31842956; PubMed Central PMCID: PMC6912988.
- [UCSC tools](https://www.ncbi.nlm.nih.gov/pubmed/20639541/)

> Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010 Sep 1;26(17):2204-7. doi: 10.1093/bioinformatics/btq351. Epub 2010 Jul 17. PubMed PMID: 20639541; PubMed Central PMCID: PMC2922891.
- [xPore](https://doi.org/10.1038/s41587-021-00949-w)
> Pratanwanich PN, et al.,Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore. Nat Biotechnol (2021)
## R packages

- [R](https://www.R-project.org/)

> R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
- [bambu](https://bioconductor.org/packages/release/bioc/html/bambu.html)

> Chen Y, Goeke J, Wan YK (2020). bambu: Reference-guided isoform reconstruction and quantification for long read RNA-Seq data. R package version 1.0.0.
- [BSgenome](https://bioconductor.org/packages/release/bioc/html/BSgenome.html)

> Pagès H (2020). BSgenome: Software infrastructure for efficient representation of full genomes and their SNPs. doi: 10.18129/B9.bioc.BSgenome.
- [DESeq2](https://www.ncbi.nlm.nih.gov/pubmed/25516281/)

> Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. PubMed PMID: 25516281; PubMed Central PMCID: PMC4302049.
- [DEXSeq](https://pubmed.ncbi.nlm.nih.gov/22722343/)

> Anders S, Reyes A, Huber W. Detecting differential usage of exons from RNA-seq data. Genome Res. 2012 Oct;22(10):2008-17. doi: 10.1101/gr.133744.111. Epub 2012 Jun 21. PubMed PMID: 22722343; PubMed Central PMCID: PMC3460195.
- [DRIMSeq](https://pubmed.ncbi.nlm.nih.gov/28105305/)

> Nowicka M, Robinson MD. DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Res. 2016 Jun 13;5:1356. doi: 10.12688/f1000research.8900.2. PubMed PMID: 28105305; PubMed Central PMCID: PMC5200948.
- [stageR](https://pubmed.ncbi.nlm.nih.gov/28784146/)
> Van den Berge K, Soneson C, Robinson MD, Clement L. stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage. Genome Biol. 2017 Aug 7;18(1):151. doi: 10.1186/s13059-017-1277-0. PubMed PMID: 28784146; PubMed Central PMCID: PMC5547545.
## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
Loading

0 comments on commit 4eb31af

Please sign in to comment.