MicrobialGenomics/viralrecon

Forked from nf-core/viralrecon

Pipeline additions

Adapter trimming with Trimmomatic
Codon Frequency with CodFrq
Custom consensus with consensusSequence_v2.py

Introduction

nfcore/viralrecon is a bioinformatics analysis pipeline used to perform assembly and intra-host/low-frequency variant calling for viral samples. The pipeline supports short-read Illumina sequencing data from both shotgun (e.g. sequencing directly from clinical samples) and enrichment-based library preparation methods (e.g. amplicon-based: ARTIC SARS-CoV-2 enrichment protocol; or probe-capture-based).

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with Docker containers making installation trivial and results highly reproducible. Furthermore, automated continuous integration tests that run the pipeline on a full-sized dataset using AWS cloud ensure that the code is stable.

Pipeline summary

Download samples via SRA, ENA or GEO ids (ENA FTP, parallel-fastq-dump; if required)
Merge re-sequenced FastQ files (cat; if required)
Read QC (FastQC)
Adapter trimming (fastp or trimmomatic)
Variant calling
1. Read alignment (Bowtie 2)
2. Sort and index alignments (SAMtools)
3. Primer sequence removal (iVar; amplicon data only)
4. Duplicate read marking (picard; removal optional)
5. Alignment-level QC (picard, SAMtools)
6. Genome-wide and amplicon coverage QC plots (mosdepth)
7. Choice of multiple variant calling and consensus sequence generation routes (VarScan 2, BCFTools, BEDTools || iVar variants and consensus || BCFTools, BEDTools)
  - Variant annotation (SnpEff, SnpSift)
  - Consensus assessment report (QUAST)
8. Intersect variants across callers (BCFTools)
9. Custom consensus (consensusSequence_v2.py)
10. Codon frequency calling (codfrq)
De novo assembly
1. Primer trimming (Cutadapt; amplicon data only)
2. Removal of host reads (Kraken 2)
3. Choice of multiple assembly tools (SPAdes || metaSPAdes || Unicycler || minia)
  - Blast to reference genome (blastn)
  - Contiguate assembly (ABACAS)
  - Assembly report (PlasmidID)
  - Assembly assessment report (QUAST)
  - Call variants relative to reference (Minimap2, seqwish, vg, Bandage)
  - Variant annotation (SnpEff, SnpSift)
Present QC and visualisation for raw read, alignment, assembly and variant calling results (MultiQC)

NB: The pipeline has a number of options to allow you to run only specific aspects of the workflow if you so wish. For example, you can skip all of the assembly steps with the --skip_assembly parameter. See the usage docs for all of the available options when running the pipeline.

Pipeline reporting

Numerous QC and reporting steps are included in the pipeline in order to collate a full summary of the analysis within a single MultiQC report. You can see an example MultiQC report here, generated using the parameters defined in this configuration file. The pipeline was run with these samples, prepared from the ncov-2019 ARTIC Network V1 amplicon set and sequenced on the Illumina MiSeq platform in 301bp paired-end format.

Quick Start

Install nextflow
Install Docker for full pipeline reproducibility (please only use Conda as a last resort; see docs)
Download the pipeline and test it on a minimal dataset with a single command:
```
nextflow run MicrobialGenomics/viralrecon -profile test,<docker/conda>
```

Start running your own analysis!

Typical command for shotgun analysis:

nextflow run  MicrobialGenomics/viralrecon \
    --input samplesheet.csv \
    --genome 'MN908947.3' \
    -profile <docker/conda>

Typical command for amplicon analysis:

nextflow run  MicrobialGenomics/viralrecon \
    --input samplesheet.csv \
    --genome 'MN908947.3' \
    --protocol amplicon \
    --amplicon_bed ./nCoV-2019.artic.V3.bed \
    --skip_assembly \
    -profile <docker/conda>

See the usage documentation for all of the available options when running the pipeline.

Documentation

The MicrobialGenomics/viralrecon pipeline comes with documentation about the pipeline, found in the docs/ directory:

Name		Name	Last commit message	Last commit date
Latest commit History 1,300 Commits
.github		.github
assets		assets
bin		bin
conf		conf
docs		docs
.Rhistory		.Rhistory
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
a		a
environment.yml		environment.yml
kk		kk
main.nf		main.nf
nextflow		nextflow
nextflow.config		nextflow.config
viralrecon.Rproj		viralrecon.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MicrobialGenomics/viralrecon

Forked from nf-core/viralrecon

Pipeline additions

Introduction

Pipeline summary

Pipeline reporting

Quick Start

Documentation

About

Releases 1

Packages

Languages

License

MicrobialGenomics/viralrecon

Folders and files

Latest commit

History

Repository files navigation

MicrobialGenomics/viralrecon

Forked from nf-core/viralrecon

Pipeline additions

Introduction

Pipeline summary

Pipeline reporting

Quick Start

Documentation

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages