Skip to content

broadinstitute/viral-pipelines

Repository files navigation

Build Status Documentation Status

viral-pipelines

A set of scripts and tools for the analysis of viral NGS data.

Workflows are written in WDL format. This is a portable workflow language that allows for easy execution on a wide variety of platforms:

  • on individual machines (using miniWDL or Cromwell to execute)
  • on commercial cloud platforms like GCP, AWS, or Azure (using Cromwell or CromwellOnAzure)
  • on institutional HPC systems (using Cromwell)
  • on commercial platform as a service vendors (like DNAnexus)
  • on academic cloud platforms (like Terra)

Obtaining the latest WDL workflows

Workflows from this repository are continuously deployed to Dockstore, a GA4GH Tool Registry Service. They can then be easily imported to any bioinformatic compute platform that utilizes the TRS API and understands WDL (this includes Terra, DNAnexus, DNAstack, etc).

Workflows are also available in the Terra featured workspace.

Workflows are continuously deployed to a DNAnexus CI project.

Basic execution

The easiest way to get started is on a single, Python & Docker-capable machine (your laptop, shared workstation, or virtual machine) using miniWDL as shown above. MiniWDL can be installed either via pip or conda (via conda-forge). After confirming that it works (miniwdl run_self_test, you can use miniwdl run to invoke WDL workflows from this repository.

For example, to list the inputs for the assemble_refbased workflow:

miniwdl run https://raw.githubusercontent.com/broadinstitute/viral-pipelines/v2.1.8.0/pipes/WDL/workflows/assemble_refbased.wdl

This will emit:

missing required inputs for assemble_refbased: reads_unmapped_bams, reference_fasta

required inputs:
  Array[File]+ reads_unmapped_bams
  File reference_fasta

optional inputs:
  <really long list>

outputs:
  <really long list>

To then execute this workflow on your local machine, invoke it with like this:

miniwdl run \
  https://raw.githubusercontent.com/broadinstitute/viral-ngs-staging/master/pipes/WDL/workflows/assemble_refbased.wdl \
  reads_unmapped_bams=PatientA_library1.bam \
  reads_unmapped_bams=PatientA_library2.bam \
  reference_fasta=/refs/NC_045512.2.fasta \
  trim_coords_bed=/refs/NC_045512.2-artic_primers-3.bed \
  sample_name=PatientA

In the above example, reads from two sequencing runs are aligned and merged together before consensus calling. The optional bed file provided turns on primer trimming at the given coordinates.

Available workflows

The workflows provided here are more fully documented at our ReadTheDocs page.