- Description
- Prerequisites
- Features
- WGSGermlineSNPsIndels
- WESGermlineSNPsIndels
- Funding and Acknowledgement
Next Generation Sequencing data analysis comprises a series of computational tasks frequently based on the use of command line tools. These analyses are defined in workflows that group all the necessary tasks, improving data processing performance and results interpretation. Some Domain Specific Languages (DSLs), such as WDL and Nextflow, have been recently created to define and program complex pipelines, as well as to improve the parallelization, the scalability and the reusability. We have developed complete pipelines programmed in WDL via scripting and Rabix Composer based on the Broad Institute’s best practices and the Genome Analysis Toolkit (GATK4) to analyze whole-genome (WGS) and whole-exome (WES) data.
For benchmarking, we are following the guidelines of the Truth and Consistency precisionFDA challenges using Genome In A Bottle Consortium released genomes data. A full pipeline is currently running on TeideHPC to analyze WGS and WES germline data produced by an Illumina HiSeq4000 sequencing platform for research purposes.
We have developed two workflows based in GATK4 using WDL and Cromwell technologies, and both of them could run in local mode, over a HPC infrastructure or in a dockerized cluster.
Basic software needed to run the pipeline:
- Possibility to run on a HPC infrastructure connecting the Cromwell engine and the SLURM scheduler.
- Starts from BCL data.
- Demultiplexing of samples pooled across the flowcell.
- Data processing both on a per-lane and a per-sample basis.
- Possibility to handle hg19 and hg38 reference genomes.
- Programmed to restart from every step in case of fail.
For benchmarking, we are following the guidelines of the Truth and Consistency precisionFDA challenges using Genome In A Bottle Consortium released genomes data.
Pipeline for whole genome and sequencing analysis.
Pipeline for whole exome and sequencing analysis.
Funded by Ministerio de Ciencia, Innovación y Universidades (RTC-2017-6471-1; MINECO/AEI/FEDER, UE). This work has been supported by the CEDeI program (Centro de Excelencia de Desarrollo e Innovación, Cabildo de Tenerife). The authors also thankfully acknowledge the computer resources and the technical support provided by TARO Research Group of the University of La Laguna.
For more information, see the following poster.