Skip to content

Pipeline to filter whole exome vcf files and generate a report document for clinical diagnostics.

License

Notifications You must be signed in to change notification settings

GenomicsAotearoa/diagnostics_exome_reporting

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VCF-DART (VCF Diagnostics Annotation and Reporting Tool)

Pipeline to annotate and filter variant called format (vcf) files and generate a report document for clinical diagnostics. The variant annotation and filtering pipeline now uses a web server GUI implemented in R Shiny.

Published article:

Benton MC, Smith RA, Haupt LM, Sutherland HG, Dunn PJ, Albury CL, Maksemous N, Lea RA, and Griffiths LR. (2019) Variant Call Format (VCF)-Diagnostic Annotation and Reporting Tool (VCF-DART) A Customizable Analysis Pipeline for Identification of Clinically Relevant Genetic Variants in Next-Generation Sequencing Data. The Journal of Molecular Diagnostics (article)


IMPORTANT - Please Read

This repository contains the most stable version of VCF-DART that accompanies the published article. For more recent stable and development builds and to contribute please visit the Genomics Aotearoa GitHub.

Disclaimer

Please note that this is a beta version of the VCF-DART platform which is still undergoing final testing before its official release. The platform, its software and all content found on it are provided on an “as is” and “as available” basis. VCF-DART does not give any warranties, whether express or implied, as to the suitability or usability of the website, server, its software or any of its content.

VCF-DART will not be liable for any loss, whether such loss is direct, indirect, special or consequential, suffered by any party as a result of their use of the VCF-DART platform, its software or content. Any downloading or uploading of material to the website/server is done at the user’s own risk and the user will be solely responsible for any damage to any computer system or loss of data that results from such activities.

Should you encounter any bugs, glitches, lack of functionality or other problems on the website, please let us know immediately so we can rectify these accordingly. Your help in this regard is greatly appreciated! The best way to do this is to log an issue in this GitHub repository, or if you feel inclined you are welcome to create a pull request.


Software Dependencies

The following programs need to be available/installed for correct operation:

R Package Dependencies

VCF-DART currently requires the following packages (and their dependencies) to be installed for correct operation:

# CRAN
install.packages('magrittr')
install.packages('shiny')
install.packages('shinyBS')
install.packages('rmarkdown')
install.packages('pander')

NOTE: for Shiny Server to be correctly installed you will require both shiny and rmarkdown packages to be installed.

Current to-do list and fixes/features pending

  • fix bug in assess_variants.sh that means Mutation Assessor links don't work
    • hg19 is being replaced by GRCh37, creating dead links
  • look at moving this to-do list over to a roadmap in the wiki
  • add a tab to the UI that captures and displays the tail of the most recent log file
    • to do this add ability for shell (bash) pipeline call to be sent to background processing freeing up Shiny Server reactivity
    • this has been implemented, but has meant the removal of the activity wheel (for now)
  • option to run without coverage text file (more a research purpose)
  • look at integrating VCF-DART and VCF-DART Viewer into a shinydashboard (and within a docker/singularity container)
    • explore docker/singularity
  • explore having options for which databases to annotate against, i.e. not running VEP --everything could cut run time by 30+ mins
    • reducing the number of threads to 6 and removing the --merged VEP option reduce run times to 10-15 mins for vcf files 30-50K variants in size
  • implement selection of genome build (currently only hg19 is working)
    • this is a big feature as the current databases aren't all built for hg38
      • create a separate feature branch to develop this
  • add more extracted features to the vcfcompiler_diagnostics.sh script (i.e. CADD score)
    • make CADD score available (add extraction routine in vcfcompiler_diagnostics.sh)
    • added MutationTaster and MutationAssessor to viewable output as well
    • build script to scrape clinvar and provide updated annotation
    • add more Clinvar information
      • add a script that pulls the most recent clinvar, process it and save as an RDS for quick access (a version of this is included in the repo)
    • combine above this with results
  • look into adding a cancel/exit button to the Shiny App to kill run
  • explore asking user for raw data dir in GUI or configuration file (currently hard-coded)
  • evaluate whether we need to continue to allow the user to define the 'home' dir
  • generate and send an email and/or text message upon run completion
  • look into developing an option for "off-line mode"
    • design a check for internet connection
    • would need a local copy of the repository available
  • check for and ignore .tbi files in the data directory
  • explore adding a check for label in the coverage text file as well
  • add a check for input variables and warn/error display that this is the case if missing
  • look at adding a tab for help/guide
    • added tooltips throughout app, detailed help/documentation can be found at GitHub wiki
  • add a tab with options to upload files (VCF and coverage text files)
  • check for existing gene_list dir and delete if present
  • removed the need for an external configuration file
    • config options are now at the start of the script (user defined)
  • GitHub repo requires ssh passphrase each use
  • add a Shiny GUI to the front end
  • update DART-view (other shiny app) to point to the correct directory for viewing results
  • issue with grep using gene lists (files) and vcf.gz
    • look into using tabix (MUCH faster)
    • extract list of genes from a bed file (with position info), i.e. grep -w -f'gene_list.txt' UCSC_gene_positions_hg19.bed > gene_regions_hg19.txt
    • use: tabix -R gene_regions.txt variant.vcf.gz
  • remove the xmessage checks (relies on having X11 environment installed, not ideal)
    • decide if we need to have user checks at these two locations
  • ensure the log files are being moved back into the correct location
  • overhaul Shiny script to allow hosting via Shiny Server
    • split into ui.R and server.R
    • add home directory variable to set location for data and scripts
    • test working when deployed remotely
  • added code to set working dir to main script location
  • create a configuration file to allow users to set paths to software and databases (temp)
  • remove all hard-coded paths (software, databases and directories)
    • remove from the main bash script (WESdiag_pipeline_dev.sh)
    • remove from wes_vcffiltering.sh
    • remove from assess_variants.sh
    • remove from vcfcompiler_diagnostics.sh
  • add user defined option for the 3rd tier gene list
    • create a feature branch for this to be implemented (user-defined-tiers)
    • update variable names of gene lists to be universal
    • use user uploaded gene lists (download into gene_list dir)
    • add integration with a self contained and user curated gene list repository
  • explore the presence of duplicate variants in the final tier (tier 3)
  • add ability to determine variant caller used to generate VCF file to allow allele depthspecific filtering
    • testing an IF ELSE statement which looks for AD term (GATK format)
  • test whether bgzipping and creating tabix index for the vcf file improves VEP performance
  • add time taken at the end of the pipeline (in main bash script)
  • implement multiple row selection and copy to clipboard

License

Copyright (C) 2018  Miles Benton

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

About

Pipeline to filter whole exome vcf files and generate a report document for clinical diagnostics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 70.3%
  • R 29.7%