This work has now been published: AutoGVP: a dockerized workflow integrating ClinVar and InterVar germline sequence variant classification.
Kim J^, Naqvi AS^, Corbett RJ, Kaufman RS, Vaksman Z, Brown MA, Miller DP, Phul S, Geng Z, Storm PB, Resnick AC, Stewart DR, Rokita JL+, Diskin SJ+. AutoGVP: a dockerized workflow integrating ClinVar and InterVar germline sequence variant classification. Bioinformatics. 2024 Mar 4;40(3):btae114. doi: 10.1093/bioinformatics/btae114. PMID: 38426335; PMCID: PMC10955249.
^Equal first authorship +Equal senior authorship
For more detailed instructions, please visit the user guide on our wiki.
git clone [email protected]:diskin-lab-chop/AutoGVP.git
- Pull the docker image.
docker pull pgc-images.sbgenomics.com/diskin-lab/autogvp:v1.0.1
- Navigate to the
AutoGVP
root directory
cd AutoGVP
- Start a docker image. Replace <CONTAINER_NAME> with any name and run the commands below:
docker run --platform linux/amd64 --name <CONTAINER_NAME> -d -v $PWD:/home/rstudio/AutoGVP pgc-images.sbgenomics.com/diskin-lab/autogvp:v1.0.1
docker exec -ti <CONTAINER_NAME> bash
- Navigate to AutoGVP directory within the docker image
cd /home/rstudio/AutoGVP
- Run AutoGVP (see example commands below).
VEP (v104)
InterVar
ANNOVAR
AutoPVS1 (v2.0)
bcftools (v1.17)
AutoGVP Requirements (recommended to place all in the data/
folder):
- VEP-annotated VCF file (
*VEP.vcf
) or VEP- and ClinVar-annotated VCF file (CAVATICA workflow only). For CAVATICA workflow, AutoGVP will use ClinVar annotation from sample VCF when external ClinVar VCF file is not provided. - ANNOVAR multianno file (
*hg38_multianno.txt
) - InterVar file (
*intervar.hg38_multianno.txt.intervar
) - AutoPVS1 file (
*autopvs1.txt
) - Variant submissions file (
ClinVar-selected-submissions.tsv
generated byselect-clinVar-submissions.R
) - ClinVar VCF (
clinvar_yyyymmdd.vcf.gz
optional user input orclinvar.vcf.gz
will be downloaded withdownload_db_files.sh
). This is an optional input for CAVATICA workflow; if not provided, AutoGVP will expect ClinVar annotation in VEP-annotated sample VCF (see above).
- Prepare input files by running VEP, ANNOVAR, InterVar, and AutoPVS1.
- Download database files:
bash scripts/download_db_files.sh
- Run
select-clinVar-submissions.R
. To customize conflicting interpretation resolution, users can provide a ClinGen Concept ID list to filter submissions against (--conceptID_list
). When a list is provided, users can also determine how unsettled conflicts are resolved with the--conflict_res
argument ("latest"
or"most_severe"
). For more details, see the FAQ. Example command:
Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary.txt.gz --submission_summary data/submission_summary.txt.gz --outdir results --conceptID_list data/clinvar_cpg_concept_ids.txt --conflict_res "latest"
- Run AutoGVP; if output of scripts/select-clinVar-submissions.R is not provided, the script will be run prior to starting pathogenicity assessment
bash run_autogvp.sh --workflow="custom" \
--vcf=data/test_VEP.vcf \
--filter_criteria=<filter criteria>
--clinvar=data/clinvar.vcf.gz \
--intervar=data/test_VEP.hg38_multianno.txt.intervar \
--multianno=data/test_VEP.vcf.hg38_multianno.txt \
--autopvs1=data/test_autopvs1.txt \
--outdir=results \
--out="test_custom" \
--selected_clinvar_submissions=results/ClinVar-selected-submissions.tsv \
--variant_summary=data/variant_summary.txt.gz \
--submission_summary=data/submission_summary.txt.gz \
--conceptIDs=data/clinvar_cpg_concept_ids.txt \
--conflict_res="latest"
- Download database files:
bash scripts/download_db_files.sh
- Run
select-clinVar-submissions.R
(See custom workflow step 2 for optional conflict resolution parameters). For more details, see the FAQ. Example command:
Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary.txt.gz --submission_summary data/submission_summary.txt.gz --outdir results --conceptID_list data/clinvar_cpg_concept_ids.txt --conflict_res "latest"
- Run AutoGVP; if output of scripts/select-clinVar-submissions.R is not provided, the script will be run prior to starting pathogenicity assessment
bash run_autogvp.sh --workflow="cavatica" \
--vcf=data/test_pbta.single.vqsr.filtered.vep_105.vcf \
--filter_criteria=<filter criteria> \
--intervar=data/test_pbta.hg38_multianno.txt.intervar \
--multianno=data/test_pbta.hg38_multianno.txt \
--autopvs1=data/test_pbta.autopvs1.tsv \
--outdir=results \
--out="test_pbta" \
--selected_clinvar_submissions=results/ClinVar-selected-submissions.tsv \
--variant_summary=data/variant_summary.txt.gz \
--submission_summary=data/submission_summary.txt.gz \
--conceptIDs=data/clinvar_cpg_concept_ids.txt \
--conflict_res="latest"
AutoGVP produces an abridged output file with minimal information needed to interpret variant pathogenicity, as well as a full output with >100 variant annotation columns.
chr | start | ref | alt | rs_id | gene_symbol_vep | variant_classification_vep | HGVSg | HGVSc | HGVSp | autogvp_call | autogvp_call_reason | clinvar_stars | clinvar_clinsig | intervar_evidence |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr1 | 1332490 | C | T | rs201607183 | TAS1R3 | missense_variant | chr1:g.1332490C>T | c.959C>T | p.Thr320Met | Uncertain_significance | ClinVar | 1 | Uncertain_significance | InterVar: Uncertain significance PVS1=0 PS=[0, 0, 0, 0, 0] PM=[1, 0, 0, 0, 0, 0, 0] PP=[0, 0, 1, 0, 0, 0] BA1=0 BS=[0, 0, 0, 0, 0] BP=[0, 0, 0, 0, 0, 0, 0, 0] |
chr1 | 1390349 | C | T | rs769726291 | CCNL2 | missense_variant | chr1:g.1390349C>T | c.887G>A | p.Gly296Asp | Uncertain_significance | InterVar | NA | NA | InterVar: Uncertain significance PVS1=0 PS=[0, 0, 0, 0, 0] PM=[1, 1, 0, 0, 0, 0, 0] PP=[0, 0, 0, 0, 0, 0] BA1=0 BS=[0, 0, 0, 0, 0] BP=[0, 0, 0, 0, 0, 0, 0, 0] |
*NOTE: gnomAD v.3.1.1 non-cancer AF popmax values (gnomad_3_1_1_AF_non_cancer
) will also be included in abridged output when provided.
See here for list of columns included in full output.
Ammar S. Naqvi (@naqvia) and Ryan J. Corbett (@rjcorb)
For questions, please submit an issue or send an email to Ryan Corbett (@rjcorb): [email protected]