Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GATK ValidateVariants #390

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions bio/gatk/validatevariants/environment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- gatk4 ==4.1.4.1
- snakemake-wrapper-utils ==0.1.3
12 changes: 12 additions & 0 deletions bio/gatk/validatevariants/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
name: GATK ValidateVariants
description: Interleave two paired-end FASTA/Q files
url: https://gatk.broadinstitute.org/hc/en-us/articles/360037057272-ValidateVariants
authors:
- Graeme Ford
input:
vcf: VCF file to be validated
output:
- VCF output file
params:
extra: any extra commands as a string
notes: Multiple threads can be used during compression of the output file with ``pigz``.
13 changes: 13 additions & 0 deletions bio/gatk/validatevariants/test/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
rule vcf_spec_validation:
input:
vcf="sample.vcf",
output:
"results/sample_VALID.vcf",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the output should just have a .txt suffix or so, since this tool will not create a vcf file, right?

log:
"results/sample_VALID.log",
params:
R="genome.fasta",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find this to be used in the wrapper. On the other hand, extra is missing. Maybe you meant to write

Suggested change
R="genome.fasta",
extra="", # optional extra arguments

resources:
mem_mb=1024,
wrapper:
"master/bio/gatk/validatevariants"
3 changes: 3 additions & 0 deletions bio/gatk/validatevariants/test/genome.dict
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
@HD VN:1.5
@SQ SN:ref LN:45 M5:7a66cae8ab14aef8d635bc80649e730b UR:file:/home/johannes/scms/snakemake-wrappers/bio/picard/createsequencedictionary/test/genome.fasta
@SQ SN:ref2 LN:40 M5:1636753510ec27476fdd109a6684680e UR:file:/home/johannes/scms/snakemake-wrappers/bio/picard/createsequencedictionary/test/genome.fasta
4 changes: 4 additions & 0 deletions bio/gatk/validatevariants/test/genome.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
>ref
AGCATGTTAGATAAGATAGCTGTGCTAGTAGGCAGTCAGCGCCAT
>ref2
aggttttataaaacaattaagtctacagagcaactacgcg
2 changes: 2 additions & 0 deletions bio/gatk/validatevariants/test/genome.fasta.fai
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
ref 45 5 45 46
ref2 40 57 40 41
18 changes: 18 additions & 0 deletions bio/gatk/validatevariants/test/sample.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
##fileformat=VCFv4.0
##fileDate=20170110
##source=pindel
##reference=hg38
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=HOMLEN,Number=1,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints">
##INFO=<ID=PF,Number=1,Type=Integer,Description="The number of samples carry the variant">
##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical micro-homology at event breakpoints">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=NTLEN,Number=.,Type=Integer,Description="Number of bases inserted in place of deleted code">
##FORMAT=<ID=PL,Number=3,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Reference depth, how many reads support the reference">
##FORMAT=<ID=AD,Number=2,Type=Integer,Description="Allele depth, how many reads support this allele">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT a
ref 8 . A G . PASS END=140434574;HOMLEN=5;HOMSEQ=AAAAA;SVLEN=35;SVTYPE=INS GT:AD 0/0:317,6
ref2 10 . A AGTTA . PASS END=55238278;HOMLEN=7;HOMSEQ=CTGCCAC;SVLEN=-51;SVTYPE=DEL GT:AD 0/0:40789,1734
27 changes: 27 additions & 0 deletions bio/gatk/validatevariants/wrapper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
__author__ = "Graeme Ford"
__copyright__ = "Copyright 2021, Graeme Ford"
__email__ = "[email protected]"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

extra = snakemake.params.get("extra", "")
preloader = snakemake.params.get("preloader", "")


shell(
"{preloader} "
"gatk "
"--java-options '{java_opts}' "
"ValidateVariants "
"-V {snakemake.input.vcf} "
"{extra} "
"{log} "
"> {output}"
)