Skip to content

Commit

Permalink
Fix multiallelic genotypes for biallelic CNVs (#748)
Browse files Browse the repository at this point in the history
* Initial commit

* Modified dockstore to track cleanvcf5

* Minor change to trigger sync

* Removed redundant dockstore module

* Modifid +setGT call to use accurate GT expression

* Modified to use sample-specific genotype filtering

* Documented bug in cleanvcfpart5 py script

* Removed dockstore sync
  • Loading branch information
kjaisingh authored Nov 15, 2024
1 parent 1d78766 commit 488d7cb
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 4 deletions.
2 changes: 1 addition & 1 deletion .github/.dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -214,4 +214,4 @@ workflows:
branches:
- main
tags:
- /.*/
- /.*/
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ def main():

if gt5kb_dup:
for sample_obj in record.samples.itervalues():
# Leave no-calls
# Leave no-calls - also causes bug that skips multiallelic genotypes for a biallelic variant
if sample_obj['GT'] == (None, None):
continue
if not sample_obj['GQ'] is None and \
Expand Down
12 changes: 10 additions & 2 deletions wdl/CleanVcf5.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -247,16 +247,24 @@ task Polish {
--exclude 'ID=@ids_to_remove.list' \
--output-type z -o polished.need_reheader.vcf.gz --threads ~{threads}
# replace multiallelic genotypes for CNVs with homref
bcftools +setGT polished.need_reheader.vcf.gz -- \
-t q \
-n c:'1/1' \
-i '(INFO/SVTYPE="DEL" | INFO/SVTYPE="DUP") & (FMT/GT~"[2-9]" | FMT/GT~"[1-9][0-9]+") & FMT/RD_CN>3' > polished.need_reheader.regenotyped.vcf
bgzip polished.need_reheader.regenotyped.vcf
# do the last bit of header cleanup
bcftools view -h polished.need_reheader.vcf.gz > original_header.vcf
bcftools view -h polished.need_reheader.regenotyped.vcf.gz > original_header.vcf
cat original_header.vcf | fgrep '##fileformat' > new_header.vcf
cat original_header.vcf \
| egrep -v "CIPOS|CIEND|RMSSTD|EVENT|INFO=<ID=UNRESOLVED,|source|varGQ|bcftools|ALT=<ID=UNR|INFO=<ID=MULTIALLELIC|GATKCommandLine|#CHROM|##contig|##fileformat" \
| sort >> new_header.vcf
# Don't sort contigs lexicographically, which would result in incorrect chr1, chr10, chr11, ... ordering
cat original_header.vcf | fgrep '##contig' >> new_header.vcf
cat original_header.vcf | fgrep '#CHROM' >> new_header.vcf
bcftools reheader polished.need_reheader.vcf.gz -h new_header.vcf -o ~{prefix}.vcf.gz
bcftools reheader polished.need_reheader.regenotyped.vcf.gz -h new_header.vcf -o ~{prefix}.vcf.gz
>>>
output {
Expand Down

0 comments on commit 488d7cb

Please sign in to comment.