HARVI (Haplotype, Ancestry & Risk Variant Integrator) is a Python script to integrate Beagle3 phased haplotypes, PCADMIX ancestry data, VEP, LOFTEE, CADD deleterious variants
There are 2 ways to start GDI:
To integrate Beagle3 and PCADMIX files type
./harvi.py 22 individuals.list PathToBeagleFiles PathToPCADMIXFiles
To integrate Beagle3, PCADMIX, VEP and CADD files type
./harvi.py 22 individuals.list PathToBeagleFiles PathToPCADMIXFiles PathToVEPFiles PathToCADDFiles
were 22 - chromosomes number individuals.list - individuals list
Test run using example files
./harvi.py 1 oneindividual.list IN IN IN IN
head OUT/GS000010321-ASM.haps.anc.vep.cadd # GS000010321-ASM Posterior(PROXY_for_WEA, PROXY_for_EA) #Chrom Pos Hap1 Hap2 Hap1Anc1 Hap1Anc2 Hap1Anc3 Hap2Anc1 Hap2Anc2 Hap2Anc3 Hap1Gene Hap1Feature Hap1Consequence Hap1Canonical Hap1_LoF Hap1_Phred Hap1_Source Hap2Gene Hap2Feature Hap2Consequence Hap2Canonical Hap2_LoF Hap2_Phred Hap2_Source 1 568256 T T 0.00507307 0.994927 - 0.00626922 0.993731 - ENSG00000237973 ENST00000414273 downstream_gene_variant YES - 10.02 VC ENSG00000237973 ENST00000414273 downstream_gene_variant YES - 10.02 VC 1 568256 T T 0.00507307 0.994927 - 0.00626922 0.993731 - ENSG00000198744 ENST00000416718 upstream_gene_variant YES LC 10.02 VC ENSG00000198744 ENST00000416718 stop_gained YES LC 10.02 VC 1 568361 C C 0.00507307 0.994927 - 0.00626922 0.993731 - ENSG00000237973 ENST00000414273 downstream_gene_variant YES - 10.78 VC ENSG00000237973 ENST00000414273 downstream_gene_variant YES - 10.78 VC 1 568361 C C 0.00507307 0.994927 - 0.00626922 0.993731 - ENSG00000229344 ENST00000427426 non_coding_transcript_exoYES - 10.78 VC ENSG00000229344 ENST00000427426 non_coding_transcript_exon YES - 10.78 VC 1 752721 A G - - - - - - - - - - - - - ENSG00000177757 ENST00000326734 upstream_gene_variant YES - 7.526 VChead OUT/GS000010321-ASM.haps.anc # GS000010321-ASM Posterior(PROXY_for_WEA, PROXY_for_EA) #Chrom Pos Hap1 Hap2 Hap1Anc1 Hap1Anc2 Hap1Anc3 Hap2Anc1 Hap2Anc2 Hap2Anc3 1 567697 A A - - - - - - 1 568201 C C - - - - - - 1 568256 T T 0.00507307 0.994927 - 0.00626922 0.993731 - 1 568361 C C 0.00507307 0.994927 - 0.00626922 0.993731 - 1 752721 A G - - - - - - 1 755274 T C - - - - - - 1 756781 A G - - - - - - 1 757103 T C 0.00149201 0.998508 - 0.00219325 0.997807 -File names format:
individual_id.suffix.chromosome
Examles:
PathToBeagleFiles/GS000010321-ASM.bgl.1
PathToVEPFiles/GS000010321-ASM.vep.1
PathToCADDFiles/GS000010321-ASM.cadd.1File names format:
individual_id.suffix.chromosome.haplotype
PathToPCADMIXFiles/individual_id.pcadmix.1.1
PathToPCADMIXFiles/individual_id.pcadmix.1.2
Examles:
PathToPCADMIXFiles/GS000010321-ASM.pcadmix.1.1
PathToPCADMIXFiles/GS000010321-ASM.pcadmix.1.2Ural Yunusbaev
[email protected]