Skip to content

Python script to integrate Beagle3 phased haplotypes, PCADMIX ancestry data, VEP, LOFTEE & CADD deleterious variants

Notifications You must be signed in to change notification settings

Ural-Yunusbaev/HARVI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 

Repository files navigation

Genomic Data Integrator HARVI

HARVI (Haplotype, Ancestry & Risk Variant Integrator) is a Python script to integrate Beagle3 phased haplotypes, PCADMIX ancestry data, VEP, LOFTEE, CADD deleterious variants

Usage

There are 2 ways to start GDI:

To integrate Beagle3 and PCADMIX files type

./harvi.py 22 individuals.list PathToBeagleFiles PathToPCADMIXFiles

To integrate Beagle3, PCADMIX, VEP and CADD files type

./harvi.py 22 individuals.list PathToBeagleFiles PathToPCADMIXFiles PathToVEPFiles PathToCADDFiles

were 22 - chromosomes number individuals.list - individuals list

Test run using example files

./harvi.py 1 oneindividual.list IN IN IN IN

Output files examples

head OUT/GS000010321-ASM.haps.anc.vep.cadd
#       GS000010321-ASM Posterior(PROXY_for_WEA, PROXY_for_EA)
#Chrom  Pos     Hap1  Hap2  Hap1Anc1     Hap1Anc2    Hap1Anc3  Hap2Anc1      Hap2Anc2    Hap2Anc3  Hap1Gene        Hap1Feature     Hap1Consequence          Hap1Canonical   Hap1_LoF  Hap1_Phred      Hap1_Source     Hap2Gene        Hap2Feature     Hap2Consequence Hap2Canonical  Hap2_LoF Hap2_Phred  Hap2_Source

1       568256  T     T     0.00507307   0.994927    -         0.00626922    0.993731    -         ENSG00000237973 ENST00000414273 downstream_gene_variant  YES             -         10.02   VC      ENSG00000237973 ENST00000414273 downstream_gene_variant         YES            -        10.02       VC
1       568256  T     T     0.00507307   0.994927    -         0.00626922    0.993731    -         ENSG00000198744 ENST00000416718 upstream_gene_variant    YES             LC         10.02   VC      ENSG00000198744 ENST00000416718 stop_gained                     YES            LC       10.02       VC
1       568361  C     C     0.00507307   0.994927    -         0.00626922    0.993731    -         ENSG00000237973 ENST00000414273 downstream_gene_variant  YES             -         10.78   VC      ENSG00000237973 ENST00000414273 downstream_gene_variant         YES            -        10.78       VC
1       568361  C     C     0.00507307   0.994927    -         0.00626922    0.993731    -         ENSG00000229344 ENST00000427426 non_coding_transcript_exoYES             -         10.78   VC      ENSG00000229344 ENST00000427426 non_coding_transcript_exon      YES            -        10.78       VC
1       752721  A     G     -            -           -         -             -           -         -               -               -                        -               -         -       -       ENSG00000177757 ENST00000326734  upstream_gene_variant          YES            -        7.526       VC
head OUT/GS000010321-ASM.haps.anc
#       GS000010321-ASM Posterior(PROXY_for_WEA, PROXY_for_EA)
#Chrom  Pos     Hap1    Hap2    Hap1Anc1        Hap1Anc2        Hap1Anc3        Hap2Anc1        Hap2Anc2        Hap2Anc3
1       567697  A       A       -       -       -       -       -       -
1       568201  C       C       -       -       -       -       -       -
1       568256  T       T       0.00507307      0.994927        -       0.00626922      0.993731        -
1       568361  C       C       0.00507307      0.994927        -       0.00626922      0.993731        -
1       752721  A       G       -       -       -       -       -       -
1       755274  T       C       -       -       -       -       -       -
1       756781  A       G       -       -       -       -       -       -
1       757103  T       C       0.00149201      0.998508        -       0.00219325      0.997807        -

File names format:
individual_id.suffix.chromosome
Examles:
PathToBeagleFiles/GS000010321-ASM.bgl.1
PathToVEPFiles/GS000010321-ASM.vep.1
PathToCADDFiles/GS000010321-ASM.cadd.1

File names format:
individual_id.suffix.chromosome.haplotype
PathToPCADMIXFiles/individual_id.pcadmix.1.1
PathToPCADMIXFiles/individual_id.pcadmix.1.2
Examles:
PathToPCADMIXFiles/GS000010321-ASM.pcadmix.1.1
PathToPCADMIXFiles/GS000010321-ASM.pcadmix.1.2

Contact

Ural Yunusbaev
[email protected]

About

Python script to integrate Beagle3 phased haplotypes, PCADMIX ancestry data, VEP, LOFTEE & CADD deleterious variants

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published