Skip to content

Get a pandas DataFrame from a possibly gzipped VCF file.

Notifications You must be signed in to change notification settings

vallenderlab/vcf_to_dataframe

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VCF to DataFrame

Get a pandas DataFrame from a possibly gzipped VCF file. Keep only the variants or also the genotypes from the selected samples.

Usage:

from vcf_to_dataframe import vcf_to_dataframe

df = vcf_to_dataframe('data.vcf')
# => DataFrame with the variants in the data.vcf file

df = vcf_to_dataframe('data.vcf.gz')
# => DataFrame with the variants in the *gzipped* data.vcf.gz file

df = vcf_to_dataframe('data.vcf.gz', keep_samples='HG00096')
# => Same, but now it keeps the genotypes of the selected sample(s)

df = vcf_to_dataframe('data.vcf.gz', keep_samples=['HG00096', 'HG00097'],
                      keep_format_data=True)
# => Same, but now you have the genotypes and the each call metadata,
#    like AD, DP, GQ, and whatever there is in the VCF

About

Get a pandas DataFrame from a possibly gzipped VCF file.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%