You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I suppose this package is the same as GeneticVariation.jl, but any chance this package will support multithreaded read? The standard
reader = VCF.Reader(open("example.vcf", "r"))
for record in reader
# do somethingendclose(reader)
requires looping over every record. On large VCF files, just looping through all records can take a few hours. Essentially we need some way to query the reader at the ith position.
The text was updated successfully, but these errors were encountered:
Yes, this is a feature that I would really like. There are other more urgent changes needed though, so there might take some time before I get to it.
My idea would be support index files (.tbi or .csi). Then you can create a Reader for e.g. a specific chromosome. And thus work in parallell on one file by working on different chromosomes on different threads. Would this be in line with what you need?
Since the indexed files usually use BGZF compression (a block-gzip variant), it may be useful to look at how the access is done on such files. Aside from tabix, grabix, and bcftools I noticed also a few Julia packages related to BGZF format, most notably BGZFStreams, but also packages handling BAM could be used for inspiration.
I suppose this package is the same as
GeneticVariation.jl
, but any chance this package will support multithreaded read? The standardrequires looping over every record. On large VCF files, just looping through all records can take a few hours. Essentially we need some way to query the reader at the
i
th position.The text was updated successfully, but these errors were encountered: