Read from middle of VCF file? #3

biona001 · 2021-02-14T20:07:14Z

I suppose this package is the same as GeneticVariation.jl, but any chance this package will support multithreaded read? The standard

reader = VCF.Reader(open("example.vcf", "r"))
for record in reader
    # do something
end
close(reader)

requires looping over every record. On large VCF files, just looping through all records can take a few hours. Essentially we need some way to query the reader at the ith position.

The text was updated successfully, but these errors were encountered:

rasmushenningsson · 2021-02-15T08:34:59Z

Yes, this is a feature that I would really like. There are other more urgent changes needed though, so there might take some time before I get to it.

My idea would be support index files (.tbi or .csi). Then you can create a Reader for e.g. a specific chromosome. And thus work in parallell on one file by working on different chromosomes on different threads. Would this be in line with what you need?

biona001 · 2021-02-15T18:50:07Z

Yes, that sounds great! I'll try to look into index files too, and try to help out in some way if possible.

janxkoci · 2022-04-11T15:36:00Z

Since the indexed files usually use BGZF compression (a block-gzip variant), it may be useful to look at how the access is done on such files. Aside from tabix, grabix, and bcftools I noticed also a few Julia packages related to BGZF format, most notably BGZFStreams, but also packages handling BAM could be used for inspiration.

rasmushenningsson added the enhancement New feature or request label Feb 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read from middle of VCF file? #3

Read from middle of VCF file? #3

biona001 commented Feb 14, 2021

rasmushenningsson commented Feb 15, 2021

biona001 commented Feb 15, 2021

janxkoci commented Apr 11, 2022

Read from middle of VCF file? #3

Read from middle of VCF file? #3

Comments

biona001 commented Feb 14, 2021

rasmushenningsson commented Feb 15, 2021

biona001 commented Feb 15, 2021

janxkoci commented Apr 11, 2022