Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read from middle of VCF file? #3

Open
biona001 opened this issue Feb 14, 2021 · 3 comments
Open

Read from middle of VCF file? #3

biona001 opened this issue Feb 14, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@biona001
Copy link

I suppose this package is the same as GeneticVariation.jl, but any chance this package will support multithreaded read? The standard

reader = VCF.Reader(open("example.vcf", "r"))
for record in reader
    # do something
end
close(reader)

requires looping over every record. On large VCF files, just looping through all records can take a few hours. Essentially we need some way to query the reader at the ith position.

@rasmushenningsson
Copy link
Owner

Yes, this is a feature that I would really like. There are other more urgent changes needed though, so there might take some time before I get to it.

My idea would be support index files (.tbi or .csi). Then you can create a Reader for e.g. a specific chromosome. And thus work in parallell on one file by working on different chromosomes on different threads. Would this be in line with what you need?

@biona001
Copy link
Author

Yes, that sounds great! I'll try to look into index files too, and try to help out in some way if possible.

@rasmushenningsson rasmushenningsson added the enhancement New feature or request label Feb 15, 2021
@janxkoci
Copy link

Since the indexed files usually use BGZF compression (a block-gzip variant), it may be useful to look at how the access is done on such files. Aside from tabix, grabix, and bcftools I noticed also a few Julia packages related to BGZF format, most notably BGZFStreams, but also packages handling BAM could be used for inspiration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants