Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reading genotype data in compressed archives #237

Open
nevrome opened this issue Mar 15, 2023 · 1 comment
Open

Allow reading genotype data in compressed archives #237

nevrome opened this issue Mar 15, 2023 · 1 comment
Labels
enhancement New feature or request for the future

Comments

@nevrome
Copy link
Member

nevrome commented Mar 15, 2023

Maybe this could be implemented with sth. like pipes-zlib. It would allow for even smaller file sizes, which in turn would simplify and speed up a lot of our operations.

Ideally poseidon-hs should recognize .[bed|bim|geno|snp].gz suffixes in file names and stream the respective files accordingly when reading a package.

I suggest we play around with this here to see if it's possible and feasible. Later we could consider adding it to the standard.

@stschiff
Copy link
Member

yes. Note that last time I tried pipes-zlib sadly suffered from this bug: k0001/pipes-zlib#16 which was actually a bug in some other library upstream. I ended up decompressing directly from lazy bytestring (https://hackage.haskell.org/package/zlib-0.6.3.0/docs/Codec-Compression-Zlib.html) before then piping it through a suitable Pipes.Parser. So, definitely possible, but definitely also requires some playing around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request for the future
Projects
None yet
Development

No branches or pull requests

2 participants