You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A fun example to include would be the Viridian SARS-Cov-2 data, which has ~4M whole genomes.
The data is available as a tar archive of FASTAs here
It would be simplest to write a script to just do the conversion sequentially say that we could imagine doing a fasta2zarr program at some point. It would illustrate some nice points, that we can store whole alignments, and the format is flexible enough to include "-" as a gap character for deletions (although these may be removed here, annoyingly).
A fun example to include would be the Viridian SARS-Cov-2 data, which has ~4M whole genomes.
The data is available as a tar archive of FASTAs here
It would be simplest to write a script to just do the conversion sequentially say that we could imagine doing a
fasta2zarr
program at some point. It would illustrate some nice points, that we can store whole alignments, and the format is flexible enough to include "-" as a gap character for deletions (although these may be removed here, annoyingly).Preprint: https://www.biorxiv.org/content/10.1101/2024.04.29.591666v1.full.pdf
It's worth doing if it's a day or two's work, no more. I'll have a go at some point, as I'm quite well up on SARS2 data at the moment.
The text was updated successfully, but these errors were encountered: