Add a case study on Viridian SARS-Cov-2 data? #172

jeromekelleher · 2024-10-08T19:57:57Z

A fun example to include would be the Viridian SARS-Cov-2 data, which has ~4M whole genomes.

The data is available as a tar archive of FASTAs here

It would be simplest to write a script to just do the conversion sequentially say that we could imagine doing a fasta2zarr program at some point. It would illustrate some nice points, that we can store whole alignments, and the format is flexible enough to include "-" as a gap character for deletions (although these may be removed here, annoyingly).

Preprint: https://www.biorxiv.org/content/10.1101/2024.04.29.591666v1.full.pdf

It's worth doing if it's a day or two's work, no more. I'll have a go at some point, as I'm quite well up on SARS2 data at the moment.

The text was updated successfully, but these errors were encountered:

jeromekelleher mentioned this issue Oct 25, 2024

Sars-cov-2 alignment storage #175

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a case study on Viridian SARS-Cov-2 data? #172

Add a case study on Viridian SARS-Cov-2 data? #172

jeromekelleher commented Oct 8, 2024

Add a case study on Viridian SARS-Cov-2 data? #172

Add a case study on Viridian SARS-Cov-2 data? #172

Comments

jeromekelleher commented Oct 8, 2024