Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add gather implementation benchmarks to sourmash docs #3232

Open
ctb opened this issue Jun 30, 2024 · 0 comments
Open

add gather implementation benchmarks to sourmash docs #3232

ctb opened this issue Jun 30, 2024 · 0 comments

Comments

@ctb
Copy link
Contributor

ctb commented Jun 30, 2024

hackmd: link

gather benchmarking - sourmash v4.8.10 / branchwater plugin v0.9.5

Source repo: sourmash-bio/2024-benchmark-gather

sample SRR1976948

This sample contains 177 genomes.

Benchmarking results with 64 threads (note pygather uses 1).

prefix s max_rss
fastmultigather_rocksdb 102.103 515.24
fastgather 152.312 13071.1
fastmultigather 441.748 13029.6
pygather 2768.48 13755.2

Notes:

  • Memory consumption is the same for all non-rocksdb implementations.
  • fastgather is much faster than the others!

These trends held across all four samples.

rocksdb indexing of GTDB rs214

Indexing GTDB rs214 (400k sequences) took 4h 47m (17255 s) and 14 GB. The rocksdb index is 7 GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant