-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: use updated rocksdb-integrated sourmash core #134
Conversation
I derived |
yes - i can now clone to pass in. Thanks! |
Attempting #1292 in order to move forward sourmash-bio/sourmash_plugin_branchwater#134 Modifies `Signature` `Select` to downsample automatically. - for scaled sketches, while checking ksize, we also retain only sketches that have the right scaled or can be downsampled (scaled <= selection.scaled()) - next, we iterate through the sketches and downsample any where scaled < selection.scaled() Note that for `sourmash_plugin_branchwater` compatibility, we need: - `byteorder` = "1.4.3" - `wasm-bindgen` = "0.2.89" - `once_cell` = "1.18.0" --------- Co-authored-by: Luiz Irber <[email protected]>
Issues solved over the course of this PR:
Dependency pin issue:
|
yay!! Question: does this add awesome new functionality (in speed, or convenience) or is it more of a developer-focused update to level-set the codebase? If the former, could you boast about the new functionality a bit more in the PR description? It would help guide review... |
@luizirber @ctb ready for review! |
Cargo.toml
Outdated
sourmash = { git = "https://github.com/sourmash-bio/sourmash", rev= "94b88cc314f781342721addc5ed35c531732a9b6", features = ["branchwater"] } | ||
#sourmash = { version = "0.12.0", features = ["branchwater"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to release 0.12.1
in sourmash instead of depending on rev
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some things I want to add from my work over in #197. I was thinking a core release after that? But also both would be fine!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue for releasing and then updating again, unless releasing is a lot of work. I like the idea of releasing more frequently...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, while I know the new signature select is working (tested), downsampling doesn't yet seem to be working as I expected. Am tracking down and will submit a PR if need be, but maybe let's hold off on release till I make sure it's working the way we want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, actually I just needed an additional call to select
-- will post details on how to use in #197 and we can discuss whether or not we want to modify core at all. Ok to release if/when you have time!
also adding some tests to core to show usage, they'll be in sourmash-bio/sourmash#2948 (sourmash-bio/sourmash@10c7b4c)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sourmash r0.12.1 was just released
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work!
Two strong requests before merging -
- update to depend on new sourmash-rs r0.12.1
- clean up
test_index.py
by removing commented out lines that are no longer relevant.
And I guess along those lines, maybe create an issue related to the TODO item "TODO: index:: do not write output if no signatures to write?"
But please go ahead and merge once that's done!
This is entirely a developer-focused update to level-set the codebase.
Since the addition of rocksdb-indexed gather to this plugin, we have been relying on a prior version of the sourmash core, without the final updates/ refactoring that went into sourmash-bio/sourmash#2230. Sourmash core 0.12.0, our latest release, did not have automatic downsampling while selecting signatures, which broke some of our functionality. After sourmash-bio/sourmash#2931, we can now use the updated core code here.
Notes:
latest
, pinned to current commit. After releasing a new core, we should update to that.selection
andSignature::Select
rather than template-based selectionindex
, use newCollection
to load zipfileCollection::from_zipfile(siglist)?
gather
, to reflect adjusted names/ structs from sourmash core as necessaryWe can later use the updated
core
code to improve /standardize more functionality here, see #196.Behavior changes:
index
: If there are no signatures to index, now produces empty index file instead of erroring out.