Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: use updated rocksdb-integrated sourmash core #134

Merged
merged 23 commits into from
Feb 10, 2024
Merged

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Sep 27, 2023

This is entirely a developer-focused update to level-set the codebase.

Since the addition of rocksdb-indexed gather to this plugin, we have been relying on a prior version of the sourmash core, without the final updates/ refactoring that went into sourmash-bio/sourmash#2230. Sourmash core 0.12.0, our latest release, did not have automatic downsampling while selecting signatures, which broke some of our functionality. After sourmash-bio/sourmash#2931, we can now use the updated core code here.

Notes:

  • currently uses sourmash core within sourmash latest, pinned to current commit. After releasing a new core, we should update to that.
  • Where code was working and did not rely on changed core functionality, I didn't change anything. Instead, I focused only on code that needed to be changed -- primarily any code using a rocksdb index.
  • Within there, main changes:
    • use selection and Signature::Select rather than template-based selection
    • in index, use new Collection to load zipfile Collection::from_zipfile(siglist)?
    • adapt code for calling gather, to reflect adjusted names/ structs from sourmash core as necessary

We can later use the updated core code to improve /standardize more functionality here, see #196.

Behavior changes:

  • index: If there are no signatures to index, now produces empty index file instead of erroring out.

Is this desired? This mirrors what happens within core, but we could add a check to see if the index is empty and throw an error from here.

  • to do this, we would need an easy check to see if the rocksdb index is empty. @luizirber -- does db.check() do that?

@bluegenes bluegenes changed the title EXP: use official mastiff sourmash core WIP: use official mastiff sourmash core Sep 28, 2023
@luizirber
Copy link
Member

* can't actually pass `selection` to mastiff `gather` here -- copy/clone isn't implemented and we can't move it. Need to look into this to see what I need to change.

I derived Clone for Selection in latest commit, does that work for you now?

@bluegenes
Copy link
Contributor Author

* can't actually pass `selection` to mastiff `gather` here -- copy/clone isn't implemented and we can't move it. Need to look into this to see what I need to change.

I derived Clone for Selection in latest commit, does that work for you now?

yes - i can now clone to pass in. Thanks!

bluegenes added a commit to sourmash-bio/sourmash that referenced this pull request Jan 23, 2024
Attempting #1292 in order to move forward
sourmash-bio/sourmash_plugin_branchwater#134

Modifies `Signature` `Select` to downsample automatically.

- for scaled sketches, while checking ksize, we also retain only
sketches that have the right scaled or can be downsampled (scaled <=
selection.scaled())
- next, we iterate through the sketches and downsample any where scaled
< selection.scaled()

Note that for `sourmash_plugin_branchwater` compatibility, we need:

- `byteorder` = "1.4.3"
- `wasm-bindgen` = "0.2.89"
- `once_cell` = "1.18.0"
---------

Co-authored-by: Luiz Irber <[email protected]>
@bluegenes
Copy link
Contributor Author

Issues solved over the course of this PR:

  • To fix the failing tests, we needed to (re-)enable automatic downsampling, now that we've replaced template with select / selection (implemented: MRG: in core, enable downsample within select sourmash#2931)
  • zip had byteorder and bytecount version conflicts with the versions required by sourmash. Thanks for solving, Luiz!
  • issue with chrono (needs to be pinned to 0.4.28 instead of 0.4.31 because of piz; see full err below)
  • couldn't actually pass selection to mastiff gather here -- Luiz implemented clone for selection!

Dependency pin issue:

~/pyo3_branchwater$ pip install -e .
Obtaining file:///home/ntpierce/pyo3_branchwater
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing editable metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [21 lines of output]
          Updating git repository `https://github.com/sourmash-bio/sourmash`
          Updating crates.io index
      error: failed to select a version for `chrono`.
          ... required by package `piz v0.5.0`
          ... which satisfies dependency `piz = "^0.5.0"` of package `sourmash v0.12.0 (https://github.com/sourmash-bio/sourmash?branch=lirber/mastiff#0799adc3)`
          ... which satisfies git dependency `sourmash` of package `pyo3-branchwater v0.8.1 (/home/ntpierce/pyo3_branchwater)`
      versions that meet the requirements `^0.4` (locked to 0.4.28) are: 0.4.28

      all possible versions conflict with previously selected packages.

        previously selected package `chrono v0.4.31`
          ... which satisfies dependency `chrono = "^0.4.31"` of package `sourmash v0.12.0 (https://github.com/sourmash-bio/sourmash?branch=lirber/mastiff#0799adc3)`
          ... which satisfies git dependency `sourmash` of package `pyo3-branchwater v0.8.1 (/home/ntpierce/pyo3_branchwater)`

      failed to select a version for `chrono` which could resolve this conflict
      💥 maturin failed
        Caused by: Cargo metadata failed. Does your crate compile with `cargo build`?
        Caused by: `cargo metadata` exited with an error:
      Error running maturin: Command '['maturin', 'pep517', 'write-dist-info', '--metadata-directory', '/scratch/ntpierce/pip-modern-metadata-igv0kafw', '--interpreter', '/home/ntpierce/.conda/envs/pybranch2/bin/python3.10']' returned non-zero exit status 1.
      Checking for Rust toolchain....
      Running `maturin pep517 write-dist-info --metadata-directory /scratch/ntpierce/pip-modern-metadata-igv0kafw --interpreter /home/ntpierce/.conda/envs/pybranch2/bin/python3.10`
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

@bluegenes bluegenes changed the title WIP: use official mastiff sourmash core MRG: use updated rocksdb-integrated sourmash core Jan 24, 2024
@ctb
Copy link
Collaborator

ctb commented Jan 24, 2024

yay!!

Question: does this add awesome new functionality (in speed, or convenience) or is it more of a developer-focused update to level-set the codebase? If the former, could you boast about the new functionality a bit more in the PR description? It would help guide review...

@bluegenes
Copy link
Contributor Author

@luizirber @ctb ready for review!

Cargo.toml Outdated
Comment on lines 15 to 16
sourmash = { git = "https://github.com/sourmash-bio/sourmash", rev= "94b88cc314f781342721addc5ed35c531732a9b6", features = ["branchwater"] }
#sourmash = { version = "0.12.0", features = ["branchwater"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to release 0.12.1 in sourmash instead of depending on rev here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some things I want to add from my work over in #197. I was thinking a core release after that? But also both would be fine!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue for releasing and then updating again, unless releasing is a lot of work. I like the idea of releasing more frequently...

Copy link
Contributor Author

@bluegenes bluegenes Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, while I know the new signature select is working (tested), downsampling doesn't yet seem to be working as I expected. Am tracking down and will submit a PR if need be, but maybe let's hold off on release till I make sure it's working the way we want

Copy link
Contributor Author

@bluegenes bluegenes Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, actually I just needed an additional call to select -- will post details on how to use in #197 and we can discuss whether or not we want to modify core at all. Ok to release if/when you have time!

also adding some tests to core to show usage, they'll be in sourmash-bio/sourmash#2948 (sourmash-bio/sourmash@10c7b4c)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sourmash r0.12.1 was just released

Copy link
Collaborator

@ctb ctb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work!

Two strong requests before merging -

  • update to depend on new sourmash-rs r0.12.1
  • clean up test_index.py by removing commented out lines that are no longer relevant.

And I guess along those lines, maybe create an issue related to the TODO item "TODO: index:: do not write output if no signatures to write?"

But please go ahead and merge once that's done!

@bluegenes bluegenes enabled auto-merge (squash) February 10, 2024 20:30
@bluegenes bluegenes merged commit 9da289e into main Feb 10, 2024
1 check passed
@bluegenes bluegenes deleted the upd-smash-core branch February 10, 2024 20:31
@ctb ctb mentioned this pull request Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants