Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use low level functions? #489

Closed
phiweger opened this issue Jun 6, 2018 · 10 comments
Closed

How to use low level functions? #489

phiweger opened this issue Jun 6, 2018 · 10 comments
Labels

Comments

@phiweger
Copy link

phiweger commented Jun 6, 2018

I would like to use the C++ implementation of two low level functions similar to

from sourmash._minhash import hash_murmur

These are

  1. the function that splits a string into k-mers
  2. the function that returns the canonical k-mer (the lexicographically smaller of a k-mer and its reverse complement

Are they "exposed", i.e. can I import and use them somehow?

Thank you

@luizirber
Copy link
Member

They are not exposed because they would trigger a lot of data copying from C++ to Python, but they are indeed convenient to have for exploration.

@ctb
Copy link
Contributor

ctb commented Jan 19, 2020

maybe this is easier with the oxidation (=> rust) that happened with 3.0?

1 similar comment
@ctb

This comment was marked as duplicate.

@luizirber
Copy link
Member

  • the function that splits a string into k-mers
  • the function that returns the canonical k-mer (the lexicographically smaller of a k-mer and its reverse complement

maybe this is easier with the oxidation (=> rust) that happened with 3.0?

The main issue is none of these exist as separate functions anymore... They are all defined internally in add_sequence and making a function just to expose it to Python opens the possibility of making it out of sync with the add_sequence impl.

@olgabot
Copy link
Collaborator

olgabot commented Feb 17, 2020

Yeah I ended up implementing k-merization that on my own: https://github.com/czbiohub/kh-tools/blob/master/khtools/compare_kmer_content.py#L77 though it doesn't return the canonical k-mer.

@luizirber How would modularizing and exposing to Python make it out of sync with add_sequence?

@luizirber
Copy link
Member

@luizirber How would modularizing and exposing to Python make it out of sync with add_sequence?

It wouldn't make it out of sync if add_sequence also used them, but #861 and #865 performance improvements come from not having extra functions (and doing everything inside add_sequence). Note how k-merization is implemented as string slicing here, for example.

So unless anyone sees another approach, it's either

  1. modularized, usable in Python, but slower
  2. inline, not usable in Python, but faster

Maybe a way of avoiding the out-of-sync issue: adding the k-merization functions in Rust, expose to Python, and use hypothesis to generate data and do an oracle test with add_sequence (like this one from set_abundances)?

@ctb
Copy link
Contributor

ctb commented Feb 23, 2020 via email

@olgabot
Copy link
Collaborator

olgabot commented Feb 24, 2020 via email

@ctb
Copy link
Contributor

ctb commented Sep 23, 2021

note that kmer to hashes functionality was added in #1653 and #1695!

@ctb
Copy link
Contributor

ctb commented Aug 3, 2022

closing as obsolete.

@ctb ctb closed this as completed Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants