-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement find
as an iterator in SBT
to provide library users more options
#2665
base: latest
Are you sure you want to change the base?
Conversation
find
as an iterator in SBT
to provide library users more options
hi @morsecodist this sounds fantastic! thanks for engaging! note to @luizirber this is touching the Rust code / implementation of SBT! (I've approved your PR for GitHub actions, also.) A few hot takes -
thanks again! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## latest #2665 +/- ##
=======================================
Coverage 86.78% 86.78%
=======================================
Files 136 136
Lines 15524 15524
Branches 2626 2626
=======================================
Hits 13472 13472
Misses 1751 1751
Partials 301 301
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@ctb Thanks for the quick response and for linking. My tree implementation makes use of the principle from this issue. I would love to discuss folding that in though if there is a better place than this PR let me know. My tree has a few key differences:
I would be happy to go into more detail about it. I was planning on adding some of these elements separately either to the core SBT tree or making a new implementation of it with these properties. This is all compatible with |
hey, and look, all your tests passed 🎉 I also forgot to mention this hot take: we have an early-stage but (in my experience) quite functional plug-in system, see #2428 and #2438. It should be easy to add your own experimental I like your incrementalist approach! |
Wholeheartedly agree with the In #2230 I... removed the SBT code. Because I figured no one was using it. Should it be kept around? |
@luizirber for my tree I actually am not using any of the SBT code and it is much more minimal than what you have for SBTs. From what it sounds like if you are open to having this index type there are two ways forward:
Would you be OK with either of these and if so which would you prefer. Also it sounds like you are happy with a full refactor of
Does that make sense as a breakdown? |
hi @morsecodist please see #2230 (comment) - @luizirber is splitting out the big PR into many small bits! In particular, for now, we suggest going this route,
and keeping it all in Rust - no need to expose to Python. You can keep your tree index separated and not in In re:
we are definitely on board with this, but @luizirber has some thoughts about how to make your life easier here by splitting off & merging the removal of In brief: please let us know if/when this PR is ready for full review! @luizirber let me know if I got that all right 😆 |
Hello! I work on Chan Zuckerberg Infectious Disease and we are working on using sourmash for removing redundant sequences. You can take a look at some of our efforts here, though we are still in the exploratory stages so there is not much there in the way of documentation. We are using an approach where if any sequence we already have is closer than the threshold to a given sequence we want to omit that sequence. I am using sourmash as a library for this. I have implemented my own slightly different SBT heavily based on what you already have here for our use case. I think that all or some of this may make a nice addition to the library.
One simple piece of low hanging fruit that I feel would benefit all users of the library is implementing
find
as an iterator. If we find a similar sequence we want to stop the search instead of continuing to search the whole tree. I was going to implement a method that either checks for the presence of a signature or returns the first one. However, I realized an iterator solves a lot of these use cases simultaneously. Users of the library will be able to stop the search after an arbitrary number of matches as well as do things like filtering and mapping without adding filtered elements to the final vector.This is a minimal addition. I didn't add an iterator version of find to the
Index
trait yet but if you are OK with going in this direction we can merge in this change and I would be happy to implement more iterators for moreIndexes
. We can also use the default implementation which already uses an iterator and converts it to a vector.I didn't end up writing new tests because I modified the existing
find
method to depend on the iterator implementation. This way it is covered by the existingfind
test cases.