-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add probability of overlap and weighted containment for Multisearch matches #458
Merged
Merged
Changes from 92 commits
Commits
Show all changes
94 commits
Select commit
Hold shift + click to select a range
a1d385f
Add probability of overlap and weighted containment to multisearch re…
olgabot cf1b2bd
Start writing prob_overlap
olgabot 2e1b338
Couldn't figure out how to get prob_overlap.rs to import .. putting i…
olgabot b621854
Trying to get prob overlap to at least import properly
olgabot 3639a1b
Start writing a merge_all_minhashes function
olgabot 1003afe
Write in commented code what needs to happen
olgabot 5fe707a
Remove mut from unused variables for now
olgabot ca54e0d
wrote function to merge all minhashes of a vector of signatures
olgabot e32f8e3
Added mege_all_minhashes to multisearch
olgabot e049420
Add crates for stable calculation of log values
olgabot af6190a
Add dependencies for stable calculation of log values in Cargo.lock
olgabot f7368f2
Add rust decimal with math feature
olgabot 2d81dab
Add function to get probability of overlap between specific intersect…
olgabot 149f67f
Call probability of overlap between queries and database
olgabot 95b9489
I'm getting too confused by rust_decimal .. let's go back to using th…
olgabot 8655ab6
Add adjusted prob_overlap to MultiSearchResult
olgabot 55aaf41
Getting prob_overlap to actually work
olgabot f398996
Add failing test for test_multisearch.py
olgabot 8626d6f
Fix n_comparisons to be float, remove commented out pseudocode
olgabot e301b6e
Remove unnecessary parens
olgabot 4afe614
Added prob_overlap, prob_overlap_adjusted, containment_adjusted, cont…
olgabot 77de30a
Add print statements
olgabot 3807891
Add containment_adjusted_log10
olgabot ab046cb
Fix compiler errors
olgabot 2030909
Fix rounding for prob_overlap, prob_overlap_adjusted, containment_adj…
olgabot 1076200
Move probability of overlap code into separate search_significance mo…
olgabot a774697
add tf_idf_score to test_multisearch.py
olgabot a64abc0
Add tf_idf_score to MultiSearchResult
olgabot 7bec363
Make separate "againsts" as Vec<Sig>
olgabot bd2eb94
Get TF-IDF running
olgabot 68c6b97
remove print statements and commented out code
olgabot 19d657d
Remove print statements, commented out code, add todos
olgabot 8221b49
Fix optional boolean types for prob_overlap and tf idf
olgabot 7ecd7b7
Add multisearch test of protein with abundance
olgabot e24a915
Remove part_001 from signature filename
olgabot 6bc166f
Delete old part_001 file
olgabot 9fdb78d
Remove too big sig from test data
olgabot 4ce8c82
Add test of probability of overlap with multisearch
olgabot ddf16b0
Add --prob argument
olgabot 3d87184
Precompute frequencies for queries and againsts, save as HashMaps for…
olgabot 61805fd
Use L2 norm for tf idf, add more print messages
olgabot 25128ca
Use par_iter whenever possible
olgabot 562a6a2
Remove logsumexp from files
olgabot 6020a10
Add failing test to make sure prob_overlap only gets computed when --…
olgabot ce9e303
Remove logsumexp from rust file
olgabot de68dd9
Try to make prob_overlap calculation optional
olgabot a7e55cc
Make prob_overlap an optional column
olgabot 99b5373
remove unused and commented out code
olgabot 6de6933
add comment for estimate_prob_overlap
olgabot b40b512
Remove `let` keyword to stop "shadowing" the variables
olgabot 61e0ff4
add par_bridge() after iter_mins() for parallel computation
olgabot 49d9bee
Remove `let` from creating precomupted HashMaps for search significan…
olgabot effda46
Remove checking for non-existence of prob_overlap when it really shou…
olgabot 4360733
remove unsed 'mut'
olgabot b7ab720
Add float_round function
olgabot 0ea6635
Fix missing bracket
olgabot ae0eae2
Rename unused hashval variable -> _hashval
olgabot 5b4f415
Update protein fasta paths in test_sketch.py ... but also run black f…
olgabot 5fa77dd
Add comment about minhash not being defined
olgabot e81e4ba
remove commented out code
olgabot 4f6966d
Add clarification about squaring 1
olgabot 42cbfa5
Apply `cargo fix --lib -p sourmash_plugin_branchwater`
olgabot be298e6
Remove unused import
olgabot 85789c5
Just kidding, that import was used
olgabot 9bb17e5
Merge with branchwater main
olgabot ebbdcff
Fix SmallSignature import
olgabot e514993
Fix weirdness for test_simple_ani and test_simple_prob_overlap caused…
olgabot 39a8acf
Run black and fix zip True/False in test_against_multisigfile
olgabot c3b9c65
Merge branch 'main' into olgabot/multisearch-evalue
olgabot 342347d
whitespace
olgabot 16728c5
formatting
olgabot 192f428
"syn" package appeared twice
olgabot 9d1a29b
Trailing whitespace
olgabot ef92c57
Add protein k5 signature
olgabot 1fd5687
Apply black formatting to everytthing
olgabot e42440e
Merge black-applied python test files
olgabot 1ae344a
Missed some merge markers
olgabot 8627397
Missed more merge markers...
olgabot 6e474e8
Merge branch 'main' into olgabot/multisearch-evalue
olgabot 0d7a446
Fix black in test_multisearch.py
olgabot 0525c1f
Remove commented out code
olgabot f0f1c3a
unwrap -> expect
olgabot a63a5c3
Modularize the probability of overlap computation into functions
olgabot bcbff23
set values for prob_overlap results in the if statement
olgabot 8129f00
Merge branch 'main' into olgabot/multisearch-evalue
olgabot 7d3064b
Add longer argument name and description
olgabot d6c5bf9
Cargo fmt
olgabot e23ee7b
Borrow 'selection'
olgabot 53c221b
Clone selection
olgabot 49ff137
Add longer argument name
olgabot 93d6085
Use `new_selection` to set scaled
olgabot de6287b
Add @pytest.mark.xfail(reason="should work, bug") to `test_fastgather…
olgabot fd0deaf
Revert test_against_multisigfile back to main
olgabot caf90c3
Remove .clone() from selection
olgabot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CTB: to check. This clone should maybe not be needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was correct that the clone is not required; I like not having it because it consumes
selection
so you can't reuseselection
accidentally below.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, fixed! This saves a problem because I was using
selection
accidentally below .. so yay memory safety!