-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/fix hnsw search termination check #14215
Bugfix/fix hnsw search termination check #14215
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nasty performance bug, great catch @benwtrent
@iverase is the one who found it :D |
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
Outdated
Show resolved
Hide resolved
@@ -52,7 +52,7 @@ public boolean collect(int docId, float similarity) { | |||
|
|||
@Override | |||
public float minCompetitiveSimilarity() { | |||
return queue.size() >= k() ? queue.topScore() : Float.NEGATIVE_INFINITY; | |||
return queue.size() >= k() ? Math.nextUp(queue.topScore()) : Float.NEGATIVE_INFINITY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be instead of adding Math.nextUp
in different collectors, some of which we may miss, it is better to update local minAcceptedSimilarity
in HnswGraphSearcher:searchLevel
?
minAcceptedSimilarity = Math.nextUp( results.minCompetitiveSimilarity());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with that as well. I suppose the only thing that actually uses this is the graph searcher :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was more about the high-level contract of minCompetitiveSimilarity. It's not a big deal, but it does place the responsibility on the various implementers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with that as well. I suppose the only thing that actually uses this is the graph searcher :/
we now possibly have multiple class for graph searchers (extensions of AbstractHnswGraphSearcher
), but much better than going over all of the KnnCollectors
only perhaps it'd be nice if we could add an |
previously related PR: #12770 While my original change to help move us towards a saner HNSW search behavior, it is will still actually explore a candidate if its score is `==` min accepted. This will devolve in the degenerate case where all vectors are the same. This change adjusts minimum required candidate score to match `Math.nextUp`, similar to TopScoreDocCollector related to (but doesn't fully solve): #14214
previously related PR: #12770
While my original change to help move us towards a saner HNSW search behavior, it is will still actually explore a candidate if its score is
==
min accepted. This will devolve in the degenerate case where all vectors are the same.Here are some test runs. One test indexes the same vector many times. The other indexes the same 16 vectors many times.
There isn't much difference with the "few unique vectors" case from what I can tell. However, the super degenerate case where all scores are exactly the same, this is magnitudes faster.
Logically, it makes sense to make the condition to skip a candidate the exact same for adding a candidate.
Also note that this degenerate case with uniform vector scores got WAY worse with the connected components change.
Archive.zip
related to (but doesn't fully solve): #14214