Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/fix hnsw search termination check #14215

Merged

Conversation

benwtrent
Copy link
Member

previously related PR: #12770

While my original change to help move us towards a saner HNSW search behavior, it is will still actually explore a candidate if its score is == min accepted. This will devolve in the degenerate case where all vectors are the same.

Here are some test runs. One test indexes the same vector many times. The other indexes the same 16 vectors many times.

There isn't much difference with the "few unique vectors" case from what I can tell. However, the super degenerate case where all scores are exactly the same, this is magnitudes faster.

Logically, it makes sense to make the condition to skip a candidate the exact same for adding a candidate.

Also note that this degenerate case with uniform vector scores got WAY worse with the connected components change.

Archive.zip

related to (but doesn't fully solve): #14214

@benwtrent benwtrent added this to the 10.1.1 milestone Feb 7, 2025
@benwtrent benwtrent requested review from tteofili and iverase February 7, 2025 18:32
@benwtrent benwtrent modified the milestones: 10.1.1, 10.2.0 Feb 7, 2025
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nasty performance bug, great catch @benwtrent

@benwtrent
Copy link
Member Author

@iverase is the one who found it :D

@@ -52,7 +52,7 @@ public boolean collect(int docId, float similarity) {

@Override
public float minCompetitiveSimilarity() {
return queue.size() >= k() ? queue.topScore() : Float.NEGATIVE_INFINITY;
return queue.size() >= k() ? Math.nextUp(queue.topScore()) : Float.NEGATIVE_INFINITY;
Copy link
Contributor

@mayya-sharipova mayya-sharipova Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be instead of adding Math.nextUp in different collectors, some of which we may miss, it is better to update local minAcceptedSimilarity in HnswGraphSearcher:searchLevel ?

minAcceptedSimilarity = Math.nextUp( results.minCompetitiveSimilarity());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that as well. I suppose the only thing that actually uses this is the graph searcher :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was more about the high-level contract of minCompetitiveSimilarity. It's not a big deal, but it does place the responsibility on the various implementers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that as well. I suppose the only thing that actually uses this is the graph searcher :/

we now possibly have multiple class for graph searchers (extensions of AbstractHnswGraphSearcher), but much better than going over all of the KnnCollectors

@tteofili
Copy link
Contributor

tteofili commented Feb 11, 2025

only perhaps it'd be nice if we could add an @Monster / nightly test to make sure we don't run in this again in the future.

@benwtrent benwtrent merged commit a6a96cd into apache:main Feb 11, 2025
6 checks passed
@benwtrent benwtrent deleted the bugfix/fix-hnsw-search-termination-check branch February 11, 2025 19:05
benwtrent added a commit that referenced this pull request Feb 11, 2025
previously related PR: #12770

While my original change to help move us towards a saner HNSW search behavior, it is will still actually explore a candidate if its score is `==` min accepted. This will devolve in the degenerate case where all vectors are the same.

This change adjusts minimum required candidate score to match `Math.nextUp`, similar to TopScoreDocCollector
related to (but doesn't fully solve): #14214
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants