Refactor scoring to map leaf reader context with results #2271

VijayanB · 2024-11-14T20:13:52Z

Description

We map order of results to order of segments, and finally rely on that order to build top docs. Refactor method to use map.Entry to map leaf reader context with results from those leaves.
This is required when we want to split segments based on approx search or exact search to reduce rescoring twice by exact search

Related Issues

Prerequisite for #2215

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

We map order of results to order of segments, and finally rely on that order to build top docs. Refactor method to use map.Entry to map leafreader context with results from those leaves. This is required when we want to split segments based on approx search or exact search to reduce rescoring twice by exact search Signed-off-by: Vijayan Balasubramanian <[email protected]>

heemin32 · 2024-11-14T21:26:51Z

Do you have a follow up code showing why this is needed? Can't we always get the relation between leaf reader and the result based on their index in the list?

VijayanB · 2024-11-14T21:40:11Z

Do you have a follow up code showing why this is needed? Can't we always get the relation between leaf reader and the result based on their index in the list?

Sure. #2273 . This is draft PR. I want to test recall before making it ready.

heemin32 · 2024-11-14T22:20:53Z

Thanks. I think this refactoring should be accompanied with the final PR instead of pushing it separately.

VijayanB · 2024-11-14T22:47:50Z

Thanks. I think this refactoring should be accompanied with the final PR instead of pushing it separately.

Having multiple independent PR which doesn’t break existing feature helps to get reviewed faster. If I add refactoring and new implementation, it might be hard to find out bugs

heemin32 · 2024-11-14T22:52:06Z

The challenge is that it's difficult to justify this refactoring without the actual PR that requires it. There's always a possibility that the implementation in the subsequent PR may change, in which case this refactoring might end up providing no real benefit.

VijayanB · 2024-11-14T23:45:14Z

The challenge is that it's difficult to justify this refactoring without the actual PR that requires it. There's always a possibility that the implementation in the subsequent PR may change, in which case this refactoring might end up providing no real benefit.

Fair enough. How about this? I will backport to 2.x only if both PRs are merged. Will that work?

heemin32 · 2024-11-15T01:06:02Z

I am against merging it in main. Doesn't matter we backport it to 2.x or not. The change is not big so I think it should be okay to have this with following PR together.

VijayanB · 2024-11-15T01:21:20Z

I am against merging it in main. Doesn't matter we backport it to 2.x or not. The change is not big so I think it should be okay to have this with following PR together.

I don't see a reason why it is absolute necessary that this feature cannot be broken into two PR with predefined scope, where each does two different thing. Can you give reason why you are against breaking into two PRs by pointing out how it breaks either build or existing feature?

heemin32 · 2024-11-15T16:39:54Z

I am against merging it in main. Doesn't matter we backport it to 2.x or not. The change is not big so I think it should be okay to have this with following PR together.

I don't see a reason why it is absolute necessary that this feature cannot be broken into two PR with predefined scope, where each does two different thing. Can you give reason why you are against breaking into two PRs by pointing out how it breaks either build or existing feature?

Thank you for your effort in keeping the PR small to make the review process easier—I really appreciate that. My concern is that the value of this PR depends heavily on the approval of the subsequent PR, as it primarily supports those changes. If the following PR undergoes modifications, this one may also need to be adjusted accordingly. By the way, I didn't say it will breaks either build or existing feature.

shatejas · 2024-11-19T17:09:37Z

src/main/java/org/opensearch/knn/index/query/nativelib/NativeEngineKnnVectorQuery.java

        for (LeafReaderContext leafReaderContext : leafReaderContexts) {
            tasks.add(() -> searchLeaf(leafReaderContext, knnWeight, k));
        }
        return indexSearcher.getTaskExecutor().invokeAll(tasks);
    }

-    private List<Map<Integer, Float>> doRescore(
+    private List<Map.Entry<LeafReaderContext, Map<Integer, Float>>> doRescore(


This data structure is getting complex, can we look into simplifying it in favor of readability?

VijayanB force-pushed the refactor-scoring branch from 3504312 to a8b3c1d Compare November 14, 2024 20:19

VijayanB marked this pull request as ready for review November 14, 2024 20:49

VijayanB requested review from heemin32, navneet1v, vamshin, jmazanec15, naveentatikonda, junqiu-lei, martin-gaievski, ryanbogan, luyuncheng and shatejas as code owners November 14, 2024 20:49

shatejas reviewed Nov 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor scoring to map leaf reader context with results #2271

Refactor scoring to map leaf reader context with results #2271

VijayanB commented Nov 14, 2024

heemin32 commented Nov 14, 2024 •

edited

Loading

VijayanB commented Nov 14, 2024

heemin32 commented Nov 14, 2024

VijayanB commented Nov 14, 2024

heemin32 commented Nov 14, 2024

VijayanB commented Nov 14, 2024

heemin32 commented Nov 15, 2024

VijayanB commented Nov 15, 2024

heemin32 commented Nov 15, 2024

shatejas Nov 19, 2024

Refactor scoring to map leaf reader context with results #2271

Are you sure you want to change the base?

Refactor scoring to map leaf reader context with results #2271

Conversation

VijayanB commented Nov 14, 2024

Description

Related Issues

Check List

heemin32 commented Nov 14, 2024 • edited Loading

VijayanB commented Nov 14, 2024

heemin32 commented Nov 14, 2024

VijayanB commented Nov 14, 2024

heemin32 commented Nov 14, 2024

VijayanB commented Nov 14, 2024

heemin32 commented Nov 15, 2024

VijayanB commented Nov 15, 2024

heemin32 commented Nov 15, 2024

shatejas Nov 19, 2024

Choose a reason for hiding this comment

heemin32 commented Nov 14, 2024 •

edited

Loading