Split segment by search type #2273

VijayanB · 2024-11-14T21:39:24Z

Description

For exact search, it is not required to perform
qunatization during rescore with oversamples.
However, to avoid normalization between segments from
approx search and exact search, we will first identify
segments that needs approxsearch and will perform oversamples
and, at end, after rescore, we will add scores from segments that
will perform exact search.

Related Issues

#2215

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

heemin32 · 2024-11-14T22:19:05Z

src/main/java/org/opensearch/knn/index/query/nativelib/NativeEngineKnnVectorQuery.java

            if (isShardLevelRescoringEnabled == true) {
                ResultUtil.reduceToTopK(perLeafResults, firstPassK);
            }

            StopWatch stopWatch = new StopWatch().start();
-            perLeafResults = doRescore(indexSearcher, leafReaderContexts, knnWeight, perLeafResults, finalK);
+            perLeafResults = doRescore(indexSearcher, knnWeight, perLeafResults, finalK);


Previously, we did exact search on finalK. After this change, we still does exact search on finalK. Could you tell me how will this improve the latency?

doSearch can call either ApproxSearch or Exact Search based on conditions like whether engine files exists or not, number of docs after filter is less than k. In those cases, we will quantize query vector, and every vector from segments, and, then perform distance computation using Hamming distance for firstPassK. With this approach, we only call doSearch for those segments which we know will always call approxsearch, and, for other segments we will call exact search without quantization with finalK. The optimization is at https://github.com/opensearch-project/k-NN/pull/2273/files#diff-9cfe412357ba56b3ef216427d491fc653535686a760e8ba19ea1aa00fc0e0338R68-R78

Are you assuming that an exact search on full precision vectors will be faster than an exact search with quantized vectors due to the slower quantization process? It would be interesting to see the benchmark results for this.

If that’s the case, an alternative could be to retrieve quantized values directly from the Faiss file instead of performing on-the-fly quantization.

Yes, exact search on full precision for k is less than, exact search on quantization for first pass K + rescore matched docs on full precision . The linked GitHub issues actually shows how performance got impacted 10x when there are segments with no faiss engine files. In my POC, I saw improvements but recall was poor because of using order as link between results and leaf reader context. I am rerunning experiments with my change to collect metrics with latency and recall

There is one case where we are running exact search; when the returned result is less than k. Are we going to handle that case as well?

I believe this happens with filter, if so, yes, it was already taken care

No. It happens regardless there is filter or not. https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/query/KNNWeight.java#L149

No. It can happen either when there are no engine files or after filter that number of matched documents is less than k. We only decided to call doSearch if we know that it will call Approx Search API. For other segments we will directly call Exact Search, this PR is about that only https://github.com/opensearch-project/k-NN/pull/2273/files#diff-9cfe412357ba56b3ef216427d491fc653535686a760e8ba19ea1aa00fc0e0338R72-R75

We map order of results to order of segments, and finally rely on that order to build top docs. Refactor method to use map.Entry to map leafreader context with results from those leaves. This is required when we want to split segments based on approx search or exact search to reduce rescoring twice by exact search Signed-off-by: Vijayan Balasubramanian <[email protected]>

For exact search, it is not required to perform qunatization during rescore with oversamples. However, to avoid normalization between segments from approx search and exact search, we will first identify segments that needs approxsearch and will perform oversamples and, at end, after rescore, we will add scores from segments that will perform exact search. Signed-off-by: Vijayan Balasubramanian <[email protected]>

jmazanec15 · 2024-11-19T02:12:09Z

Should we skip quantization on indexing if we are not using it here then?

navneet1v · 2024-11-19T05:04:41Z

Should we skip quantization on indexing if we are not using it here then?

+1. I think it is valid point. We should do that too. But @VijayanB there were couple of more ideas that we were discussing on how to fix this issue. Did you put some thoughts on that like why we should split the segments by search type?

navneet1v · 2024-11-19T05:05:33Z

src/main/java/org/opensearch/knn/index/query/KNNWeight.java

+    /**
+     * For given {@link LeafReaderContext}, this api will return will KNNWeight perform exact search or not
+     * always. This decision is based on two properties, 1) if there are no native engine files in segments,
+     * exact search will always be performed, 2) if number of docs after filter is less than 'k'


this java doc is incorrect. We have bit more logic around this. So lets fix this java doc.

navneet1v · 2024-11-19T05:09:25Z

src/main/java/org/opensearch/knn/index/query/ResultUtil.java

@@ -30,14 +31,15 @@ public final class ResultUtil {
     * @param perLeafResults Results from the list
     * @param k the number of results across all leaf results to return
     */
-    public static void reduceToTopK(List<Map<Integer, Float>> perLeafResults, int k) {
+    public static void reduceToTopK(List<Map.Entry<LeafReaderContext, Map<Integer, Float>>> perLeafResults, int k) {


should move have a List<FirstPassResults> where

class FirstPassResults { private LeafReaderContext leafReaderContext; private Map<Integer, Float> docToScoreMap; }

this will help in future to abstract more details related to search for a segment.

Feel free to have a better name for classes.

Would PerLeafResult work as a class name? It feels more generic and versatile compared to including FirstPass. Perhaps firstPassResults could be used as a variable name instead?

shatejas · 2024-11-20T18:08:39Z

src/main/java/org/opensearch/knn/index/query/KNNWeight.java

+     * @throws IOException
+     */
+    public boolean isExactSearchPreferred(LeafReaderContext context) throws IOException {
+        final BitSet filterBitSet = getFilteredDocsBitSet(context);


For each ANN leaf, getting the filter Bitset is now happening twice. Once for this check, and then for getting the actual filter bitset.

Considering worst case scenario, This goes through filterWeight.scorer twice and then creating a bitset which involves a linear loop

How confident are we that this won't impact latencies for filtering cases?

Can we avoid this duplicate? One way is to do only engine files empty check here. and then pass the bitset in searchLeaf and in exact Search context

Let me know if I am missing something

shatejas · 2024-11-20T18:09:48Z

src/main/java/org/opensearch/knn/index/query/nativelib/NativeEngineKnnVectorQuery.java

            });
        }
        return indexSearcher.getTaskExecutor().invokeAll(rescoreTasks);
    }

+    private List<Map.Entry<LeafReaderContext, Map<Integer, Float>>> doExactSearch(


doRescore and this method are more or less similar, any possiblity that we can reuse something here?

heemin32 · 2024-11-21T16:42:52Z

Should we skip quantization on indexing if we are not using it here then?

+1. I think it is valid point. We should do that too. But @VijayanB there were couple of more ideas that we were discussing on how to fix this issue. Did you put some thoughts on that like why we should split the segments by search type?

Another option could be to read the quantized values directly from the native engine file, as discussed here: #2266. This approach would also address cases where the search results are fewer than k, and we fall back to an exact search.

Or, we could also store quantized value in lucene segment file and read from it directly so that we can avoid quantization during exact search.

VijayanB mentioned this pull request Nov 14, 2024

Refactor scoring to map leaf reader context with results #2271

Open

5 tasks

heemin32 reviewed Nov 14, 2024

View reviewed changes

VijayanB force-pushed the split-segment-by-search-type branch 2 times, most recently from 235460f to 229e788 Compare November 18, 2024 21:39

VijayanB added 2 commits November 18, 2024 17:40

VijayanB force-pushed the split-segment-by-search-type branch from 229e788 to b149309 Compare November 19, 2024 01:40

navneet1v reviewed Nov 20, 2024

View reviewed changes

shatejas reviewed Nov 20, 2024

View reviewed changes

shatejas mentioned this pull request Nov 25, 2024

Add Support for Multi Values in innerHit for Nested k-NN Fields in Lucene and FAISS #2283

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split segment by search type #2273

Split segment by search type #2273

VijayanB commented Nov 14, 2024

heemin32 Nov 14, 2024

VijayanB Nov 14, 2024 •

edited

Loading

heemin32 Nov 14, 2024

VijayanB Nov 14, 2024 •

edited

Loading

heemin32 Nov 15, 2024

VijayanB Nov 18, 2024

heemin32 Nov 18, 2024

VijayanB Nov 18, 2024 •

edited

Loading

jmazanec15 commented Nov 19, 2024

navneet1v commented Nov 19, 2024

navneet1v Nov 19, 2024

navneet1v Nov 19, 2024

heemin32 Nov 20, 2024 •

edited

Loading

shatejas Nov 20, 2024

shatejas Nov 20, 2024 •

edited

Loading

shatejas Nov 20, 2024

heemin32 commented Nov 21, 2024 •

edited

Loading

Split segment by search type #2273

Are you sure you want to change the base?

Split segment by search type #2273

Conversation

VijayanB commented Nov 14, 2024

Description

Related Issues

Check List

Choose a reason for hiding this comment

VijayanB Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VijayanB Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VijayanB Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

jmazanec15 commented Nov 19, 2024

navneet1v commented Nov 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heemin32 Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shatejas Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heemin32 commented Nov 21, 2024 • edited Loading

VijayanB Nov 14, 2024 •

edited

Loading

VijayanB Nov 14, 2024 •

edited

Loading

VijayanB Nov 18, 2024 •

edited

Loading

heemin32 Nov 20, 2024 •

edited

Loading

shatejas Nov 20, 2024 •

edited

Loading

heemin32 commented Nov 21, 2024 •

edited

Loading