You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For 1M-768 dim vectors, the fastest exact search can do is 102 ms/119 ms/124 ms with lucene backed storage and simd enabled
There is a 2x difference in perf between lucene and plugin formats. When all vectors can fit in memory, the p50/p90/p99 latency is 323 ms/325 ms/344 ms (no simd) for lucene backed storage and 568 ms/584 ms/594 ms (no simd) for plugin backed storage. This indicates that lucene’s vector format is almost 2x faster than the plugins for script scoring. The cause of this appears to be that Lucene is able to directly map float vectors into JVM via Panama. For the plugin, it requires copying bytes in and then deserializing. There is overlap with [Enhancement] Optimize the de-serialization of vector when reading from Doc Values #1050
SIMD gave a 3x improvement over non-SIMD. Without SIMD, for the Lucene backed storage, the p50/p90/p99 latency is 323 ms/325 ms/344 ms. With SIMD, it is 101 ms/119 ms/124 ms.
Test Configuration
The following configurations were used to execute these tests:
Config Key
Value
Base OS version
2.14
Instance type
r5.4xlarge
Data set
cohere
Num index vecs
1M
Dimension
768
OSB Mem
64
OSB CPU
4
OS-metrics Mem
1 GB
OS-metrics JVM
512 MB
OS-metrics CPUs
2
OS-metrics Mem
12 GB
OS-test JVM
4 GB
OS-test CPUs
8
Primary shards
8
Replica shards
0
segment count
1
k
100
space type
innerproduct
disk
gp3
disk size
500 GB
query clients
1
index clients
1
Results - Lucene w/o SIMD
Run #
p50 latency (ms)
p90 latency (ms)
p99 latency (ms)
Recall
1
324
326
344
0.99998
2
324
325
328
0.99998
Results - Lucene w/ SIMD
Run #
p50 latency (ms)
p90 latency (ms)
p99 latency (ms)
Recall
1
102
119
124
0.999999
2
103
119
125
0.999999
Results - Plugin w/o SIMD
Run #
p50 latency (ms)
p90 latency (ms)
p99 latency (ms)
Recall
1
674
684
692
0.99998
2
674
684
692
0.99998
Results - Plugin w/o SIMD (with optimizations)
Run #
p50 latency (ms)
p90 latency (ms)
p99 latency (ms)
Recall
1
568
584
594
0.99998
2
568
584
596
0.99998
The text was updated successfully, but these errors were encountered:
Description
Meta-issue for improving performance of exact search with script scoring.
AIs
Single Node Experiments
Overview
I ran several experiments show-casing performance of exact scoring for single node. In addition, I captured several profiling examples.
The testing code can be found in https://github.com/jmazanec15/opensearch-knn-rescore-experiments/. Code that was benchmarked can be found in https://github.com/jmazanec15/k-NN-1/tree/exact-scoring-exps. The cohere dataset with 1M-768 dim vecs, 10k queries, and innerproduct space type was used.
Result Summary
Test Configuration
The following configurations were used to execute these tests:
Results - Lucene w/o SIMD
Results - Lucene w/ SIMD
Results - Plugin w/o SIMD
Results - Plugin w/o SIMD (with optimizations)
The text was updated successfully, but these errors were encountered: