-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE]Remove double converting from byte[] to float[], and float[] to byte[] for script scoring with binary vector #1827
Comments
@heemin32 why we are not resolving this as part of 2.16 when this impacts the latency? |
It is not a trivial change and 2.16 code freeze is just a next Monday. Score script is already slow compared to ANN and the impact on latency due to double converting could be negligible though actual profiling might be needed to confirm. |
till we don't have benchmarks we cannot say this. and leaving optimization out of a release doesn't seem correct to me. |
Optimization can be implemented incrementally. That is how it has been till now. |
@kasundra07 is working on this. |
You can assign this to me. |
DetailsHere, in
So, for binary vectors, either For both these classes, we have doGetValue() method where we convert the underlying vector to float[] - KNNByteVectorScriptDocValues -
KNNNativeVectorScriptDocValues -
And for binary vector, script scoring is supported for hamming space type and here we convert the float[] back to byte[] for calculating hamming distance -
We want to avoid this double conversion, byte[] to float[] & float[] to byte[]. SolutionCurrently
Here, processedQuery and scoringMethod are bound to float[] vectors -
Since we need byte[] alternatives of these for Hamming, we can instead implement the interface And within the execute() method of this new scoring script class (KNNByteVectorType), we can call getByteValue() instead of getValue() on So, both the classes mentioned earlier - ResultsDataset used - http://corpus-texmex.irisa.fr/ (ANN_SIFT1M)
@VijayanB does this approach seem reasonable to you? |
In #1826 which support script scoring on knn binary vector, we convert byte[] to float[] and again float[] to byte[] which will add a latency during query using script scoring on knn binary vector. We want to avoid the unnecessary converting.
The text was updated successfully, but these errors were encountered: