Fix to Cosine Similiarity To Probability Clipping #326
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR fixes an issue in the
_convert_cosine_similarity_to_probability
function where values outside the range of[-1, 1]
were not clipped, leading to overflow problems during cosine similarity conversion in theretrieve_string_queries
method.Summary of Changes:
np.clip
to ensure that values inD
are restricted to the range[-1, 1]
before further calculations.-1
for indices that don't match during cosine similarity searches, ensuring compatibility with the expected value range for subsequent operations.Before Fix:
The function did not clip
D
values, causing potential overflow issues whenD = (D + 1) / 2
was performed.After Fix:
The updated function now clips
D
to[-1, 1]
, preventing overflow errors and ensuring accurate conversion to probabilities:Error Log Before Fix:
Logs After Fix:
Breaking Changes:
None. The changes are backward-compatible and resolve overflow issues without altering other functionality.
Was this discussed/agreed via a GitHub issue?
Yes.
Did you read the contributor guideline?
Yes.
Did you make sure your PR does only one thing, instead of bundling different changes together?
Yes.
Did you make sure to update the documentation with your changes?
Not applicable.
Did you write any new necessary tests?
Not applicable
Did you verify new and existing tests pass locally with your changes?
Yes.
Did you list all the breaking changes introduced by this pull request?
No breaking changes were introduced.
Additional Notes:
This fix ensures robustness and prevents errors during cosine similarity searches, especially when handling indices that do not match at all in FAISS Index. You can see more here https://docs.google.com/document/d/1ILCbNgrD6ILjHDHZV1rh7eKa-QPIHQYujoDj7q3nQ7E/edit?tab=t.0
Had fun solving this! 🙃