Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to Cosine Similiarity To Probability Clipping #326

Conversation

Unobtainiumrock
Copy link
Collaborator

What does this PR do?

This PR fixes an issue in the _convert_cosine_similarity_to_probability function where values outside the range of [-1, 1] were not clipped, leading to overflow problems during cosine similarity conversion in the retrieve_string_queries method.

Summary of Changes:

  • Added np.clip to ensure that values in D are restricted to the range [-1, 1] before further calculations.
  • This change addresses issues caused by FAISS Index returning -1 for indices that don't match during cosine similarity searches, ensuring compatibility with the expected value range for subsequent operations.

Before Fix:

The function did not clip D values, causing potential overflow issues when D = (D + 1) / 2 was performed.

After Fix:

The updated function now clips D to [-1, 1], preventing overflow errors and ensuring accurate conversion to probabilities:

def _convert_cosine_similarity_to_probability(self, D: np.ndarray) -> np.ndarray:
    D = np.clip(D, -1, 1)
    D = (D + 1) / 2
    D = np.round(D, 3)
    return D

Error Log Before Fix:

# (Include the relevant error log from the "Error Encountered" section of the image here)

Logs After Fix:

# (Include the log output from the "Log After Error Resolved" section of the image here)

Breaking Changes:

None. The changes are backward-compatible and resolve overflow issues without altering other functionality.

  • Was this discussed/agreed via a GitHub issue?
    Yes.

  • Did you read the contributor guideline?
    Yes.

  • Did you make sure your PR does only one thing, instead of bundling different changes together?
    Yes.

  • Did you make sure to update the documentation with your changes?
    Not applicable.

  • Did you write any new necessary tests?
    Not applicable

  • Did you verify new and existing tests pass locally with your changes?
    Yes.

  • Did you list all the breaking changes introduced by this pull request?
    No breaking changes were introduced.

Additional Notes:

This fix ensures robustness and prevents errors during cosine similarity searches, especially when handling indices that do not match at all in FAISS Index. You can see more here https://docs.google.com/document/d/1ILCbNgrD6ILjHDHZV1rh7eKa-QPIHQYujoDj7q3nQ7E/edit?tab=t.0

Had fun solving this! 🙃

… cosine similarity scores into probability scores
Copy link
Member

@liyin2015 liyin2015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code is fine, but need to set the formatting to be more consistent with the library.

@liyin2015 liyin2015 merged commit 45fa558 into SylphAI-Inc:main Jan 28, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants