Skip to content

fix(chroma): add missing Euclidean relevance-score function #31643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mohiuddin-khan-shiam
Copy link

This patch restores relevance-scored retrieval for the Chroma vector store.

Problem
Chroma._select_relevance_score_fn() defaults to the L2 ("l2") metric, but the corresponding _euclidean_relevance_score_fn() was never implemented. Any call to similarity_search_with_relevance_scores() therefore raised AttributeError, breaking default Chroma searches and the related test test_chroma_with_relevance_score_custom_normalization_fn.

Solution
Introduced @staticmethod _euclidean_relevance_score_fn(distance: float) -> float using the normalization 1 / (1 + distance), ensuring:

  • distance = 0 → score = 1 (most relevant)
  • distance → ∞ → score → 0 (least relevant)

Impact
• Re-enables relevance-score queries for Chroma with L2 distance.
• Unblocks dependent retrievers and integration tests.
• Keeps API behavior consistent with other vector stores (e.g., Qdrant).

This patch restores relevance-scored retrieval for the [Chroma](cci:2://file:///d:/Github/langchain/libs/partners/chroma/langchain_chroma/vectorstores.py:143:0-1263:50) vector store.

Problem
`Chroma._select_relevance_score_fn()` defaults to the L2 (`"l2"`) metric, but the corresponding [_euclidean_relevance_score_fn()](cci:1://file:///d:/Github/langchain/libs/partners/chroma/langchain_chroma/vectorstores.py:755:4-763:44) was never implemented. Any call to [similarity_search_with_relevance_scores()](cci:1://file:///d:/Github/langchain/libs/core/langchain_core/vectorstores/base.py:533:4-580:36) therefore raised `AttributeError`, breaking default Chroma searches and the related test [test_chroma_with_relevance_score_custom_normalization_fn](cci:1://file:///d:/Github/langchain/libs/partners/chroma/tests/integration_tests/test_vectorstores.py:516:0-536:5).

Solution
Introduced `@staticmethod _euclidean_relevance_score_fn(distance: float) -> float` using the normalization `1 / (1 + distance)`, ensuring:
* distance = 0 → score = 1 (most relevant)
* distance → ∞ → score → 0 (least relevant)

Impact
• Re-enables relevance-score queries for Chroma with L2 distance.
• Unblocks dependent retrievers and integration tests.
• Keeps API behavior consistent with other vector stores (e.g., Qdrant).
Copy link

vercel bot commented Jun 17, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 17, 2025 7:00pm

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jun 17, 2025
Copy link

codspeed-hq bot commented Jun 17, 2025

CodSpeed Walltime Performance Report

Merging #31643 will not alter performance

Comparing mohiuddin-khan-shiam:master (4c8d802) with master (c1c3e13)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 13 untouched benchmarks

Copy link

codspeed-hq bot commented Jun 17, 2025

CodSpeed Instrumentation Performance Report

Merging #31643 will not alter performance

Comparing mohiuddin-khan-shiam:master (4c8d802) with master (c1c3e13)

Summary

✅ 13 untouched benchmarks

Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Could you clarify what's happening?

Chroma will use the base implementation of _euclidean_relevance_score_fn, it's not the case that an implementation is missing (though the implementation may be incompatible with Chroma, as it assumes distances are bound by sqrt(2)).

There is a comment on the test you mention that suggests it is broken, but it is passing.

Your change might be right but could you reflect the change in a test? The test should fail on the master branch and pass here.

@ccurme ccurme self-assigned this Jun 17, 2025
@ccurme ccurme added the needs test PR needs to be updated with tests label Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature needs test PR needs to be updated with tests size:S This PR changes 10-29 lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants