Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support specifying a similarity threshold #21

Open
tomusher opened this issue Nov 27, 2023 · 1 comment
Open

Support specifying a similarity threshold #21

tomusher opened this issue Nov 27, 2023 · 1 comment

Comments

@tomusher
Copy link
Member

For each of the vector index methods, similar, search and query, support specifying a similarity threshold where anything below that value will be excluded from the results/the documents included in the query.

I'm not sure if this will need normalising across backends.

@tm-kn
Copy link
Member

tm-kn commented Nov 29, 2023

We'd expect backends to support this.

Otherwise it should have no effect. We use the limit and ordering so that should be enough to not return too many results.

brylie added a commit to brylie/wagtail-vector-index that referenced this issue Jul 30, 2024
Related to wagtail#21

Add support for specifying a similarity threshold in vector search methods.

* **VectorIndex Class**: Add `similarity_threshold` parameter to `query`, `find_similar`, and `search` methods in `src/wagtail_vector_index/storage/base.py`.
* **PgvectorIndexMixin Class**: Update `get_similar_documents` and `aget_similar_documents` methods to accept `similarity_threshold` parameter and filter results based on it in `src/wagtail_vector_index/storage/pgvector/provider.py`.
* **QdrantIndexMixin Class**: Update `get_similar_documents` method to accept `similarity_threshold` parameter and filter results based on it in `src/wagtail_vector_index/storage/qdrant/provider.py`.
* **WeaviateIndexMixin Class**: Update `get_similar_documents` method to accept `similarity_threshold` parameter and filter results based on it in `src/wagtail_vector_index/storage/weaviate/provider.py`.
* **NumpyIndexMixin Class**: Update `get_similar_documents` method to accept `similarity_threshold` parameter and filter results based on it in `src/wagtail_vector_index/storage/numpy/provider.py`.
* **Documentation**: Update `docs/vector-indexes.md` to include information on the new `similarity_threshold` parameter and provide examples of its usage.
* **Tests**: Add tests for the new `similarity_threshold` parameter in `query`, `find_similar`, and `search` methods in `tests/test_index.py`.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/wagtail/wagtail-vector-index/issues/21?shareId=XXXX-XXXX-XXXX-XXXX).
tomusher pushed a commit that referenced this issue Aug 12, 2024
* Support specifying a similarity threshold

Related to #21

Add support for specifying a similarity threshold in vector search methods.

* **VectorIndex Class**: Add `similarity_threshold` parameter to `query`, `find_similar`, and `search` methods in `src/wagtail_vector_index/storage/base.py`.
* **PgvectorIndexMixin Class**: Update `get_similar_documents` and `aget_similar_documents` methods to accept `similarity_threshold` parameter and filter results based on it in `src/wagtail_vector_index/storage/pgvector/provider.py`.
* **QdrantIndexMixin Class**: Update `get_similar_documents` method to accept `similarity_threshold` parameter and filter results based on it in `src/wagtail_vector_index/storage/qdrant/provider.py`.
* **WeaviateIndexMixin Class**: Update `get_similar_documents` method to accept `similarity_threshold` parameter and filter results based on it in `src/wagtail_vector_index/storage/weaviate/provider.py`.
* **NumpyIndexMixin Class**: Update `get_similar_documents` method to accept `similarity_threshold` parameter and filter results based on it in `src/wagtail_vector_index/storage/numpy/provider.py`.
* **Documentation**: Update `docs/vector-indexes.md` to include information on the new `similarity_threshold` parameter and provide examples of its usage.
* **Tests**: Add tests for the new `similarity_threshold` parameter in `query`, `find_similar`, and `search` methods in `tests/test_index.py`.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/wagtail/wagtail-vector-index/issues/21?shareId=XXXX-XXXX-XXXX-XXXX).

* Add similarity_score to mock get_similar_documents function

* Ignore .DS_Store

* Fix test_search_with_similarity_threshold

* Update similarity documentation

* Run pre-commit

* Simplify/clarify docs with Grammarly

* Fix Pyright error

* Use only one set of vectors

* Use Pytest style assertions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants