-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(chromadb): queries without embeddings result in program crash #11328
fix(chromadb): queries without embeddings result in program crash #11328
Conversation
With chromadb, VectorStoreQuery queries with metadata only (without embeddings) result in error: "ValueError: You must provide one of query_embeddings, query_texts, query_images, or query_uris." The fix is to use chromadb get() method in such cases.
I think the idea is to provide an "empty vector" for the auto retriever to query with. But this works too |
@@ -291,13 +291,29 @@ def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResul | |||
else: | |||
where = kwargs.pop("where", {}) | |||
|
|||
results = self._collection.query( | |||
if not query.query_embedding: | |||
return self._get(limit=query.similarity_top_k, where=where, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there still be a limit if its purely a filter query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, from the perspective of the end-user they don't know whether the query() or get() method gets used, so when they set smililarity_top_k
they expect a max of this number of results.
) | ||
|
||
print(f"QUERY RES: {results}") | ||
print(f"EMBEDDINGS: {query_embeddings}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove debug prints
As I understand it, the "empty vector" is used when the query string is empty and is more of a default search string in the form of an embedding. It doesn't allow for a metadata only searches. |
Description
When using
VectorStoreQuery
to query chromadb by metadata only, without an embedding, it results ina program crash with the following error message:
ValueError: You must provide one of query_embeddings, query_texts, query_images, or query_uris.
Why is this a problem?
When using the
VectorIndexAutoRetriever
, it will sometimes suggest a metadata only search, without an embedding,and this will cause a program crash with the above error message.
This fix is to use the chromadb collection's
get()
method instead ofquery()
for cases where a metadata only search is required.Type of Change
How Has This Been Tested?
Added a new test
test_add_to_chromadb_and_query_by_metafilters_only
intotest_chromadb.py
.Run it with:
Suggested Checklist:
make format; make lint
to appease the lint gods