fix(chromadb): queries without embeddings result in program crash #11328

poxrud · 2024-02-23T17:08:11Z

Description

When using VectorStoreQuery to query chromadb by metadata only, without an embedding, it results in
a program crash with the following error message:

ValueError: You must provide one of query_embeddings, query_texts, query_images, or query_uris.

Why is this a problem?

When using the VectorIndexAutoRetriever, it will sometimes suggest a metadata only search, without an embedding,
and this will cause a program crash with the above error message.

This fix is to use the chromadb collection's get() method instead of query() for cases where a metadata only search is required.

Type of Change

Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Added a new test test_add_to_chromadb_and_query_by_metafilters_only into test_chromadb.py.
Run it with:

pytest llama-index-integrations/vector_stores/llama-index-vector-stores-chroma/tests/test_chromadb.py

Added new unit/integration tests
I stared at the code and made sure it makes sense

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

With chromadb, VectorStoreQuery queries with metadata only (without embeddings) result in error: "ValueError: You must provide one of query_embeddings, query_texts, query_images, or query_uris." The fix is to use chromadb get() method in such cases.

logan-markewich · 2024-02-23T17:40:33Z

I think the idea is to provide an "empty vector" for the auto retriever to query with. But this works too

logan-markewich · 2024-02-23T17:41:02Z

...ions/vector_stores/llama-index-vector-stores-chroma/llama_index/vector_stores/chroma/base.py

@@ -291,13 +291,29 @@ def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResul
        else:
            where = kwargs.pop("where", {})

-        results = self._collection.query(
+        if not query.query_embedding:
+            return self._get(limit=query.similarity_top_k, where=where, **kwargs)


Should there still be a limit if its purely a filter query?

I think so, from the perspective of the end-user they don't know whether the query() or get() method gets used, so when they set smililarity_top_k they expect a max of this number of results.

logan-markewich · 2024-02-23T17:41:24Z

...ions/vector_stores/llama-index-vector-stores-chroma/llama_index/vector_stores/chroma/base.py

+        )
+
+        print(f"QUERY RES: {results}")
+        print(f"EMBEDDINGS: {query_embeddings}")


remove debug prints

poxrud · 2024-02-27T16:26:50Z

I think the idea is to provide an "empty vector" for the auto retriever to query with. But this works too

As I understand it, the "empty vector" is used when the query string is empty and is more of a default search string in the form of an embedding. It doesn't allow for a metadata only searches.

…n-llama#11328)

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 23, 2024

logan-markewich reviewed Feb 23, 2024

View reviewed changes

remove debug prints

baf4fd3

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Feb 27, 2024

logan-markewich approved these changes Feb 28, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 28, 2024

logan-markewich merged commit 381afe6 into run-llama:main Feb 28, 2024
8 checks passed

Dominastorm pushed a commit to uptrain-ai/llama_index that referenced this pull request Feb 28, 2024

fix(chromadb): queries without embeddings result in program crash (ru…

27b7003

…n-llama#11328)

anoopshrma pushed a commit to anoopshrma/llama_index that referenced this pull request Mar 2, 2024

fix(chromadb): queries without embeddings result in program crash (ru…

b05a765

…n-llama#11328)

Izukimat pushed a commit to Izukimat/llama_index that referenced this pull request Mar 29, 2024

fix(chromadb): queries without embeddings result in program crash (ru…

caf0352

…n-llama#11328)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(chromadb): queries without embeddings result in program crash #11328

fix(chromadb): queries without embeddings result in program crash #11328

poxrud commented Feb 23, 2024

logan-markewich commented Feb 23, 2024

logan-markewich Feb 23, 2024

poxrud Feb 27, 2024

logan-markewich Feb 23, 2024

poxrud commented Feb 27, 2024

fix(chromadb): queries without embeddings result in program crash #11328

fix(chromadb): queries without embeddings result in program crash #11328

Conversation

poxrud commented Feb 23, 2024

Description

Why is this a problem?

Type of Change

How Has This Been Tested?

Suggested Checklist:

logan-markewich commented Feb 23, 2024

logan-markewich Feb 23, 2024

Choose a reason for hiding this comment

poxrud Feb 27, 2024

Choose a reason for hiding this comment

logan-markewich Feb 23, 2024

Choose a reason for hiding this comment

poxrud commented Feb 27, 2024