Enhance (Qdrant) hybrid/sparse search support #15483

enrico-stauss · 2024-08-19T11:01:41Z

Description

This PR gives more explicit control over sparse_embeddings to the developer of RAG applications. The motivation is provided in this issue and the PR directly adapts the QdrantVectorStore. Other VectorStores that support hybrid retrieval will need to be adapted.

This PR aims at enhancing the useablility of the Qdrant hybrid search functionality by adding sparse embedding fields to both the VectorStoreQuery and QueryBundle and using the provided sparse embedding rather than recomputing it.

Additionally the sparse_embedding field is added to the BaseNode class and if set, used by the QdrantVectorStore.add method rather than recomputing it.

Along with the described changes, a larger refactoring reduces code duplication for the QdrantVectorStore.{query, aquery} and enhances readability. The add method got a minor rework, too.

Use Case

If working with (a subclass of) the QueryFusionRetriever and multipe retrievers (that use the same embedding models), the query embedding would be calculated more than once per embedding string. Caching won't help as the embeddings are generally calculated asynchronously. To avoid the resource overhead of duplicate computation, passing the precompted embeddings was possible for dense embeddings already.

Implementation

The commits within this PR should be self-explanatory and well-structured. The main changes are:

Introduction of a helper function that either uses the provided sparse embedding or calculates the sparse embedding as before
using self._{a}client.search_batch also in the default case to enable building the requests that shall be sent step-by-step and thus reduce the number of cases
Removing the async evaluation of await self.asparse_vector_name() by precomputing it in __init__
Passing down hybrid_top_k from the VectorIndexRetriever kwargs to the QueryBundle (as done with _sparse_top_k)
Reducing line count by roughly 160 LOC through removal of duplication

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No
Bumping the versions of
llama-index-core
llama-index-vector-stores-qdrant
llama-index-node-parser-docling
llama-index-readers-docling
remains in TODO for now (see below).

Type of Change

In principle, this is a change to the QdrantVectorStore but as the additional fields in QueryBundle and VectorStoreQuery are required, a parallel version bump of llama-index-core (and specification in the requirements) will be necessary. The type of changes made are:

Refactoring
Possibly performance enhancement

Backwards compatibility could be achieved easily by checking if the provided QueryBundle has the necessary field but I am unsure if this is the best approach.

How Has This Been Tested?

I added a test that directly ensures that the getter function _get_sparse_embedding_query reuses the provided sparse embedding.
I furthermore implemented unit tests for all 6 variants of {query, aquery} x {dense, sparse, hybrid} by setting the sparse and dense embeddings in a QdrantVectorStore with enable_hybrid=True and running queries with identically set (sparse_)embeddings

Added new unit/integration tests
Added new notebook (that tests end-to-end)
I stared at the code and made sure it makes sense
Used my private test-case that I unfortunately cannot share

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

enrico-stauss · 2024-08-29T18:07:46Z

I rebased onto main and noticed that a previous commit (78f9a11) introduced somewhat breaking changes with the qdrant vector_store which I resolved in 7a7f684.

Why I consider the changes breaking:
Users that specified a fastembed_sparse_model would automatically run the vector_store in hybrid mode as the enable_hybrid variable would be overwritten before passed to the super init call. If existing code specified fastembed_sparse_model but not ´enable_hybrid=True`, their code will stop working. Easy to resolve on the user-side, but nonetheless...

Is there anything I can do to help integrate this PR within a foreseable timespan? Or are there specific reasons that keep you from doing so? Please let me know if that is the case. I'm happy to continue working on it if there is something missing!

enrico-stauss · 2024-09-02T06:28:05Z

Hi @logan-markewich, I think you reviewed my last PR on llama-index, could you maybe take a look at this and let me know what you think? I believe that, aside from the enhancement of code quality, the possibility to avoid duplicate computation really adds some value! If you have other concerns, please do share them!

Thanks and kind regards,
Enrico

enrico-stauss · 2024-09-19T12:32:16Z

I'm not sure why the tests here fail after rebasing. It seems that some dependencies are not available but the failing tests are actually not relevant to the changes in this PR.

enrico-stauss · 2024-10-18T15:47:51Z

Hi @logan-markewich, this PR keeps growing :D

I finally found some time to write unit tests. In the process I also had to fix unit tests for the docling reader and node_parser which failed due to the newly introduced sparse_embedding field in the BaseNode.

If you, for any reason, don't plan to merge this PR then please let me know, so that I can stop rebasing it over and over again :)

…oreQuery`

…ry_fn will be set)

…ery into helper

…e embeddings directly to the retriever

…n the helper

…ass methods

The `_sparse_doc_fn` and `_sparse_query_fn` are set in `__init__` IF `enable_hybrid==True`.

Given that the `_sparse_doc_fn` does not return empty lists, we can safely remove the check. It might in fact be safer to let it crash here if the output of the `_sparse_doc_fn` is of an unexpected format.

…uery} # Conflicts: # llama-index-integrations/vector_stores/llama-index-vector-stores-qdrant/tests/test_vector_stores_qdrant.py

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Aug 19, 2024

enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch 2 times, most recently from f205b0f to 5f87d89 Compare August 19, 2024 21:42

enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch from 5f87d89 to 3e83297 Compare August 29, 2024 17:53

enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch 2 times, most recently from 150e219 to 6355a32 Compare September 5, 2024 08:00

enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch from 6355a32 to d038e05 Compare September 19, 2024 09:25

enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch 4 times, most recently from 59dbadb to 895139f Compare October 1, 2024 07:47

enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch 3 times, most recently from 23c6f87 to 008732b Compare October 14, 2024 08:52

enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch from 008732b to b0173c9 Compare October 18, 2024 13:30

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Oct 18, 2024

enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch from cd2bdb2 to f687386 Compare October 28, 2024 10:09

enrico-stauss changed the title ~~Enhance (Qdrant) hybrid search support~~ Enhance (Qdrant) hybrid/sparse search support Oct 28, 2024

enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch 2 times, most recently from 3a7eaca to 45943f2 Compare October 30, 2024 09:31

Enrico Stauss added 5 commits November 13, 2024 14:09

Move error checking logic to start of method

cc8fbf8

Replace construction of query_filter with more concise statement

007f043

Prepare an additional field sparse_query_embedding in the `VectorSt…

f7c8472

…oreQuery`

Use sparse_query_embedding if provided and refactor slightly

87e281c

Remove superfluous condition (if self.enable_hybrid, self._sparse_que…

ef5128d

…ry_fn will be set)

Enrico Stauss added 19 commits November 13, 2024 14:09

Refactor query and aquery

f06c672

fix: Resolve backwards-compatibility break introduced by 78f9a11

afcddf4

Evaluate sparse_vector_name ONCE in init

10a5921

Reduce code duplication by extracting shared logic from query and aqu…

0f23822

…ery into helper

Add sparse_embedding field to QueryBundle to enable passing spars…

5df2372

…e embeddings directly to the retriever

fix: Resolve incomplete duplication of a QueryBundle

75b54c1

fix: Unpack BatchSparseEncoding as computed by the _sparse_query_fn i…

b7cc4ac

…n the helper

fix: Avoid passing around self._sparse_vector_name when invoking cl…

7f262f0

…ass methods

fix: Ensure py38 backwards compatibility by using typing for typehints

ada18cd

refactor: Remove superfluous check

c822cc6

The `_sparse_doc_fn` and `_sparse_query_fn` are set in `__init__` IF `enable_hybrid==True`.

refactor: Simplify chained condition

24e7e46

refactor: Remove dead code and unnecessary condition

81ad488

Given that the `_sparse_doc_fn` does not return empty lists, we can safely remove the check. It might in fact be safer to let it crash here if the output of the `_sparse_doc_fn` is of an unexpected format.

feat: Add sparse_embedding field to the BaseNode

45f9239

feat: Add method to compute OR EXTRACT sparse embeddings for/from nodes

4d5b295

refactor: Rename method

ab70558

fix: Resolve bug due to an empty list of strings to be embedded

f8f6666

test: Add unit tests for dense/sparse/hybrid retrieval for {query, aq…

ec50cbe

…uery} # Conflicts: # llama-index-integrations/vector_stores/llama-index-vector-stores-qdrant/tests/test_vector_stores_qdrant.py

test: Fix docling reader/node_parser unit tests

f192d21

test: Add a unit test for the _get_sparse_embeddings_nodes getter

2162769

enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch from 45943f2 to 2162769 Compare November 13, 2024 13:58

Merge branch 'main' into enhance-qdrant-hybrid-search-support

b423701

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance (Qdrant) hybrid/sparse search support #15483

Enhance (Qdrant) hybrid/sparse search support #15483

enrico-stauss commented Aug 19, 2024 •

edited

Loading

enrico-stauss commented Aug 29, 2024

enrico-stauss commented Sep 2, 2024

enrico-stauss commented Sep 19, 2024

enrico-stauss commented Oct 18, 2024

Enhance (Qdrant) hybrid/sparse search support #15483

Are you sure you want to change the base?

Enhance (Qdrant) hybrid/sparse search support #15483

Conversation

enrico-stauss commented Aug 19, 2024 • edited Loading

Description

Use Case

Implementation

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

enrico-stauss commented Aug 29, 2024

enrico-stauss commented Sep 2, 2024

enrico-stauss commented Sep 19, 2024

enrico-stauss commented Oct 18, 2024

enrico-stauss commented Aug 19, 2024 •

edited

Loading