Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance (Qdrant) hybrid/sparse search support #15483

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

enrico-stauss
Copy link
Contributor

@enrico-stauss enrico-stauss commented Aug 19, 2024

Description

This PR gives more explicit control over sparse_embeddings to the developer of RAG applications. The motivation is provided in this issue and the PR directly adapts the QdrantVectorStore. Other VectorStores that support hybrid retrieval will need to be adapted.

This PR aims at enhancing the useablility of the Qdrant hybrid search functionality by adding sparse embedding fields to both the VectorStoreQuery and QueryBundle and using the provided sparse embedding rather than recomputing it.

Additionally the sparse_embedding field is added to the BaseNode class and if set, used by the QdrantVectorStore.add method rather than recomputing it.

Along with the described changes, a larger refactoring reduces code duplication for the QdrantVectorStore.{query, aquery} and enhances readability. The add method got a minor rework, too.

Use Case

If working with (a subclass of) the QueryFusionRetriever and multipe retrievers (that use the same embedding models), the query embedding would be calculated more than once per embedding string. Caching won't help as the embeddings are generally calculated asynchronously. To avoid the resource overhead of duplicate computation, passing the precompted embeddings was possible for dense embeddings already.

Implementation

The commits within this PR should be self-explanatory and well-structured. The main changes are:

  • Introduction of a helper function that either uses the provided sparse embedding or calculates the sparse embedding as before
  • using self._{a}client.search_batch also in the default case to enable building the requests that shall be sent step-by-step and thus reduce the number of cases
  • Removing the async evaluation of await self.asparse_vector_name() by precomputing it in __init__
  • Passing down hybrid_top_k from the VectorIndexRetriever kwargs to the QueryBundle (as done with _sparse_top_k)
  • Reducing line count by roughly 160 LOC through removal of duplication

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No
    Bumping the versions of
  • llama-index-core
  • llama-index-vector-stores-qdrant
  • llama-index-node-parser-docling
  • llama-index-readers-docling
    remains in TODO for now (see below).

Type of Change

In principle, this is a change to the QdrantVectorStore but as the additional fields in QueryBundle and VectorStoreQuery are required, a parallel version bump of llama-index-core (and specification in the requirements) will be necessary. The type of changes made are:

  • Refactoring
  • Possibly performance enhancement

Backwards compatibility could be achieved easily by checking if the provided QueryBundle has the necessary field but I am unsure if this is the best approach.

How Has This Been Tested?

I added a test that directly ensures that the getter function _get_sparse_embedding_query reuses the provided sparse embedding.
I furthermore implemented unit tests for all 6 variants of {query, aquery} x {dense, sparse, hybrid} by setting the sparse and dense embeddings in a QdrantVectorStore with enable_hybrid=True and running queries with identically set (sparse_)embeddings

  • Added new unit/integration tests
  • Added new notebook (that tests end-to-end)
  • I stared at the code and made sure it makes sense
  • Used my private test-case that I unfortunately cannot share

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Aug 19, 2024
@enrico-stauss enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch 2 times, most recently from f205b0f to 5f87d89 Compare August 19, 2024 21:42
@enrico-stauss enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch from 5f87d89 to 3e83297 Compare August 29, 2024 17:53
@enrico-stauss
Copy link
Contributor Author

I rebased onto main and noticed that a previous commit (78f9a11) introduced somewhat breaking changes with the qdrant vector_store which I resolved in 7a7f684.

Why I consider the changes breaking:
Users that specified a fastembed_sparse_model would automatically run the vector_store in hybrid mode as the enable_hybrid variable would be overwritten before passed to the super init call. If existing code specified fastembed_sparse_model but not ´enable_hybrid=True`, their code will stop working. Easy to resolve on the user-side, but nonetheless...

Is there anything I can do to help integrate this PR within a foreseable timespan? Or are there specific reasons that keep you from doing so? Please let me know if that is the case. I'm happy to continue working on it if there is something missing!

@enrico-stauss
Copy link
Contributor Author

Hi @logan-markewich, I think you reviewed my last PR on llama-index, could you maybe take a look at this and let me know what you think? I believe that, aside from the enhancement of code quality, the possibility to avoid duplicate computation really adds some value! If you have other concerns, please do share them!

Thanks and kind regards,
Enrico

@enrico-stauss enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch 2 times, most recently from 150e219 to 6355a32 Compare September 5, 2024 08:00
@enrico-stauss enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch from 6355a32 to d038e05 Compare September 19, 2024 09:25
@enrico-stauss
Copy link
Contributor Author

I'm not sure why the tests here fail after rebasing. It seems that some dependencies are not available but the failing tests are actually not relevant to the changes in this PR.

@enrico-stauss enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch 4 times, most recently from 59dbadb to 895139f Compare October 1, 2024 07:47
@enrico-stauss enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch 3 times, most recently from 23c6f87 to 008732b Compare October 14, 2024 08:52
@enrico-stauss enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch from 008732b to b0173c9 Compare October 18, 2024 13:30
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Oct 18, 2024
@enrico-stauss
Copy link
Contributor Author

Hi @logan-markewich, this PR keeps growing :D

I finally found some time to write unit tests. In the process I also had to fix unit tests for the docling reader and node_parser which failed due to the newly introduced sparse_embedding field in the BaseNode.

If you, for any reason, don't plan to merge this PR then please let me know, so that I can stop rebasing it over and over again :)

@enrico-stauss enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch from cd2bdb2 to f687386 Compare October 28, 2024 10:09
@enrico-stauss enrico-stauss changed the title Enhance (Qdrant) hybrid search support Enhance (Qdrant) hybrid/sparse search support Oct 28, 2024
@enrico-stauss enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch 2 times, most recently from 3a7eaca to 45943f2 Compare October 30, 2024 09:31
Enrico Stauss added 19 commits November 13, 2024 14:09
The `_sparse_doc_fn` and `_sparse_query_fn` are set in `__init__` IF `enable_hybrid==True`.
Given that the `_sparse_doc_fn` does not return empty lists, we can safely remove the check. It might in fact be safer to let it crash here if the output of the `_sparse_doc_fn` is of an unexpected format.
…uery}

# Conflicts:
#	llama-index-integrations/vector_stores/llama-index-vector-stores-qdrant/tests/test_vector_stores_qdrant.py
@enrico-stauss enrico-stauss force-pushed the enhance-qdrant-hybrid-search-support branch from 45943f2 to 2162769 Compare November 13, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant