You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Milvus 2.5 introduced full text search allowing to use built-in analyzers for sparse-embedding retrieval, using the BM25 algorithm. Taken from milvus documentation:
Full text search simplifies the process of text-based searching by eliminating the need for manual embedding. This feature operates through the following workflow:
Text input: You insert raw text documents or provide query text without needing to manually embed them.
Text analysis: Milvus uses an analyzer to tokenize the input text into individual, searchable terms. For more information on analyzers, refer to Analyzer Overview.
Function processing: The built-in function receives tokenized terms and converts them into sparse vector representations.
Collection store: Milvus stores these sparse embeddings in a collection for efficient retrieval.
BM25 scoring: During a search, Milvus applies the BM25 algorithm to calculate scores for the stored documents and ranks matched results based on their relevance to the query text.
This means that sparse embedding creating an retrieval can now be handled internally by Milvus, therefore there is no need to manually create the embeddings from llamaindex code (we could still support it as an option):
bm25_function=Function(
name="text_bm25_emb", # Function nameinput_field_names=["text"], # Name of the VARCHAR field containing raw text dataoutput_field_names=["sparse"], # Name of the SPARSE_FLOAT_VECTOR field reserved to store generated embeddingsfunction_type=FunctionType.BM25,
)
schema.add_function(bm25_function)
Also since the raw text would be defined in a new static field, some changes would be required to the ingestion pipeline and query engine (TextNode), so that the raw data is not duplicated. Right now it is stored as a JSON (dynamic) field in node_content.
Reason
No response
Value of Feature
It moves the responsibility of creating sparse embeddings to Milvus, reducing code and logic from llamaindex's side. Furthermore, Milvus may be more optimized in such operation.
The text was updated successfully, but these errors were encountered:
Feature Description
Milvus 2.5 introduced full text search allowing to use built-in analyzers for sparse-embedding retrieval, using the BM25 algorithm. Taken from milvus documentation:
This means that sparse embedding creating an retrieval can now be handled internally by Milvus, therefore there is no need to manually create the embeddings from llamaindex code (we could still support it as an option):
llama_index/llama-index-integrations/vector_stores/llama-index-vector-stores-milvus/llama_index/vector_stores/milvus/utils.py
Line 176 in 2668cb7
llama_index/llama-index-integrations/vector_stores/llama-index-vector-stores-milvus/llama_index/vector_stores/milvus/utils.py
Line 210 in 2668cb7
This change would require some changes when creating the collection's schema to include the text field:
and the text -> sparse embedding function:
Also since the raw text would be defined in a new static field, some changes would be required to the ingestion pipeline and query engine (TextNode), so that the raw data is not duplicated. Right now it is stored as a JSON (dynamic) field in node_content.
Reason
No response
Value of Feature
It moves the responsibility of creating sparse embeddings to Milvus, reducing code and logic from llamaindex's side. Furthermore, Milvus may be more optimized in such operation.
The text was updated successfully, but these errors were encountered: