Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] add support for more than one kNN query on nested vectors with multiple inner hits and filter #1768

Closed
konstadin opened this issue Jun 21, 2024 · 8 comments

Comments

@konstadin
Copy link

konstadin commented Jun 21, 2024

Is your feature request related to a problem?
Yes, I want to create a document with more than one nested vector in a single document (nested / nested vector), query the document with multiple k-NN queries, gather more than one inner_hit when searching nested k-NN for each query. This feature is available in Elasticsearch and require parity with Opensearch.

What solution would you like?

Expand support for kNN search with nested fields to allow for multiple knn queries.

This solution builds on Enhanced multi-vector support for OpenSearch k-NN search with nested fields.

Instead of one k-NN search with nested fields on a doc, the solution supports:

  • multiple k-NN searches
  • more than one inner_hit for each k-NN nested vector (order by desc _score)
  • filtering on top level document

The response returns:

  • top documents ordered desc by _score for the document; the document _score is calculated by aggregating the max_score for each k-NN search
  • multiple inner_hits with _score for each k-NN search

What alternatives have you considered?
Storing the documents as nested vectors (instead of nested / nested vectors) and using a boolean query with multiple k-NN queries with an aggregation. However, the mapping of which field matched which k-NN query is lost in the aggregation, as are inner hits. The _score racking is questionable if it will be calculated the same way.

Do you have any additional context?

Consider example of storing lines for each paragraph, for each chapter, in a book. Attached is an example mapping, where the lines are stored as nested embeddings in vector and paragraphs are nested in embeddings. Essentially each document stores paragraphs to a chapter, to a book; a document is a collection of paragraphs for a chapter.

  "mappings": {
    "properties": {
      "book_id": { "type": "keyword" },
      "chapter_id": { "type": "keyword" },
	"paragraph": {
        "type": "nested",
        "properties": {
	  "paragraph_id": { "type": "keyword" },
	  "embeddings": {
            "type": "nested",
            "properties": {
	      "line_id": { "type": "keyword" },
              "vector": {
                "type": "dense_vector",
                "index": true,
		"dims": 384,
                "similarity": "cosine"
              }
            }
          }
        }
      }
    }
  }

We want to find the chapters, that have the closest matches to n lines of text, where each line of text represents a k-NN search (query_1, query_2) that will target the nested embeddings in vector.

We should have the the ability to filter for a specific book book_id, in this example 1234. This will filter out any unrelated books, and be applied as a pre-filter in the k-NN search and not as a post-filter.

Sample response included below that returns top 2 documents, with k=2 for each k-NN search.

"hits": {
  "max_score": 1.7332492,
    "hits": [
      {
        "_score": 1.7332492,
        "fields": {
          "book_id": [ "1234" ],
          "chapter_id": [ "chapter10" ]
        },
        "inner_hits": {
          "query_1": {
            "hits": {
              "max_score": 0.83575505,
                "hits": [
                  {
                    "_score": 0.83575505,
                    "fields": {
                      "paragraph.embeddings": [{"paragraph_id": [ "p_1" ], "line_id": [ "line_3" ]}]
                    }
                  },
                  {
                    "_score": 0.0333445,
                    "fields": {
                      "paragraph.embeddings": [{"paragraph_id": [ "p_1" ], "line_id": [ "line_6" ]}]
                    }
                  }
                ]
            }
          },
          "query_2": {
            "hits": {
              "max_score": 0.8974941,
                "hits": [
                  {
                    "_score": 0.8974941,
                    "fields": {
                      "paragraph.embeddings": [{"paragraph_id": [ "p_3" ], "line_id": [ "line_5" ]}]
                    }
                  },
                  {
                    "_score": 0.55534545,
                    "fields": {
                      "paragraph.embeddings": [{"paragraph_id": [ "p_3" ], "line_id": [ "line_8" ]}]
                    }
                  }
                ]
            }
          }
        }
      },
      {
        "_score": 0.8735112,
        "fields": {
          "book_id": [ "1234" ],
          "chapter_id": [ "chapter3" ]
        },
        "inner_hits": {
          "query_1": {
            "hits": {
              "max_score": null,
                "hits": []
            }
          },
          "query_2": {
            "hits": {
              "max_score": 0.8735112,
                "hits": [
                  {
                    "_score": 0.8735112,
                    "fields": {
                      "paragraph.embeddings": [{"paragraph_id": [ "p_7" ], "line_id": [ "line_56" ]}]
                    }
                  },
                  {
                    "_score": 0.03553,
                    "fields": {
                      "paragraph.embeddings": [{"paragraph_id": [ "p_7" ], "line_id": [ "line_88" ]}]
                    }
                  }
                ]
            }
          }
        }
      }
  ]
}
@konstadin konstadin changed the title [FEATURE] add support for more than one kNN nested vector knn query with inner hit on nested vector with filter [FEATURE] add support for more than one kNN query with inner hit and filter on nested vectors Jun 21, 2024
@konstadin konstadin changed the title [FEATURE] add support for more than one kNN query with inner hit and filter on nested vectors [FEATURE] add support for more than one kNN query on nested vectors with multiple inner hits and filter Jun 21, 2024
@konstadin
Copy link
Author

@heemin32 would it be possible to triage this enhancement and identify the timeline for a deliverable. We are currently blocked without this work and need to understand if and when this feature would be available.

@heemin32
Copy link
Collaborator

Hi @konstadin. Is this functionality provided for text field? I think there is no such method to make two query and get innerHit result for each query even for text field. Could you also check if hybrid search could be used for your use case? https://opensearch.org/docs/latest/search-plugins/hybrid-search/

@konstadin
Copy link
Author

konstadin commented Jun 21, 2024

Hi @heemin32. Not aware of functionality provided for text field. however it is available for multiple k-NN search.

Search multiple knn fields:
Available in ES 8.12 -> https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#_search_multiple_knn_fields
What are the plans for parity in OS v2.x?

Nested kNN Search with 1 Inner hits:
Available in ES 8.12 -> https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#nested-knn-search-inner-hits
- this also works when there are multiple k-nn queries
Will there be parity in OS 2.15? -> #1447

Nested kNN Search with multiple Inner hits:
Available in ES 8.13 -> elastic/elasticsearch#104006
What are the plans for parity in OS v2.x?

Filtered kNN search, applied as a pre-filter:
Available in ES 8.12 -> https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#knn-search-filter-example
Will there be parity in OS 2.15? -> opensearch-project/OpenSearch#13903

The request is to provide parity for above; to search multiple knn fields on nested embeddings, and return more than 1 inner hit with filter.

@navneet1v
Copy link
Collaborator

@konstadin you can use a bool query with a should/must clause to search on multiple k-NN fields(nested or non nested doesn't matter). A k-nn query clause in Opensearch is just like any other query clause of Opensearch. it doesn't require any special treatment just like elastic has done. So its more like the way you will search on mutliple text fields you can do the same for k-NN query clause too.

POST <index-name>/_search
{
  "size": 10,
  "query": {
    "bool": {
      "should": [
        {
          "knn": {
            "my_vector2": {
              "vector": [
                2,
                3,
                5,
                6
              ],
              "k": 10
            }
          }
        },
        {
          "knn": {
            "my_vector1": {
              "vector": [
                2,
                3,
                5,
                6
              ],
              "k": 6
            }
          }
        }
      ]
    }
  }
}

Same goes for a nested field.

@konstadin
Copy link
Author

Thanks @navneet1v @heemin32 will take a look.

What are the plans to provide feature to return multiple Inner hits?
Available in ES 8.13 -> elastic/elasticsearch#104006

@navneet1v
Copy link
Collaborator

Thanks @navneet1v @heemin32 will take a look.

What are the plans to provide feature to return multiple Inner hits? Available in ES 8.13 -> elastic/elasticsearch#104006

@heemin32 is this feature added in 2.15 release of opensearch?

@navneet1v navneet1v moved this from Backlog to Backlog (Hot) in Vector Search RoadMap Jun 27, 2024
@heemin32
Copy link
Collaborator

It is not. This issue is somewhat related with #1743

@heemin32
Copy link
Collaborator

Closing in favor of #2113.

@github-project-automation github-project-automation bot moved this from Backlog (Hot) to ✅ Done in Vector Search RoadMap Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants