-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Introduce Template Query to OpenSearch #16823
Comments
From your example, it seems that vector is the embedding of text |
Will this build on this feature - https://opensearch.org/docs/latest/api-reference/search-template/? |
Right, this is limitation to knn query which requires a list of vectors. For string type, we can use neural query to pass a string to model input(even though we cannot parse different model output format, it requires post processing functions). Extending use cases, for example,
these can be send to model inputs and generate vectors but it cannot be passed to the existing query builders |
Would you like to provide a few examples to showcase how to use your feature with |
@mingshl Can you provide some examples? |
@yuye-aws for example, geo_shape query usually takes in a list of coordinates, but I don't know the gps location then I can use a llm to tell me the coordinates and run geo shape query: I am using a Claude model in this case:
when I run this model in predict API, I am getting a list of coordinates:
the response will be :
now I can put on some documents:
usually, when we use geo_shape query, we need to memorize the gps location in a array
and returning the document:
create a search pipeline:
Now I can search with the search pipeline and I don't need to know the coordinates:
and then I will get the document matching the gps coordinates:
|
Thanks for providing the use case! Can you please elaborate more on the following questions:
|
in the prompt, it was an example to give instruction to the LLM model, to give me coordinate. If I input different location that The key of this template query is to help rewriting query using model output, without this template query, there is no way to rewrite query with different object types. |
why do we want to introduce a template query type which is essentially delay the query validation? this seems more like a neural search use case. Or adding a new field in knn query in addition to vector to allow user to specify text to generate embedding? |
btw, what is the impact on the OpenSearch client libraries if we want to introduce this template query? |
yes this could be done by neural query or add a new field in knn query, but the benefit of this design is that you can leverage the flexibility of search pipeline and processors to implement many of use cases without having to introducing a new search clause like neural or modifying existing query like knn. |
@austintlee this is different than search template, search template relies on mustache, and this is the main disadvantage that many customers would not consider using mustache as a scripting languages, which might bring security concerns. The Template Query is designed as a query builder, similar to match and term queries. It receives a template of an inner query and allows search processors to enrich the query content. After processing, the inner query is executed during query rewrite. There is string substitution but mustache is not involved. We are considering two options for implementing the Template Query, inspired by the search template functionality: Option A: Template Query contains an objectIn this approach, the Template Query contains a template object. Placeholders are wrapped as strings, e.g., Example KNN query in a Template Query:
after ml inference search request processor, the query will be processed as following, and the request will be validated before query rewrite.
Benefits of Option A:
Option B: Template Query contains an object or contains a stringRefer to search template, when the query with placeholder doesn't compile, as stated the "sad case" from the description, when knn query requires an array in the vector field, the search template can accept a string in search template, when the query expect an array, search template accepts a string of template, and then after mustache scripts replaced the string, then will validate the query request.
Similarly, in template query, we can implement similar design, when the query with placeholder compiles, we can accept object in constructors, when the query with placeholder do not compile, we can accept a big string to put placeholder. for the same example of knn query, the template query will contain a string, and the placeholder
The same processed as option A that after ml inference search request processor, the query will be validated before query rewrite. however, when it's a compile query in the inner query, for example, term query, the template query can also accept object for example:
When template query accepts an object, we can validate the query during constructors, this brings in earlier validation for the template query builders. Benefits of Option B: -Increased flexibility in handling different query types Cons of Option B: We're seeking feedback on these approaches to determine the most effective implementation for the Template Query feature. @zengyan-amazon @sean-zheng-amazon @ylwu-amzn @ohltyler |
+1 to this.
@mingshl could you please what kind of flexibility are you referring to? And how that is improving error handling as well? In addition, according to the RFC headline, we are going to introduce Template query to Opensearch. Then why the RFC is in ml-commons repo? Why not in Opensearch Core? |
@mingshl -- I think this template query is a neat idea and we should consider adding it to OpenSearch core. I'm thinking of something like #14774, which built on the pre-existing terms lookup logic in Using a template query, @bowenlan-amzn's example from #14774:
Could have been implemented as something like:
The expansion could be handled by a generic request processor that knows to handle |
+1 on this.
+1 on this too. I feel this whole use-case is about query building, I am just thinking out loud here, why we cannot build this logic in the opensearch-clients? Have we thought about that option? if yes, then please ignore the suggestion.
I agree on this. A clever expansion logic is what is needed. Because currently we cannot tie back that the values for a field is filled from a ML request, it can very well be a query to another index to fetch some documents whose data gets used as a filter for the main query. There has some use-cases I have heard from different users around this. @msfroh I was thinking in core(AbstractQueryBuilder) can we have logic that gets triggered for every query's xContent function and other relevant places. Not sure if thats the best way, but this is something I was thinking if it can help. |
@navneet1v technically, we could build this function in clients, but we have requirements to support OpenSearch flow, which is going to build through OpenSearch Dashboard, here is the tutorials about OpenSearch Flow https://github.com/opensearch-project/dashboards-flow-framework/blob/main/documentation/tutorial.md. That is the reason why we need it on the server side. |
according to the above suggestions, we should move template query to OpenSearch repo and make template query can be served with any search request processors, if it produces variables to PipelineProcessingContext. The template query will substitute the variables during query rewrite phase. @msfroh helped to refactored the QueryRewriteContext to interface, so now the QueryRewriteContext can carry over the PipelineProcessingContext, which can be produced by any processors. Raised the PR in OpenSearch repo to address changes related to template query and QueryRewriteContext. A different PR will raise in ml-commons for ml inference search extensions and emit ml inference outputs to PipelineProcessingContext. |
this is really great. :) |
[Catch All Triage - 1, 2] |
Problem Statement:
When using search request processors, users need to send an initial search request with properly constructed query builders. However, if the initial request fails to meet the type constraints of the query builders, it will be rejected, and the search request cannot be processed by the search request processors.
The data flow is as follows:
Initial Search Request -> Search Request Processors -> New Search Request
However, when constructing the initial request every query builder has type constrains. For example, for knn query,
(Happy case) this is a valid query accepted by knn query builder, and it can be passed to search request processors:
(Sad case) this is not a valid query that would be throwing exceptions when constructing knn query builder. It cannot reach search request processors.
In the sad case, the "vector" field is provided with a string value ("sunny") instead of the required list of integers or floats, violating the type constraints of the knn query builder. As a result, an exception will be thrown during the construction of the query builder, preventing the search request from reaching the search request processors.
Scope:
Proposed Design:
To allow the initial search request to pass the search request processors, we are introducing a template query type, which contains the query body.
for the same example above, here is the sample curl command using query extensions:
combing with a ml_inference search request processor and query extension:
this is the sample search pipeline config:
after ml inference search request processor run, it will rewrite to the new search requests as follows:
By using the template query type, the initial search request can bypass the strict type checking and validation of the query builders during the initial processing by the search request processors. This allows the search request to flow through the search request processors, even if the query body contains invalid or incorrect data types.
After the search request processors have completed their processing, the query body inside the template query type can be validated and processed according to the type constraints of the respective query builders.
This approach separates the initial processing of the search request from the validation and construction of the query builders, allowing for more flexibility and error handling in the overall search request processing pipeline.
Limitations:
The text was updated successfully, but these errors were encountered: