Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Introduce Template Query to OpenSearch #16823

Open
mingshl opened this issue Sep 23, 2024 · 20 comments
Open

[RFC] Introduce Template Query to OpenSearch #16823

mingshl opened this issue Sep 23, 2024 · 20 comments
Labels
enhancement Enhancement or improvement to existing feature or request v2.19.0 Issues and PRs related to version 2.19.0

Comments

@mingshl
Copy link
Contributor

mingshl commented Sep 23, 2024

Problem Statement:

When using search request processors, users need to send an initial search request with properly constructed query builders. However, if the initial request fails to meet the type constraints of the query builders, it will be rejected, and the search request cannot be processed by the search request processors.

The data flow is as follows:

Initial Search Request -> Search Request Processors -> New Search Request

However, when constructing the initial request every query builder has type constrains. For example, for knn query,

(Happy case) this is a valid query accepted by knn query builder, and it can be passed to search request processors:

GET my-knn-index-1/_search
{
  "query": {
    "knn": {
      "my_vector": {
        "vector": [2, 3, 5, 6], // vector field requires a list of int/float 
        "k": 2
      }
    }
  }
}

(Sad case) this is not a valid query that would be throwing exceptions when constructing knn query builder. It cannot reach search request processors.

GET my-knn-index-1/_search
{
  "query": {
    "knn": {
      "my_vector": {
        "vector": "sunny", // vector field requires a list of int/float 
        "k": 2
      }
    }
  }
}

In the sad case, the "vector" field is provided with a string value ("sunny") instead of the required list of integers or floats, violating the type constraints of the knn query builder. As a result, an exception will be thrown during the construction of the query builder, preventing the search request from reaching the search request processors.

Scope:

  1. The initial processing of the search request by the search request processors is decoupled from the validation and construction of the query builders.
  2. The query body inside the template query type can be validated against the type constraints of the respective query builders at a later stage in the processing pipeline.

Proposed Design:

To allow the initial search request to pass the search request processors, we are introducing a template query type, which contains the query body.

  1. Instead of directly constructing the query (e.g., knn query) in the initial search request, the query body is wrapped inside a template query type.
  2. The template query type acts as a container for the actual query body, allowing the search request processors to accept the initial search request without performing strict type checking or validation on the query body.
  3. The search request processors will process the initial search request as usual, but they will not attempt to construct or validate the query builders based on the query body inside the template query type.
  4. After the search request processors have finished their processing, the query body inside the template query type can be extracted and validated against the type constraints of the respective query builders (e.g., knn query builder).
  5. If the query body inside the template query type is valid and meets the type constraints of the query builders, it can be used to construct the actual query and execute the search.
  6. If the query body inside the template query type is invalid or violates the type constraints of the query builders, appropriate error handling or fallback mechanisms can be implemented.

for the same example above, here is the sample curl command using query extensions:

GET my-knn-index-1/_search
{
  "query": {
    "template": {
      "knn": {
        "my_vector": {
          "vector": "${vector}", // this is the field generated from ml_inference search request processor 
          "k": 2
        }
      }
    }
  },
  "ext": {
    "ml_inference": {
      "params": {
        "text": "sunny"
      }
    }
  }
}

combing with a ml_inference search request processor and query extension:

this is the sample search pipeline config:

PUT /_search/pipeline/my_pipeline
{
  "request_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run ml inference during search request",
        "model_id": "<model_id>",
        "input_map": [
          {
            "inputs": "ext.ml_inference.params.text"
          }
        ],
        "output_map": [
          {
            "ext.ml_inference.params.vector": "response"
          }
        ],
        "ignore_missing":false,
        "ignore_failure": false
        
      }
    }
  ]
}

after ml inference search request processor run, it will rewrite to the new search requests as follows:

GET my-knn-index-1/_search
{
  "query": {
    "template": {
      "knn": {
        "my_vector": {
          "vector": [1,2,3], // this is the result substituted by from ml_inference search request processor 
          "k": 2
        }
      }
    }
  },
  "ext": {
    "ml_inference": {
      "params": {
        "text": "sunny",
        "vector: [1,2,3]
      }
    }
  }
}

By using the template query type, the initial search request can bypass the strict type checking and validation of the query builders during the initial processing by the search request processors. This allows the search request to flow through the search request processors, even if the query body contains invalid or incorrect data types.

After the search request processors have completed their processing, the query body inside the template query type can be validated and processed according to the type constraints of the respective query builders.

This approach separates the initial processing of the search request from the validation and construction of the query builders, allowing for more flexibility and error handling in the overall search request processing pipeline.

Limitations:

  1. template query cannot be execute (doToQuery) without a search request processor that can help rewrite the query string, for example (ml_inference search request processors).
  2. If the new search request is invalid, the efforts spent in search request rewrite is wasted.
@mingshl mingshl added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 23, 2024
@yuye-aws
Copy link
Member

From your example, it seems that vector is the embedding of text sunny. A better option is to direct use the neural query. Can you provide more examples?

@mingshl mingshl changed the title [FEATURE] Introduce Template Query to OpenSearch [RFC] Introduce Template Query to OpenSearch Sep 24, 2024
@austintlee
Copy link
Contributor

Will this build on this feature - https://opensearch.org/docs/latest/api-reference/search-template/?

@mingshl
Copy link
Contributor Author

mingshl commented Sep 25, 2024

From your example, it seems that vector is the embedding of text sunny. A better option is to direct use the neural query. Can you provide more examples?

Right, this is limitation to knn query which requires a list of vectors. For string type, we can use neural query to pass a string to model input(even though we cannot parse different model output format, it requires post processing functions).

Extending use cases, for example,
when user input

  • array
  • image bytes
  • map
    ...

these can be send to model inputs and generate vectors but it cannot be passed to the existing query builders

@yuye-aws
Copy link
Member

Extending use cases, for example, when user input

  • array
  • image bytes
  • map
    ...

Would you like to provide a few examples to showcase how to use your feature with ml_inference?

@dblock dblock removed the untriaged label Oct 14, 2024
@dblock
Copy link
Member

dblock commented Oct 14, 2024

[Catch All Triage - 1, 2, 3, 4]

@yuye-aws
Copy link
Member

@mingshl Can you provide some examples?

@mingshl
Copy link
Contributor Author

mingshl commented Nov 13, 2024

@yuye-aws for example, geo_shape query usually takes in a list of coordinates, but I don't know the gps location then I can use a llm to tell me the coordinates and run geo shape query:

I am using a Claude model in this case:

POST /_plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Connector: Claude Instant V1",
  "version": "1",
  "description": "The connector to bedrock Claude model",
  "protocol": "aws_sigv4",
  "parameters": {
    "max_tokens_to_sample": "8000",
    "service_name": "bedrock",
    "temperature": "1.0E-4",
    "response_filter": "$.completion",
    "region": "us-west-2",
    "anthropic_version": "bedrock-2023-05-31",
    "inputs":"please summerize the documents"
  },
  "credential": {
    "access_key": " ",
        "secret_key": " ",
        "session_token": " " },
  "actions": [
    {
      "action_type": "PREDICT",
      "method": "POST",
      "url": "https://bedrock-runtime.us-west-2.amazonaws.com/model/anthropic.claude-instant-v1/invoke",
      "headers": {
        "x-amz-content-sha256": "required",
        "content-type": "application/json"
      },
      "request_body":  "{\"prompt\":\"${parameters.prompt}\",\"max_tokens_to_sample\":300,\"temperature\":0.5,\"top_k\":250,\"top_p\":1,\"stop_sequences\":[\"\\n\\nHuman:\"]}"
    }
  ]
}

POST /_plugins/_ml/models/_register
{
  "name": "Bedrock agent model with prompt",
  "function_name": "remote",
  "description": "test model",
  "connector_id": "Duy_I5MBMgMm6WMwSs1M"}
{
  "task_id": "Q9Kf3ZEBMhDukNDaCyfV",
  "status": "CREATED",
  "model_id": "RNKf3ZEBMhDukNDaCyfx"
}

POST /_plugins/_ml/models/Eey_I5MBMgMm6WMwdc13/_deploy


when I run this model in predict API, I am getting a list of coordinates:

POST /_plugins/_ml/models/Eey_I5MBMgMm6WMwdc13/_predict  
{
  "parameters": {
    "prompt":"\n\nHuman: You are a professional geo location specilist. You will always tell me an array of geo location coordinates, for example, when I ask Brooklyn Bridge Park, you will answer [[-74.0011, 40.7024], [-73.9958, 40.6997]] . If you don't know the answer, just return empty array. \n\n Human: please tell me the coordinate of the ${parameters.context.toString()} in an array \n\n Assistant:",
    "context":"Brooklyn Bridge Park."
  }
}

the response will be :

{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "dataAsMap": {
            "response": [
              [
                -74.0011,
                40.7024
              ],
              [
                -73.9958,
                40.6997
              ]
            ]
          }
        }
      ],
      "status_code": 200
    }
  ]
}

now I can put on some documents:

PUT areas_of_interest
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "location": { "type": "geo_shape" },
      "category": { "type": "keyword" }
    }
  }
}

POST areas_of_interest/_doc/1
{
  "name": "Central Park",
  "location": {
    "type": "envelope",
    "coordinates": [[-73.9812, 40.7682], [-73.9495, 40.7642]]
  },
  "category": "Park"
}

POST areas_of_interest/_doc/2
{
  "name": "Times Square",
  "location": {
    "type": "envelope",
    "coordinates": [[-73.9879, 40.7589], [-73.9842, 40.7577]]
  },
  "category": "Tourist Attraction"
}

POST areas_of_interest/_doc/3
{
  "name": "Brooklyn Bridge Park",
  "location": {
    "type": "envelope",
    "coordinates": [[-74.0011, 40.7024], [-73.9958, 40.6997]]
  },
  "category": "Park"
}

usually, when we use geo_shape query, we need to memorize the gps location in a array

POST areas_of_interest/_search
{
  "query": {
    "geo_shape": {
      "location": {
        "shape": {
          "type": "envelope",
          "coordinates": [[-74.0, 40.75], [-73.95, 40.70]]
        },
        "relation": "intersects"
      },
      "ignore_unmapped": false,
      "boost": 42.0
    }
  }
}

and returning the document:

{
  "took": 15,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 42,
    "hits": [
      {
        "_index": "areas_of_interest",
        "_id": "3",
        "_score": 42,
        "_source": {
          "name": "Brooklyn Bridge Park",
          "location": {
            "type": "envelope",
            "coordinates": [
              [
                -74.0011,
                40.7024
              ],
              [
                -73.9958,
                40.6997
              ]
            ]
          },
          "category": "Park"
        }
      }
    ]
  }
}

create a search pipeline:

PUT /_search/pipeline/my_pipeline
{
  "request_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run ml inference during search request",
        "model_id": "Eey_I5MBMgMm6WMwdc13",
        "input_map": [
          {
            "context": "query.template.geo_shape.location.shape.coordinates"
          }
        ],
        "output_map": [
          {
            "query.template.geo_shape.location.shape.coordinates": "response"
          }
        ],
        "model_config":{"prompt":"\n\nHuman: You are a professional geo location specilist. You will always tell me an array of geo location coordinates, for example, when I ask Brooklyn Bridge Park, you will answer [[-74.0011, 40.7024], [-73.9958, 40.6997]] . If you don't know the answer, just return empty array. \n\n Human: please tell me the coordinate of the ${parameters.context.toString()} in an array \n\n Assistant:"},
        "ignore_missing":false,
        "ignore_failure": false
        
      }
    }
  ]
}

Now I can search with the search pipeline and I don't need to know the coordinates:

POST areas_of_interest/_search?search_pipeline=my_pipeline
{
  "query": {
    "template": {
      "geo_shape": {
        "location": {
          "shape": {
            "type": "envelope",
            "coordinates": "Brooklyn Bridge Park"
          },
          "relation": "intersects"
        },
        "ignore_unmapped": false,
        "boost": 42
      }
    }
  }
}

and then I will get the document matching the gps coordinates:

{
  "took": 518,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 42,
    "hits": [
      {
        "_index": "areas_of_interest",
        "_id": "3",
        "_score": 42,
        "_source": {
          "name": "Brooklyn Bridge Park",
          "location": {
            "type": "envelope",
            "coordinates": [
              [
                -74.0011,
                40.7024
              ],
              [
                -73.9958,
                40.6997
              ]
            ]
          },
          "category": "Park"
        }
      }
    ]
  }
}

@yuye-aws
Copy link
Member

Thanks for providing the use case! Can you please elaborate more on the following questions:

  1. I'm still unclear about. In your use case, you have already defined a list of coordinates in the prompt: \n\nHuman: You are a professional geo location specilist. You will always tell me an array of geo location coordinates, for example, when I ask Brooklyn Bridge Park, you will answer [[-74.0011, 40.7024], [-73.9958, 40.6997]] . If you don't know the answer, just return empty array. \n\n Human: please tell me the coordinate of the ${parameters.context.toString()} in an array \n\n Assistant:. It seems that the user already knows the geo location coordinates, then why do you claim that Now I can search with the search pipeline and I don't need to know the coordinates:.
  2. Have you ever tried a workaround, i.e. a flow framework with a ml model tool concatenated with the search index tool? The ml model tool does the LLM part in your example. After that, the search index tool takes the output coordinates from the ml model tool and searched the index just like what you have defined in the search pipeline.

@mingshl
Copy link
Contributor Author

mingshl commented Nov 14, 2024

Thanks for providing the use case! Can you please elaborate more on the following questions:

  1. I'm still unclear about. In your use case, you have already defined a list of coordinates in the prompt: \n\nHuman: You are a professional geo location specilist. You will always tell me an array of geo location coordinates, for example, when I ask Brooklyn Bridge Park, you will answer [[-74.0011, 40.7024], [-73.9958, 40.6997]] . If you don't know the answer, just return empty array. \n\n Human: please tell me the coordinate of the ${parameters.context.toString()} in an array \n\n Assistant:. It seems that the user already knows the geo location coordinates, then why do you claim that Now I can search with the search pipeline and I don't need to know the coordinates:.
  2. Have you ever tried a workaround, i.e. a flow framework with a ml model tool concatenated with the search index tool? The ml model tool does the LLM part in your example. After that, the search index tool takes the output coordinates from the ml model tool and searched the index just like what you have defined in the search pipeline.

in the prompt, it was an example to give instruction to the LLM model, to give me coordinate. If I input different location that Brooklyn Bridge Park, it will also give me coordinate for other locations.

The key of this template query is to help rewriting query using model output, without this template query, there is no way to rewrite query with different object types.

@zengyan-amazon
Copy link
Member

zengyan-amazon commented Nov 26, 2024

why do we want to introduce a template query type which is essentially delay the query validation? this seems more like a neural search use case. Or adding a new field in knn query in addition to vector to allow user to specify text to generate embedding?

@zengyan-amazon
Copy link
Member

btw, what is the impact on the OpenSearch client libraries if we want to introduce this template query?

@sean-zheng-amazon
Copy link

why do we want to introduce a template query type which is essentially delay the query validation? this seems more like a neural search use case. Or adding a new field in knn query in addition to vector to allow user to specify text to generate embedding?

yes this could be done by neural query or add a new field in knn query, but the benefit of this design is that you can leverage the flexibility of search pipeline and processors to implement many of use cases without having to introducing a new search clause like neural or modifying existing query like knn.

@mingshl
Copy link
Contributor Author

mingshl commented Nov 27, 2024

@austintlee this is different than search template, search template relies on mustache, and this is the main disadvantage that many customers would not consider using mustache as a scripting languages, which might bring security concerns.

The Template Query is designed as a query builder, similar to match and term queries. It receives a template of an inner query and allows search processors to enrich the query content. After processing, the inner query is executed during query rewrite. There is string substitution but mustache is not involved.

We are considering two options for implementing the Template Query, inspired by the search template functionality:

Option A: Template Query contains an object

In this approach, the Template Query contains a template object. Placeholders are wrapped as strings, e.g., "${ext.ml_inference.params.vector}" for a KNN query's vector field.

Example KNN query in a Template Query:

GET my-knn-index-1/_search
{
  "query": {
    "template": {
      "knn": {
        "my_vector": {
          "vector": "${ext.ml_inference.params.vector}", // this is the field generated from ml_inference search request processor 
          "k": 2
        }
      }
    }
  },
  "ext": {
    "ml_inference": {
      "params": {
        "text": "sunny"
      }
    }
  },
  "search_pipeline": {
    "request_processors": [
        {
            "ml_inference": {
                "model_id": "<model_id>",
                "input_map": [
                {
                    "inputs": "ext.ml_inference.params.text"
                }
                ],
                "output_map": [
                {
                    "ext.ml_inference.params.vector": "response"
                }
                ],
                "ignore_missing":false,
                "ignore_failure": false
            }
        }
    ]
    }
}

after ml inference search request processor, the query will be processed as following, and the request will be validated before query rewrite.

GET my-knn-index-1/_search
{
  "query": {
    "template": {
      "knn": {
        "my_vector": {
          "vector": [0,1,0,0...], // this vector is generated from model inference and substituted from ml_inference extension
          "k": 2
        }
      }
    }
  },
  "ext": {
    "ml_inference": {
      "params": {
        "text": "sunny",
        "vector": [0,1,0,0...], // this vector is generated from model inference 
      }
    }
  }
}

Benefits of Option A:
-Increased flexibility in handling different query types

  • Easier to understand, a template query contains a object, and the placeholder needs to put within double quote.

Option B: Template Query contains an object or contains a string

Refer to search template, when the query with placeholder doesn't compile, as stated the "sad case" from the description, when knn query requires an array in the vector field, the search template can accept a string

in search template, when the query expect an array, search template accepts a string of template, and then after mustache scripts replaced the string, then will validate the query request.

GET _search/template
{
  "source": "{\"query\":{\"bool\":{\"must\":[{\"terms\": {\"text_entries\": {{#toJson}}text_entries{{/toJson}} }}] }}}",
  "params": {
    "text_entries": [
        { "term": { "text_entry" : "love" } },
        { "term": { "text_entry" : "soldier" } }
    ]
  }
}

Similarly, in template query, we can implement similar design, when the query with placeholder compiles, we can accept object in constructors, when the query with placeholder do not compile, we can accept a big string to put placeholder.

for the same example of knn query, the template query will contain a string, and the placeholder ${ext.ml_inference.params.vector} don't need wrap as string representation.

GET my-knn-index-1/_search
{
  "query": {
    "template": 
   """ {
      "knn": {
        "my_vector": {
          "vector": ${ext.ml_inference.params.vector}, // this is the field generated from ml_inference search request processor 
          "k": 2
        }
      }
    }
  }
  """,
  "ext": {
    "ml_inference": {
      "params": {
        "text": "sunny"
      }
    }
  },
  "search_pipeline": {
    "request_processors": [
        {
            "ml_inference": {
                "model_id": "<model_id>",
                "input_map": [
                {
                    "inputs": "ext.ml_inference.params.text"
                }
                ],
                "output_map": [
                {
                    "ext.ml_inference.params.vector": "response"
                }
                ],
                "ignore_missing":false,
                "ignore_failure": false
            }
        }
    ]
    }
}

The same processed as option A that after ml inference search request processor, the query will be validated before query rewrite.

however, when it's a compile query in the inner query, for example, term query, the template query can also accept object

for example:

GET my-keyword/_search
{
  "query": {
    "template": {
      "term": {
        "text": {
          "value": "${ext.ml_inference.params.inference_result}", // this is the field generated from ml_inference search request processor 
        }
      }
    }
  },
  "ext": {
    "ml_inference": {
      "params": {
        "text": "sunny"
      }
    }
},
  "search_pipeline": {
    "request_processors": [
        {
            "ml_inference": {
                "model_id": "<model_id>",
                "input_map": [
                {
                    "inputs": "ext.ml_inference.params.text"
                }
                ],
                "output_map": [
                {
                    "ext.ml_inference.params.inference_result": "response"
                }
                ],
                "ignore_missing":false,
                "ignore_failure": false
            }
        }
    ]
    }
  }

When template query accepts an object, we can validate the query during constructors, this brings in earlier validation for the template query builders.

Benefits of Option B:

-Increased flexibility in handling different query types
-Earlier validation for template query builders when using objects
-Compatibility with existing search template patterns

Cons of Option B:
-writing a long string in a query might be troublesome, users might forget some escape in the string.

We're seeking feedback on these approaches to determine the most effective implementation for the Template Query feature. @zengyan-amazon @sean-zheng-amazon @ylwu-amzn @ohltyler

@dhrubo-os
Copy link

why do we want to introduce a template query type which is essentially delay the query validation? this seems more like a neural search use case. Or adding a new field in knn query in addition to vector to allow user to specify text to generate embedding?

+1 to this.

This approach separates the initial processing of the search request from the validation and construction of the query builders, allowing for more flexibility and error handling in the overall search request processing pipeline.

@mingshl could you please what kind of flexibility are you referring to? And how that is improving error handling as well?

In addition, according to the RFC headline, we are going to introduce Template query to Opensearch. Then why the RFC is in ml-commons repo? Why not in Opensearch Core?

@msfroh
Copy link
Collaborator

msfroh commented Dec 2, 2024

@mingshl -- I think this template query is a neat idea and we should consider adding it to OpenSearch core.

I'm thinking of something like #14774, which built on the pre-existing terms lookup logic in terms query. While we can't remove that functionality now (since it's shipped), I think these template queries would be a better way to do a term-based lookup.

Using a template query, @bowenlan-amzn's example from #14774:

POST products/_search
{
  "query": {
    "terms": {
      "product_id": {
        "index": "customers",
        "id": "customer123",
        "path": "customer_filter",
		"store": true  <-- new parameter to do the lookup on the stored field, instead of _source
      },
      "value_type": "bitmap" <-- new parameter in terms query to specify the data type of the terms values input
    }
  }
}

Could have been implemented as something like:

POST products/_search
{
  "query": {
    "template": {
      "terms": {
        "product_id": {
           "subquery" : {
             "index": "customers",
             "stored_fields": ["customer_filter"],
             "query": {
               "term": { "id": "customer123" }
             }
           }
         }
        }
      }
    }
  }
}

The expansion could be handled by a generic request processor that knows to handle subquery clauses under a template query, versus (some of) the changes that needed to be made to TermsQueryBuilder. Right now, if we want to add clever expansion logic (like the terms lookup), we need to modify each and every query that wants it. This template query (and appropriate search request processors) would let us treat query modification as an orthogonal problem.

@navneet1v
Copy link
Contributor

navneet1v commented Dec 4, 2024

why do we want to introduce a template query type which is essentially delay the query validation? this seems more like a neural search use case. Or adding a new field in knn query in addition to vector to allow user to specify text to generate embedding?

+1 on this.

btw, what is the impact on the OpenSearch client libraries if we want to introduce this template query?

+1 on this too.

I feel this whole use-case is about query building, I am just thinking out loud here, why we cannot build this logic in the opensearch-clients? Have we thought about that option? if yes, then please ignore the suggestion.

The expansion could be handled by a generic request processor that knows to handle subquery clauses under a template query, versus (some of) the changes that needed to be made to TermsQueryBuilder. Right now, if we want to add clever expansion logic (like the terms lookup), we need to modify each and every query that wants it. This template query (and appropriate search request processors) would let us treat query modification as an orthogonal problem.

I agree on this. A clever expansion logic is what is needed. Because currently we cannot tie back that the values for a field is filled from a ML request, it can very well be a query to another index to fetch some documents whose data gets used as a filter for the main query. There has some use-cases I have heard from different users around this.

@msfroh I was thinking in core(AbstractQueryBuilder) can we have logic that gets triggered for every query's xContent function and other relevant places. Not sure if thats the best way, but this is something I was thinking if it can help.

@mingshl
Copy link
Contributor Author

mingshl commented Dec 5, 2024

@navneet1v technically, we could build this function in clients, but we have requirements to support OpenSearch flow, which is going to build through OpenSearch Dashboard, here is the tutorials about OpenSearch Flow https://github.com/opensearch-project/dashboards-flow-framework/blob/main/documentation/tutorial.md. That is the reason why we need it on the server side.

@mingshl
Copy link
Contributor Author

mingshl commented Dec 9, 2024

according to the above suggestions, we should move template query to OpenSearch repo and make template query can be served with any search request processors, if it produces variables to PipelineProcessingContext. The template query will substitute the variables during query rewrite phase.

@msfroh helped to refactored the QueryRewriteContext to interface, so now the QueryRewriteContext can carry over the PipelineProcessingContext, which can be produced by any processors.

Raised the PR in OpenSearch repo to address changes related to template query and QueryRewriteContext. A different PR will raise in ml-commons for ml inference search extensions and emit ml inference outputs to PipelineProcessingContext.

@navneet1v
Copy link
Contributor

according to the above suggestions, we should move template query to OpenSearch repo and make template query can be served any search request processors, if it produces variables to PipelineProcessingContext. The template query will substitute the variables during query rewrite phase.

@msfroh helped to refactored the QueryRewriteContext to interface, so now the QueryRewriteContext can carry over the PipelineProcessingContext, which can be produced by any processors.

Raised the PR in OpenSearch repo to address changes related to template query and QueryRewriteContext. A different PR will raise in ml-commons for ml inference search extensions and emit ml inference outputs to PipelineProcessingContext.

this is really great. :)

@getsaurabh02 getsaurabh02 transferred this issue from opensearch-project/ml-commons Dec 10, 2024
@mingshl mingshl added the v2.19.0 Issues and PRs related to version 2.19.0 label Dec 10, 2024
@dblock dblock removed the untriaged label Dec 16, 2024
@dblock
Copy link
Member

dblock commented Dec 16, 2024

[Catch All Triage - 1, 2]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request v2.19.0 Issues and PRs related to version 2.19.0
Projects
Status: In Progress
Development

No branches or pull requests

9 participants