-
Notifications
You must be signed in to change notification settings - Fork 613
Documentation for native MMR support #11024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged. Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer. When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review. |
| Processor name | Processor factory name | Execution stage | Trigger condition | Description | | ||
|----------------|------------------------|---------------------|------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| `mmr_rerank` | `mmr_rerank_factory` | `PRE_USER_DEFINED` | Triggered when a search request includes the mmr extension | Re-ranks the oversampled results using MMR and reduces them to the original query size. This processor runs before any user-defined search response processors. | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add explanation for execution stage here as well?
The execution stage determines whether a system-generated processor runs before or after user-defined processors of the same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already fixed by implementation. So, for mmr_rerank
, it always run before user-defined processor.
|
||
# Prerequisites | ||
|
||
To use MMR, you must enable [system-generated search processor factories](({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/system-generated-search-processors/)). Set the `cluster.search.enabled_system_generated_factories` setting to either `*` or explicitly include the required factories: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the default setting value? If it is already *, this shouldn't be prerequisite but we can adds as an extra note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default it is an empty list.
Signed-off-by: Bo Zhang <[email protected]>
@bzhangam Have you addressed tech review comments (is the PR ready for doc team review)? |
Tech review comments are addressed. Thanks. |
|
||
# Limitation | ||
|
||
## One processor per type and execution stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bzhangam @heemin32 Given that there are 2 processor types (request and response) and 2 execution stages (PRE_USER_DEFINED and POST_USER_DEFINED), does this mean that a single search request can have up to 4 system-generated processors (request processor at PRE_USER_DEFINED stage, request processor at POST_USER_DEFINED stage, response processor at PRE_USER_DEFINED stage, and response processor at POST_USER_DEFINED stage)? Or is the limit actually 2 total processors (one request + one response) regardless of execution stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total system generated processors which can be run is 6.
Three type: SearchRequestProcessor, SearchPhaseResultProcessor, SearchResponseProcessor
Two stage: PRE_USER_DEFIEND, POST_USER_DEFINED
|
||
| Processor name | Processor factory name | Execution stage | Trigger condition | Description | | ||
|-------------------|---------------------------| ------------------- |-------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| `mmr_over_sample` | `mmr_over_sample_factory` | `POST_USER_DEFINED` | Triggered when a search request includes the mmr extension | Modifies the query size and `k` value of the k-NN query to oversample candidates for MMR re-ranking. This processor runs after any user-defined search request processors. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be knn
and neural
queries because this processor supports both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That is correct. Thanks for catching it.
Signed-off-by: Fanit Kolchina <[email protected]>
| `diversity` | float | No | Controls the weight of diversity in the re-ranking process. Valid values range from `0` to `1`. A value of `1` prioritizes maximum diversity, and `0` disables diversity. Default is `0.5`. | | ||
| `candidates` | integer | No | Specifies how many candidate documents to oversample before re-ranking. Default is `3 * query size`. | | ||
| `vector_field_path` | string | Optional, but required for remote indices | Path to the vector field used for MMR re-ranking. If not provided, OpenSearch resolves it automatically from the search request. | | ||
| `vector_field_data_type` | string | Optional, but required for remote indices | Data type of the vector field. Used to parse the field and calculate similarity. If not provided, OpenSearch resolves it from the index mapping. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For data type and space type should we add a link to the https://docs.opensearch.org/latest/field-types/supported-field-types/knn-vector/#parameters to show what are the valid values? It should be valid values of the data type and space type of the knn vector.
Signed-off-by: Fanit Kolchina <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Editorial review
_search-plugins/search-pipelines/system-generated-search-processors.md
Outdated
Show resolved
Hide resolved
_search-plugins/search-pipelines/system-generated-search-processors.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Nathan Bower <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
The following table lists the available system-generated search response processors. | ||
|
||
| Processor name | Processor factory name | Execution stage | Trigger condition | Description | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need add semantic-highlighter
processor.
| `semantic-highlighter` | `semantic-highlighter` | Runs after any user-defined response processors. | Triggered when a search request contains `semantic` type highlight with `options.batch_inference` set to `true`. See [Using semantic highlighting]({{site.url}}{{site.baseurl}}/tutorials/vector-search/semantic-highlighting-tutorial/). | batch inference processing semantic highlighting |
Reference PR: #11137
Description
Documentation for native MMR support
Issues Resolved
Closes #11022
Version
List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all.
3.3
Frontend features
If you're submitting documentation for an OpenSearch Dashboards feature, add a video that shows how a user will interact with the UI step by step. A voiceover is optional.
N/A
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.