Skip to content

Conversation

bzhangam
Copy link
Contributor

Description

Documentation for native MMR support

Issues Resolved

Closes #11022

Version

List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all.
3.3

Frontend features

If you're submitting documentation for an OpenSearch Dashboards feature, add a video that shows how a user will interact with the UI step by step. A voiceover is optional.
N/A

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

| Processor name | Processor factory name | Execution stage | Trigger condition | Description |
|----------------|------------------------|---------------------|------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `mmr_rerank` | `mmr_rerank_factory` | `PRE_USER_DEFINED` | Triggered when a search request includes the mmr extension | Re-ranks the oversampled results using MMR and reduces them to the original query size. This processor runs before any user-defined search response processors. |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add explanation for execution stage here as well?

The execution stage determines whether a system-generated processor runs before or after user-defined processors of the same type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bzhangam @heemin32 So a given system-generated processor can run either before or after user-defined processors? Or does the system-generated processor type determine when the processor runs? So, for example, does mmr_rerank always run before the user-defined processors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already fixed by implementation. So, for mmr_rerank, it always run before user-defined processor.


# Prerequisites

To use MMR, you must enable [system-generated search processor factories](({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/system-generated-search-processors/)). Set the `cluster.search.enabled_system_generated_factories` setting to either `*` or explicitly include the required factories:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default setting value? If it is already *, this shouldn't be prerequisite but we can adds as an extra note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default it is an empty list.

@kolchfa-aws kolchfa-aws added Tech review PR: Tech review in progress release-notes PR: Include this PR in the automated release notes v3.3.0 labels Sep 24, 2025
@kolchfa-aws
Copy link
Collaborator

@bzhangam Have you addressed tech review comments (is the PR ready for doc team review)?

@heemin32
Copy link
Contributor

heemin32 commented Oct 1, 2025

@bzhangam Have you addressed tech review comments (is the PR ready for doc team review)?

Tech review comments are addressed. Thanks.

@kolchfa-aws kolchfa-aws added Doc review PR: Doc review in progress and removed Tech review PR: Tech review in progress labels Oct 1, 2025

# Limitation

## One processor per type and execution stage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bzhangam @heemin32 Given that there are 2 processor types (request and response) and 2 execution stages (PRE_USER_DEFINED and POST_USER_DEFINED), does this mean that a single search request can have up to 4 system-generated processors (request processor at PRE_USER_DEFINED stage, request processor at POST_USER_DEFINED stage, response processor at PRE_USER_DEFINED stage, and response processor at POST_USER_DEFINED stage)? Or is the limit actually 2 total processors (one request + one response) regardless of execution stage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total system generated processors which can be run is 6.
Three type: SearchRequestProcessor, SearchPhaseResultProcessor, SearchResponseProcessor
Two stage: PRE_USER_DEFIEND, POST_USER_DEFINED

opensearch-project/OpenSearch#19062 (comment)


| Processor name | Processor factory name | Execution stage | Trigger condition | Description |
|-------------------|---------------------------| ------------------- |-------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `mmr_over_sample` | `mmr_over_sample_factory` | `POST_USER_DEFINED` | Triggered when a search request includes the mmr extension | Modifies the query size and `k` value of the k-NN query to oversample candidates for MMR re-ranking. This processor runs after any user-defined search request processors. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be knn and neural queries because this processor supports both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That is correct. Thanks for catching it.

Signed-off-by: Fanit Kolchina <[email protected]>
| `diversity` | float | No | Controls the weight of diversity in the re-ranking process. Valid values range from `0` to `1`. A value of `1` prioritizes maximum diversity, and `0` disables diversity. Default is `0.5`. |
| `candidates` | integer | No | Specifies how many candidate documents to oversample before re-ranking. Default is `3 * query size`. |
| `vector_field_path` | string | Optional, but required for remote indices | Path to the vector field used for MMR re-ranking. If not provided, OpenSearch resolves it automatically from the search request. |
| `vector_field_data_type` | string | Optional, but required for remote indices | Data type of the vector field. Used to parse the field and calculate similarity. If not provided, OpenSearch resolves it from the index mapping. |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For data type and space type should we add a link to the https://docs.opensearch.org/latest/field-types/supported-field-types/knn-vector/#parameters to show what are the valid values? It should be valid values of the data type and space type of the knn vector.

Signed-off-by: Fanit Kolchina <[email protected]>
@kolchfa-aws kolchfa-aws added Editorial review PR: Editorial review in progress and removed Doc review PR: Doc review in progress labels Oct 6, 2025
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Editorial review

Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@natebower natebower removed the Editorial review PR: Editorial review in progress label Oct 7, 2025
@natebower natebower merged commit 145c513 into opensearch-project:main Oct 7, 2025
6 checks passed

The following table lists the available system-generated search response processors.

| Processor name | Processor factory name | Execution stage | Trigger condition | Description |
Copy link
Member

@junqiu-lei junqiu-lei Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need add semantic-highlighter processor.

| `semantic-highlighter`   | `semantic-highlighter`   | Runs after any user-defined response processors. | Triggered when a search request contains `semantic` type highlight with `options.batch_inference` set to `true`. See [Using semantic highlighting]({{site.url}}{{site.baseurl}}/tutorials/vector-search/semantic-highlighting-tutorial/). | batch inference processing semantic highlighting  |

Reference PR: #11137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes PR: Include this PR in the automated release notes v3.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Documentation for Native MMR Support
5 participants