Documentation for native MMR support #11024

bzhangam · 2025-09-23T22:47:04Z

Description

Documentation for native MMR support

Issues Resolved

Version

List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all.
3.3

Frontend features

If you're submitting documentation for an OpenSearch Dashboards feature, add a video that shows how a user will interact with the UI step by step. A voiceover is optional.
N/A

Checklist

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2025-09-23T22:47:12Z

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

heemin32 · 2025-09-23T22:54:28Z

_search-plugins/search-pipelines/system-generated-search-processors.md

+| Processor name | Processor factory name | Execution stage     | Trigger condition                                          | Description                                                                                                                                                     |
+|----------------|------------------------|---------------------|------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `mmr_rerank`   | `mmr_rerank_factory`   | `PRE_USER_DEFINED`  | Triggered when a search request includes the mmr extension | Re-ranks the oversampled results using MMR and reduces them to the original query size. This processor runs before any user-defined search response processors. |
+


We might want to add explanation for execution stage here as well?

The execution stage determines whether a system-generated processor runs before or after user-defined processors of the same type.

@bzhangam @heemin32 So a given system-generated processor can run either before or after user-defined processors? Or does the system-generated processor type determine when the processor runs? So, for example, does mmr_rerank always run before the user-defined processors?

It is already fixed by implementation. So, for mmr_rerank, it always run before user-defined processor.

heemin32 · 2025-09-23T22:57:30Z

_vector-search/specialized-operations/vector-search-mmr.md

+
+# Prerequisites
+
+To use MMR, you must enable [system-generated search processor factories](({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/system-generated-search-processors/)). Set the `cluster.search.enabled_system_generated_factories` setting to either `*` or explicitly include the required factories:


What is the default setting value? If it is already *, this shouldn't be prerequisite but we can adds as an extra note.

By default it is an empty list.

Signed-off-by: Bo Zhang <[email protected]>

kolchfa-aws · 2025-10-01T15:13:07Z

@bzhangam Have you addressed tech review comments (is the PR ready for doc team review)?

heemin32 · 2025-10-01T15:19:04Z

@bzhangam Have you addressed tech review comments (is the PR ready for doc team review)?

Tech review comments are addressed. Thanks.

kolchfa-aws · 2025-10-02T17:04:26Z

_search-plugins/search-pipelines/system-generated-search-processors.md

+
+# Limitation
+
+## One processor per type and execution stage


@bzhangam @heemin32 Given that there are 2 processor types (request and response) and 2 execution stages (PRE_USER_DEFINED and POST_USER_DEFINED), does this mean that a single search request can have up to 4 system-generated processors (request processor at PRE_USER_DEFINED stage, request processor at POST_USER_DEFINED stage, response processor at PRE_USER_DEFINED stage, and response processor at POST_USER_DEFINED stage)? Or is the limit actually 2 total processors (one request + one response) regardless of execution stage?

Total system generated processors which can be run is 6.
Three type: SearchRequestProcessor, SearchPhaseResultProcessor, SearchResponseProcessor
Two stage: PRE_USER_DEFIEND, POST_USER_DEFINED

opensearch-project/OpenSearch#19062 (comment)

kolchfa-aws · 2025-10-02T18:40:15Z

_search-plugins/search-pipelines/system-generated-search-processors.md

+
+| Processor name    | Processor factory name    | Execution stage     | Trigger condition                                           | Description                                                                                                                                                                |
+|-------------------|---------------------------| ------------------- |-------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `mmr_over_sample` | `mmr_over_sample_factory` | `POST_USER_DEFINED` | Triggered when a search request includes the mmr extension  | Modifies the query size and `k` value of the k-NN query to oversample candidates for MMR re-ranking. This processor runs after any user-defined search request processors. |


Should this be knn and neural queries because this processor supports both?

Yes. That is correct. Thanks for catching it.

Signed-off-by: Fanit Kolchina <[email protected]>

bzhangam · 2025-10-06T17:32:33Z

_vector-search/specialized-operations/vector-search-mmr.md

+| `diversity`               | float     | No                                        | Controls the weight of diversity in the re-ranking process. Valid values range from `0` to `1`. A value of `1` prioritizes maximum diversity, and `0` disables diversity. Default is `0.5`. |
+| `candidates`              | integer   | No                                        | Specifies how many candidate documents to oversample before re-ranking. Default is `3 * query size`.                                                                                        |
+| `vector_field_path`       | string    | Optional, but required for remote indices | Path to the vector field used for MMR re-ranking. If not provided, OpenSearch resolves it automatically from the search request.                                                            |
+| `vector_field_data_type`  | string    | Optional, but required for remote indices | Data type of the vector field. Used to parse the field and calculate similarity. If not provided, OpenSearch resolves it from the index mapping.                                            |


For data type and space type should we add a link to the https://docs.opensearch.org/latest/field-types/supported-field-types/knn-vector/#parameters to show what are the valid values? It should be valid values of the data type and space type of the knn vector.

Signed-off-by: Fanit Kolchina <[email protected]>

natebower

Editorial review

_search-plugins/search-pipelines/index.md

_search-plugins/search-pipelines/system-generated-search-processors.md

_vector-search/specialized-operations/vector-search-mmr.md

Signed-off-by: Nathan Bower <[email protected]>

natebower

LGTM

junqiu-lei · 2025-10-07T20:14:12Z

_search-plugins/search-pipelines/system-generated-search-processors.md

+
+The following table lists the available system-generated search response processors.
+
+| Processor name | Processor factory name | Execution stage    | Trigger condition                                          | Description                                                                                                                               |


We also need add semantic-highlighter processor.

| `semantic-highlighter` | `semantic-highlighter` | Runs after any user-defined response processors. | Triggered when a search request contains `semantic` type highlight with `options.batch_inference` set to `true`. See [Using semantic highlighting]({{site.url}}{{site.baseurl}}/tutorials/vector-search/semantic-highlighting-tutorial/). | batch inference processing semantic highlighting |

Reference PR: #11137

bzhangam requested review from AMoo-Miki, dlvenable, epugh, kolchfa-aws, natebower and sumobrian as code owners September 23, 2025 22:47

github-actions bot assigned kolchfa-aws Sep 23, 2025

heemin32 reviewed Sep 23, 2025

View reviewed changes

kolchfa-aws added Tech review PR: Tech review in progress release-notes PR: Include this PR in the automated release notes v3.3.0 labels Sep 24, 2025

bzhangam force-pushed the main branch from e62e7ee to 292deae Compare September 24, 2025 17:06

Documentation for native MMR support

29682c6

Signed-off-by: Bo Zhang <[email protected]>

bzhangam force-pushed the main branch from 292deae to 29682c6 Compare September 24, 2025 17:19

heemin32 approved these changes Oct 1, 2025

View reviewed changes

kolchfa-aws added Doc review PR: Doc review in progress and removed Tech review PR: Tech review in progress labels Oct 1, 2025

kolchfa-aws reviewed Oct 2, 2025

View reviewed changes

Doc review

03c79e4

Signed-off-by: Fanit Kolchina <[email protected]>

bzhangam commented Oct 6, 2025

View reviewed changes

Address questions answered

344d7cf

Signed-off-by: Fanit Kolchina <[email protected]>

kolchfa-aws added Editorial review PR: Editorial review in progress and removed Doc review PR: Doc review in progress labels Oct 6, 2025

kolchfa-aws assigned natebower Oct 6, 2025

natebower reviewed Oct 7, 2025

View reviewed changes

Apply suggestions from code review

01a5b08

Signed-off-by: Nathan Bower <[email protected]>

natebower approved these changes Oct 7, 2025

View reviewed changes

natebower removed the Editorial review PR: Editorial review in progress label Oct 7, 2025

natebower merged commit 145c513 into opensearch-project:main Oct 7, 2025
6 checks passed

junqiu-lei reviewed Oct 7, 2025

View reviewed changes


		# Prerequisites

		To use MMR, you must enable [system-generated search processor factories](({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/system-generated-search-processors/)). Set the `cluster.search.enabled_system_generated_factories` setting to either `*` or explicitly include the required factories:


		The following table lists the available system-generated search response processors.

		\| Processor name \| Processor factory name \| Execution stage \| Trigger condition \| Description \|

Documentation for native MMR support #11024

Documentation for native MMR support #11024

Conversation

bzhangam commented Sep 23, 2025

Description

Issues Resolved

Version

Frontend features

Checklist

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kolchfa-aws commented Oct 1, 2025

Uh oh!

heemin32 commented Oct 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natebower left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

natebower left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

junqiu-lei Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

junqiu-lei Oct 7, 2025 •

edited

Loading