Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion _search-plugins/search-pipelines/collapse-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Collapse
nav_order: 10
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Hybrid score explanation
nav_order: 15
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Filter query
nav_order: 20
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
25 changes: 13 additions & 12 deletions _search-plugins/search-pipelines/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,27 +10,22 @@

# Search pipelines

You can use _search pipelines_ to build new or reuse existing result rerankers, query rewriters, and other components that operate on queries or results. Search pipelines make it easier for you to process search queries and search results within OpenSearch. Moving some of your application functionality into an OpenSearch search pipeline reduces the overall complexity of your application. As part of a search pipeline, you specify a list of processors that perform modular tasks. You can then easily add or reorder these processors to customize search results for your application.
You can use _search pipelines_ to build new or reuse existing result rerankers, query rewriters, and other components that operate on queries or results. Search pipelines make it easier for you to process search queries and search results within OpenSearch. Moving some of your application functionality into an OpenSearch search pipeline reduces the overall complexity of your application. As part of a search pipeline, you specify a list of search processors that perform modular tasks. You can then easily add or reorder these processors to customize search results for your application.

## Terminology

The following is a list of search pipeline terminology:

* [_Search request processor_]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors#search-request-processors): A component that intercepts a search request (the query and the metadata passed in the request), performs an operation with or on the search request, and returns the search request.
* [_Search response processor_]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors#search-response-processors): A component that intercepts a search response and search request (the query, results, and metadata passed in the request), performs an operation with or on the search response, and returns the search response.
* [_Search phase results processor_]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors#search-phase-results-processors): A component that runs between search phases at the coordinating node level. A search phase results processor intercepts the results retrieved from one search phase and transforms them before passing them to the next search phase.
* [_Processor_]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors/): Either a search request processor or a search response processor.
* _Search pipeline_: An ordered list of processors that is integrated into OpenSearch. The pipeline intercepts a query, performs processing on the query, sends it to OpenSearch, intercepts the results, performs processing on the results, and returns them to the calling application, as shown in the following diagram.
When defined, a search pipeline is an ordered list of search processors that is integrated into OpenSearch. The pipeline shown in the following diagram intercepts a query, performs processing on the query, sends it to OpenSearch, intercepts the results, performs processing on the results, and returns them to the calling application.

![Search processor diagram]({{site.url}}{{site.baseurl}}/images/search-pipelines.png)

Both request and response processing for the pipeline are performed on the coordinator node, so there is no shard-level processing.
{: .note}

## Processors
## Search processors

To learn more about available search processors, see [Search processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors/).
Search processors can be classified by **execution phase** (when they run):

- [Search request processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors#search-request-processors): A _search request processor_ intercepts a search request (the query and the metadata passed in the request), performs an operation with or on the search request, and submits the search request to the index.
- [Search response processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors#search-response-processors): A _search response processor_ intercepts a search response and search request (the query, results, and metadata passed in the request), performs an operation with or on the search response, and returns the search response.
- [Search phase results processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors#search-phase-results-processors): A _search phase results processor_ runs between search phases at the coordinating node level. It intercepts the results retrieved from one search phase and transforms them before passing them to the next search phase.

## Example

Expand Down Expand Up @@ -77,6 +72,12 @@

To learn about retrieving details for an existing search pipeline, see [Retrieving search pipelines]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/retrieving-search-pipeline/).

## Manual and automatic processor creation

Search processors can be created manually or automatically:

- [User-defined processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors): Processors configured manually in search pipelines, like in the preceding [example](#example).

Check failure on line 79 in _search-plugins/search-pipelines/index.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.LinksEndSlash] Add a trailing slash to the link '({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors)'. Raw Output: {"message": "[OpenSearch.LinksEndSlash] Add a trailing slash to the link '({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors)'.", "location": {"path": "_search-plugins/search-pipelines/index.md", "range": {"start": {"line": 79, "column": 28}}}, "severity": "ERROR"}
- [System-generated processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/system-generated-search-processors/): Processors automatically created by OpenSearch based on search request parameters.

## Search pipeline metrics

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: ML inference (request)
nav_order: 30
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: ML inference (response)
nav_order: 40
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Neural query enricher
nav_order: 50
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: Neural sparse query two-phase
nav_order: 60
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Normalization
nav_order: 70
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
2 changes: 1 addition & 1 deletion _search-plugins/search-pipelines/oversample-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Oversample
nav_order: 80
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Personalize search ranking
nav_order: 85
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
2 changes: 1 addition & 1 deletion _search-plugins/search-pipelines/rag-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Retrieval-augmented generation
nav_order: 115
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Rename field
nav_order: 100
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
2 changes: 1 addition & 1 deletion _search-plugins/search-pipelines/rerank-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Rerank
nav_order: 110
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: Score ranker
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
nav_order: 117
---
Expand Down
2 changes: 1 addition & 1 deletion _search-plugins/search-pipelines/script-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Script
nav_order: 120
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: default
title: Search pipeline metrics
nav_order: 50
nav_order: 60
has_children: false
parent: Search pipelines
---
Expand Down
17 changes: 7 additions & 10 deletions _search-plugins/search-pipelines/search-processors.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,20 @@
---
layout: default
title: Search processors
title: User-defined search processors
nav_order: 40
has_children: true
parent: Search pipelines
---

# Search processors
# User-defined search processors

Search processors can be of the following types:
**User-defined search processors** are processors that you manually configure in search pipelines to customize search behavior. You define these processors in your pipeline configuration and control their parameters, execution order, and conditions.

- [Search request processors](#search-request-processors)
- [Search response processors](#search-response-processors)
- [Search phase results processors](#search-phase-results-processors)
The following sections list all user-defined search processors available in OpenSearch. OpenSearch can also create processors automatically based on search request parameters. For more information, see [System-generated search processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/system-generated-search-processors/).

## Search request processors

A search request processor intercepts a search request (the query and the metadata passed in the request), performs an operation with or on the search request, and submits the search request to the index.
A _search request processor_ intercepts a search request (the query and the metadata passed in the request), performs an operation with or on the search request, and submits the search request to the index.

The following table lists all supported search request processors.

Expand All @@ -31,7 +29,7 @@ Processor | Description | Earliest available version

## Search response processors

A search response processor intercepts a search response and search request (the query, results, and metadata passed in the request), performs an operation with or on the search response, and returns the search response.
A _search response processor_ intercepts a search response and search request (the query, results, and metadata passed in the request), performs an operation with or on the search response, and returns the search response.

The following table lists all supported search response processors.

Expand All @@ -48,10 +46,9 @@ Processor | Description | Earliest available version
[`split`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/split-processor/)| Splits a string field into an array of substrings based on a specified delimiter. | 2.17
[`truncate_hits`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/)| Discards search hits after a specified target count is reached. Can undo the effect of the `oversample` request processor. | 2.12


## Search phase results processors

A search phase results processor runs between search phases at the coordinating node level. It intercepts the results retrieved from one search phase and transforms them before passing them to the next search phase.
A _search phase results processor_ runs between search phases at the coordinating node level. It intercepts the results retrieved from one search phase and transforms them before passing them to the next search phase.

The following table lists all supported search phase results processors.

Expand Down
2 changes: 1 addition & 1 deletion _search-plugins/search-pipelines/sort-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Sort
nav_order: 130
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
2 changes: 1 addition & 1 deletion _search-plugins/search-pipelines/split-processor.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Split
nav_order: 140
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
layout: default
title: System-generated search processors
nav_order: 50
has_children: false
parent: Search pipelines
---

# System-generated search processors
**Introduced 3.3**
{: .label .label-purple }

System-generated search processors are processors that OpenSearch creates automatically based on the search request. Unlike [user-defined processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors/) that you manually configure in pipelines, system-generated processors are triggered automatically when certain features are used, eliminating the need for manual processor configuration.

## Enabling system-generated search processors

To enable system-generated search processor creation, set the `cluster.search.enabled_system_generated_factories` cluster setting to `*` (all factories) or explicitly list the factories you want to enable. The following example enables `mmr_over_sample_factory` and `mmr_rerank_factory`:

```json
PUT _cluster/settings
{
"persistent": {
"cluster.search.enabled_system_generated_factories": [
"mmr_over_sample_factory",
"mmr_rerank_factory"
]
}
}
```
{% include copy-curl.html %}

## Processor types

OpenSearch supports the following types of system-generated processors:

* [Search request processors](#system-generated-search-request-processors)
* [Search response processors](#system-generated-search-response-processors)

Each system-generated processor runs at a fixed execution stage, either before or after user-defined processors of the same type.
{: .note}

### System-generated search request processors

The following table lists the available system-generated search request processors.

| Processor name | Processor factory name | Execution stage | Trigger condition | Description |
| ----------------- | ------------------------- | ------------------- | ---------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `mmr_over_sample` | `mmr_over_sample_factory` | Runs after any user-defined request processors. | Triggered when a search request includes the `mmr` parameter in the `ext` object. See [Vector search with MMR reranking]({{site.url}}{{site.baseurl}}/vector-search/specialized-operations/vector-search-mmr/). | Adjusts the query size and `k` value of the `knn` or `neural` query to oversample candidates for maximal marginal relevance (MMR) reranking. |

### System-generated search response processors

The following table lists the available system-generated search response processors.

| Processor name | Processor factory name | Execution stage | Trigger condition | Description |
Copy link
Member

@junqiu-lei junqiu-lei Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need add semantic-highlighter processor.

| `semantic-highlighter`   | `semantic-highlighter`   | Runs after any user-defined response processors. | Triggered when a search request contains `semantic` type highlight with `options.batch_inference` set to `true`. See [Using semantic highlighting]({{site.url}}{{site.baseurl}}/tutorials/vector-search/semantic-highlighting-tutorial/). | batch inference processing semantic highlighting  |

Reference PR: #11137

| -------------- | ---------------------- | ------------------ | ---------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `mmr_rerank` | `mmr_rerank_factory` | Runs before any user-defined response processors. | Triggered when a search request includes the `mmr` parameter in the `ext` object. See [Vector search with MMR reranking]({{site.url}}{{site.baseurl}}/vector-search/specialized-operations/vector-search-mmr/). | Reranks the oversampled results using MMR and reduces them to the original query size. |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add explanation for execution stage here as well?

The execution stage determines whether a system-generated processor runs before or after user-defined processors of the same type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bzhangam @heemin32 So a given system-generated processor can run either before or after user-defined processors? Or does the system-generated processor type determine when the processor runs? So, for example, does mmr_rerank always run before the user-defined processors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already fixed by implementation. So, for mmr_rerank, it always run before user-defined processor.

## Limitations

The following limitations apply to system-generated processors:

- OpenSearch supports only **one system-generated processor per processor type and execution stage** for a given search request. Since each processor type (request and response) can run at two execution stages (before or after user-defined processors), a single search request can include multiple system-generated processors, as long as they are of different types or run at different execution stages. This limitation ensures deterministic execution order and predictable behavior.

## Related pages

- [Vector search with MMR reranking]({{site.url}}{{site.baseurl}}/vector-search/specialized-operations/vector-search-mmr/)
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: Truncate hits
nav_order: 150
has_children: false
parent: Search processors
parent: User-defined search processors
grand_parent: Search pipelines
---

Expand Down
3 changes: 3 additions & 0 deletions _vector-search/specialized-operations/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ cards:
- heading: "Radial search"
description: "Search all points in a vector space that reside within a specified maximum distance or minimum score threshold from a query point"
link: "/vector-search/specialized-operations/radial-search-knn/"
- heading: "Vector search with MMR reranking"
description: "Improve vector search results by automatically reranking for both relevance and diversity using maximal marginal relevance (MMR)"
link: "/vector-search/specialized-operations/vector-search-mmr/"
---

# Specialized vector search
Expand Down
Loading
Loading