Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for pruning neural sparse vectors #8984

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 19 additions & 8 deletions _ingest-pipelines/processors/sparse-encoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,30 @@ The following table lists the required and optional parameters for the `sparse_e
| Parameter | Data type | Required/Optional | Description |
|:---|:---|:---|:---|
`model_id` | String | Required | The ID of the model that will be used to generate the embeddings. The model must be deployed in OpenSearch before it can be used in neural search. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/).
`prune_type` | String | Optional | The prune strategy for sparse vectors. Choose one value from `max_ratio`, `alpha_mass`, `top_k`, `abs_value` and `none`. Default value is `none`.
zhichao-aws marked this conversation as resolved.
Show resolved Hide resolved
`prune_ratio` | Float | Optional | The ratio for prune strategy. Once the `prune_type` is provided, `prune_ratio` field is required.
zhichao-aws marked this conversation as resolved.
Show resolved Hide resolved
`field_map` | Object | Required | Contains key-value pairs that specify the mapping of a text field to a `rank_features` field.
`field_map.<input_field>` | String | Required | The name of the field from which to obtain text for generating vector embeddings.
`field_map.<vector_field>` | String | Required | The name of the vector field in which to store the generated vector embeddings.
`description` | String | Optional | A brief description of the processor. |
`tag` | String | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. |
`batch_size` | Integer | Optional | Specifies the number of documents to be batched and processed each time. Default is `1`. |

### Sparse vectors prune
zhichao-aws marked this conversation as resolved.
Show resolved Hide resolved
The token weights in sparse vectors exhibit a significant long-tail distribution, where tokens with lower semantic importance occupy a large portion of the storage space. Prune is to remove less-important tokens based on their weights. It trades some search relevance for much smaller index size.
zhichao-aws marked this conversation as resolved.
Show resolved Hide resolved

The `sparse_encoding` processor can be used to prune sparse vectors by configuring `prune_type` and `prune_ratio` parameters. The following table lists the supported prune options for `sparse_encoding` processor.
zhichao-aws marked this conversation as resolved.
Show resolved Hide resolved

| Prune type | Valid prune ratio | Description |
|:---|:---|:---|
max_ratio | Float in [0, 1) | Prunes a sparse vector by keeping only elements whose values are within the prune_ratio of the max value in the vector.
zhichao-aws marked this conversation as resolved.
Show resolved Hide resolved
abs_value | Float in (0, +∞) | Prunes a sparse vector by removing elements with values below the prune_ratio.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
abs_value | Float in (0, +∞) | Prunes a sparse vector by removing elements with values below the prune_ratio.
`abs_value` | Float (0, +∞) | Prunes a sparse vector by removing elements with values below the `prune_ratio`.

alpha_mass | Float in [0, 1) | Prunes a sparse vector by keeping only elements whose cumulative sum of values is within the prune_ratio of the total sum.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
alpha_mass | Float in [0, 1) | Prunes a sparse vector by keeping only elements whose cumulative sum of values is within the prune_ratio of the total sum.
`alpha_mass` | Float [0, 1) | Prunes a sparse vector by keeping only elements whose cumulative sum of values is within the `prune_ratio` of the total sum.

top_k | Integer in (0, +∞) | Prunes a sparse vector by keeping only the top prune_ratio elements with the highest values.
zhichao-aws marked this conversation as resolved.
Show resolved Hide resolved
none | - | Does nothing on sparse vectors.
zhichao-aws marked this conversation as resolved.
Show resolved Hide resolved

Among all prune options, the combination of (`max_ratio`, 0.1) demonstrates great generalization on test datasets. Which saves around 40% storage at a cost of <1% search relevance loss.
zhichao-aws marked this conversation as resolved.
Show resolved Hide resolved

## Using the processor

Follow these steps to use the processor in a pipeline. You must provide a model ID when creating the processor. For more information, see [Using custom models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
Expand All @@ -59,6 +76,8 @@ PUT /_ingest/pipeline/nlp-ingest-pipeline
{
"sparse_encoding": {
"model_id": "aP2Q8ooBpBj3wT4HVS8a",
"prune_type": "max_ratio",
"prune_ratio": 0.1,
"field_map": {
"passage_text": "passage_embedding"
}
Expand Down Expand Up @@ -111,23 +130,15 @@ The response confirms that in addition to the `passage_text` field, the processo
"worlds" : 2.7839446,
"yes" : 0.75845814,
"##world" : 2.5432441,
"born" : 0.2682308,
"nothing" : 0.8625516,
"goodbye" : 0.17146169,
"greeting" : 0.96817183,
"birth" : 1.2788506,
"come" : 0.1623208,
"global" : 0.4371151,
"it" : 0.42951578,
"life" : 1.5750692,
"thanks" : 0.26481047,
"world" : 4.7300377,
"tiny" : 0.5462298,
"earth" : 2.6555297,
"universe" : 2.0308156,
"worldwide" : 1.3903781,
"hello" : 6.696973,
"so" : 0.20279501,
"?" : 0.67785245
},
"passage_text" : "hello world"
Expand Down
2 changes: 2 additions & 0 deletions _search-plugins/neural-sparse-with-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,8 @@ PUT /_ingest/pipeline/nlp-ingest-pipeline-sparse
{
"sparse_encoding": {
"model_id": "<bi-encoder or doc-only model ID>",
"prune_type": "max_ratio",
"prune_ratio": 0.1,
"field_map": {
"passage_text": "passage_embedding"
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ Field | Data type | Description
:--- | :--- | :---
`enabled` | Boolean | Controls whether the two-phase processor is enabled. Default is `true`.
`two_phase_parameter` | Object | A map of key-value pairs representing the two-phase parameters and their associated values. You can specify the value of `prune_ratio`, `expansion_rate`, `max_window_size`, or any combination of these three parameters. Optional.
`two_phase_parameter.prune_ratio` | Float | A ratio that represents how to split the high-weight tokens and low-weight tokens. The threshold is the token's maximum score multiplied by its `prune_ratio`. Valid range is [0,1]. Default is `0.4`
`two_phase_parameter.prune_type` | String | The prune strategy of how to split the high-weight tokens and low-weight tokens. Default is `max_ratio`. See prune options [here]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/processors/sparse-encoding/#sparse-vectors-prune).
zhichao-aws marked this conversation as resolved.
Show resolved Hide resolved
`two_phase_parameter.prune_ratio` | Float | A ratio that represents how to split the high-weight tokens and low-weight tokens. The threshold is the token's maximum score multiplied by its `prune_ratio`. Valid range is [0,1] for `max_ratio` prune_type. Default is `0.4`
zhichao-aws marked this conversation as resolved.
Show resolved Hide resolved
`two_phase_parameter.expansion_rate` | Float | The rate at which documents will be fine-tuned during the second phase. The second-phase document number equals the query size (default is 10) multiplied by its expansion rate. Valid range is greater than 1.0. Default is `5.0`
`two_phase_parameter.max_window_size` | Int | The maximum number of documents that can be processed using the two-phase processor. Valid range is greater than 50. Default is `10000`.
`tag` | String | The processor's identifier. Optional.
Expand Down
Loading