From 7d4ab4a97d0d029d741fc0cf10bd87f88fcb374a Mon Sep 17 00:00:00 2001
From: Dagney <110415401+dagneyb@users.noreply.github.com>
Date: Thu, 7 Dec 2023 09:16:48 -0800
Subject: [PATCH 01/16] Update
 2023-12-05-improving-document-retrieval-with-spade-semantic-encoders.md

Signed-off-by: Dagney <110415401+dagneyb@users.noreply.github.com>
---
 ...ument-retrieval-with-spade-semantic-encoders.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/_posts/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders.md
index 13f5829085..eebb582491 100644
--- a/_posts/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders.md
+++ b/_posts/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders.md
@@ -10,16 +10,16 @@ date: 2023-12-05 01:00:00 -0700
 categories:
     - technical-posts
 meta_keywords: search relevance, neural sparse search, semantic search, semantic search with sparse encoders
-meta_description: Learn how the neural sparse framework in OpenSearch 2.11 can help you improve search relevance and optimize semantic searches with spare encoders using just a few APIs.
+meta_description: Learn how the neural sparse framework in OpenSearch 2.11 can help you improve search relevance and optimize semantic searches with sparse encoders using just a few APIs.
 has_science_table: true
 ---
 
-In our previous [blog post](https://opensearch.org/blog/semantic-science-benchmarks), one finding shared was that zero-shot semantic search based on dense encoders will have challenges when being applied to scenarios with unfamiliar corpus. This was highlighted with the [BEIR](https://github.com/beir-cellar/beir) benchmark, which consists of diverse retrieval tasks so that the “transferability” of a pretrained embedding model to unseen datasets can be evaluated.
+In our previous [blog post](https://opensearch.org/blog/semantic-science-benchmarks), one finding shared was that zero-shot semantic search based on dense encoders will have challenges when being applied to scenarios with an unfamiliar corpus. This was highlighted with the [BEIR](https://github.com/beir-cellar/beir) benchmark, which consists of diverse retrieval tasks, in order to evaluate the “transferability” of a pretrained embedding model to unseen datasets.
 
 In this blog post, we will present Neural Sparse, our sparse semantic retrieval framework that is now the top-performing search method on the latest BEIR benchmark. You will learn about semantic search with sparse encoders as well as how to implement this method in OpenSearch with just a few API calls.
 
 ## Sparse Encoder is now a better choice
-When using transformer-based encoders (e.g. BERT) in traditional dense text embedding, the output of each position in the response layer is translated into a vector, projecting the text into a semantic vector space where distance correlates to similarity in meaning. Neural sparse conducts the process in a novel way that makes the encoder “vote” for the most representative BERT tokens. The vocabulary being adopted (WordPiece) contains most daily used words and also various suffixes, including tense suffixes (for example, ##ed, ##ing,) and common word roots (for example, ##ate, ##ion), where the symbol ## represents continuation. The vocabulary itself spans into a semantic space where all the documents can be regarded as sparse vectors.
+When using transformer-based encoders (e.g. BERT) in traditional dense text embedding, the output of each position in the response layer is translated into a vector, projecting the text into a semantic vector space where distance correlates to similarity in meaning. Neural sparse conducts the process in a novel way that makes the encoder “vote” for the most representative BERT tokens. The vocabulary being adopted (WordPiece) contains most daily used words and also various suffixes, including tense suffixes (for example, `-ed`, `-ing`,) and common word roots (for example, `-ate`, `-ion`). The vocabulary itself spans into a semantic space where all the documents can be regarded as sparse vectors.
 
 <table style="border:none">
   <tr>
@@ -37,16 +37,16 @@ When using transformer-based encoders (e.g. BERT) in traditional dense text embe
   </tr>
 </table>
 
-Searching with dense embedding will present challenges when facing “unfamiliar” content. In this case, the encoder will produce unpredictable embeddings, leading to bad relevance. That is also why in some BEIR datasets that contain strong domain knowledge, BM25 is the still best performer. In these cases, sparse encoders will try to degenerate themselves into keyword-based matching, protecting the search result to be no worse than BM25. A relevance comparison is provided in **Table I**.
+Searching with dense embedding presents challenges when facing “unfamiliar” content. In this case, the encoder will produce unpredictable embeddings, leading to bad relevance. That is also why in some BEIR datasets that contain strong domain knowledge, BM25 performs best. In these cases, sparse encoders will try to degenerate themselves into keyword-based matching, protecting the search result to be no worse than BM25. A relevance comparison is provided in **Table I**.
 
 In dense encoding, documents are usually represented as high-dimensional vectors; therefore, k-NN indexes need to be adopted in similarity search. On the contrary, the sparse encoding results are more similar to “term vectors” used by keyword-based matching; therefore, native Lucene indexes can be leveraged. Compared to k-NN indexes, sparse embeddings has the following advantages, leading to reduced costs: 1) Much smaller index size, 2) Reduced runtime RAM cost, and 3) Lower computation cost. The quantized comparison can be found in **Table II**.
 
 ### Try extreme efficiency with document-only encoders
-There are two modes supported by Neural Sparse: 1) with bi-encoders and 2) with document-only encoders. Bi-encoder mode is outlined above, while document-only mode, wherein the search queries are tokenized instead of being passed to deep encoders. In this mode, the document encoders are trained to learn more synonym association so as to increase the recall. And by eliminating the online inference phase, a few computational resources can be saved while the latency can also be reduced significantly. We can observe this in **Table II** by comparing “Neural Sparse Doc-only” with other solutions.
+There are two modes supported by Neural Sparse: 1) with bi-encoders and 2) with document-only encoders. Bi-encoder mode is outlined above, while in document-only mode search queries are tokenized instead of being passed to deep encoders. In this mode, the document encoders are trained to learn more synonym association in order to increase the recall. And by eliminating the online inference phase, computational resources can be saved and latency can also be reduced significantly. We can observe this in **Table II** by comparing “Neural Sparse Doc-only” with other solutions.
 
 ## Neural Sparse Search outperforms in Benchmarking
 
-We have conducted some benchmarking using a cluster containing 3 r5.8xlarge data nodes and 1 r5.12xlarge leader&ml node. First, all the evaluated methods are compared in terms of NCDG@10. Then we also compare the runtime speed of each method as well as the resource cost.
+We have conducted benchmarking using a cluster containing 3 r5.8xlarge data nodes and 1 r5.12xlarge leader&ml node. First, all the evaluated methods are compared in terms of NCDG@10. We also compare the runtime speed of each method as well as the resource cost.
 
 Key takeaways:
 
@@ -443,4 +443,4 @@ Congratulations! Now you have your own semantic search engine based on sparse en
 
 Here are two parameters:
 - **“model_id” (string)**: The ID of the model that will be used to generate tokens and weights from the query text. The model must be indexed in OpenSearch before it can be used in neural search. A sparse encoding model will expand the tokens from query text, while the tokenizer model will only generate the token inside the query text.
-- **“max_token_score” (float)**: An extra parameter required for performance optimization. Just like the common procedure of OpenSearch match query, the neural_sparse query is transformed to a Lucene BooleanQuery combining disjunction of term-level sub-queries. The difference is we use FeatureQuery instead of TermQuery for term here. Lucene leverages the WAND (Weak AND) algorithm for dynamic pruning, which skips non-competitive tokens based on their score upper bounds. However, FeatureQuery uses FLOAT.MAX_VALUE as the score upper bound, which makes WAND optimization ineffective. The parameter resets the upper bound of each token in this query, and the default value is FLOAT.MAX_VALUE, which is consistent with the origin FeatureQuery. Setting the value to “3.5” for the bi-encoder model and “2” for the document-only model can accelerate search without precision loss. After OpenSearch is upgraded to Lucene version 9.8, this parameter will be deprecated.
+- **“max_token_score” (float)**: An extra parameter required for performance optimization. Just like the common OpenSearch match query, the neural_sparse query is transformed to a Lucene BooleanQuery combining disjunction of term-level sub-queries. The difference is we use FeatureQuery instead of TermQuery for term here. Lucene leverages the WAND (Weak AND) algorithm for dynamic pruning, which skips non-competitive tokens based on their score upper bounds. However, FeatureQuery uses FLOAT.MAX_VALUE as the score upper bound, which makes WAND optimization ineffective. The parameter resets the upper bound of each token in this query, and the default value is FLOAT.MAX_VALUE, which is consistent with the origin FeatureQuery. Setting the value to “3.5” for the bi-encoder model and “2” for the document-only model can accelerate search without precision loss. After OpenSearch is upgraded to Lucene version 9.8, this parameter will be deprecated.

From f201f17c06984ae9f940ecda185136d7ccc9385d Mon Sep 17 00:00:00 2001
From: Fanit Kolchina <kolchfa@amazon.com>
Date: Thu, 7 Dec 2023 22:05:15 -0500
Subject: [PATCH 02/16] Doc rewrites

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 ...-retrieval-with-spade-semantic-encoders.md | 446 ----------------
 ...retrieval-with-sparse-semantic-encoders.md | 483 ++++++++++++++++++
 2 files changed, 483 insertions(+), 446 deletions(-)
 delete mode 100644 _posts/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders.md
 create mode 100644 _posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md

diff --git a/_posts/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders.md
deleted file mode 100644
index eebb582491..0000000000
--- a/_posts/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders.md
+++ /dev/null
@@ -1,446 +0,0 @@
----
-layout: post
-title:  Improving document retrieval with sparse semantic encoders
-authors:
-  - zhichaog
-  - xinyual
-  - dagney
-  - yych
-date: 2023-12-05 01:00:00 -0700
-categories:
-    - technical-posts
-meta_keywords: search relevance, neural sparse search, semantic search, semantic search with sparse encoders
-meta_description: Learn how the neural sparse framework in OpenSearch 2.11 can help you improve search relevance and optimize semantic searches with sparse encoders using just a few APIs.
-has_science_table: true
----
-
-In our previous [blog post](https://opensearch.org/blog/semantic-science-benchmarks), one finding shared was that zero-shot semantic search based on dense encoders will have challenges when being applied to scenarios with an unfamiliar corpus. This was highlighted with the [BEIR](https://github.com/beir-cellar/beir) benchmark, which consists of diverse retrieval tasks, in order to evaluate the “transferability” of a pretrained embedding model to unseen datasets.
-
-In this blog post, we will present Neural Sparse, our sparse semantic retrieval framework that is now the top-performing search method on the latest BEIR benchmark. You will learn about semantic search with sparse encoders as well as how to implement this method in OpenSearch with just a few API calls.
-
-## Sparse Encoder is now a better choice
-When using transformer-based encoders (e.g. BERT) in traditional dense text embedding, the output of each position in the response layer is translated into a vector, projecting the text into a semantic vector space where distance correlates to similarity in meaning. Neural sparse conducts the process in a novel way that makes the encoder “vote” for the most representative BERT tokens. The vocabulary being adopted (WordPiece) contains most daily used words and also various suffixes, including tense suffixes (for example, `-ed`, `-ing`,) and common word roots (for example, `-ate`, `-ion`). The vocabulary itself spans into a semantic space where all the documents can be regarded as sparse vectors.
-
-<table style="border:none">
-  <tr>
-    <td style="border:none">
-        <img src="/assets/media/blog-images/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders/embedding.png" />
-    </td>
-    <td style="border:none">
-        <img src="/assets/media/blog-images/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders/expand.png" />
-    </td>
-  </tr>
-  <tr>
-    <td colspan="2" style="border:none">
-        Figure 1: <b>Left:</b> words encoded in the dense vector sparse. <b>Right</b>: A typical result of sparse encoding.
-    </td>
-  </tr>
-</table>
-
-Searching with dense embedding presents challenges when facing “unfamiliar” content. In this case, the encoder will produce unpredictable embeddings, leading to bad relevance. That is also why in some BEIR datasets that contain strong domain knowledge, BM25 performs best. In these cases, sparse encoders will try to degenerate themselves into keyword-based matching, protecting the search result to be no worse than BM25. A relevance comparison is provided in **Table I**.
-
-In dense encoding, documents are usually represented as high-dimensional vectors; therefore, k-NN indexes need to be adopted in similarity search. On the contrary, the sparse encoding results are more similar to “term vectors” used by keyword-based matching; therefore, native Lucene indexes can be leveraged. Compared to k-NN indexes, sparse embeddings has the following advantages, leading to reduced costs: 1) Much smaller index size, 2) Reduced runtime RAM cost, and 3) Lower computation cost. The quantized comparison can be found in **Table II**.
-
-### Try extreme efficiency with document-only encoders
-There are two modes supported by Neural Sparse: 1) with bi-encoders and 2) with document-only encoders. Bi-encoder mode is outlined above, while in document-only mode search queries are tokenized instead of being passed to deep encoders. In this mode, the document encoders are trained to learn more synonym association in order to increase the recall. And by eliminating the online inference phase, computational resources can be saved and latency can also be reduced significantly. We can observe this in **Table II** by comparing “Neural Sparse Doc-only” with other solutions.
-
-## Neural Sparse Search outperforms in Benchmarking
-
-We have conducted benchmarking using a cluster containing 3 r5.8xlarge data nodes and 1 r5.12xlarge leader&ml node. First, all the evaluated methods are compared in terms of NCDG@10. We also compare the runtime speed of each method as well as the resource cost.
-
-Key takeaways:
-
-* Both bi-encoder and document-only mode generate the highest relevance on the BEIR benchmark, along with the Amazon ESCI dataset.
-* Without online inference, the search latency of document-only mode is comparable to BM25.
-* Neural sparse search have much smaller index size than dense encoding. A document-only encoder generates an index with 10.4% of dense encoding’s index size, while the number for a bi-encoder is 7.2%.
-* Dense encoding adopts k-NN retrieval and will have a 7.9% increase in RAM cost when search traffic received. Neural sparse search is based on native Lucene, and the RAM cost will not increase in runtime.
-
-
-The detailed results are presented in the following tables.
-
-<center><b>Table I.</b> Relevance comparison on <b>BEIR</b><sup>*</sup> benchmark and Amazon ESCI, in the term of both NDCG@10 and the rank.</center>
-
-<table>
-    <tr style="text\-align:center;">
-        <td></td>
-        <td colspan="2">BM25</td>
-        <td colspan="2">Dense(with TAS-B model)</td>
-        <td colspan="2">Hybrid(Dense + BM25)</td>
-        <td colspan="2">Neural Sparse Search bi-encoder</td>
-        <td colspan="2">Neural Sparse Search doc-only</td>
-    </tr>
-    <tr>
-        <td><b>Dataset</b></td>
-        <td><b>NDCG</b></td>
-        <td><b>Rank</b></td>
-        <td><b>NDCG</b></td>
-        <td><b>Rank</b></td>
-        <td><b>NDCG</b></td>
-        <td><b>Rank</b></td>
-        <td><b>NDCG</b></td>
-        <td><b>Rank</b></td>
-        <td><b>NDCG</b></td>
-        <td><b>Rank</b></td>
-    </tr>
-    <tr>
-        <td>Trec Covid</td>
-        <td>0.688</td>
-        <td>4</td>
-        <td>0.481</td>
-        <td>5</td>
-        <td>0.698</td>
-        <td>3</td>
-        <td>0.771</td>
-        <td>1</td>
-        <td>0.707</td>
-        <td>2</td>
-    </tr>
-    <tr>
-        <td>NFCorpus</td>
-        <td>0.327</td>
-        <td>4</td>
-        <td>0.319</td>
-        <td>5</td>
-        <td>0.335</td>
-        <td>3</td>
-        <td>0.36</td>
-        <td>1</td>
-        <td>0.352</td>
-        <td>2</td>
-    </tr>
-    <tr>
-        <td>NQ</td>
-        <td>0.326</td>
-        <td>5</td>
-        <td>0.463</td>
-        <td>3</td>
-        <td>0.418</td>
-        <td>4</td>
-        <td>0.553</td>
-        <td>1</td>
-        <td>0.521</td>
-        <td>2</td>
-    </tr>
-    <tr>
-        <td>HotpotQA</td>
-        <td>0.602</td>
-        <td>4</td>
-        <td>0.579</td>
-        <td>5</td>
-        <td>0.636</td>
-        <td>3</td>
-        <td>0.697</td>
-        <td>1</td>
-        <td>0.677</td>
-        <td>2</td>
-    </tr>
-    <tr>
-        <td>FiQA</td>
-        <td>0.254</td>
-        <td>5</td>
-        <td>0.3</td>
-        <td>4</td>
-        <td>0.322</td>
-        <td>3</td>
-        <td>0.376</td>
-        <td>1</td>
-        <td>0.344</td>
-        <td>2</td>
-    </tr>
-    <tr>
-        <td>ArguAna</td>
-        <td>0.472</td>
-        <td>2</td>
-        <td>0.427</td>
-        <td>4</td>
-        <td>0.378</td>
-        <td>5</td>
-        <td>0.508</td>
-        <td>1</td>
-        <td>0.461</td>
-        <td>3</td>
-    </tr>
-    <tr>
-        <td>Touche</td>
-        <td>0.347</td>
-        <td>1</td>
-        <td>0.162</td>
-        <td>5</td>
-        <td>0.313</td>
-        <td>2</td>
-        <td>0.278</td>
-        <td>4</td>
-        <td>0.294</td>
-        <td>3</td>
-    </tr>
-    <tr>
-        <td>DBPedia</td>
-        <td>0.287</td>
-        <td>5</td>
-        <td>0.383</td>
-        <td>4</td>
-        <td>0.387</td>
-        <td>3</td>
-        <td>0.447</td>
-        <td>1</td>
-        <td>0.412</td>
-        <td>2</td>
-    </tr>
-    <tr>
-        <td>SCIDOCS</td>
-        <td>0.165</td>
-        <td>2</td>
-        <td>0.149</td>
-        <td>5</td>
-        <td>0.174</td>
-        <td>1</td>
-        <td>0.164</td>
-        <td>3</td>
-        <td>0.154</td>
-        <td>4</td>
-    </tr>
-    <tr>
-        <td>FEVER</td>
-        <td>0.649</td>
-        <td>5</td>
-        <td>0.697</td>
-        <td>4</td>
-        <td>0.77</td>
-        <td>2</td>
-        <td>0.821</td>
-        <td>1</td>
-        <td>0.743</td>
-        <td>3</td>
-    </tr>
-    <tr>
-        <td>Climate FEVER</td>
-        <td>0.186</td>
-        <td>5</td>
-        <td>0.228</td>
-        <td>3</td>
-        <td>0.251</td>
-        <td>2</td>
-        <td>0.263</td>
-        <td>1</td>
-        <td>0.202</td>
-        <td>4</td>
-    </tr>
-    <tr>
-        <td>SciFact</td>
-        <td>0.69</td>
-        <td>3</td>
-        <td>0.643</td>
-        <td>5</td>
-        <td>0.672</td>
-        <td>4</td>
-        <td>0.723</td>
-        <td>1</td>
-        <td>0.716</td>
-        <td>2</td>
-    </tr>
-    <tr>
-        <td>Quora</td>
-        <td>0.789</td>
-        <td>4</td>
-        <td>0.835</td>
-        <td>3</td>
-        <td>0.864</td>
-        <td>1</td>
-        <td>0.856</td>
-        <td>2</td>
-        <td>0.788</td>
-        <td>5</td>
-    </tr>
-    <tr>
-        <td>Amazon ESCI</td>
-        <td>0.081</td>
-        <td>3</td>
-        <td>0.071</td>
-        <td>5</td>
-        <td>0.086</td>
-        <td>2</td>
-        <td>0.077</td>
-        <td>4</td>
-        <td>0.095</td>
-        <td>1</td>
-    </tr>
-    <tr>
-        <td>Average</td>
-        <td>0.419</td>
-        <td>3.71</td>
-        <td>0.41</td>
-        <td>4.29</td>
-        <td>0.45</td>
-        <td>2.71</td>
-        <td>0.492</td>
-        <td>1.64</td>
-        <td>0.462</td>
-        <td>2.64</td>
-    </tr>
-</table>
-
-***BEIR** is short for Benchmarking Information Retrieval, check our its [Github](https://github.com/beir-cellar/beir) page.
-
-<center><b>Table II.</b>Speed Comparison, in the term of latency and throughput</center>
-
-|	                        | BM25          | Dense (with TAS-B model)  | Neural Sparse Search bi-encoder | Neural Sparse Search doc-only  |
-|---------------------------|---------------|---------------------------| ------------------------------- | ------------------------------ |
-| P50 latency (ms)          | 8ms	        | 56.6ms	                |176.3ms	                      | 10.2ms	|
-| P90 latency (ms)          | 12.4ms	    | 71.12ms	                |267.3ms	                      | 15.2ms	|
-| P99 Latency (ms)          | 18.9ms	    | 86.8ms	                |383.5ms	                      | 22ms	|
-| Max throughput (op/s)	    | 2215.8op/s	| 318.5op/s	                |107.4op/s	                      | 1797.9op/s	|
-| Mean throughput (op/s)	| 2214.6op/s	| 298.2op/s	                |106.3op/s	                      | 1790.2op/s	|
-
-
-*The latencies were tested on a subset of MSMARCO v2, with in total 1M documents. We used 20 clients to loop search requests to get the latency data.
-
-<center><b>Table III.</b>Capacity consumption comparison</center>
-
-|	|BM25	|Dense (with TAS-B model)	|Neural Sparse Search Bi-encoder	| Neural Sparse Search Doc-only	|
-|-|-|-|-|-|
-|Index size	|1 GB	|65.4 GB	|4.7 GB	|6.8 GB	|
-|RAM usage	|480.74 GB	|675.36 GB	|480.64 GB	|494.25 GB	|
-|Runtime RAM delta	|+0.01 GB	|+53.34 GB	|+0.06 GB	|+0.03 GB	|
-
-*We performed this experiment using the full dataset of MSMARCO v2, with 8.8M passages. We excluded all _source fields for all methods and force merged the index before measuring index size. We set the heap size of the OpenSearch JVM to half the node RAM, so an empty OpenSearch cluster also consumes close to 480 GB of memory.
-
-## Build your search engine in five steps
-
-Several pretrained encoder models are published in the OpenSearch model repository. As the state-of-the-art of BEIR benchmark, they are already available for out-of-the-box use, reducing fine-tuning effort. You can follow these three steps to build your search engine:
-
-1. **Prerequisites**: To run the following simple cases in the cluster, change the settings:
-
-    ```
-    PUT /_cluster/settings
-    {
-        "transient" : {
-        "plugins.ml_commons.allow_registering_model_via_url" : true,
-        "plugins.ml_commons.only_run_on_ml_node" : false,
-        "plugins.ml_commons.native_memory_threshold" : 99
-        }
-    }
-    ```
-
-    **allow_registering_model_via_url** is required to be true because you need to register your pretrained model by URL. Set **only_run_on_ml_node** to false if you don’t have a machine learning (ML) node on your cluster.
-2. **Deploy encoders**: The ML Commons plugin supports deploying pretrained models via URL. Taking `opensearch-neural-sparse-encoding` as an example, you can deploy the encoder via this API:
-
-    ```
-    POST /_plugins/_ml/models/_register?deploy=true
-    {
-        "name": "opensearch-neural-sparse-encoding",
-        "version": "1.0.0",
-        "description": "opensearch-neural-sparse-encoding",
-        "model_format": "TORCH_SCRIPT",
-        "function_name": "SPARSE_ENCODING",
-        "model_content_hash_value": "d1ebaa26615090bdb0195a62b180afd2a8524c68c5d406a11ad787267f515ea8",
-        "url": "https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip"
-        }
-    ```
-
-    After that, you will get the task_id in your response:
-
-    ```
-    {
-        "task_id": "<task_id>",
-        "status": "CREATED"
-    }
-    ```
-
-    Use task_id to search register model task like:
-
-    ```
-    GET /_plugins/_ml/tasks/<task_id>
-    ```
-
-    You can get register model task information. The state will change. After the state is completed, you can get the model_id like::
-
-    ```
-    {
-        "model_id": "<model_id>",
-        "task_type": "REGISTER_MODEL",
-        "function_name": "SPARSE_TOKENIZE",
-        "state": "COMPLETED",
-        "worker_node": [
-            "wubXZX7xTIC7RW2z8nzhzw"
-        ],
-        "create_time": 1701390988405,
-        "last_update_time": 1701390993724,
-        "is_async": true
-    }
-    ```
-
-3. **Set up the ingestion process**: Each document should be encoded into sparse vectors before being indexed. In OpenSearch, this procedure is implemented by an ingestion processor. You can create the ingestion pipeline using this API:
-
-    ```
-    PUT /_ingest/pipeline/neural-sparse-pipeline
-    {
-        "description": "An example neural sparse encoding pipeline",
-        "processors" : [
-            {
-                "sparse_encoding": {
-                    "model_id": "<model_id>",
-                    "field_map": {
-                    "passage_text": "passage_embedding"
-                    }
-                }
-            }
-        ]
-    }
-    ```
-
-4. **Set up index mapping**: Neural search leverages the `rank_features` field type for indexing, such that the token weights can be stored. The index will use the above ingestion processor to embed text. The index can be created as follows:
-
-    ```
-    PUT /my-neural-sparse-index
-    {
-        "settings": {
-            "default_pipeline": "neural-sparse-pipeline"
-        },
-        "mappings": {
-            "properties": {
-                "passage_embedding": {
-                    "type": "rank_features"
-                },
-                "passage_text": {
-                    "type": "text"
-                }
-            }
-        }
-    }
-    ```
-
-5. **Ingest documents with the ingestion processor**: After setting index, customer can put doc. Customer provide text field while processor will automatically transfer text content into embedding vector and put it into  `rank_features` field according the `field_map` in the processor:
-
-    ```
-    PUT /my-neural-sparse-index/_doc/
-    {
-        "passage_text": "Hello world"
-    }
-    ```
-
-### Model selection
-
-Neural sparse has two working modes: bi-encoder and document-only. For bi-encoder mode, we recommend using the pretrained model named “opensearch-neural-sparse-encoding-v1”, while both online search and offline ingestion share the same model file. For document-only mode, we recommended using the pretrained model “opensearch-neural-sparse-encoding-doc-v1” for the ingestion processor and using the model “opensearch-neural-sparse-tokenizer-v1” to implement online query tokenization. Altough presented as a “ml-commons” model, “opensearch-neural-sparse-tokenizer-v1” only translates the query into tokens without any model inference. All the models are published [here](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/).
-
-### **Try your engine with a query clause**
-
-Congratulations! Now you have your own semantic search engine based on sparse encoders. To try a sample query, we can invoke the `_search` endpoint using the `neural_sparse` clause in query DSL:
-
-```
- GET /my-neural-sparse-index/_search/
- {
-    "query": {
-        "neural_sparse": {
-            "passage_embedding": {
-                "query_text": "Hello world a b",
-                "model_id": "<model_id>",
-                "max_token_score": 2.0
-            }
-        }
-    }
-}
-```
-
-Here are two parameters:
-- **“model_id” (string)**: The ID of the model that will be used to generate tokens and weights from the query text. The model must be indexed in OpenSearch before it can be used in neural search. A sparse encoding model will expand the tokens from query text, while the tokenizer model will only generate the token inside the query text.
-- **“max_token_score” (float)**: An extra parameter required for performance optimization. Just like the common OpenSearch match query, the neural_sparse query is transformed to a Lucene BooleanQuery combining disjunction of term-level sub-queries. The difference is we use FeatureQuery instead of TermQuery for term here. Lucene leverages the WAND (Weak AND) algorithm for dynamic pruning, which skips non-competitive tokens based on their score upper bounds. However, FeatureQuery uses FLOAT.MAX_VALUE as the score upper bound, which makes WAND optimization ineffective. The parameter resets the upper bound of each token in this query, and the default value is FLOAT.MAX_VALUE, which is consistent with the origin FeatureQuery. Setting the value to “3.5” for the bi-encoder model and “2” for the document-only model can accelerate search without precision loss. After OpenSearch is upgraded to Lucene version 9.8, this parameter will be deprecated.
diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
new file mode 100644
index 0000000000..b54e4975e6
--- /dev/null
+++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
@@ -0,0 +1,483 @@
+---
+layout: post
+title:  Improving document retrieval with sparse semantic encoders
+authors:
+  - zhichaog
+  - xinyual
+  - dagney
+  - yych
+  - kolchfa
+date: 2023-12-05 01:00:00 -0700
+categories:
+    - technical-posts
+meta_keywords: search relevance, neural sparse search, semantic search, semantic search with sparse encoders
+meta_description: Learn how the neural sparse framework in OpenSearch 2.11 can help you improve search relevance and optimize semantic searches with sparse encoders using just a few APIs.
+has_science_table: true
+---
+
+OpenSearch 2.11 introduced neural sparse search---a new, efficient way of semantic retrieval. In this blog post, you'll learn about using sparse encoders for semantic search. You'll find that neural sparse search reduces costs, performs faster, and improves search relevance. We're excited to share benchmarking results that show why neural sparse search is now the top-performing search method. You can even try it out by building your own search engine in just five steps. For a TLDR on benchmarking learnings, see [Key takeaways](#here-are-the-key-takeaways).
+
+## What are dense and sparse vector embeddings?
+
+When you use a transformer-based encoder, such as BERT, to generate traditional dense vector embeddings, the encoder translates each word into a vector. Collectively, these vectors make up a semantic vector space. In this space, the closer the vectors are, the more similar the words are in meaning.
+
+In sparse encoding, the encoder takes the text and creates a list of tokens that have similar semantic meaning. The model vocabulary ([WordPiece](https://huggingface.co/learn/nlp-course/chapter6/6?fw=pt)) contains most commonly used words along with various tense endings (for example, `-ed` and `-ing`) and suffixes (for example, `-ate` and `-ion`). You can think of the vocabulary as a semantic space where each document is a sparse vector.
+
+The following images show example results of dense and sparse encoding.
+
+<table style="border:none">
+  <tr>
+    <td style="border:none">
+        <img height="100%" src="/assets/media/blog-images/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders/embedding.png" />
+    </td>
+    <td style="border:none">
+        <img height="100%" src="/assets/media/blog-images/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders/expand.png" />
+    </td>
+  </tr>
+</table>
+
+_**Left**: Dense vector semantic space. **Right**: Sparse vector semantic space._
+
+## Sparse encoders use more efficient data structures
+
+In dense encoding, documents are represented as high-dimensional vectors. To search these documents, you need to use a k-NN index as an underlying data structure. On the other hand, sparse search can use a native Lucene index because sparse encodings are similar to term vectors used by keyword-based matching. 
+
+Compared to k-NN indexes, **sparse embeddings have the following cost-reducing advantages**: 
+
+1. Much smaller index size.
+1. Reduced runtime RAM cost.
+1. Lower computational cost. 
+
+For a detailed comparison, see [Table II](#table-ii-speed-comparison-in-terms-of-latency-and-throughput).
+
+## Sparse encoders perform better on unfamiliar datasets
+
+In our previous [blog post](https://opensearch.org/blog/semantic-science-benchmarks), we mentioned that searching with dense embeddings presents challenges when encoders are facing unfamiliar content. When an encoder trained on one dataset is used on a different dataset, the encoder often produces unpredictable embeddings, resulting in poor search result relevance. 
+
+Often, BM25 performs better than dense encoders on BEIR datasets that incorporate strong domain knowledge. In these cases, sparse encoders fall back on keyword-based matching, ensuring that their search results are no worse than BM25 ones. For a comparison of search result relevance benchmarks, see [Table I](#benchmarking-results).
+
+## Out of sparse encoders, document-only encoders are the most efficient
+
+You can run a neural sparse search in two modes: **bi-encoder** and **document-only**.
+
+In bi-encoder mode, both documents and search queries are passed through deep encoders. In document-only mode, documents are still passed through deep encoders, but search queries are instead tokenized. In this mode, document encoders are trained to learn more synonym association in order to increase recall. By eliminating the online inference phase, you can **save computational resources** and **significantly reduce latency**. For benchmarks, compare the `Neural sparse doc-only` column with other columns in [Table II](#table-ii-speed-comparison-in-terms-of-latency-and-throughput). 
+
+## Neural sparse search outperforms other search methods in benchmarking tests
+
+For benchmarking, we used a cluster containing 3 `r5.8xlarge` data nodes and 1 `r5.12xlarge` leader/ML node. We evaluated search relevance for all evaluated search methods in terms of NCDG@10. Additionally, we compared the runtime speed and the resource cost of each method.
+
+### Here are the key takeaways:
+
+* Both bi-encoder and document-only modes provide the highest relevance on the BEIR and Amazon ESCI datasets.
+* Without online inference, the search latency of document-only mode is comparable to BM25.
+* Sparse encoding results in a much smaller index size than dense encoding. The size of an index a document-only sparse encoder generates is **10.4%** of a dense encoding index size. For a bi-encoder, the index size is **7.2%** of a dense encoding index size.
+* Dense encoding uses k-NN retrieval and incurs a 7.9% increase in RAM cost at search time. Neural sparse search uses a native Lucene index, so the RAM cost does not increase at search time.
+
+## Benchmarking results
+
+The benchmarking results are presented in the following tables.
+
+### Table I. Relevance comparison on BEIR<sup>*</sup> benchmark and Amazon ESCI, in terms of NDCG@10 and  rank.
+
+<table>
+    <tr style="text\-align:center;">
+        <td></td>
+        <td colspan="2">BM25</td>
+        <td colspan="2">Dense(with TAS-B model)</td>
+        <td colspan="2">Hybrid(Dense + BM25)</td>
+        <td colspan="2">Neural sparse search bi-encoder</td>
+        <td colspan="2">Neural sparse search doc-only</td>
+    </tr>
+    <tr>
+        <td><b>Dataset</b></td>
+        <td><b>NDCG</b></td>
+        <td><b>Rank</b></td>
+        <td><b>NDCG</b></td>
+        <td><b>Rank</b></td>
+        <td><b>NDCG</b></td>
+        <td><b>Rank</b></td>
+        <td><b>NDCG</b></td>
+        <td><b>Rank</b></td>
+        <td><b>NDCG</b></td>
+        <td><b>Rank</b></td>
+    </tr>
+    <tr>
+        <td>Trec Covid</td>
+        <td>0.688</td>
+        <td>4</td>
+        <td>0.481</td>
+        <td>5</td>
+        <td>0.698</td>
+        <td>3</td>
+        <td>0.771</td>
+        <td>1</td>
+        <td>0.707</td>
+        <td>2</td>
+    </tr>
+    <tr>
+        <td>NFCorpus</td>
+        <td>0.327</td>
+        <td>4</td>
+        <td>0.319</td>
+        <td>5</td>
+        <td>0.335</td>
+        <td>3</td>
+        <td>0.36</td>
+        <td>1</td>
+        <td>0.352</td>
+        <td>2</td>
+    </tr>
+    <tr>
+        <td>NQ</td>
+        <td>0.326</td>
+        <td>5</td>
+        <td>0.463</td>
+        <td>3</td>
+        <td>0.418</td>
+        <td>4</td>
+        <td>0.553</td>
+        <td>1</td>
+        <td>0.521</td>
+        <td>2</td>
+    </tr>
+    <tr>
+        <td>HotpotQA</td>
+        <td>0.602</td>
+        <td>4</td>
+        <td>0.579</td>
+        <td>5</td>
+        <td>0.636</td>
+        <td>3</td>
+        <td>0.697</td>
+        <td>1</td>
+        <td>0.677</td>
+        <td>2</td>
+    </tr>
+    <tr>
+        <td>FiQA</td>
+        <td>0.254</td>
+        <td>5</td>
+        <td>0.3</td>
+        <td>4</td>
+        <td>0.322</td>
+        <td>3</td>
+        <td>0.376</td>
+        <td>1</td>
+        <td>0.344</td>
+        <td>2</td>
+    </tr>
+    <tr>
+        <td>ArguAna</td>
+        <td>0.472</td>
+        <td>2</td>
+        <td>0.427</td>
+        <td>4</td>
+        <td>0.378</td>
+        <td>5</td>
+        <td>0.508</td>
+        <td>1</td>
+        <td>0.461</td>
+        <td>3</td>
+    </tr>
+    <tr>
+        <td>Touche</td>
+        <td>0.347</td>
+        <td>1</td>
+        <td>0.162</td>
+        <td>5</td>
+        <td>0.313</td>
+        <td>2</td>
+        <td>0.278</td>
+        <td>4</td>
+        <td>0.294</td>
+        <td>3</td>
+    </tr>
+    <tr>
+        <td>DBPedia</td>
+        <td>0.287</td>
+        <td>5</td>
+        <td>0.383</td>
+        <td>4</td>
+        <td>0.387</td>
+        <td>3</td>
+        <td>0.447</td>
+        <td>1</td>
+        <td>0.412</td>
+        <td>2</td>
+    </tr>
+    <tr>
+        <td>SCIDOCS</td>
+        <td>0.165</td>
+        <td>2</td>
+        <td>0.149</td>
+        <td>5</td>
+        <td>0.174</td>
+        <td>1</td>
+        <td>0.164</td>
+        <td>3</td>
+        <td>0.154</td>
+        <td>4</td>
+    </tr>
+    <tr>
+        <td>FEVER</td>
+        <td>0.649</td>
+        <td>5</td>
+        <td>0.697</td>
+        <td>4</td>
+        <td>0.77</td>
+        <td>2</td>
+        <td>0.821</td>
+        <td>1</td>
+        <td>0.743</td>
+        <td>3</td>
+    </tr>
+    <tr>
+        <td>Climate FEVER</td>
+        <td>0.186</td>
+        <td>5</td>
+        <td>0.228</td>
+        <td>3</td>
+        <td>0.251</td>
+        <td>2</td>
+        <td>0.263</td>
+        <td>1</td>
+        <td>0.202</td>
+        <td>4</td>
+    </tr>
+    <tr>
+        <td>SciFact</td>
+        <td>0.69</td>
+        <td>3</td>
+        <td>0.643</td>
+        <td>5</td>
+        <td>0.672</td>
+        <td>4</td>
+        <td>0.723</td>
+        <td>1</td>
+        <td>0.716</td>
+        <td>2</td>
+    </tr>
+    <tr>
+        <td>Quora</td>
+        <td>0.789</td>
+        <td>4</td>
+        <td>0.835</td>
+        <td>3</td>
+        <td>0.864</td>
+        <td>1</td>
+        <td>0.856</td>
+        <td>2</td>
+        <td>0.788</td>
+        <td>5</td>
+    </tr>
+    <tr>
+        <td>Amazon ESCI</td>
+        <td>0.081</td>
+        <td>3</td>
+        <td>0.071</td>
+        <td>5</td>
+        <td>0.086</td>
+        <td>2</td>
+        <td>0.077</td>
+        <td>4</td>
+        <td>0.095</td>
+        <td>1</td>
+    </tr>
+    <tr>
+        <td>Average</td>
+        <td>0.419</td>
+        <td>3.71</td>
+        <td>0.41</td>
+        <td>4.29</td>
+        <td>0.45</td>
+        <td>2.71</td>
+        <td>0.492</td>
+        <td>1.64</td>
+        <td>0.462</td>
+        <td>2.64</td>
+    </tr>
+</table>
+
+<sup>*</sup> BEIR is short for Benchmarking Information Retrieval. For more information, see [the BEIR GitHub page](https://github.com/beir-cellar/beir).
+
+### Table II. Speed comparison, in terms of latency and throughput.
+
+|	                        | BM25          | Dense (with TAS-B model)  | Neural sparse search bi-encoder | Neural sparse search doc-only  |
+|---------------------------|---------------|---------------------------| ------------------------------- | ------------------------------ |
+| P50 latency (ms)          | 8 ms	        | 56.6 ms	                |176.3 ms	                      | 10.2ms	|
+| P90 latency (ms)          | 12.4 ms	    | 71.12 ms	                |267.3 ms	                      | 15.2ms	|
+| P99 Latency (ms)          | 18.9 ms	    | 86.8 ms	                |383.5 ms	                      | 22ms	|
+| Max throughput (op/s)	    | 2215.8 op/s	| 318.5 op/s	                |107.4 op/s	                      | 1797.9 op/s	|
+| Mean throughput (op/s)	| 2214.6 op/s	| 298.2 op/s	                |106.3 op/s	                      | 1790.2 op/s	|
+
+
+<sup>*</sup> We tested latency on a subset of MSMARCO v2, with 1M documents in total. To obtain latency data, we used 20 clients to loop search requests.
+
+### Table III. Capacity consumption comparison
+
+|	|BM25	|Dense (with TAS-B model)	|Neural sparse search bi-encoder	| Neural sparse search doc-only	|
+|-|-|-|-|-|
+|Index size	|1 GB	|65.4 GB	|4.7 GB	|6.8 GB	|
+|RAM usage	|480.74 GB	|675.36 GB	|480.64 GB	|494.25 GB	|
+|Runtime RAM delta	|+0.01 GB	|+53.34 GB	|+0.06 GB	|+0.03 GB	|
+
+<sup>*</sup> We performed this experiment using the full MSMARCO v2 dataset, with 8.8M passages. For all methods, we excluded the `_source` fields and force merged the index before measuring index size. We set the heap size of the OpenSearch JVM to half of the node RAM, so an empty OpenSearch cluster still consumed close to 480 GB of memory.
+
+## Build your search engine in five steps
+
+OpenSearch provides several pretrained encoder models that you can use out-of-the-box without fine-tuning. For more information about sparse encoding models provided by OpenSearch, see [Sparse encoding models](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sparse-encoding-models).
+
+Follow these steps to build your search engine:
+
+1. **Prerequisites**: For this simple setup, update the following cluster settings:
+
+    ```json
+    PUT /_cluster/settings
+    {
+        "transient": {
+            "plugins.ml_commons.allow_registering_model_via_url": true,
+            "plugins.ml_commons.only_run_on_ml_node": false,
+            "plugins.ml_commons.native_memory_threshold": 99
+        }
+    }
+    ```
+
+    For more information about ML-related cluster settings, see [ML Commons cluster settings](https://opensearch.org/docs/latest/ml-commons-plugin/cluster-settings/).
+2. **Deploy encoders**: The ML Commons plugin supports deploying pretrained models using a URL. For this example, you'll deploy the `opensearch-neural-sparse-encoding` encoder:
+
+    ```json
+    POST /_plugins/_ml/models/_register?deploy=true
+    {
+        "name": "opensearch-neural-sparse-encoding",
+        "version": "1.0.0",
+        "description": "opensearch-neural-sparse-encoding",
+        "model_format": "TORCH_SCRIPT",
+        "function_name": "SPARSE_ENCODING",
+        "model_content_hash_value": "d1ebaa26615090bdb0195a62b180afd2a8524c68c5d406a11ad787267f515ea8",
+        "url": "https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip"
+    }
+    ```
+
+    OpenSearch responds with a `task_id`:
+
+    ```json
+    {
+        "task_id": "<task_id>",
+        "status": "CREATED"
+    }
+    ```
+
+    Use the `task_id` to check the status of the task:
+
+    ```json
+    GET /_plugins/_ml/tasks/<task_id>
+    ```
+
+    Once the task is complete, the task state changes `COMPLETED` and OpenSearch returns the `model_id` for the deployed model:
+
+    ```json
+    {
+        "model_id": "<model_id>",
+        "task_type": "REGISTER_MODEL",
+        "function_name": "SPARSE_TOKENIZE",
+        "state": "COMPLETED",
+        "worker_node": [
+            "wubXZX7xTIC7RW2z8nzhzw"
+        ],
+        "create_time": 1701390988405,
+        "last_update_time": 1701390993724,
+        "is_async": true
+    }
+    ```
+
+3. **Set up ingestion**: In OpenSearch, a `sparse_encoding` ingest processor encodes documents into sparse vectors before indexing them. Create an ingest pipeline as follows:
+
+    ```json
+    PUT /_ingest/pipeline/neural-sparse-pipeline
+    {
+        "description": "An example neural sparse encoding pipeline",
+        "processors" : [
+            {
+                "sparse_encoding": {
+                    "model_id": "<model_id>",
+                    "field_map": {
+                        "passage_text": "passage_embedding"
+                    }
+                }
+            }
+        ]
+    }
+    ```
+
+4. **Set up index mapping**: Neural search uses the `rank_features` field type to store token weights when documents are indexed. The index will use the ingest pipeline you created to generate text embeddings. Create the index as follows:
+
+    ```json
+    PUT /my-neural-sparse-index
+    {
+        "settings": {
+            "default_pipeline": "neural-sparse-pipeline"
+        },
+        "mappings": {
+            "properties": {
+                "passage_embedding": {
+                    "type": "rank_features"
+                },
+                "passage_text": {
+                    "type": "text"
+                }
+            }
+        }
+    }
+    ```
+
+5. **Ingest documents using the ingest pipeline**: After creating the index, you can ingest documents into it. When you index a text field, the ingest processor converts text into a vector embedding and stores it in the `passage_embedding` field specified in the processor:
+
+    ```json
+    PUT /my-neural-sparse-index/_doc/
+    {
+        "passage_text": "Hello world"
+    }
+    ```
+
+**Try your engine with a query clause**
+
+Congratulations! You've now created your own semantic search engine based on sparse encoders. To try a sample query, invoke the `_search` endpoint using the `neural_sparse` query:
+
+```json
+ GET /my-neural-sparse-index/_search/
+ {
+    "query": {
+        "neural_sparse": {
+            "passage_embedding": {
+                "query_text": "Hello world a b",
+                "model_id": "<model_id>",
+                "max_token_score": 2.0
+            }
+        }
+    }
+}
+```
+
+### The neural sparse query parameters
+
+The `neural_sparse` query supports two parameters:
+
+- `model_id` (String): The ID of the model that is used to generate tokens and weights from the query text. A sparse encoding model will expand the tokens from query text, while the tokenizer model will only tokenize the query text itself.
+- `max_token_score` (Float): An extra parameter required for performance optimization. Just like the OpenSearch `match` query, the `neural_sparse` query is transformed to a Lucene BooleanQuery, combining term-level subqueries using disjunction. The difference is that for the neural sparse query, we use FeatureQuery instead of TermQuery to match the terms. Lucene employs the WAND (Weak AND) algorithm for dynamic pruning, which skips non-competitive tokens based on their score upper bounds. However, FeatureQuery uses `FLOAT.MAX_VALUE` as the score upper bound, which makes the WAND optimization ineffective. The `max_token_score` parameter resets the score upper bound for each token in a query, which is consistent with the original FeatureQuery. Thus, setting the value to 3.5 for the bi-encoder model and 2 for the document-only model can accelerate search without precision loss. After OpenSearch is upgraded to Lucene version 9.8, this parameter will be deprecated.
+
+## Selecting a model
+
+Use the following recommendations to help you select a sparse encoder model:
+
+- For **bi-encoder** mode, we recommend using the `opensearch-neural-sparse-encoding-v1` pretrained model. For this model, both online search and offline ingestion share the same model file. 
+
+- For **document-only** mode, we recommended using the `opensearch-neural-sparse-encoding-doc-v1` pretrained model for ingestion and the `opensearch-neural-sparse-tokenizer-v1` model at search time to implement online query tokenization. This model does not employ model inference and only translates the query into tokens. 
+
+For more information, see [Sparse encoding models](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sparse-encoding-models).
+
+## Next steps
+
+- For more information about neural sparse search, see [Neural sparse search](https://opensearch.org/docs/latest/search-plugins/neural-sparse-search/). 
+- For an OpenSearch end-to-end neural search tutorial, see [Neural search tutorial](https://opensearch.org/docs/latest/search-plugins/neural-search-tutorial/). 
+- For a list of all search methods OpenSearch supports, see [Search methods](https://opensearch.org/docs/latest/search-plugins/index/#search-methods).
+- Give us your feedback on the [OpenSearch Forum](https://forum.opensearch.org/).
\ No newline at end of file

From 7e3815b725198d69ae57be2b444f3b2967dd13f3 Mon Sep 17 00:00:00 2001
From: Fanit Kolchina <kolchfa@amazon.com>
Date: Thu, 7 Dec 2023 23:12:17 -0500
Subject: [PATCH 03/16] Minor rewording

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 ...oving-document-retrieval-with-sparse-semantic-encoders.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
index b54e4975e6..98a75eb82b 100644
--- a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
+++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
@@ -325,8 +325,6 @@ The benchmarking results are presented in the following tables.
 
 ## Build your search engine in five steps
 
-OpenSearch provides several pretrained encoder models that you can use out-of-the-box without fine-tuning. For more information about sparse encoding models provided by OpenSearch, see [Sparse encoding models](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sparse-encoding-models).
-
 Follow these steps to build your search engine:
 
 1. **Prerequisites**: For this simple setup, update the following cluster settings:
@@ -467,13 +465,14 @@ The `neural_sparse` query supports two parameters:
 
 ## Selecting a model
 
+OpenSearch provides several pretrained encoder models that you can use out-of-the-box without fine-tuning. For a list of sparse encoding models provided by OpenSearch, see [Sparse encoding models](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sparse-encoding-models).
+
 Use the following recommendations to help you select a sparse encoder model:
 
 - For **bi-encoder** mode, we recommend using the `opensearch-neural-sparse-encoding-v1` pretrained model. For this model, both online search and offline ingestion share the same model file. 
 
 - For **document-only** mode, we recommended using the `opensearch-neural-sparse-encoding-doc-v1` pretrained model for ingestion and the `opensearch-neural-sparse-tokenizer-v1` model at search time to implement online query tokenization. This model does not employ model inference and only translates the query into tokens. 
 
-For more information, see [Sparse encoding models](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sparse-encoding-models).
 
 ## Next steps
 

From e50effcc403d04f85f3a637f828ea58f63a84e95 Mon Sep 17 00:00:00 2001
From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Date: Fri, 8 Dec 2023 08:44:26 -0500
Subject: [PATCH 04/16] Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
---
 ...retrieval-with-sparse-semantic-encoders.md | 36 +++++++++----------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
index 98a75eb82b..5d6cb396ef 100644
--- a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
+++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
@@ -15,7 +15,7 @@ meta_description: Learn how the neural sparse framework in OpenSearch 2.11 can h
 has_science_table: true
 ---
 
-OpenSearch 2.11 introduced neural sparse search---a new, efficient way of semantic retrieval. In this blog post, you'll learn about using sparse encoders for semantic search. You'll find that neural sparse search reduces costs, performs faster, and improves search relevance. We're excited to share benchmarking results that show why neural sparse search is now the top-performing search method. You can even try it out by building your own search engine in just five steps. For a TLDR on benchmarking learnings, see [Key takeaways](#here-are-the-key-takeaways).
+OpenSearch 2.11 introduced neural sparse search---a new efficient method of semantic retrieval. In this blog post, you'll learn about using sparse encoders for semantic search. You'll find that neural sparse search reduces costs, performs faster, and improves search relevance. We're excited to share benchmarking results that show why neural sparse search is now the top-performing search method. You can even try it out by building your own search engine in just five steps. For a TLDR on benchmarking learnings, see [Key takeaways](#here-are-the-key-takeaways).
 
 ## What are dense and sparse vector embeddings?
 
@@ -68,9 +68,9 @@ For benchmarking, we used a cluster containing 3 `r5.8xlarge` data nodes and 1 `
 
 ### Here are the key takeaways:
 
-* Both bi-encoder and document-only modes provide the highest relevance on the BEIR and Amazon ESCI datasets.
+* Both modes provide the highest relevance on the BEIR and Amazon ESCI datasets.
 * Without online inference, the search latency of document-only mode is comparable to BM25.
-* Sparse encoding results in a much smaller index size than dense encoding. The size of an index a document-only sparse encoder generates is **10.4%** of a dense encoding index size. For a bi-encoder, the index size is **7.2%** of a dense encoding index size.
+* Sparse encoding results in a much smaller index size than dense encoding. A document-only sparse encoder generates an index that is **10.4%** of the size of a dense encoding index. For a bi-encoder, the index size is **7.2%** of the size of a dense encoding index.
 * Dense encoding uses k-NN retrieval and incurs a 7.9% increase in RAM cost at search time. Neural sparse search uses a native Lucene index, so the RAM cost does not increase at search time.
 
 ## Benchmarking results
@@ -83,10 +83,10 @@ The benchmarking results are presented in the following tables.
     <tr style="text\-align:center;">
         <td></td>
         <td colspan="2">BM25</td>
-        <td colspan="2">Dense(with TAS-B model)</td>
-        <td colspan="2">Hybrid(Dense + BM25)</td>
+        <td colspan="2">Dense (with TAS-B model)</td>
+        <td colspan="2">Hybrid (Dense + BM25)</td>
         <td colspan="2">Neural sparse search bi-encoder</td>
-        <td colspan="2">Neural sparse search doc-only</td>
+        <td colspan="2">Neural sparse search document-only</td>
     </tr>
     <tr>
         <td><b>Dataset</b></td>
@@ -298,11 +298,11 @@ The benchmarking results are presented in the following tables.
     </tr>
 </table>
 
-<sup>*</sup> BEIR is short for Benchmarking Information Retrieval. For more information, see [the BEIR GitHub page](https://github.com/beir-cellar/beir).
+<sup>*</sup> BEIR stands for Benchmarking Information Retrieval. For more information, see [the BEIR GitHub page](https://github.com/beir-cellar/beir).
 
-### Table II. Speed comparison, in terms of latency and throughput.
+### Table II. Speed comparison in terms of latency and throughput
 
-|	                        | BM25          | Dense (with TAS-B model)  | Neural sparse search bi-encoder | Neural sparse search doc-only  |
+|	                        | BM25          | Dense (with TAS-B model)  | Neural sparse search bi-encoder | Neural sparse search document-only  |
 |---------------------------|---------------|---------------------------| ------------------------------- | ------------------------------ |
 | P50 latency (ms)          | 8 ms	        | 56.6 ms	                |176.3 ms	                      | 10.2ms	|
 | P90 latency (ms)          | 12.4 ms	    | 71.12 ms	                |267.3 ms	                      | 15.2ms	|
@@ -311,17 +311,17 @@ The benchmarking results are presented in the following tables.
 | Mean throughput (op/s)	| 2214.6 op/s	| 298.2 op/s	                |106.3 op/s	                      | 1790.2 op/s	|
 
 
-<sup>*</sup> We tested latency on a subset of MSMARCO v2, with 1M documents in total. To obtain latency data, we used 20 clients to loop search requests.
+<sup>*</sup> We tested latency on a subset of MS MARCO v2 containing 1M documents in total. To obtain latency data, we used 20 clients to loop search requests.
 
 ### Table III. Capacity consumption comparison
 
-|	|BM25	|Dense (with TAS-B model)	|Neural sparse search bi-encoder	| Neural sparse search doc-only	|
+|	|BM25	|Dense (with TAS-B model)	|Neural sparse search bi-encoder	| Neural sparse search document-only	|
 |-|-|-|-|-|
 |Index size	|1 GB	|65.4 GB	|4.7 GB	|6.8 GB	|
 |RAM usage	|480.74 GB	|675.36 GB	|480.64 GB	|494.25 GB	|
 |Runtime RAM delta	|+0.01 GB	|+53.34 GB	|+0.06 GB	|+0.03 GB	|
 
-<sup>*</sup> We performed this experiment using the full MSMARCO v2 dataset, with 8.8M passages. For all methods, we excluded the `_source` fields and force merged the index before measuring index size. We set the heap size of the OpenSearch JVM to half of the node RAM, so an empty OpenSearch cluster still consumed close to 480 GB of memory.
+<sup>*</sup> We performed this experiment using the full MS MARCO v2 dataset, containing 8.8M passages. For all methods, we excluded the `_source` fields and force merged the index before measuring index size. We set the heap size of the OpenSearch JVM to half of the node RAM, so an empty OpenSearch cluster still consumed close to 480 GB of memory.
 
 ## Build your search engine in five steps
 
@@ -371,7 +371,7 @@ Follow these steps to build your search engine:
     GET /_plugins/_ml/tasks/<task_id>
     ```
 
-    Once the task is complete, the task state changes `COMPLETED` and OpenSearch returns the `model_id` for the deployed model:
+    Once the task is complete, the task state changes to `COMPLETED` and OpenSearch returns the `model_id` for the deployed model:
 
     ```json
     {
@@ -456,7 +456,7 @@ Congratulations! You've now created your own semantic search engine based on spa
 }
 ```
 
-### The neural sparse query parameters
+### Neural sparse query parameters
 
 The `neural_sparse` query supports two parameters:
 
@@ -465,9 +465,9 @@ The `neural_sparse` query supports two parameters:
 
 ## Selecting a model
 
-OpenSearch provides several pretrained encoder models that you can use out-of-the-box without fine-tuning. For a list of sparse encoding models provided by OpenSearch, see [Sparse encoding models](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sparse-encoding-models).
+OpenSearch provides several pretrained encoder models that you can use out of the box without fine-tuning. For a list of sparse encoding models provided by OpenSearch, see [Sparse encoding models](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sparse-encoding-models).
 
-Use the following recommendations to help you select a sparse encoder model:
+Use the following recommendations to select a sparse encoder model:
 
 - For **bi-encoder** mode, we recommend using the `opensearch-neural-sparse-encoding-v1` pretrained model. For this model, both online search and offline ingestion share the same model file. 
 
@@ -477,6 +477,6 @@ Use the following recommendations to help you select a sparse encoder model:
 ## Next steps
 
 - For more information about neural sparse search, see [Neural sparse search](https://opensearch.org/docs/latest/search-plugins/neural-sparse-search/). 
-- For an OpenSearch end-to-end neural search tutorial, see [Neural search tutorial](https://opensearch.org/docs/latest/search-plugins/neural-search-tutorial/). 
+- For an end-to-end neural search tutorial, see [Neural search tutorial](https://opensearch.org/docs/latest/search-plugins/neural-search-tutorial/). 
 - For a list of all search methods OpenSearch supports, see [Search methods](https://opensearch.org/docs/latest/search-plugins/index/#search-methods).
-- Give us your feedback on the [OpenSearch Forum](https://forum.opensearch.org/).
\ No newline at end of file
+- Provide your feedback on the [OpenSearch Forum](https://forum.opensearch.org/).
\ No newline at end of file

From 96e0cb25aca45e3bcaf71b8ec4895129a599bc99 Mon Sep 17 00:00:00 2001
From: Fanit Kolchina <kolchfa@amazon.com>
Date: Fri, 8 Dec 2023 09:25:17 -0500
Subject: [PATCH 05/16] Implemented editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 ...-retrieval-with-sparse-semantic-encoders.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
index 5d6cb396ef..8adc63b235 100644
--- a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
+++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
@@ -15,13 +15,13 @@ meta_description: Learn how the neural sparse framework in OpenSearch 2.11 can h
 has_science_table: true
 ---
 
-OpenSearch 2.11 introduced neural sparse search---a new efficient method of semantic retrieval. In this blog post, you'll learn about using sparse encoders for semantic search. You'll find that neural sparse search reduces costs, performs faster, and improves search relevance. We're excited to share benchmarking results that show why neural sparse search is now the top-performing search method. You can even try it out by building your own search engine in just five steps. For a TLDR on benchmarking learnings, see [Key takeaways](#here-are-the-key-takeaways).
+OpenSearch 2.11 introduced neural sparse search---a new efficient method of semantic retrieval. In this blog post, you'll learn about using sparse encoders for semantic search. You'll find that neural sparse search reduces costs, performs faster, and improves search relevance. We're excited to share benchmarking results and show how neural sparse search outperforms other search methods. You can even try it out by building your own search engine in just five steps. 
 
 ## What are dense and sparse vector embeddings?
 
 When you use a transformer-based encoder, such as BERT, to generate traditional dense vector embeddings, the encoder translates each word into a vector. Collectively, these vectors make up a semantic vector space. In this space, the closer the vectors are, the more similar the words are in meaning.
 
-In sparse encoding, the encoder takes the text and creates a list of tokens that have similar semantic meaning. The model vocabulary ([WordPiece](https://huggingface.co/learn/nlp-course/chapter6/6?fw=pt)) contains most commonly used words along with various tense endings (for example, `-ed` and `-ing`) and suffixes (for example, `-ate` and `-ion`). You can think of the vocabulary as a semantic space where each document is a sparse vector.
+In sparse encoding, the encoder uses the text to create a list of tokens that have similar semantic meaning. The model vocabulary ([WordPiece](https://huggingface.co/learn/nlp-course/chapter6/6?fw=pt)) contains most commonly used words along with various tense endings (for example, `-ed` and `-ing`) and suffixes (for example, `-ate` and `-ion`). You can think of the vocabulary as a semantic space where each document is a sparse vector.
 
 The following images show example results of dense and sparse encoding.
 
@@ -52,21 +52,21 @@ For a detailed comparison, see [Table II](#table-ii-speed-comparison-in-terms-of
 
 ## Sparse encoders perform better on unfamiliar datasets
 
-In our previous [blog post](https://opensearch.org/blog/semantic-science-benchmarks), we mentioned that searching with dense embeddings presents challenges when encoders are facing unfamiliar content. When an encoder trained on one dataset is used on a different dataset, the encoder often produces unpredictable embeddings, resulting in poor search result relevance. 
+In our previous [blog post](https://opensearch.org/blog/semantic-science-benchmarks), we mentioned that searching with dense embeddings presents challenges when encoders encounter unfamiliar content. When an encoder trained on one dataset is used on a different dataset, the encoder often produces unpredictable embeddings, resulting in poor search result relevance. 
 
-Often, BM25 performs better than dense encoders on BEIR datasets that incorporate strong domain knowledge. In these cases, sparse encoders fall back on keyword-based matching, ensuring that their search results are no worse than BM25 ones. For a comparison of search result relevance benchmarks, see [Table I](#benchmarking-results).
+Often, BM25 performs better than dense encoders on BEIR datasets that incorporate strong domain knowledge. In these cases, sparse encoders can fall back on keyword-based matching, ensuring that their search results are no worse than those produced by BM25. For a comparison of search result relevance benchmarks, see [Table I](#benchmarking-results).
 
-## Out of sparse encoders, document-only encoders are the most efficient
+## Among sparse encoders, document-only encoders are the most efficient
 
 You can run a neural sparse search in two modes: **bi-encoder** and **document-only**.
 
-In bi-encoder mode, both documents and search queries are passed through deep encoders. In document-only mode, documents are still passed through deep encoders, but search queries are instead tokenized. In this mode, document encoders are trained to learn more synonym association in order to increase recall. By eliminating the online inference phase, you can **save computational resources** and **significantly reduce latency**. For benchmarks, compare the `Neural sparse doc-only` column with other columns in [Table II](#table-ii-speed-comparison-in-terms-of-latency-and-throughput). 
+In bi-encoder mode, both documents and search queries are passed through deep encoders. In document-only mode, documents are still passed through deep encoders, but search queries are instead tokenized. In this mode, document encoders are trained to learn more synonym association in order to increase recall. By eliminating the online inference phase, you can **save computational resources** and **significantly reduce latency**. For benchmarks, compare the `Neural sparse document-only` column with the other columns in [Table II](#table-ii-speed-comparison-in-terms-of-latency-and-throughput). 
 
 ## Neural sparse search outperforms other search methods in benchmarking tests
 
-For benchmarking, we used a cluster containing 3 `r5.8xlarge` data nodes and 1 `r5.12xlarge` leader/ML node. We evaluated search relevance for all evaluated search methods in terms of NCDG@10. Additionally, we compared the runtime speed and the resource cost of each method.
+For benchmarking, we used a cluster containing 3 `r5.8xlarge` data nodes and 1 `r5.12xlarge` leader/machine learning (ML) node. We measured search relevance for all evaluated search methods in terms of NCDG@10. Additionally, we compared the runtime speed and the resource cost of each method.
 
-### Here are the key takeaways:
+Here are the key takeaways:
 
 * Both modes provide the highest relevance on the BEIR and Amazon ESCI datasets.
 * Without online inference, the search latency of document-only mode is comparable to BM25.
@@ -461,7 +461,7 @@ Congratulations! You've now created your own semantic search engine based on spa
 The `neural_sparse` query supports two parameters:
 
 - `model_id` (String): The ID of the model that is used to generate tokens and weights from the query text. A sparse encoding model will expand the tokens from query text, while the tokenizer model will only tokenize the query text itself.
-- `max_token_score` (Float): An extra parameter required for performance optimization. Just like the OpenSearch `match` query, the `neural_sparse` query is transformed to a Lucene BooleanQuery, combining term-level subqueries using disjunction. The difference is that for the neural sparse query, we use FeatureQuery instead of TermQuery to match the terms. Lucene employs the WAND (Weak AND) algorithm for dynamic pruning, which skips non-competitive tokens based on their score upper bounds. However, FeatureQuery uses `FLOAT.MAX_VALUE` as the score upper bound, which makes the WAND optimization ineffective. The `max_token_score` parameter resets the score upper bound for each token in a query, which is consistent with the original FeatureQuery. Thus, setting the value to 3.5 for the bi-encoder model and 2 for the document-only model can accelerate search without precision loss. After OpenSearch is upgraded to Lucene version 9.8, this parameter will be deprecated.
+- `max_token_score` (Float): An extra parameter required for performance optimization. Just like the OpenSearch `match` query, the `neural_sparse` query is transformed to a Lucene BooleanQuery, combining term-level subqueries using disjunction. The difference is that neural sparse query uses FeatureQuery instead of TermQuery to match the terms. Lucene employs the Weak AND (WAND) algorithm for dynamic pruning, which skips non-competitive tokens based on their score upper bounds. However, FeatureQuery uses `FLOAT.MAX_VALUE` as the score upper bound, which makes the WAND optimization ineffective. The `max_token_score` parameter resets the score upper bound for each token in a query, which is consistent with the original FeatureQuery. Thus, setting the value to 3.5 for the bi-encoder model and to 2 for the document-only model can accelerate search without precision loss. After OpenSearch is upgraded to Lucene version 9.8, this parameter will be deprecated.
 
 ## Selecting a model
 

From b3564f0a45b5c818bb7d613df5c5a21409a8a7be Mon Sep 17 00:00:00 2001
From: Fanit Kolchina <kolchfa@amazon.com>
Date: Fri, 8 Dec 2023 09:29:07 -0500
Subject: [PATCH 06/16] Corrected capitalization of datasets

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 ...ving-document-retrieval-with-sparse-semantic-encoders.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
index 8adc63b235..9caddb6908 100644
--- a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
+++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
@@ -77,7 +77,7 @@ Here are the key takeaways:
 
 The benchmarking results are presented in the following tables.
 
-### Table I. Relevance comparison on BEIR<sup>*</sup> benchmark and Amazon ESCI, in terms of NDCG@10 and  rank.
+### Table I. Relevance comparison on BEIR<sup>*</sup> benchmark and Amazon ESCI, in terms of NDCG@10 and  rank
 
 <table>
     <tr style="text\-align:center;">
@@ -102,7 +102,7 @@ The benchmarking results are presented in the following tables.
         <td><b>Rank</b></td>
     </tr>
     <tr>
-        <td>Trec Covid</td>
+        <td>Trec-Covid</td>
         <td>0.688</td>
         <td>4</td>
         <td>0.481</td>
@@ -206,7 +206,7 @@ The benchmarking results are presented in the following tables.
         <td>2</td>
     </tr>
     <tr>
-        <td>SCIDOCS</td>
+        <td>SciDocs</td>
         <td>0.165</td>
         <td>2</td>
         <td>0.149</td>

From 3dc0577fc5413a01ed0f5afb664655c3e441beba Mon Sep 17 00:00:00 2001
From: Fanit Kolchina <kolchfa@amazon.com>
Date: Fri, 8 Dec 2023 09:31:48 -0500
Subject: [PATCH 07/16] Extra space

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 ...mproving-document-retrieval-with-sparse-semantic-encoders.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
index 9caddb6908..aa15d5d1de 100644
--- a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
+++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
@@ -77,7 +77,7 @@ Here are the key takeaways:
 
 The benchmarking results are presented in the following tables.
 
-### Table I. Relevance comparison on BEIR<sup>*</sup> benchmark and Amazon ESCI, in terms of NDCG@10 and  rank
+### Table I. Relevance comparison on BEIR<sup>*</sup> benchmark and Amazon ESCI, in terms of NDCG@10 and rank
 
 <table>
     <tr style="text\-align:center;">

From e48f1bafc240483d5decdfea0551fe7c54c11782 Mon Sep 17 00:00:00 2001
From: Fanit Kolchina <kolchfa@amazon.com>
Date: Fri, 8 Dec 2023 09:37:28 -0500
Subject: [PATCH 08/16] More editorial feedback

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 ...ng-document-retrieval-with-sparse-semantic-encoders.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
index aa15d5d1de..105187f744 100644
--- a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
+++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
@@ -40,13 +40,13 @@ _**Left**: Dense vector semantic space. **Right**: Sparse vector semantic space.
 
 ## Sparse encoders use more efficient data structures
 
-In dense encoding, documents are represented as high-dimensional vectors. To search these documents, you need to use a k-NN index as an underlying data structure. On the other hand, sparse search can use a native Lucene index because sparse encodings are similar to term vectors used by keyword-based matching. 
+In dense encoding, documents are represented as high-dimensional vectors. To search these documents, you need to use a k-NN index as an underlying data structure. In contrast, sparse search can use a native Lucene index because sparse encodings are similar to term vectors used by keyword-based matching. 
 
 Compared to k-NN indexes, **sparse embeddings have the following cost-reducing advantages**: 
 
-1. Much smaller index size.
-1. Reduced runtime RAM cost.
-1. Lower computational cost. 
+1. Much smaller index size
+1. Reduced runtime RAM cost
+1. Lower computational cost
 
 For a detailed comparison, see [Table II](#table-ii-speed-comparison-in-terms-of-latency-and-throughput).
 

From ff7ea3792f55cd0668c870568aa18dfae5ac7230 Mon Sep 17 00:00:00 2001
From: Fanit Kolchina <kolchfa@amazon.com>
Date: Fri, 8 Dec 2023 10:00:41 -0500
Subject: [PATCH 09/16] More editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 ...cument-retrieval-with-sparse-semantic-encoders.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
index 105187f744..7d145c7a66 100644
--- a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
+++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
@@ -15,7 +15,7 @@ meta_description: Learn how the neural sparse framework in OpenSearch 2.11 can h
 has_science_table: true
 ---
 
-OpenSearch 2.11 introduced neural sparse search---a new efficient method of semantic retrieval. In this blog post, you'll learn about using sparse encoders for semantic search. You'll find that neural sparse search reduces costs, performs faster, and improves search relevance. We're excited to share benchmarking results and show how neural sparse search outperforms other search methods. You can even try it out by building your own search engine in just five steps. 
+OpenSearch 2.11 introduced neural sparse search---a new efficient method of semantic retrieval. In this blog post, you'll learn about using sparse encoders for semantic search. You'll find that neural sparse search reduces costs, performs faster, and improves search relevance. We're excited to share benchmarking results and show how neural sparse search outperforms other search methods. You can even try it out by building your own search engine in just five steps. To skip straight to the results, see [Benchmarking results](#benchmarking-results).
 
 ## What are dense and sparse vector embeddings?
 
@@ -54,7 +54,7 @@ For a detailed comparison, see [Table II](#table-ii-speed-comparison-in-terms-of
 
 In our previous [blog post](https://opensearch.org/blog/semantic-science-benchmarks), we mentioned that searching with dense embeddings presents challenges when encoders encounter unfamiliar content. When an encoder trained on one dataset is used on a different dataset, the encoder often produces unpredictable embeddings, resulting in poor search result relevance. 
 
-Often, BM25 performs better than dense encoders on BEIR datasets that incorporate strong domain knowledge. In these cases, sparse encoders can fall back on keyword-based matching, ensuring that their search results are no worse than those produced by BM25. For a comparison of search result relevance benchmarks, see [Table I](#benchmarking-results).
+Often, BM25 performs better than dense encoders on BEIR datasets that incorporate strong domain knowledge. In these cases, sparse encoders can fall back on keyword-based matching, ensuring that their search results are no worse than those produced by BM25. For a comparison of search result relevance benchmarks, see [Table I](#table-i-relevance-comparison-on-beir-benchmark-and-amazon-esci-in-terms-of-ndcg10-and-rank).
 
 ## Among sparse encoders, document-only encoders are the most efficient
 
@@ -77,7 +77,7 @@ Here are the key takeaways:
 
 The benchmarking results are presented in the following tables.
 
-### Table I. Relevance comparison on BEIR<sup>*</sup> benchmark and Amazon ESCI, in terms of NDCG@10 and rank
+### Table I. Relevance comparison on BEIR benchmark and Amazon ESCI in terms of NDCG@10 and rank
 
 <table>
     <tr style="text\-align:center;">
@@ -298,7 +298,7 @@ The benchmarking results are presented in the following tables.
     </tr>
 </table>
 
-<sup>*</sup> BEIR stands for Benchmarking Information Retrieval. For more information, see [the BEIR GitHub page](https://github.com/beir-cellar/beir).
+For more information about Benchmarking Information Retrieval (BEIR), see [the BEIR GitHub page](https://github.com/beir-cellar/beir).
 
 ### Table II. Speed comparison in terms of latency and throughput
 
@@ -461,13 +461,13 @@ Congratulations! You've now created your own semantic search engine based on spa
 The `neural_sparse` query supports two parameters:
 
 - `model_id` (String): The ID of the model that is used to generate tokens and weights from the query text. A sparse encoding model will expand the tokens from query text, while the tokenizer model will only tokenize the query text itself.
-- `max_token_score` (Float): An extra parameter required for performance optimization. Just like the OpenSearch `match` query, the `neural_sparse` query is transformed to a Lucene BooleanQuery, combining term-level subqueries using disjunction. The difference is that neural sparse query uses FeatureQuery instead of TermQuery to match the terms. Lucene employs the Weak AND (WAND) algorithm for dynamic pruning, which skips non-competitive tokens based on their score upper bounds. However, FeatureQuery uses `FLOAT.MAX_VALUE` as the score upper bound, which makes the WAND optimization ineffective. The `max_token_score` parameter resets the score upper bound for each token in a query, which is consistent with the original FeatureQuery. Thus, setting the value to 3.5 for the bi-encoder model and to 2 for the document-only model can accelerate search without precision loss. After OpenSearch is upgraded to Lucene version 9.8, this parameter will be deprecated.
+- `max_token_score` (Float): An extra parameter required for performance optimization. Just like a `match` query, a `neural_sparse` query is transformed to a Lucene BooleanQuery, combining term-level subqueries using disjunction. The difference is that a `neural_sparse` query uses FeatureQuery instead of TermQuery to match the terms. Lucene employs the Weak AND (WAND) algorithm for dynamic pruning, which skips non-competitive tokens based on their score upper bounds. However, FeatureQuery uses `FLOAT.MAX_VALUE` as the score upper bound, which makes the WAND optimization ineffective. The `max_token_score` parameter resets the score upper bound for each token in a query, which is consistent with the original FeatureQuery. Thus, setting the value to 3.5 for the bi-encoder model and to 2 for the document-only model can accelerate search without precision loss. After OpenSearch is upgraded to Lucene version 9.8, this parameter will be deprecated.
 
 ## Selecting a model
 
 OpenSearch provides several pretrained encoder models that you can use out of the box without fine-tuning. For a list of sparse encoding models provided by OpenSearch, see [Sparse encoding models](https://opensearch.org/docs/latest/ml-commons-plugin/pretrained-models/#sparse-encoding-models).
 
-Use the following recommendations to select a sparse encoder model:
+Use the following recommendations to select a sparse encoding model:
 
 - For **bi-encoder** mode, we recommend using the `opensearch-neural-sparse-encoding-v1` pretrained model. For this model, both online search and offline ingestion share the same model file. 
 

From e0973f47a8570f062f967f71f87c95b43c28e7d0 Mon Sep 17 00:00:00 2001
From: Fanit Kolchina <kolchfa@amazon.com>
Date: Fri, 8 Dec 2023 10:05:06 -0500
Subject: [PATCH 10/16] Add asterisk

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 ...mproving-document-retrieval-with-sparse-semantic-encoders.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
index 7d145c7a66..427e83a7b2 100644
--- a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
+++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
@@ -298,7 +298,7 @@ The benchmarking results are presented in the following tables.
     </tr>
 </table>
 
-For more information about Benchmarking Information Retrieval (BEIR), see [the BEIR GitHub page](https://github.com/beir-cellar/beir).
+<sup>*</sup> For more information about Benchmarking Information Retrieval (BEIR), see [the BEIR GitHub page](https://github.com/beir-cellar/beir).
 
 ### Table II. Speed comparison in terms of latency and throughput
 

From 8c3ef2e2a505b6244e0884da75786ff7c72898c6 Mon Sep 17 00:00:00 2001
From: Fanit Kolchina <kolchfa@amazon.com>
Date: Fri, 8 Dec 2023 10:13:53 -0500
Subject: [PATCH 11/16] Change capacity to resource

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 ...mproving-document-retrieval-with-sparse-semantic-encoders.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
index 427e83a7b2..d7afe795da 100644
--- a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
+++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
@@ -313,7 +313,7 @@ The benchmarking results are presented in the following tables.
 
 <sup>*</sup> We tested latency on a subset of MS MARCO v2 containing 1M documents in total. To obtain latency data, we used 20 clients to loop search requests.
 
-### Table III. Capacity consumption comparison
+### Table III. Resource consumption comparison
 
 |	|BM25	|Dense (with TAS-B model)	|Neural sparse search bi-encoder	| Neural sparse search document-only	|
 |-|-|-|-|-|

From 217d246b0c1e19b66452c0b1ae830df3c9e29f4c Mon Sep 17 00:00:00 2001
From: Fanit Kolchina <kolchfa@amazon.com>
Date: Fri, 8 Dec 2023 10:43:05 -0500
Subject: [PATCH 12/16] Make images same height and edit bios

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 _authors/xinyual.markdown                                     | 2 +-
 _authors/yych.markdown                                        | 2 +-
 _authors/zhichaog.markdown                                    | 2 +-
 ...roving-document-retrieval-with-sparse-semantic-encoders.md | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/_authors/xinyual.markdown b/_authors/xinyual.markdown
index ae87da0450..a749e6389c 100644
--- a/_authors/xinyual.markdown
+++ b/_authors/xinyual.markdown
@@ -5,4 +5,4 @@ photo: '/assets/media/authors/xinyual.jpg'
 github: xinyual
 ---
 
-**Xinyuan Lu** is a machine learning engineer with the OpenSearch project. He is working on large language model(LLM) related applications and search relevance.
\ No newline at end of file
+**Xinyuan Lu** is a machine learning engineer with the OpenSearch Project. He is working on large language model (LLM)-related applications and search relevance.
\ No newline at end of file
diff --git a/_authors/yych.markdown b/_authors/yych.markdown
index b13832fefe..964e3643d7 100644
--- a/_authors/yych.markdown
+++ b/_authors/yych.markdown
@@ -5,4 +5,4 @@ photo: '/assets/media/authors/yych.png'
 github: model-collapse
 ---
 
-**Charlie Yang** is an AWS engineering manager working on the OpenSearch Project. He is focusing on machine learning, search relevance and performance optimization.
\ No newline at end of file
+**Charlie Yang** is an AWS engineering manager working on the OpenSearch Project. He is focusing on machine learning, search relevance, and performance optimization.
\ No newline at end of file
diff --git a/_authors/zhichaog.markdown b/_authors/zhichaog.markdown
index d0e1c2e5dc..b5357cb89b 100644
--- a/_authors/zhichaog.markdown
+++ b/_authors/zhichaog.markdown
@@ -5,4 +5,4 @@ photo: '/assets/media/authors/zhichaog.png'
 github: zhichao-aws
 ---
 
-**Zhichao Geng** is a machine learning engineer working on OpenSearch. His interest is improving search relevance using machine learning.
\ No newline at end of file
+**Zhichao Geng** is a machine learning engineer working on the OpenSearch Project. His interests include improving search relevance using machine learning.
\ No newline at end of file
diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
index d7afe795da..2aab238340 100644
--- a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
+++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
@@ -28,10 +28,10 @@ The following images show example results of dense and sparse encoding.
 <table style="border:none">
   <tr>
     <td style="border:none">
-        <img height="100%" src="/assets/media/blog-images/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders/embedding.png" />
+        <img height="280px" src="/assets/media/blog-images/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders/embedding.png" />
     </td>
     <td style="border:none">
-        <img height="100%" src="/assets/media/blog-images/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders/expand.png" />
+        <img height="280px" src="/assets/media/blog-images/2023-12-05-improving-document-retrieval-with-spade-semantic-encoders/expand.png" />
     </td>
   </tr>
 </table>

From 8f499f360734abfde74cdb06cb1aeb83095ad9ed Mon Sep 17 00:00:00 2001
From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Date: Fri, 8 Dec 2023 10:47:19 -0500
Subject: [PATCH 13/16] Update _authors/xinyual.markdown

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
---
 _authors/xinyual.markdown | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_authors/xinyual.markdown b/_authors/xinyual.markdown
index a749e6389c..93c855a044 100644
--- a/_authors/xinyual.markdown
+++ b/_authors/xinyual.markdown
@@ -5,4 +5,4 @@ photo: '/assets/media/authors/xinyual.jpg'
 github: xinyual
 ---
 
-**Xinyuan Lu** is a machine learning engineer with the OpenSearch Project. He is working on large language model (LLM)-related applications and search relevance.
\ No newline at end of file
+**Xinyuan Lu** is a machine learning engineer with the OpenSearch Project. He works on large language model (LLM)-related applications and search relevance.
\ No newline at end of file

From e32981fc3ea689bd612093b1de5a113dda3b6fad Mon Sep 17 00:00:00 2001
From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Date: Fri, 8 Dec 2023 10:47:34 -0500
Subject: [PATCH 14/16] Update _authors/yych.markdown

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
---
 _authors/yych.markdown | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_authors/yych.markdown b/_authors/yych.markdown
index 964e3643d7..88734bdaf7 100644
--- a/_authors/yych.markdown
+++ b/_authors/yych.markdown
@@ -5,4 +5,4 @@ photo: '/assets/media/authors/yych.png'
 github: model-collapse
 ---
 
-**Charlie Yang** is an AWS engineering manager working on the OpenSearch Project. He is focusing on machine learning, search relevance, and performance optimization.
\ No newline at end of file
+**Charlie Yang** is an AWS engineering manager with the OpenSearch Project. He focuses on machine learning, search relevance, and performance optimization.
\ No newline at end of file

From 1af2c0ea09d38e38e5e8a1200d5b81fc030952ff Mon Sep 17 00:00:00 2001
From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Date: Fri, 8 Dec 2023 10:47:45 -0500
Subject: [PATCH 15/16] Update _authors/zhichaog.markdown

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
---
 _authors/zhichaog.markdown | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_authors/zhichaog.markdown b/_authors/zhichaog.markdown
index b5357cb89b..d113861a4b 100644
--- a/_authors/zhichaog.markdown
+++ b/_authors/zhichaog.markdown
@@ -5,4 +5,4 @@ photo: '/assets/media/authors/zhichaog.png'
 github: zhichao-aws
 ---
 
-**Zhichao Geng** is a machine learning engineer working on the OpenSearch Project. His interests include improving search relevance using machine learning.
\ No newline at end of file
+**Zhichao Geng** is a machine learning engineer with the OpenSearch Project. His interests include improving search relevance using machine learning.
\ No newline at end of file

From ac4f8b180b7259311e00ecf30fcbedaefa6d6590 Mon Sep 17 00:00:00 2001
From: Fanit Kolchina <kolchfa@amazon.com>
Date: Fri, 8 Dec 2023 11:49:59 -0500
Subject: [PATCH 16/16] Format change

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 ...mproving-document-retrieval-with-sparse-semantic-encoders.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
index 2aab238340..419ff26c12 100644
--- a/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
+++ b/_posts/2023-12-05-improving-document-retrieval-with-sparse-semantic-encoders.md
@@ -66,7 +66,7 @@ In bi-encoder mode, both documents and search queries are passed through deep en
 
 For benchmarking, we used a cluster containing 3 `r5.8xlarge` data nodes and 1 `r5.12xlarge` leader/machine learning (ML) node. We measured search relevance for all evaluated search methods in terms of NCDG@10. Additionally, we compared the runtime speed and the resource cost of each method.
 
-Here are the key takeaways:
+**Here are the key takeaways:**
 
 * Both modes provide the highest relevance on the BEIR and Amazon ESCI datasets.
 * Without online inference, the search latency of document-only mode is comparable to BM25.