Doc review

Signed-off-by: Fanit Kolchina <[email protected]>
opensearch-project · Jan 3, 2025 · 7252117 · 7252117
1 parent 8ae9f5b
commit 7252117
Show file tree

Hide file tree

Showing 2 changed files with 111 additions and 32 deletions.
diff --git a/_analyzers/character-filters/html-character-filter.md b/_analyzers/character-filters/html-character-filter.md
@@ -1,6 +1,6 @@
 ---
 layout: default
-title: HTML Strip Character Filter
+title: HTML strip
 parent: Character filters
 nav_order: 100
 ---
@@ -11,6 +11,8 @@ The `html_strip` character filter removes HTML tags, such as `<div>`, `<p>`, and
 
 ## Example: HTML analyzer
 
+The following request applies an `html_strip` character filter to the provided text:
+
 ```json
 GET /_analyze
 {
@@ -23,15 +25,35 @@ GET /_analyze
 ```
 {% include copy-curl.html %}
 
-Using the HTML analyzer, you can convert the HTML character entity references into their corresponding symbols. The processed text would read as follows:
+The response contains the token in which HTML characters have been converted to their decoded values:
 
-```
+```json
+{
+  "tokens": [
+    {
+      "token": """
 Commonly used calculus symbols include α, β and θ 
+""",
+      "start_offset": 0,
+      "end_offset": 74,
+      "type": "word",
+      "position": 0
+    }
+  ]
+}
 ```
 
+## Parameters
+
+The `html_strip` character filter can be configured with the following parameter.
+
+| Parameter       | Required/Optional | Data type | Description    |
+|:---|:---|:---|:---|
+| `escaped_tags` | Optional | Array of strings | An array of HTML element names, specified without the enclosing angle brackets (`< >`). The filter does not remove elements in this list when stripping HTML from the text. For example, setting the array to `["b", "i"]` will prevent the `<b>` and `<i>` elements from being stripped.|
+
 ## Example: Custom analyzer with lowercase filter
 
-The following example query creates a custom analyzer that strips HTML tags and converts the plain text to lowercase by using the `html_strip` analyzer and `lowercase` filter:
+The following example request creates a custom analyzer that strips HTML tags and converts the plain text to lowercase by using the `html_strip` analyzer and `lowercase` filter:
 
 ```json
 PUT /html_strip_and_lowercase_analyzer
@@ -57,9 +79,7 @@ PUT /html_strip_and_lowercase_analyzer
 ```
 {% include copy-curl.html %}
 
-### Testing `html_strip_and_lowercase_analyzer`
-
-You can run the following request to test the analyzer:
+Use the following request to examine the tokens generated using the analyzer:
 
 ```json
 GET /html_strip_and_lowercase_analyzer/_analyze
@@ -72,8 +92,32 @@ GET /html_strip_and_lowercase_analyzer/_analyze
 
 In the response, the HTML tags have been removed and the plain text has been converted to lowercase:
 
-```
-welcome to opensearch!
+```json
+{
+  "tokens": [
+    {
+      "token": "welcome",
+      "start_offset": 4,
+      "end_offset": 11,
+      "type": "<ALPHANUM>",
+      "position": 0
+    },
+    {
+      "token": "to",
+      "start_offset": 12,
+      "end_offset": 14,
+      "type": "<ALPHANUM>",
+      "position": 1
+    },
+    {
+      "token": "opensearch",
+      "start_offset": 23,
+      "end_offset": 42,
+      "type": "<ALPHANUM>",
+      "position": 2
+    }
+  ]
+}
 ```
 
 ## Example: Custom analyzer that preserves HTML tags
@@ -104,9 +148,7 @@ PUT /html_strip_preserve_analyzer
 ```
 {% include copy-curl.html %}
 
-### Testing `html_strip_preserve_analyzer`  
-
-You can run the following request to test the analyzer:
+Use the following request to examine the tokens generated using the analyzer:
 
 ```json
 GET /html_strip_preserve_analyzer/_analyze
@@ -119,6 +161,18 @@ GET /html_strip_preserve_analyzer/_analyze
 
 In the response, the `italic` and `bold` tags have been retained, as specified in the custom analyzer request:
 
-```
+```json
+{
+  "tokens": [
+    {
+      "token": """
 This is a <b>bold</b> and <i>italic</i> text.
+""",
+      "start_offset": 0,
+      "end_offset": 52,
+      "type": "word",
+      "position": 0
+    }
+  ]
+}
 ```
diff --git a/_analyzers/character-filters/mapping-character-filter.md b/_analyzers/character-filters/mapping-character-filter.md
@@ -1,21 +1,21 @@
 ---
 layout: default
-title: Mapping Character Filter
+title: Mapping
 parent: Character Filters
 nav_order: 120
 ---
 
 # Mapping character filter
 
-The `mapping character filter` allows you to define a map of `keys` and `values` for character replacements. Whenever the filter encounters a string of characters matching a key, it replaces them with the corresponding value.
+The `mapping` character filter accepts a map of key-value pairs for character replacement. Whenever the filter encounters a string of characters matching a key, it replaces them with the corresponding value. Replacement values can be empty strings.
 
-Matching is greedy, meaning that the longest matching pattern is prioritized. Replacements can also be empty strings if needed.
+The filter applies greedy matching, meaning that the longest matching pattern is matched. 
 
-The mapping character filter helps in scenarios where specific text replacements are required before tokenization.
+The `mapping` character filter helps in scenarios where specific text replacements are required before tokenization.
 
-## Example of the mapping filter
+## Example 
 
-The following example demonstrates a mapping filter that converts Roman numerals (I, II, III, IV, etc.) into their corresponding Arabic numerals (1, 2, 3, 4, etc.). 
+The following request configures a `mapping` character filter that converts Roman numerals (such as I, II, or III) into their corresponding Arabic numerals (1, 2, and 3): 
 
 ```json
 GET /_analyze
@@ -37,24 +37,38 @@ GET /_analyze
 }
 ```
 
-Using the mapping filter on the following text "I have III apples and IV oranges" with the mappings provided produces the response text:
+The response contains a token where Roman numerals have been replaced with Arabic numerals:
 
+```json
+{
+  "tokens": [
+    {
+      "token": "1 have 3 apples and 4 oranges",
+      "start_offset": 0,
+      "end_offset": 32,
+      "type": "word",
+      "position": 0
+    }
+  ]
+}
 ```
-I have 3 apples and 4 oranges
-```
+{% include copy-curl.html %}
+
+## Parameters
 
-## Configuring the mapping filter
+You can use either of the following parameters to configure the key-value map.
 
-There are two ways to configure the mappings. 
-1. `mappings`: Provide an array of key-value pairs in the form `key => value`. For every key found, the corresponding value will replace it in the input text.
-2. `mappings_path`: Specify the path to a UTF-8 encoded file containing key-value mappings. Each mapping should be on a new line in the format `key => value`. The path can be absolute or relative to the OpenSearch configuration directory.
+| Parameter       | Required/Optional | Data type | Description    |
+|:---|:---|:---|:---|
+| `mappings`       | Optional          | Array      | An array of key-value pairs in the format `key => value`. Each key found in the input text will be replaced with its corresponding value. |
+| `mappings_path`  | Optional          | String     | The path to a UTF-8 encoded file containing key-value mappings. Each mapping should appear on a new line in the format `key => value`. The path can be absolute or relative to the OpenSearch configuration directory. |
 
 ### Using a custom mapping character filter
 
-You can create a custom mapping character filter by defining your own set of mappings. The following example demonstrates the creation of a custom character filter that replaces common abbreviations in a text.
+You can create a custom mapping character filter by defining your own set of mappings. The following request creates a custom character filter that replaces common abbreviations in a text:
 
 ```json
-PUT /text-index
+PUT /test-index
 {
   "settings": {
     "analysis": {
@@ -80,8 +94,9 @@ PUT /text-index
   }
 }
 ```
+{% include copy-curl.html %}
 
-We can use our custom analyzer with the mappings we have provided to analzw the text "FYI, updates to the workout schedule are posted. IDK when it takes effect, but we have some details. BTW, the finalized schedule will be released Monday."
+Use the following request to examine the tokens generated using the analyzer:
 
 ```json
 GET /text-index/_analyze
@@ -92,8 +107,18 @@ GET /text-index/_analyze
 }
 ```
 
-With the custom mappings we provided the text is mapped to the `key` `value` pairs we submitted, this results in the text being updated as the mappings specified and we get the following response:
+The response shows that the abbreviations were replaced:
 
-```
-For your information, updates to the workout schedule are posted. I don't know when it takes effect, but we have some details. By the way, the finalized schedule will be released Monday.
+```json
+{
+  "tokens": [
+    {
+      "token": "For your information, updates to the workout schedule are posted. I don't know when it takes effect, but we have some details. By the way, the finalized schedule will be released Monday.",
+      "start_offset": 0,
+      "end_offset": 153,
+      "type": "word",
+      "position": 0
+    }
+  ]
+}
 ```