From 43bb34e9edb3ae0e2de3095e252c98f123212487 Mon Sep 17 00:00:00 2001 From: Lynette Miles Date: Mon, 21 Oct 2024 08:35:53 -0700 Subject: [PATCH 1/4] pipelines: docs: bringing elasticsearch up to standards Signed-off-by: Lynette Miles --- pipeline/outputs/elasticsearch.md | 135 ++++++++++++++++-------------- 1 file changed, 73 insertions(+), 62 deletions(-) diff --git a/pipeline/outputs/elasticsearch.md b/pipeline/outputs/elasticsearch.md index 8e5288a44..eb38e4cde 100644 --- a/pipeline/outputs/elasticsearch.md +++ b/pipeline/outputs/elasticsearch.md @@ -4,78 +4,85 @@ description: Send logs to Elasticsearch (including Amazon OpenSearch Service) # Elasticsearch -The **es** output plugin, allows to ingest your records into an [Elasticsearch](http://www.elastic.co) database. The following instructions assumes that you have a fully operational Elasticsearch service running in your environment. +The **es** output plugin lets you ingest your records into an +[Elasticsearch](http://www.elastic.co) database. To use this plugin, you must have an +operational Elasticsearch service running in your environment. ## Configuration Parameters -| Key | Description | default | +| Key | Description | Default | | :--- | :--- | :--- | -| Host | IP address or hostname of the target Elasticsearch instance | 127.0.0.1 | -| Port | TCP port of the target Elasticsearch instance | 9200 | -| Path | Elasticsearch accepts new data on HTTP query path "/\_bulk". But it is also possible to serve Elasticsearch behind a reverse proxy on a subpath. This option defines such path on the fluent-bit side. It simply adds a path prefix in the indexing HTTP POST URI. | Empty string | -| compress | Set payload compression mechanism. Option available is 'gzip' | | -| Buffer\_Size | Specify the buffer size used to read the response from the Elasticsearch HTTP service. This option is useful for debugging purposes where is required to read full responses, note that response size grows depending of the number of records inserted. To set an _unlimited_ amount of memory set this value to **False**, otherwise the value must be according to the [Unit Size](../../administration/configuring-fluent-bit/unit-sizes.md) specification. | 512KB | -| Pipeline | Newer versions of Elasticsearch allows to setup filters called pipelines. This option allows to define which pipeline the database should use. For performance reasons is strongly suggested to do parsing and filtering on Fluent Bit side, avoid pipelines. | | -| AWS\_Auth | Enable AWS Sigv4 Authentication for Amazon OpenSearch Service | Off | -| AWS\_Region | Specify the AWS region for Amazon OpenSearch Service | | -| AWS\_STS\_Endpoint | Specify the custom sts endpoint to be used with STS API for Amazon OpenSearch Service | | -| AWS\_Role\_ARN | AWS IAM Role to assume to put records to your Amazon cluster | | -| AWS\_External\_ID | External ID for the AWS IAM Role specified with `aws_role_arn` | | -| AWS\_Service\_Name | Service name to be used in AWS Sigv4 signature. For integration with Amazon OpenSearch Serverless, set to `aoss`. See the [FAQ](opensearch.md#faq) section on Amazon OpenSearch Serverless for more information. | es | -| AWS\_Profile | AWS profile name | default | -| Cloud\_ID | If you are using Elastic's Elasticsearch Service you can specify the cloud\_id of the cluster running. The Cloud ID string has the format `:`. Once decoded, the `base64_info` string has the format `$$`. +| `Host` | IP address or hostname of the target Elasticsearch instance | 127.0.0.1 | +| `Port` | TCP port of the target Elasticsearch instance | 9200 | +| `Path` | Elasticsearch accepts new data on HTTP query path `/_bulk`. It's also possible to serve Elasticsearch behind a reverse proxy on a sub-path. Define the path by adding a path prefix in the indexing HTTP POST URI. | Empty string | +| `compress` | Set payload compression mechanism. Option available is 'gzip' | | +| `Buffer_Size` | Specify the buffer size used to read the response from the Elasticsearch HTTP service. Useful for debugging purposes where it's required to read full responses. Response size grows depending of the number of records inserted. To set an _unlimited_ amount of memory set this value to **False**, otherwise the value must be according to the [Unit Size](../../administration/configuring-fluent-bit/unit-sizes.md) specification. | `512KB` | +| `Pipeline` | Newer versions of Elasticsearch allows to setup filters called pipelines. This option allows to define which pipeline the database should use. For performance reasons is strongly suggested to do parsing and filtering on Fluent Bit side, avoid pipelines. | | +| `AWS_Auth` | Enable AWS Sigv4 Authentication for Amazon OpenSearch Service | Off | +| `AWS_Region` | Specify the AWS region for Amazon OpenSearch Service | | +| `AWS_STS_Endpoint` | Specify the custom STS endpoint to be used with STS API for Amazon OpenSearch Service | | +| `AWS_Role_ARN` | AWS IAM Role to assume to put records to your Amazon cluster | | +| `AWS_External_ID` | External ID for the AWS IAM Role specified with `aws_role_arn` | | +| `AWS_Service_Name` | Service name to use in AWS Sigv4 signature. For integration with Amazon OpenSearch Serverless, set to `aoss`. See the [FAQ](opensearch.md#faq) section on Amazon OpenSearch Serverless for more information. | `es` | +| `AWS_Profile` | AWS profile name | default | +| `Cloud_ID` | If using Elastic's Elasticsearch Service you can specify the `cloud_id` of the cluster running. The string has the format `:`. Once decoded, the `base64_info` string has the format `$$`. | | -| Cloud\_Auth | Specify the credentials to use to connect to Elastic's Elasticsearch Service running on Elastic Cloud | | -| HTTP\_User | Optional username credential for Elastic X-Pack access | | -| HTTP\_Passwd | Password for user defined in HTTP\_User | | -| Index | Index name | fluent-bit | -| Type | Type name | \_doc | -| Logstash\_Format | Enable Logstash format compatibility. This option takes a boolean value: True/False, On/Off | Off | -| Logstash\_Prefix | When Logstash\_Format is enabled, the Index name is composed using a prefix and the date, e.g: If Logstash\_Prefix is equals to 'mydata' your index will become 'mydata-YYYY.MM.DD'. The last string appended belongs to the date when the data is being generated. | logstash | -| Logstash\_Prefix\_Key | When included: the value of the key in the record will be evaluated as key reference and overrides Logstash\_Prefix for index generation. If the key/value is not found in the record then the Logstash\_Prefix option will act as a fallback. The parameter is expected to be a [record accessor](../../administration/configuring-fluent-bit/classic-mode/record-accessor.md). | | -| Logstash\_Prefix\_Separator | Set a separator between logstash_prefix and date.| - | -| Logstash\_DateFormat | Time format \(based on [strftime](http://man7.org/linux/man-pages/man3/strftime.3.html)\) to generate the second part of the Index name. | %Y.%m.%d | -| Time\_Key | When Logstash\_Format is enabled, each record will get a new timestamp field. The Time\_Key property defines the name of that field. | @timestamp | -| Time\_Key\_Format | When Logstash\_Format is enabled, this property defines the format of the timestamp. | %Y-%m-%dT%H:%M:%S | -| Time\_Key\_Nanos | When Logstash\_Format is enabled, enabling this property sends nanosecond precision timestamps. | Off | -| Include\_Tag\_Key | When enabled, it append the Tag name to the record. | Off | -| Tag\_Key | When Include\_Tag\_Key is enabled, this property defines the key name for the tag. | \_flb-key | -| Generate\_ID | When enabled, generate `_id` for outgoing records. This prevents duplicate records when retrying ES. | Off | -| Id\_Key | If set, `_id` will be the value of the key from incoming record and `Generate_ID` option is ignored. | | -| Write\_Operation | The write\_operation can be any of: create (default), index, update, upsert. | create | -| Replace\_Dots | When enabled, replace field name dots with underscore, required by Elasticsearch 2.0-2.3. | Off | -| Trace\_Output | Print all elasticsearch API request payloads to stdout \(for diag only\) | Off | -| Trace\_Error | If elasticsearch return an error, print the elasticsearch API request and response \(for diag only\) | Off | -| Current\_Time\_Index | Use current time for index generation instead of message record | Off | -| Suppress\_Type\_Name | When enabled, mapping types is removed and `Type` option is ignored. If using Elasticsearch 8.0.0 or higher - it [no longer supports mapping types](https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html), so it shall be set to On. | Off | -| Workers | The number of [workers](../../administration/multithreading.md#outputs) to perform flush operations for this output. | `2` | - -> The parameters _index_ and _type_ can be confusing if you are new to Elastic, if you have used a common relational database before, they can be compared to the _database_ and _table_ concepts. Also see [the FAQ below](elasticsearch.md#faq) +| `Cloud_Auth` | Specify the credentials to use to connect to Elastic's Elasticsearch Service running on Elastic Cloud | | +| `HTTP_User` | Optional username credential for Elastic X-Pack access | | +| `HTTP_Passwd` | Password for user defined in `HTTP_User` | | +| `Index` | Index name | fluent-bit | +| `Type` | Type name | `_doc` | +| `Logstash_Format` | Enable Logstash format compatibility. This option takes a Boolean value: `True/False`, `On/Off` | `Off` | +| `Logstash_Prefix` | When Logstash\_Format is enabled, the Index name is composed using a prefix and the date, e.g: If `Logstash_Prefix` is equal to `mydata` your index will become `mydata-YYYY.MM.DD`. The last string appended belongs to the date when the data is being generated. | `logstash` | +| `Logstash_Prefix_Key` | When included: the value of the key in the record will be evaluated as key reference and overrides `Logstash_Prefix` for index generation. If the key/value isn't found in the record then the `Logstash_Prefix` option will act as a fallback. The parameter is expected to be a [record accessor](../../administration/configuring-fluent-bit/classic-mode/record-accessor.md). | | +| `Logstash_Prefix_Separator` | Set a separator between `Logstash_Prefix` and date.| - | +| `Logstash_DateFormat` | Time format based on [strftime](http://man7.org/linux/man-pages/man3/strftime.3.html) to generate the second part of the Index name. | `%Y.%m.%d` | +| `Time_Key` | When `Logstash_Format` is enabled, each record will get a new timestamp field. The `Time_Key` property defines the name of that field. | `@timestamp` | +| `Time_Key_Format` | When `Logstash_Format` is enabled, this property defines the format of the timestamp. | `%Y-%m-%dT%H:%M:%S` | +| `Time_Key_Nanos` | When `Logstash_Format` is enabled, enabling this property sends nanosecond precision timestamps. | `Off` | +| `Include_Tag_Key` | When enabled, it append the Tag name to the record. | `Off` | +| `Tag_Key` | When `Include_Tag_Key` is enabled, this property defines the key name for the tag. | `_flb-key` | +| `Generate_ID` | When enabled, generate `_id` for outgoing records. This prevents duplicate records when retrying ES. | `Off` | +| `Id_Key` | If set, `_id` will be the value of the key from incoming record and `Generate_ID` option is ignored. | | +| `Write_Operation` | `Write_operation` can be any of: `create`, `index`, `update`, `upsert`. | `create` | +| `Replace_Dots` | When enabled, replace field name dots with underscore, required by Elasticsearch 2.0-2.3. | `Off` | +| `Trace_Output` | Print all ElasticSearch API request payloads to `stdout` for diagnostics | `Off` | +| `Trace_Error` | If ElasticSearch returns an error, print the ElasticSearch API request and response for diagnostics | `Off` | +| `Current_Time_Index` | Use current time for index generation instead of message record | `Off` | +| `Suppress_Type_Name` | When enabled, mapping types is removed and `Type` option is +ignored. Elasticsearch 8.0.0 or higher [no longer supports mapping types](https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html), and is set to `On`. | `Off` | +| `Workers` | The number of [workers](../../administration/multithreading.md#outputs) to perform flush operations for this output. | `2` | + +The parameters `index` and `type` can be confusing if you are new to Elastic, if you +have used a common relational database before, they can be compared to the `database` +and `table` concepts. Also see [the FAQ](elasticsearch.md#faq) ### TLS / SSL -Elasticsearch output plugin supports TLS/SSL, for more details about the properties available and general configuration, please refer to the [TLS/SSL](../../administration/transport-security.md) section. +Elasticsearch output plugin supports TLS/SSL. For more details about the properties +available and general configuration, refer to[TLS/SSL](../../administration/transport-security.md). -### write\_operation +### `write_operation` -The write\_operation can be any of: +The `write_operation` can be any of: | Operation | Description | | ------------- | ----------- | -| create (default) | adds new data - if the data already exists (based on its id), the op is skipped.| -| index | new data is added while existing data (based on its id) is replaced (reindexed).| -| update | updates existing data (based on its id). If no data is found, the op is skipped.| -| upsert | known as merge or insert if the data does not exist, updates if the data exists (based on its id).| +| `create` | Adds new data. If the data already exists (based on its id), the op is skipped.| +| `index` | New data is added while existing data (based on its id) is replaced (reindexed).| +| `update` | Updates existing data (based on its id). If no data is found, the op is skipped.| +| `upsert` | Known as merge or insert if the data does not exist, updates if the data exists (based on its id).| **Please note, `Id_Key` or `Generate_ID` is required in update, and upsert scenario.** -## Getting Started +## Get started -In order to insert records into a Elasticsearch service, you can run the plugin from the command line or through the configuration file: +To insert records into an Elasticsearch service, you run the plugin from the +command line or through the configuration file: ### Command Line -The **es** plugin, can read the parameters from the command line in two ways, through the **-p** argument \(property\) or setting them directly through the service URI. The URI format is the following: +The **es** plugin can read the parameters from the command line in two ways, through the **-p** argument (property) or setting them directly through the service URI. The URI format is the following: ```text es://host:port/index/type @@ -83,15 +90,15 @@ es://host:port/index/type Using the format specified, you could start Fluent Bit through: -```text -$ fluent-bit -i cpu -t cpu -o es://192.168.2.3:9200/my_index/my_type \ +```shell copy +fluent-bit -i cpu -t cpu -o es://192.168.2.3:9200/my_index/my_type \ -o stdout -m '*' ``` which is similar to do: -```text -$ fluent-bit -i cpu -t cpu -o es -p Host=192.168.2.3 -p Port=9200 \ +```shell copy +fluent-bit -i cpu -t cpu -o es -p Host=192.168.2.3 -p Port=9200 \ -p Index=my_index -p Type=my_type -o stdout -m '*' ``` @@ -113,11 +120,11 @@ In your main configuration file append the following _Input_ & _Output_ sections Type my_type ``` -![example configuration visualization from calyptia](../../.gitbook/assets/image%20%282%29.png) +![example configuration visualization from Calyptia](../../.gitbook/assets/image%20%282%29.png) ## About Elasticsearch field names -Some input plugins may generate messages where the field names contains dots, since Elasticsearch 2.0 this is not longer allowed, so the current **es** plugin replaces them with an underscore, e.g: +Some input plugins can generate messages where the field names contains dots, since Elasticsearch 2.0 this is not longer allowed, so the current **es** plugin replaces them with an underscore, e.g: ```text {"cpu0.p_cpu"=>17.000000} @@ -133,7 +140,8 @@ becomes ### Elasticsearch rejects requests saying "the final mapping would have more than 1 type" -Since Elasticsearch 6.0, you cannot create multiple types in a single index. This means that you cannot set up your configuration as below anymore. +Elasticsearch 6.0 can't create multiple types in a single index. This +means that you can't set up your configuration like the following:. ```text [OUTPUT] @@ -149,11 +157,14 @@ Since Elasticsearch 6.0, you cannot create multiple types in a single index. Thi Type type2 ``` -If you see an error message like below, you'll need to fix your configuration to use a single type on each index. +An error message like the following indicats you need to update your configuration to +use a single type on each index. -> Rejecting mapping update to \[search\] as the final mapping would have more than 1 type +```text + Rejecting mapping update to [search] as the final mapping would have more than 1 type +``` -For details, please read [the official blog post on that issue](https://www.elastic.co/guide/en/elasticsearch/reference/6.7/removal-of-types.html). +For details, read [the official blog post on that issue](https://www.elastic.co/guide/en/elasticsearch/reference/6.7/removal-of-types.html). ### Elasticsearch rejects requests saying "Document mapping type name can't start with '\_'" From 325075c74505eb1492c3790a8ea25a80395d6480 Mon Sep 17 00:00:00 2001 From: Lynette Miles Date: Mon, 21 Oct 2024 14:21:42 -0700 Subject: [PATCH 2/4] fluent: docs: elastic search plugin style updates Signed-off-by: Lynette Miles --- pipeline/outputs/elasticsearch.md | 233 ++++++++++-------- vale-styles/FluentBit/Spelling-exceptions.txt | 1 + 2 files changed, 136 insertions(+), 98 deletions(-) diff --git a/pipeline/outputs/elasticsearch.md b/pipeline/outputs/elasticsearch.md index eb38e4cde..3c8e1919c 100644 --- a/pipeline/outputs/elasticsearch.md +++ b/pipeline/outputs/elasticsearch.md @@ -12,28 +12,27 @@ operational Elasticsearch service running in your environment. | Key | Description | Default | | :--- | :--- | :--- | -| `Host` | IP address or hostname of the target Elasticsearch instance | 127.0.0.1 | +| `Host` | IP address or hostname of the target Elasticsearch instance | `127.0.0.1` | | `Port` | TCP port of the target Elasticsearch instance | 9200 | -| `Path` | Elasticsearch accepts new data on HTTP query path `/_bulk`. It's also possible to serve Elasticsearch behind a reverse proxy on a sub-path. Define the path by adding a path prefix in the indexing HTTP POST URI. | Empty string | -| `compress` | Set payload compression mechanism. Option available is 'gzip' | | -| `Buffer_Size` | Specify the buffer size used to read the response from the Elasticsearch HTTP service. Useful for debugging purposes where it's required to read full responses. Response size grows depending of the number of records inserted. To set an _unlimited_ amount of memory set this value to **False**, otherwise the value must be according to the [Unit Size](../../administration/configuring-fluent-bit/unit-sizes.md) specification. | `512KB` | -| `Pipeline` | Newer versions of Elasticsearch allows to setup filters called pipelines. This option allows to define which pipeline the database should use. For performance reasons is strongly suggested to do parsing and filtering on Fluent Bit side, avoid pipelines. | | -| `AWS_Auth` | Enable AWS Sigv4 Authentication for Amazon OpenSearch Service | Off | -| `AWS_Region` | Specify the AWS region for Amazon OpenSearch Service | | +| `Path` | Elasticsearch accepts new data on HTTP query path `/_bulk`. You can also serve Elasticsearch behind a reverse proxy on a sub-path. Define the path by adding a path prefix in the indexing HTTP POST URI. | Empty string | +| `compress` | Set payload compression mechanism. Option available is `gzip`. | | +| `Buffer_Size` | Specify the buffer size used to read the response from the Elasticsearch HTTP service. Use for debugging purposes where required to read full responses. Response size grows depending of the number of records inserted. To use an unlimited amount of memory, set this value to `False`. Otherwise set the value according to the [Unit Size](../../administration/configuring-fluent-bit/unit-sizes.md). | `512KB` | +| `Pipeline` | Define which pipeline the database should use. For performance reasons, it's strongly suggested to do parsing and filtering on Fluent Bit side, and avoid pipelines. | | +| `AWS_Auth` | Enable AWS Sigv4 Authentication for Amazon OpenSearch Service. | `Off` | +| `AWS_Region` | Specify the AWS region for Amazon OpenSearch Service. | | | `AWS_STS_Endpoint` | Specify the custom STS endpoint to be used with STS API for Amazon OpenSearch Service | | | `AWS_Role_ARN` | AWS IAM Role to assume to put records to your Amazon cluster | | | `AWS_External_ID` | External ID for the AWS IAM Role specified with `aws_role_arn` | | | `AWS_Service_Name` | Service name to use in AWS Sigv4 signature. For integration with Amazon OpenSearch Serverless, set to `aoss`. See the [FAQ](opensearch.md#faq) section on Amazon OpenSearch Serverless for more information. | `es` | | `AWS_Profile` | AWS profile name | default | -| `Cloud_ID` | If using Elastic's Elasticsearch Service you can specify the `cloud_id` of the cluster running. The string has the format `:`. Once decoded, the `base64_info` string has the format `$$`. - | | +| `Cloud_ID` | If using Elastic's Elasticsearch Service you can specify the `cloud_id` of the cluster running. The string has the format `:`. Once decoded, the `base64_info` string has the format `$$`. | | | `Cloud_Auth` | Specify the credentials to use to connect to Elastic's Elasticsearch Service running on Elastic Cloud | | | `HTTP_User` | Optional username credential for Elastic X-Pack access | | | `HTTP_Passwd` | Password for user defined in `HTTP_User` | | -| `Index` | Index name | fluent-bit | +| `Index` | Index name | `fluent-bit` | | `Type` | Type name | `_doc` | | `Logstash_Format` | Enable Logstash format compatibility. This option takes a Boolean value: `True/False`, `On/Off` | `Off` | -| `Logstash_Prefix` | When Logstash\_Format is enabled, the Index name is composed using a prefix and the date, e.g: If `Logstash_Prefix` is equal to `mydata` your index will become `mydata-YYYY.MM.DD`. The last string appended belongs to the date when the data is being generated. | `logstash` | +| `Logstash_Prefix` | When `Logstash_Format` is enabled, the Index name is composed using a prefix and the date, e.g: If `Logstash_Prefix` is equal to `mydata` your index will become `mydata-YYYY.MM.DD`. The last string appended belongs to the date when the data is being generated. | `logstash` | | `Logstash_Prefix_Key` | When included: the value of the key in the record will be evaluated as key reference and overrides `Logstash_Prefix` for index generation. If the key/value isn't found in the record then the `Logstash_Prefix` option will act as a fallback. The parameter is expected to be a [record accessor](../../administration/configuring-fluent-bit/classic-mode/record-accessor.md). | | | `Logstash_Prefix_Separator` | Set a separator between `Logstash_Prefix` and date.| - | | `Logstash_DateFormat` | Time format based on [strftime](http://man7.org/linux/man-pages/man3/strftime.3.html) to generate the second part of the Index name. | `%Y.%m.%d` | @@ -45,17 +44,16 @@ operational Elasticsearch service running in your environment. | `Generate_ID` | When enabled, generate `_id` for outgoing records. This prevents duplicate records when retrying ES. | `Off` | | `Id_Key` | If set, `_id` will be the value of the key from incoming record and `Generate_ID` option is ignored. | | | `Write_Operation` | `Write_operation` can be any of: `create`, `index`, `update`, `upsert`. | `create` | -| `Replace_Dots` | When enabled, replace field name dots with underscore, required by Elasticsearch 2.0-2.3. | `Off` | -| `Trace_Output` | Print all ElasticSearch API request payloads to `stdout` for diagnostics | `Off` | -| `Trace_Error` | If ElasticSearch returns an error, print the ElasticSearch API request and response for diagnostics | `Off` | -| `Current_Time_Index` | Use current time for index generation instead of message record | `Off` | -| `Suppress_Type_Name` | When enabled, mapping types is removed and `Type` option is -ignored. Elasticsearch 8.0.0 or higher [no longer supports mapping types](https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html), and is set to `On`. | `Off` | +| `Replace_Dots` | When enabled, replace field name dots with underscore. Required by Elasticsearch 2.0-2.3. | `Off` | +| `Trace_Output` | Print all ElasticSearch API request payloads to `stdout` for diagnostics. | `Off` | +| `Trace_Error` | If ElasticSearch returns an error, print the ElasticSearch API request and response for diagnostics. | `Off` | +| `Current_Time_Index` | Use current time for index generation instead of message record. | `Off` | +| `Suppress_Type_Name` | When enabled, mapping types is removed and `Type` option is ignored. Elasticsearch 8.0.0 or higher [no longer supports mapping types](https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html), and is set to `On`. | `Off` | | `Workers` | The number of [workers](../../administration/multithreading.md#outputs) to perform flush operations for this output. | `2` | -The parameters `index` and `type` can be confusing if you are new to Elastic, if you -have used a common relational database before, they can be compared to the `database` -and `table` concepts. Also see [the FAQ](elasticsearch.md#faq) +If you have used a common relational database, the parameters `index` and `type` can +be compared to the `database` and `table` concepts. Also see [the +FAQ](elasticsearch.md#faq) ### TLS / SSL @@ -66,14 +64,18 @@ available and general configuration, refer to[TLS/SSL](../../administration/tran The `write_operation` can be any of: -| Operation | Description | -| ------------- | ----------- | -| `create` | Adds new data. If the data already exists (based on its id), the op is skipped.| +| Operation | Description | +| ----------- | ----------- | +| `create` | Adds new data. If the data already exists (based on its id), the op is skipped.| | `index` | New data is added while existing data (based on its id) is replaced (reindexed).| -| `update` | Updates existing data (based on its id). If no data is found, the op is skipped.| -| `upsert` | Known as merge or insert if the data does not exist, updates if the data exists (based on its id).| +| `update` | Updates existing data (based on its id). If no data is found, the op is skipped. | +| `upsert` | Merge or insert if the data doesn't exist, updates if the data exists (based on its id).| -**Please note, `Id_Key` or `Generate_ID` is required in update, and upsert scenario.** +{% hint style="info" %} + +`Id_Key` or `Generate_ID` is required for `update` and `upsert`. + +{% endhint %} ## Get started @@ -82,7 +84,12 @@ command line or through the configuration file: ### Command Line -The **es** plugin can read the parameters from the command line in two ways, through the **-p** argument (property) or setting them directly through the service URI. The URI format is the following: +The **es** plugin can read the parameters from the command line in two ways: + +- Through the `-p` argument (property) +- Setting them directly through the service URI. + +The URI format is the following: ```text es://host:port/index/type @@ -104,7 +111,7 @@ fluent-bit -i cpu -t cpu -o es -p Host=192.168.2.3 -p Port=9200 \ ### Configuration File -In your main configuration file append the following _Input_ & _Output_ sections. You can visualize this configuration [here](https://link.calyptia.com/qhq) +In your main configuration file append the following `Input` and `Output` sections. You can visualize this configuration [here](https://link.calyptia.com/qhq) ```python [INPUT] @@ -124,7 +131,9 @@ In your main configuration file append the following _Input_ & _Output_ sections ## About Elasticsearch field names -Some input plugins can generate messages where the field names contains dots, since Elasticsearch 2.0 this is not longer allowed, so the current **es** plugin replaces them with an underscore, e.g: +Some input plugins can generate messages where the field names contains dots. For +Elasticsearch 2.0, this isn't allowed. The current **es** plugin replaces +them with an underscore: ```text {"cpu0.p_cpu"=>17.000000} @@ -136,62 +145,21 @@ becomes {"cpu0_p_cpu"=>17.000000} ``` -## FAQ - -### Elasticsearch rejects requests saying "the final mapping would have more than 1 type" - -Elasticsearch 6.0 can't create multiple types in a single index. This -means that you can't set up your configuration like the following:. - -```text -[OUTPUT] - Name es - Match foo.* - Index search - Type type1 - -[OUTPUT] - Name es - Match bar.* - Index search - Type type2 -``` - -An error message like the following indicats you need to update your configuration to -use a single type on each index. - -```text - Rejecting mapping update to [search] as the final mapping would have more than 1 type -``` - -For details, read [the official blog post on that issue](https://www.elastic.co/guide/en/elasticsearch/reference/6.7/removal-of-types.html). - -### Elasticsearch rejects requests saying "Document mapping type name can't start with '\_'" +## Use Fluent Bit ElasticSearch plugin with other services -Fluent Bit v1.5 changed the default mapping type from `flb_type` to `_doc`, which matches the recommendation from Elasticsearch from version 6.2 forwards \([see commit with rationale](https://github.com/fluent/fluent-bit/commit/04ed3d8104ca8a2f491453777ae6e38e5377817e#diff-c9ae115d3acaceac5efb949edbb21196)\). This doesn't work in Elasticsearch versions 5.6 through 6.1 \([see Elasticsearch discussion and fix](https://discuss.elastic.co/t/cant-use-doc-as-type-despite-it-being-declared-the-preferred-method/113837/9)\). Ensure you set an explicit map \(such as `doc` or `flb_type`\) in the configuration, as seen on the last line: - -```text -[OUTPUT] - Name es - Match * - Host vpc-test-domain-ke7thhzoo7jawsrhmm6mb7ite7y.us-west-2.es.amazonaws.com - Port 443 - Index my_index - AWS_Auth On - AWS_Region us-west-2 - tls On - Type doc -``` +Connect to Amazon OpenSearch or Elastic Cloud with the ElasticSearch plugin. -### Fluent Bit + Amazon OpenSearch Service +### Amazon OpenSearch Service -The Amazon OpenSearch Service adds an extra security layer where HTTP requests must be signed with AWS Sigv4. Fluent Bit v1.5 introduced full support for Amazon OpenSearch Service with IAM Authentication. +The Amazon OpenSearch Service adds an extra security layer where HTTP requests must +be signed with AWS Sigv4. Fluent Bit v1.5 introduced full support for Amazon +OpenSearch Service with IAM Authentication. -See [here](https://github.com/fluent/fluent-bit-docs/tree/43c4fe134611da471e706b0edb2f9acd7cdfdbc3/administration/aws-credentials.md) for details on how AWS credentials are fetched. +See [details](https://github.com/fluent/fluent-bit-docs/tree/43c4fe134611da471e706b0edb2f9acd7cdfdbc3/administration/aws-credentials.md) on how AWS credentials are fetched. Example configuration: -```text +```text copy [OUTPUT] Name es Match * @@ -204,16 +172,20 @@ Example configuration: tls On ``` -Notice that the `Port` is set to `443`, `tls` is enabled, and `AWS_Region` is set. +Be aware that the `Port` is set to `443`, `tls` is enabled, and `AWS_Region` is set. -### Fluent Bit + Elastic Cloud +### Use Fluent Bit with Elastic Cloud -Fluent Bit supports connecting to [Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html) providing just the `cloud_id` and the `cloud_auth` settings. -`cloud_auth` uses the `elastic` user and password provided when the cluster was created, for details refer to the [Cloud ID usage page](https://www.elastic.co/guide/en/cloud/current/ec-cloud-id.html). +Fluent Bit supports connecting to +[Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html) +by providing the `cloud_id` and the `cloud_auth` settings. `cloud_auth` uses the +`elastic` user and password provided when the cluster was created. For details refer +to the +[Cloud ID usage page](https://www.elastic.co/guide/en/cloud/current/ec-cloud-id.html). Example configuration: -```text +```text copy [OUTPUT] Name es Include_Tag_Key true @@ -225,35 +197,99 @@ Example configuration: cloud_auth elastic:2vxxxxxxxxYV ``` -### Validation Failed: 1: an id must be provided if version type or value are set +In Elastic Cloud version 8 and great, the type option must be removed by setting +`Suppress_Type_Name On`. + +Without this you will see errors like: + +```text +{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Action/metadata line [1] contains an unknown parameter [_type]"}],"type":"illegal_argument_exception","reason":"Action/metadata line [1] contains an unknown parameter [_type]"},"status":400} +``` -Since v1.8.2, Fluent Bit started using `create` method (instead of `index`) for data submission. -This makes Fluent Bit compatible with Datastream introduced in Elasticsearch 7.9. +## Troubleshooting -If you see `action_request_validation_exception` errors on your pipeline with Fluent Bit >= v1.8.2, you can fix it up by turning on `Generate_ID` as follows: +Use the following information to help resolve errors using the ElasticSearch plugin. + +### Using multiple types in a single index + +Elasticsearch 6.0 can't create multiple types in a single index. An error message +like the following indicates you need to update your configuration to use a single +type on each index. + +```text +Rejecting mapping update to [products] as the final mapping would have more than 1 type: +``` + +This means that you can't set up your configuration like the following:. ```text [OUTPUT] - Name es - Match * - Host 192.168.12.1 - Generate_ID on + Name es + Match foo.* + Index search + Type type1 + +[OUTPUT] + Name es + Match bar.* + Index search + Type type2 ``` -### Action/metadata contains an unknown parameter type +For details, read [the official blog post on that issue](https://www.elastic.co/guide/en/elasticsearch/reference/6.7/removal-of-types.html). + +### Mapping type names can't start with underscores (`_`) -Elastic Cloud is now on version 8 so the type option must be removed by setting `Suppress_Type_Name On` as indicated above. +Fluent Bit v1.5 changed the default mapping type from `flb_type` to `_doc`, matching +the recommendation from Elasticsearch for version 6.2 and greater +([see commit with +rationale](https://github.com/fluent/fluent-bit/commit/04ed3d8104ca8a2f491453777ae6e38e5377817e#diff-c9ae115d3acaceac5efb949edbb21196)). -Without this you will see errors like: +This doesn't work in Elasticsearch versions 5.6 through 6.1 +([discussion and fix](https://discuss.elastic.co/t/cant-use-doc-as-type-despite-it-being-declared-the-preferred-method/113837/9)). + +Ensure you set an explicit map such as `doc` or `flb_type` in the configuration, +as seen on the last line: + +```text copy +[OUTPUT] + Name es + Match * + Host vpc-test-domain-ke7thhzoo7jawsrhmm6mb7ite7y.us-west-2.es.amazonaws.com + Port 443 + Index my_index + AWS_Auth On + AWS_Region us-west-2 + tls On + Type doc +``` + +### Validation failures + +In Fluent Bit v1.8.2 and greater, Fluent Bit started using `create` method (instead +of `index`) for data submission. This makes Fluent Bit compatible with Datastream, +introduced in Elasticsearch 7.9. You might see errors like: ```text -{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Action/metadata line [1] contains an unknown parameter [_type]"}],"type":"illegal_argument_exception","reason":"Action/metadata line [1] contains an unknown parameter [_type]"},"status":400} +Validation Failed: 1: an id must be provided if version type or value are set +``` + +If you see `action_request_validation_exception` errors on your pipeline with +Fluent Bit versions greater than v1.8.2, correct them by turning on `Generate_ID` +as follows: + +```text copy +[OUTPUT] + Name es + Match * + Host 192.168.12.1 + Generate_ID on ``` -### Logstash_Prefix_Key +### `Logstash_Prefix_Key` The following snippet demonstrates using the namespace name as extracted by the -`kubernetes` filter as logstash prefix: +`kubernetes` filter as `logstash` prefix: ```text [OUTPUT] @@ -265,4 +301,5 @@ The following snippet demonstrates using the namespace name as extracted by the # ... ``` -For records that do nor have the field `kubernetes.namespace_name`, the default prefix, `logstash` will be used. +For records that don't have the field `kubernetes.namespace_name`, the default prefix +`logstash` will be used. diff --git a/vale-styles/FluentBit/Spelling-exceptions.txt b/vale-styles/FluentBit/Spelling-exceptions.txt index 075e4284b..fa3157ce2 100644 --- a/vale-styles/FluentBit/Spelling-exceptions.txt +++ b/vale-styles/FluentBit/Spelling-exceptions.txt @@ -25,6 +25,7 @@ Datadog Datagen datapoint datapoints +Datastream declaratively deduplicate Deployer From 782512e7471fd97bff84a246dc83c411a7206886 Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Tue, 22 Oct 2024 14:52:55 -0400 Subject: [PATCH 3/4] Adding none to default column, plus other minor changes Signed-off-by: Lynette Miles (lynetet.miles@chronosphere.io) --- pipeline/outputs/elasticsearch.md | 32 +++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/pipeline/outputs/elasticsearch.md b/pipeline/outputs/elasticsearch.md index 3c8e1919c..6747869e4 100644 --- a/pipeline/outputs/elasticsearch.md +++ b/pipeline/outputs/elasticsearch.md @@ -13,28 +13,28 @@ operational Elasticsearch service running in your environment. | Key | Description | Default | | :--- | :--- | :--- | | `Host` | IP address or hostname of the target Elasticsearch instance | `127.0.0.1` | -| `Port` | TCP port of the target Elasticsearch instance | 9200 | +| `Port` | TCP port of the target Elasticsearch instance | `9200` | | `Path` | Elasticsearch accepts new data on HTTP query path `/_bulk`. You can also serve Elasticsearch behind a reverse proxy on a sub-path. Define the path by adding a path prefix in the indexing HTTP POST URI. | Empty string | | `compress` | Set payload compression mechanism. Option available is `gzip`. | | | `Buffer_Size` | Specify the buffer size used to read the response from the Elasticsearch HTTP service. Use for debugging purposes where required to read full responses. Response size grows depending of the number of records inserted. To use an unlimited amount of memory, set this value to `False`. Otherwise set the value according to the [Unit Size](../../administration/configuring-fluent-bit/unit-sizes.md). | `512KB` | -| `Pipeline` | Define which pipeline the database should use. For performance reasons, it's strongly suggested to do parsing and filtering on Fluent Bit side, and avoid pipelines. | | +| `Pipeline` | Define which pipeline the database should use. For performance reasons, it's strongly suggested to do parsing and filtering on Fluent Bit side, and avoid pipelines. | _none_ | | `AWS_Auth` | Enable AWS Sigv4 Authentication for Amazon OpenSearch Service. | `Off` | -| `AWS_Region` | Specify the AWS region for Amazon OpenSearch Service. | | -| `AWS_STS_Endpoint` | Specify the custom STS endpoint to be used with STS API for Amazon OpenSearch Service | | -| `AWS_Role_ARN` | AWS IAM Role to assume to put records to your Amazon cluster | | -| `AWS_External_ID` | External ID for the AWS IAM Role specified with `aws_role_arn` | | +| `AWS_Region` | Specify the AWS region for Amazon OpenSearch Service. | _none_ | +| `AWS_STS_Endpoint` | Specify the custom STS endpoint to be used with STS API for Amazon OpenSearch Service | _none_ | +| `AWS_Role_ARN` | AWS IAM Role to assume to put records to your Amazon cluster | _none_ | +| `AWS_External_ID` | External ID for the AWS IAM Role specified with `aws_role_arn` | _none_ | | `AWS_Service_Name` | Service name to use in AWS Sigv4 signature. For integration with Amazon OpenSearch Serverless, set to `aoss`. See the [FAQ](opensearch.md#faq) section on Amazon OpenSearch Serverless for more information. | `es` | -| `AWS_Profile` | AWS profile name | default | -| `Cloud_ID` | If using Elastic's Elasticsearch Service you can specify the `cloud_id` of the cluster running. The string has the format `:`. Once decoded, the `base64_info` string has the format `$$`. | | -| `Cloud_Auth` | Specify the credentials to use to connect to Elastic's Elasticsearch Service running on Elastic Cloud | | -| `HTTP_User` | Optional username credential for Elastic X-Pack access | | -| `HTTP_Passwd` | Password for user defined in `HTTP_User` | | +| `AWS_Profile` | AWS profile name | `default` | +| `Cloud_ID` | If using Elastic's Elasticsearch Service you can specify the `cloud_id` of the cluster running. The string has the format `:`. Once decoded, the `base64_info` string has the format `$$`. | _none_ | +| `Cloud_Auth` | Specify the credentials to use to connect to Elastic's Elasticsearch Service running on Elastic Cloud | _none_ | +| `HTTP_User` | Optional username credential for Elastic X-Pack access | _none_ | +| `HTTP_Passwd` | Password for user defined in `HTTP_User` | _none_ | | `Index` | Index name | `fluent-bit` | | `Type` | Type name | `_doc` | | `Logstash_Format` | Enable Logstash format compatibility. This option takes a Boolean value: `True/False`, `On/Off` | `Off` | | `Logstash_Prefix` | When `Logstash_Format` is enabled, the Index name is composed using a prefix and the date, e.g: If `Logstash_Prefix` is equal to `mydata` your index will become `mydata-YYYY.MM.DD`. The last string appended belongs to the date when the data is being generated. | `logstash` | -| `Logstash_Prefix_Key` | When included: the value of the key in the record will be evaluated as key reference and overrides `Logstash_Prefix` for index generation. If the key/value isn't found in the record then the `Logstash_Prefix` option will act as a fallback. The parameter is expected to be a [record accessor](../../administration/configuring-fluent-bit/classic-mode/record-accessor.md). | | -| `Logstash_Prefix_Separator` | Set a separator between `Logstash_Prefix` and date.| - | +| `Logstash_Prefix_Key` | When included: the value of the key in the record will be evaluated as key reference and overrides `Logstash_Prefix` for index generation. If the key/value isn't found in the record then the `Logstash_Prefix` option will act as a fallback. The parameter is expected to be a [record accessor](../../administration/configuring-fluent-bit/classic-mode/record-accessor.md). | _none_ | +| `Logstash_Prefix_Separator` | Set a separator between `Logstash_Prefix` and date.| `-` | | `Logstash_DateFormat` | Time format based on [strftime](http://man7.org/linux/man-pages/man3/strftime.3.html) to generate the second part of the Index name. | `%Y.%m.%d` | | `Time_Key` | When `Logstash_Format` is enabled, each record will get a new timestamp field. The `Time_Key` property defines the name of that field. | `@timestamp` | | `Time_Key_Format` | When `Logstash_Format` is enabled, this property defines the format of the timestamp. | `%Y-%m-%dT%H:%M:%S` | @@ -42,7 +42,7 @@ operational Elasticsearch service running in your environment. | `Include_Tag_Key` | When enabled, it append the Tag name to the record. | `Off` | | `Tag_Key` | When `Include_Tag_Key` is enabled, this property defines the key name for the tag. | `_flb-key` | | `Generate_ID` | When enabled, generate `_id` for outgoing records. This prevents duplicate records when retrying ES. | `Off` | -| `Id_Key` | If set, `_id` will be the value of the key from incoming record and `Generate_ID` option is ignored. | | +| `Id_Key` | If set, `_id` will be the value of the key from incoming record and `Generate_ID` option is ignored. | _none_ | | `Write_Operation` | `Write_operation` can be any of: `create`, `index`, `update`, `upsert`. | `create` | | `Replace_Dots` | When enabled, replace field name dots with underscore. Required by Elasticsearch 2.0-2.3. | `Off` | | `Trace_Output` | Print all ElasticSearch API request payloads to `stdout` for diagnostics. | `Off` | @@ -86,7 +86,7 @@ command line or through the configuration file: The **es** plugin can read the parameters from the command line in two ways: -- Through the `-p` argument (property) +- Through the `-p` argument (property). - Setting them directly through the service URI. The URI format is the following: @@ -102,7 +102,7 @@ fluent-bit -i cpu -t cpu -o es://192.168.2.3:9200/my_index/my_type \ -o stdout -m '*' ``` -which is similar to do: +Which is similar to the following command: ```shell copy fluent-bit -i cpu -t cpu -o es -p Host=192.168.2.3 -p Port=9200 \ From eeebac3074c14e69641adf632c8feeb5e6e8c104 Mon Sep 17 00:00:00 2001 From: Lynette Miles Date: Tue, 22 Oct 2024 13:44:23 -0700 Subject: [PATCH 4/4] fluent: docs: fixing some errors that got missed Signed-off-by: Lynette Miles --- pipeline/outputs/elasticsearch.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/pipeline/outputs/elasticsearch.md b/pipeline/outputs/elasticsearch.md index 6747869e4..1bcb59529 100644 --- a/pipeline/outputs/elasticsearch.md +++ b/pipeline/outputs/elasticsearch.md @@ -15,7 +15,7 @@ operational Elasticsearch service running in your environment. | `Host` | IP address or hostname of the target Elasticsearch instance | `127.0.0.1` | | `Port` | TCP port of the target Elasticsearch instance | `9200` | | `Path` | Elasticsearch accepts new data on HTTP query path `/_bulk`. You can also serve Elasticsearch behind a reverse proxy on a sub-path. Define the path by adding a path prefix in the indexing HTTP POST URI. | Empty string | -| `compress` | Set payload compression mechanism. Option available is `gzip`. | | +| `compress` | Set payload compression mechanism. Option available is `gzip`. | _none_ | | `Buffer_Size` | Specify the buffer size used to read the response from the Elasticsearch HTTP service. Use for debugging purposes where required to read full responses. Response size grows depending of the number of records inserted. To use an unlimited amount of memory, set this value to `False`. Otherwise set the value according to the [Unit Size](../../administration/configuring-fluent-bit/unit-sizes.md). | `512KB` | | `Pipeline` | Define which pipeline the database should use. For performance reasons, it's strongly suggested to do parsing and filtering on Fluent Bit side, and avoid pipelines. | _none_ | | `AWS_Auth` | Enable AWS Sigv4 Authentication for Amazon OpenSearch Service. | `Off` | @@ -23,7 +23,7 @@ operational Elasticsearch service running in your environment. | `AWS_STS_Endpoint` | Specify the custom STS endpoint to be used with STS API for Amazon OpenSearch Service | _none_ | | `AWS_Role_ARN` | AWS IAM Role to assume to put records to your Amazon cluster | _none_ | | `AWS_External_ID` | External ID for the AWS IAM Role specified with `aws_role_arn` | _none_ | -| `AWS_Service_Name` | Service name to use in AWS Sigv4 signature. For integration with Amazon OpenSearch Serverless, set to `aoss`. See the [FAQ](opensearch.md#faq) section on Amazon OpenSearch Serverless for more information. | `es` | +| `AWS_Service_Name` | Service name to use in AWS Sigv4 signature. For integration with Amazon OpenSearch Serverless, set to `aoss`. See [Amazon OpenSearch Serverless](opensearch.md) for more information. | `es` | | `AWS_Profile` | AWS profile name | `default` | | `Cloud_ID` | If using Elastic's Elasticsearch Service you can specify the `cloud_id` of the cluster running. The string has the format `:`. Once decoded, the `base64_info` string has the format `$$`. | _none_ | | `Cloud_Auth` | Specify the credentials to use to connect to Elastic's Elasticsearch Service running on Elastic Cloud | _none_ | @@ -52,8 +52,7 @@ operational Elasticsearch service running in your environment. | `Workers` | The number of [workers](../../administration/multithreading.md#outputs) to perform flush operations for this output. | `2` | If you have used a common relational database, the parameters `index` and `type` can -be compared to the `database` and `table` concepts. Also see [the -FAQ](elasticsearch.md#faq) +be compared to the `database` and `table` concepts. ### TLS / SSL @@ -111,7 +110,7 @@ fluent-bit -i cpu -t cpu -o es -p Host=192.168.2.3 -p Port=9200 \ ### Configuration File -In your main configuration file append the following `Input` and `Output` sections. You can visualize this configuration [here](https://link.calyptia.com/qhq) +In your main configuration file append the following `Input` and `Output` sections. ```python [INPUT]