Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions pipeline/inputs/tail.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
| `file_cache_advise` | Set the `posix_fadvise` in `POSIX_FADV_DONTNEED` mode. This reduces the usage of the kernel file cache. This option is ignored if not running on Linux. | `on` |
| `threaded` | Indicates whether to run this input in its own [thread](../../administration/multithreading.md#inputs). | `false` |
| `Unicode.Encoding` | Set the Unicode character encoding of the file data. This parameter requests two-byte aligned chunk and buffer sizes. If data is not aligned for two bytes, Fluent Bit will use two-byte alignment automatically to avoid character breakages on consuming boundaries. Supported values: `UTF-16LE`, `UTF-16BE`, and `auto`. | `none` |
| `Generic.Encoding` | Set the non-Unicode encoding of the file data. Supported values: `ShiftJIS`, `UHC`, `GBK`, `GB18030`, `Big5`, `Win866`, `Win874`, `Win1250`, `Win1251`, `Win1252`, `Win2513`, `Win1254`, `Win1255`, and `Win1256`. | `none` |

## Buffers and memory management

Expand Down Expand Up @@ -84,6 +85,13 @@
Additionally, the `auto` setting for `Unicode.Encoding` isn't supported in all cases, and can make mistakes when it tries to guess the correct encoding. For best results, use either the `UTF-16LE` or `UTF-16BE` setting if you know the encoding type of the target file.
{% endhint %}

{% hint style="info" %}
The `Unicode.Encoding` parameter is dependent on the `simdutf` library, which is itself dependent on C++ version 11 or later. In environments that use earlier versions of C++, the `Unicode.Encoding` parameter will fail.

Additionally, the `auto` setting for `Unicode.Encoding` isn't supported in all cases, and can make mistakes when it tries to guess the correct encoding. For best results, use either the `UTF-16LE` or `UTF-16BE` setting if you know the encoding type of the target file.
{% endhint %}


## Monitor a large number of files

To monitor a large number of files, you can increase the `inotify` settings in your Linux environment by modifying the following `sysctl` parameters:
Expand Down Expand Up @@ -464,3 +472,89 @@
- Final note: the `Path` patterns can't match the rotated files. Otherwise, the rotated file would be read again and lead to duplicate records.

{% endhint %}

## Character encoding conversion

This feature allows Fluent Bit to convert logs from various character encodings into the standard UTF-8 format.
This is crucial for processing logs from systems, especially Windows, that use legacy or non-UTF-8 encodings.
Proper conversion ensures that your log data is correctly parsed, indexed, and searchable.

### When to use this feature

You should use this feature if your log files or messages aren't in UTF-8 and you are seeing garbled or incorrectly rendered characters.
This is common in environments that use:

- Modern Windows applications that log in UTF-16.

- Legacy Windows systems with applications that use traditional code pages (for example, ShiftJIS, GBK, Win1252).

Check warning on line 489 in pipeline/inputs/tail.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [FluentBit.Acronyms] Spell out 'GBK', if it's unfamiliar to the audience. Raw Output: {"message": "[FluentBit.Acronyms] Spell out 'GBK', if it's unfamiliar to the audience.", "location": {"path": "pipeline/inputs/tail.md", "range": {"start": {"line": 489, "column": 100}}}, "severity": "INFO"}

### Configuration parameters

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[markdownlint] reported by reviewdog 🐶
MD024/no-duplicate-heading Multiple headings with the same content [Context: "### Configuration parameters"]


To enable encoding conversion, you will use one of the following two parameters within an input plugin configuration.

1. `Unicode.Encoding`

Use this parameter for high-performance conversion of UTF-16 encoded logs to UTF-8. This method utilizes modern processor features (SIMD instructions) to accelerate the conversion process, making it highly efficient.

Check warning on line 497 in pipeline/inputs/tail.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [FluentBit.Acronyms] Spell out 'SIMD', if it's unfamiliar to the audience. Raw Output: {"message": "[FluentBit.Acronyms] Spell out 'SIMD', if it's unfamiliar to the audience.", "location": {"path": "pipeline/inputs/tail.md", "range": {"start": {"line": 497, "column": 136}}}, "severity": "INFO"}

- Use Case: Ideal for logs coming from modern Windows environments that default to UTF-16.
- Supported Values:
- UTF-16LE (Little-Endian)
- UTF-16BE (Big-Endian)

Check warning on line 502 in pipeline/inputs/tail.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [FluentBit.Units] Put a nonbreaking space between the number and the unit in '16B'. Raw Output: {"message": "[FluentBit.Units] Put a nonbreaking space between the number and the unit in '16B'.", "location": {"path": "pipeline/inputs/tail.md", "range": {"start": {"line": 502, "column": 12}}}, "severity": "INFO"}

1. `Generic.Encoding`

Use this parameter to convert from a wide variety of other character encodings, particularly legacy Windows code pages.

- Use Case: Essential for logs from older systems or applications configured for specific regions, common in East Asia and Eastern Europe.
- Supported Values: You can use any of the names or aliases listed below.

Check warning on line 509 in pipeline/inputs/tail.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [FluentBit.Directional] Verify your use of 'below' with the Style Guide. Raw Output: {"message": "[FluentBit.Directional] Verify your use of 'below' with the Style Guide.", "location": {"path": "pipeline/inputs/tail.md", "range": {"start": {"line": 509, "column": 71}}}, "severity": "INFO"}

### East Asian Encodings

Check warning on line 511 in pipeline/inputs/tail.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [FluentBit.Headings] 'East Asian Encodings' should use sentence-style capitalization. Raw Output: {"message": "[FluentBit.Headings] 'East Asian Encodings' should use sentence-style capitalization.", "location": {"path": "pipeline/inputs/tail.md", "range": {"start": {"line": 511, "column": 5}}}, "severity": "INFO"}

- `ShiftJIS` (Aliases: `SJIS`, `CP932`, `Windows-31J`)
- `GB18030`
- `GBK`: (Alias: `CP936`)
- `UHC` (Unified Hangul Code): (Aliases: `CP949` and `Windows-949`)
- `Big5`: (Alias: `CP950`)

### Windows (ANSI) encodings

- `Win1250` (Central European): (Alias: `CP1250`)
- `Win1251` (Cyrillic): (Alias: `CP1251`)
- `Win1252` (Western European / Latin): (Alias: `CP1252`)
- `Win1253` (Greek): (Alias: `CP1253`)
- `Win1254` (Turkish): (Alias: `CP1254`)
- `Win1255` (Hebrew): (Alias: `CP1255`)
- `Win1256` (Arabic): (Alias: `CP1256`)

### DOS (OEM) encodings

Check warning on line 529 in pipeline/inputs/tail.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [FluentBit.Acronyms] Spell out 'OEM', if it's unfamiliar to the audience. Raw Output: {"message": "[FluentBit.Acronyms] Spell out 'OEM', if it's unfamiliar to the audience.", "location": {"path": "pipeline/inputs/tail.md", "range": {"start": {"line": 529, "column": 10}}}, "severity": "INFO"}

Check warning on line 529 in pipeline/inputs/tail.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [FluentBit.Headings] 'DOS (OEM) encodings' should use sentence-style capitalization. Raw Output: {"message": "[FluentBit.Headings] 'DOS (OEM) encodings' should use sentence-style capitalization.", "location": {"path": "pipeline/inputs/tail.md", "range": {"start": {"line": 529, "column": 5}}}, "severity": "INFO"}

Check warning on line 529 in pipeline/inputs/tail.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [FluentBit.Acronyms] Spell out 'DOS', if it's unfamiliar to the audience. Raw Output: {"message": "[FluentBit.Acronyms] Spell out 'DOS', if it's unfamiliar to the audience.", "location": {"path": "pipeline/inputs/tail.md", "range": {"start": {"line": 529, "column": 5}}}, "severity": "INFO"}

- `Win866` (Cyrillic - DOS): (Alias: `CP866`)

Check warning on line 531 in pipeline/inputs/tail.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [FluentBit.Acronyms] Spell out 'DOS', if it's unfamiliar to the audience. Raw Output: {"message": "[FluentBit.Acronyms] Spell out 'DOS', if it's unfamiliar to the audience.", "location": {"path": "pipeline/inputs/tail.md", "range": {"start": {"line": 531, "column": 24}}}, "severity": "INFO"}
- `Win874` (Thai): (Alias: `CP874`)

### Configuration example

Here is an example of how to use `Generic.Encoding` with the Tail input plugin to read a log file encoded in ShiftJIS.

{% tabs %}
{% tab title="fluent-bit.yaml" %}

```yaml
pipeline:
inputs:
- name: tail
path: /var/log/containers/*.log
generic.encoding: ShiftJIS
```
{% endtab %}
{% tab title="fluent-bit.conf" %}
```text
[INPUT]
Name tail
Path C:\path\to\your\sjis.log
Generic.Encoding ShiftJIS
```

{% endtab %}
{% endtabs %}