Skip to content

Commit d9de41a

Browse files
esmerelcnorris-cs
andauthored
Apply suggestions from code review
Co-authored-by: Craig Norris <[email protected]> Signed-off-by: esmerel <[email protected]>
1 parent 69943d9 commit d9de41a

File tree

1 file changed

+31
-30
lines changed

1 file changed

+31
-30
lines changed

pipeline/outputs/s3.md

Lines changed: 31 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ The plugin can upload data to S3 using the
1515
or [`PutObject`](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html).
1616
Multipart is the default and is recommended. Fluent Bit will stream data in a series
1717
of _parts_. This limits the amount of data buffered on disk at any point in time.
18-
By default, every time 5 MiB of data have been received, a new part will be uploaded.
18+
By default, every time 5&nbsp;MiB of data have been received, a new part will be uploaded.
1919
The plugin can create files up to gigabytes in size from many small chunks or parts
2020
using the multipart API. All aspects of the upload process are configurable.
2121

@@ -31,7 +31,7 @@ for details about fetching AWS credentials.
3131

3232
{% hint style="info" %}
3333
The [Prometheus success/retry/error metrics values](administration/monitoring.md)
34-
output by Fluent Bit's built-in http server are meaningless for S3 output. S3 has
34+
output by the built-in http server in Fluent Bit are meaningless for S3 output. S3 has
3535
its own buffering and retry mechanisms. The Fluent Bit AWS S3 maintainers apologize
3636
for this feature gap; you can [track our progress fixing it on GitHub](https://github.com/fluent/fluent-bit/issues/6141).
3737
{% endhint %}
@@ -43,7 +43,7 @@ for this feature gap; you can [track our progress fixing it on GitHub](https://g
4343
| `region` | The AWS region of your S3 bucket. | `us-east-1` |
4444
| `bucket` | S3 Bucket name | _none_ |
4545
| `json_date_key` | Specify the time key name in the output record. To disable the time key, set the value to `false`. | `date` |
46-
| `json_date_format` | Specify the format of the date. Supported formats are `double`, `epoch`, `iso8601` ( 2018-05-30T09:39:52.000681Z) and `_java_sql_timestamp_` (2018-05-30 09:39:52.000681) | `iso8601` |
46+
| `json_date_format` | Specify the format of the date. Accepted values: `double`, `epoch`, `iso8601` (2018-05-30T09:39:52.000681Z), `_java_sql_timestamp_` (2018-05-30 09:39:52.000681). | `iso8601` |
4747
| `total_file_size` | Specify file size in S3. Minimum size is `1M`. With `use_put_object On` the maximum size is `1G`. With multipart uploads, the maximum size is `50G`. | `100M` |
4848
| `upload_chunk_size` | The size of each part for multipart uploads. Max: 50M | 5,242,880 bytes |
4949
| `upload_timeout` | When this amount of time elapses, Fluent Bit uploads and creates a new file in S3. Set to `60m` to upload a new file every hour. | `10m`|
@@ -55,7 +55,7 @@ for this feature gap; you can [track our progress fixing it on GitHub](https://g
5555
| `use_put_object` | Use the S3 `PutObject` API instead of the multipart upload API. When enabled, the key extension is only available when `$UUID` is specified in `s3_key_format`. If `$UUID` isn't included, a random string appends format string and the key extension can't be customized. | `false` |
5656
| `role_arn` | ARN of an IAM role to assume (for example, for cross account access.) | _none_ |
5757
| `endpoint` | Custom endpoint for the S3 API. Endpoints can contain scheme and port. | _none_ |
58-
| `sts_endpoint` | Custom endpoint for the STS API. | _none_ |
58+
| `sts_endpoint` | Custom endpoint for the STS API. | _none_ |
5959
| `profile` | Option to specify an AWS Profile for credentials. | `default` |
6060
| `canned_acl` | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects. | _none_ |
6161
| `compression` | Compression type for S3 objects. `gzip` is currently the only supported value by default. If Apache Arrow support was enabled at compile time, you can use `arrow`. For gzip compression, the Content-Encoding HTTP Header will be set to `gzip`. Gzip compression can be enabled when `use_put_object` is `on` or `off` (`PutObject` and Multipart). Arrow compression can only be enabled with `use_put_object On`. | _none_ |
@@ -65,15 +65,15 @@ for this feature gap; you can [track our progress fixing it on GitHub](https://g
6565
| `log_key` | By default, the whole log record will be sent to S3. When specifing a key name with this option, only the value of that key sends to S3. For example, when using Docker you can specify `log_key log` and only the log message sends to S3. | _none_ |
6666
| `preserve_data_ordering` | When an upload request fails, the last received chunk might swap with a later chunk, resulting in data shuffling. This feature prevents shuffling by using a queue logic for uploads. | `true` |
6767
| `storage_class` | Specify the [storage class](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html#AmazonS3-PutObject-request-header-StorageClass) for S3 objects. If this option isn't specified, objects store with the default `STANDARD` storage class. | _none_ |
68-
| `retry_limit` | Integer value to set the maximum number of retries allowed. Requires versions 1.9.10 and 2.0.1 or higher. For previous version, the number of retries is 5 and isn;t configurable. | `1` |
68+
| `retry_limit` | Integer value to set the maximum number of retries allowed. Requires versions 1.9.10 and 2.0.1 or later. For previous version, the number of retries is `5` and isn't configurable. | `1` |
6969
| `external_id` | Specify an external ID for the STS API. Can be used with the `role_arn` parameter if your role requires an external ID. | _none_ |
7070
| `workers` | The number of [workers](../../administration/multithreading.md#outputs) to perform flush operations for this output. | `1` |
7171

7272
## TLS / SSL
7373

7474
To skip TLS verification, set `tls.verify` as `false`. For more details about the
75-
properties available and general configuration, refer to
76-
[TLS/SSL](../../administration/transport-security.md).
75+
properties available and general configuration, refer to
76+
[TLS/SSL](../../administration/transport-security.md).
7777

7878
## Permissions
7979

@@ -98,8 +98,8 @@ The S3 output plugin is used to upload large files to an Amazon S3 bucket, while
9898
most other outputs which send many requests to upload data in batches of a few
9999
megabytes or less.
100100

101-
When Fluent Bit recieves logs it stores them in chunks, either in memory or the
102-
filesystem depending on your settings. Chunks are usually around 2 MB in size.
101+
When Fluent Bit receives logs, it stores them in chunks, either in memory or the
102+
filesystem depending on your settings. Chunks are usually around 2&nbsp;MB in size.
103103
Fluent Bit sends chunks, in order, to each output that matches their tag. Most outputs
104104
then send the chunk immediately to their destination. A chunk is sent to the output's
105105
`flush` callback function, which must return one of `FLB_OK`, `FLB_RETRY`, or
@@ -108,7 +108,7 @@ then send the chunk immediately to their destination. A chunk is sent to the out
108108
and success metrics available in Prometheus format through its monitoring interface.
109109

110110
The S3 output plugin conforms to the Fluent Bit output plugin specification.
111-
Since S3's use case is to upload large files (over 2MB), its behavior is different.
111+
Since S3's use case is to upload large files (over 2&nbsp;MB), its behavior is different.
112112
S3's `flush` callback function buffers the incoming chunk to the filesystem, and
113113
returns an `FLB_OK`. This means Prometheus metrics available from the Fluent
114114
Bit HTTP server are meaningless for S3. In addition, the `storage.total_limit_size`
@@ -132,7 +132,8 @@ uploaded in the original order it was collected by Fluent Bit.
132132
[opened an issue with a design](https://github.com/fluent/fluent-bit/issues/6141)
133133
to allow S3 to manage its own output metrics.
134134
- You must use `store_dir_limit_size` to limit the space on disk used by S3 buffer files.
135-
- The original ordering of data inputted to Fluent Bit may not be preserved unless you enable `preserve_data_ordering On`.
135+
- The original ordering of data inputted to Fluent Bit may not be preserved unless you enable
136+
`preserve_data_ordering On`.
136137

137138
## S3 Key Format and Tag Delimiters
138139

@@ -158,7 +159,7 @@ associated with the logs in question is `my_app_name-logs.prod`.
158159
s3_key_format_tag_delimiters .-
159160
```
160161

161-
With the delimiters as `.` and `-,` the tag splits into parts as follows:
162+
With the delimiters as `.` and `-`, the tag splits into parts as follows:
162163

163164
- `$TAG[0]` = `my_app_name`
164165
- `$TAG[1]` = `logs`
@@ -169,9 +170,9 @@ The key in S3 will be `/prod/my_app_name/2020/01/01/00/00/00/bgdHN1NM.gz`.
169170
### Allowing a file extension in the S3 Key Format with $UUID
170171

171172
The Fluent Bit S3 output was designed to ensure that previous uploads will never be
172-
over-written by a subsequent upload. The `s3_key_format` supports time formatters,
173-
`$UUID`, and `$INDEX`. `$INDEX` is special because it is saved in the `store_dir`; if
174-
you restart Fluent Bit with the same disk, then it can continue incrementing the
173+
overwritten by a subsequent upload. The `s3_key_format` supports time formatters,
174+
`$UUID`, and `$INDEX`. `$INDEX` is special because it is saved in the `store_dir`. If
175+
you restart Fluent Bit with the same disk, it can continue incrementing the
175176
index from its last value in the previous run.
176177

177178
For files uploaded with the `PutObject` API, the S3 output requires that a unique
@@ -182,20 +183,20 @@ specify minute granularity timestamps in the S3 key, with a small upload size, i
182183
possible to have two uploads that have timestamps set in the same minute. This
183184
requirement can be disabled with `static_file_path On`.
184185

185-
There are three cases where the `PutObject` API is used:
186+
The `PutObject` API is used in these cases:
186187

187-
1. When you explicitly set `use_put_object On`.
188-
1. On startup when the S3 output finds old buffer files in the `store_dir` from
189-
a previous run and attempts to send all of them at once.
190-
1. On shutdown. To prevent data loss the S3 output attempts to send all currently
191-
buffered data at once.
188+
- When you explicitly set `use_put_object On`.
189+
- On startup when the S3 output finds old buffer files in the `store_dir` from
190+
a previous run and attempts to send all of them at once.
191+
- On shutdown. To prevent data loss the S3 output attempts to send all currently
192+
buffered data at once.
192193

193194
You should always specify `$UUID` somewhere in your S3 key format. Otherwise, if the
194-
`PutObject` API is used, S3 appends a random 8 character UUID to the end of your
195+
`PutObject` API is used, S3 appends a random eight-character UUID to the end of your
195196
S3 key. This means that a file extension set at the end of an S3 key will have the
196197
random UUID appended to it. Disabled this with `static_file_path On`.
197198

198-
For example, we attempt to set a `.gz` extension without specifying `$UUID`.
199+
For example, we attempt to set a `.gz` extension without specifying `$UUID`:
199200

200201
```python
201202
[OUTPUT]
@@ -219,7 +220,7 @@ key in the S3 bucket might be:
219220
The S3 output appended a random string to the file extension, since this upload
220221
on shutdown used the `PutObject` API.
221222

222-
There are two ways of disabling this behavior.
223+
There are two ways of disabling this behavior:
223224

224225
- Use `static_file_path`:
225226

@@ -258,9 +259,9 @@ shuts down. If it can not send some data, on restart it will look in the `store_
258259
for existing data and try to send it.
259260

260261
Multipart uploads are ideal for most use cases because they allow the plugin to
261-
upload data in small chunks over time. For example, 1 GB file can be created from 200
262-
5MB chunks. While the file size in S3 will be 1 GB, only 5 MB will be buffered on
263-
disk at any one point in time.
262+
upload data in small chunks over time. For example, 1&nbsp;GB file can be created
263+
from 200 5&nbsp;MB chunks. While the file size in S3 will be 1&nbsp;GB, only
264+
5&nbsp;MB will be buffered on disk at any one point in time.
264265

265266
One drawback to multipart uploads is that the file and data aren't visible in S3
266267
until the upload is completed with a
@@ -338,7 +339,7 @@ fallback to use the [`PutObject` API](https://docs.aws.amazon.com/AmazonS3/lates
338339

339340
When you enable compression, S3 applies the compression algorithm at send time. The
340341
size settings trigger uploads based on the size of buffered data, not the
341-
final compressed size. It is possible that after compression, buffered data no longer
342+
final compressed size. It's possible that after compression, buffered data no longer
342343
meets the required minimum S3
343344
[UploadPart](https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html)
344345
size. If this occurs, you will see a log message like:
@@ -350,7 +351,7 @@ compression, chunk is only 1063320 bytes, the chunk was too small, using PutObje
350351

351352
If you encounter this frequently, use the numbers in the messages to guess your
352353
compression factor. In this example, the buffered data was reduced from
353-
5,630,650 bytes to 1,063,320 bytes. The compressed size is 1/5 the actual data size.
354+
5,630,650 bytes to 1,063,320 bytes. The compressed size is one-fifth the actual data size.
354355
Configuring `upload_chunk_size 30M` should ensure each part is large enough after
355356
compression to be over the minimum required part size of 5,242,880 bytes.
356357

@@ -513,7 +514,7 @@ cmake -DFLB_ARROW=On ..
513514
cmake --build .
514515
```
515516

516-
Once compiled, Fluent Bit can upload incoming data to S3 in Apache Arrow format.
517+
After being compiled, Fluent Bit can upload incoming data to S3 in Apache Arrow format.
517518

518519
For example:
519520

0 commit comments

Comments
 (0)