Extends the fluent-plugin-s3 compression algorithm to enable red-arrow compression.
- Apache Arrow GLib and Apache Parquet GLib
- See Apache Arrow install document for details.
- red-arrow
- red-parquet
$ gem install fluent-plugin-s3-arrow
Add following line to your Gemfile:
gem "fluent-plugin-s3-arrow"
And then execute:
$ bundle
Example of fluent-plugin-s3-arrow configuration.
<match pattern>
@type s3
# fluent-plugin-s3 configurations ...
<format>
@type json # This plugin currently supports only json formatter.
</format>
store_as arrow
<arrow>
format parquet
compression gzip
schema_from static
<static>
schema [
{"name": "test_string", "type": "string"},
{"name": "test_uint64", "type": "uint64"}
]
</static>
</arrow>
</match>
This plugin supports multiple columnar formats and compressions by using red-arrow. Valid settings are below.
format | compression |
---|---|
arrow | gzip, zstd |
feather | zstd |
parquet | gzip, snappy, zstd |
Schema of columnar formats.
Set the schema statically.
schema_from static
<static>
schema [
{"name": "test_string", "type": "string"},
{"name": "test_uint64", "type": "uint64"}
]
</static>
An array containing the names and types of the fields.
Retrieve the schema from the AWS Glue Data Catalog.
schema_from glue
<glue>
catalog test_catalog
database test_db
table test_table
</glue>
The name of the data catalog for which to retrieve the definition. The default value is the same as the AWS API CatalogId.
The name of the database for which to retrieve the definition. The default value is default
.
The name of the table for which to retrieve the definition.
Apache License, Version 2.0