-
Notifications
You must be signed in to change notification settings - Fork 54
LTSV Decoder
Label Tab-separated Values is a simple and clever format to encode a set of key/value pairs.
A key/value pair is represented by: key:value
.
The split happens right after the first column (:
), so it is perfectly valid for a value to contain columns.
Key/value pairs are delimited by TAB characters. Therefore, keys can contain any characters with the exception of :
and TAB, whereas values can contain any characters with the exception of TAB and the records delimiter (usually \n
).
Keys and values must be valid UTF-8 strings.
Generating LTSV records is extremely fast and easy. Values containing TAB characters can usually just replace these with spaces.
Flowgger expects LTSV records to match some very basic constraints:
- A message MUST include a timestamp, under a key named
time
. The timestamp can be expressed in RFC 3339 format, as a Unix timestamp, or in English format. - A message MUST include a source host name, under the key
host
. - A message MAY include a description, under the key
message
. - A message MAY include a severity level, as a number (between 0 and 7, matching syslog severity levels), under the key
level
. - A message MAY include any number of additional key/value pairs.
Here is an example of valid LTSV record (\t
has to be replaced with actual TAB characters):
time:1469996508\thost:testhostname\tname1:value1\tname 2: value 2\tn3:v3
In addition to being easy to generate, LTSV is also the fastest option in Flowgger.
The LTSV decoder can be enabled in the [input]
section of Flowgger's configuration file:
[input]
format = "ltsv"
By design, and unlike JSON-based formats, values in LTSV records are not typed, and are assumed to be strings by default.
However, it may be desirable to enforce type constraints, and to retains the types when converting LTSV to typed formats such as JSON.
In order to do so, a schema can be defined for LTSV inputs, in an [input.ltsv_schema]
section of the Flowgger configuration file:
[input.ltsv_schema]
counter = "u64"
amount = "f64"
Supported types are:
string
-
bool
(boolean value) -
f64
(floating-point number) -
i64
(signed integer) -
u64
(unsigned integer)
Pay attention to the fact that some of these values may not have a representation in the target format. For example, Javascript, hence JSON (hence GELF) can only represent values up to 2^53-1 without losing precision.
Suffixes can be automatically added to keys with non-string values, as defined by the schema.
For example, Flowgger can ensure that keys with i64
and u64
values are always suffixed with _long
, that keys with f64
values are always suffixed with _double
, and that boolean values have _bool
suffix.
This can be enabled via the configuration file:
[input.ltsv_schema]
quantity = "u64"
amount = "f64"
done = "bool"
[input.ltsv_suffixes]
i64 = "_long"
u64 = "_long"
f64 = "_double"
bool = "_bool"
In the previous example, an incoming message with the following key/value pairs:
{
"quantity": 42,
"amount": 3.14,
"done": false
}
will be automatically transformed to:
{
"quantity_long": 42,
"amount_double": 3.14,
"done_bool": false
}
This can be especially useful with ElasticSearch, that expects a fixed type for a given index.
Property names will be transparently rewritten with the correct suffix for -their value type, unless they are already properly suffixed.