Skip to content

FUSAKLA/fluent-plugin-multiline-parser

Repository files navigation

fluent-plugin-multiline-parser

Component

ParserOutput

This is a Fluentd plugin to parse strings in log messages and re-emit them. This parser also supports multiline format.

ParserOutput

ParserOutput has just same with 'in_tail' about 'format' and 'time_format':

<match raw.apache.common.*>
  @type parser
  remove_prefix raw
  format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)$/
  time_format %d/%b/%Y:%H:%M:%S %z
  key_name message
</match>

Of course, you can use predefined format 'apache' and 'syslog':

<match raw.apache.combined.*>
  @type parser
  remove_prefix raw
  format apache
  key_name message
</match>

If you want to parse multiline log:

<filter raw.java.*>
  @type parser
  format multiline
  format_firstline /\d{4}-\d{1,2}-\d{1,2}/
  format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)/
  key_name message
</filter>

fluent-plugin-multiline-parser uses parser plugins of Fluentd (and your own customized parser plugin). See document page for more details: http://docs.fluentd.org/articles/parser-plugin-overview

If you want original attribute-data pair in re-emitted message, specify 'reserve_data':

<match raw.apache.*>
  @type parser
  tag apache
  format apache
  key_name message
  reserve_data yes
</match>

If you want to suppress 'pattern not match' log, specify 'suppress_parse_error_log true' to configuration. default value is false.

<match in.hogelog>
  @type parser
  tag hogelog
  format /^col1=(?<col1>.+) col2=(?<col2>.+)$/
  key_name message
  suppress_parse_error_log true
</match>

To store parsed values with specified key name prefix, use inject_key_prefix option:

<match raw.sales.*>
  @type parser
  tag sales
  format json
  key_name sales
  reserve_data      yes
  inject_key_prefix sales.
</match>
# input string of 'sales': {"user":1,"num":2}
# output data: {"sales":"{\"user\":1,\"num\":2}","sales.user":1, "sales.num":2}

To store parsed values as a hash value in a field, use hash_value_field option:

<match raw.sales.*>
  @type parser
  tag sales
  format json
  key_name sales
  hash_value_field parsed
</match>
# input string of 'sales': {"user":1,"num":2}
# output data: {"parsed":{"user":1, "num":2}}

Other options (ex: reserve_data, inject_key_prefix) are available with hash_value_field.

# output data: {"sales":"{\"user\":1,\"num\":2}", "parsed":{"sales.user":1, "sales.num":2}}

Not to parse times (reserve that field like 'time' in record), specify time_parse no:

<match raw.sales.*>
  type parser
  tag sales
  format json
  key_name sales
  hash_value_field parsed
  time_parse no
</match>
# input string of 'sales': {"user":1,"num":2,"time":"2013-10-31 12:48:33"}
# output data: {"parsed":{"user":1, "num":2,"time":"2013-10-31 12:48:33"}}

ParserFilter

This is the filter version of ParserOutput.

Note that this filter version of parser plugin does not have modifing tag functionality.

ParserFilter has just same with 'in_tail' about 'format' and 'time_format':

<filter raw.apache.common.*>
  @type parser
  format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)$/
  time_format %d/%b/%Y:%H:%M:%S %z
  key_name message
</filter>

Of course, you can use predefined format 'apache' and 'syslog':

<filter raw.apache.combined.*>
  @type parser
  format apache
  key_name message
</filter>

fluent-plugin-multiline-parser uses parser plugins of Fluentd (and your own customized parser plugin). See document page for more details: http://docs.fluentd.org/articles/parser-plugin-overview

If you want to parse multiline log:

<filter raw.java.*>
  @type parser
  format multiline
  format_firstline /\d{4}-\d{1,2}-\d{1,2}/
  format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)/
  key_name message
</filter>

If you want original attribute-data pair in re-emitted message, specify 'reserve_data':

<filter raw.apache.*>
  @type parser
  format apache
  key_name message
  reserve_data yes
</filter>

If you want to suppress 'pattern not match' log, specify 'suppress_parse_error_log true' to configuration. default value is false.

<filter in.hogelog>
  @type parser
  format /^col1=(?<col1>.+) col2=(?<col2>.+)$/
  key_name message
  suppress_parse_error_log true
</filter>

To store parsed values with specified key name prefix, use inject_key_prefix option:

<filter raw.sales.*>
  @type parser
  format json
  key_name sales
  reserve_data      yes
  inject_key_prefix sales.
</filter>
# input string of 'sales': {"user":1,"num":2}
# output data: {"sales":"{\"user\":1,\"num\":2}","sales.user":1, "sales.num":2}

To store parsed values as a hash value in a field, use hash_value_field option:

<filter raw.sales.*>
  @type parser
  tag sales
  format json
  key_name sales
  hash_value_field parsed
</filter>
# input string of 'sales': {"user":1,"num":2}
# output data: {"parsed":{"user":1, "num":2}}

Other options (ex: reserve_data, inject_key_prefix) are available with hash_value_field.

# output data: {"sales":"{\"user\":1,\"num\":2}", "parsed":{"sales.user":1, "sales.num":2}}

Not to parse times (reserve that field like 'time' in record), specify time_parse no:

<filter raw.sales.*>
  @type parser
  format json
  key_name sales
  hash_value_field parsed
  time_parse no
</filter>
# input string of 'sales': {"user":1,"num":2,"time":"2013-10-31 12:48:33"}
# output data: {"parsed":{"user":1, "num":2,"time":"2013-10-31 12:48:33"}}

TODO

  • consider what to do next
  • patches welcome!

Copyright

  • Copyright
    • Copyright (c) 2016 Jerry Zhou
  • License
    • Apache License, Version 2.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages