Skip to content
Fabrice Bacchella edited this page Jan 11, 2024 · 57 revisions

For configuration, loghub uses a DSL generated using antlr. Its syntax is a custom mix of logstash configuration files, java and a small taste of groovy. The exact grammar can be found at https://github.com/fbacchella/LogHub/blob/master/src/main/antlr4/loghub/Route.g4.

The configuration components are strongly typed, so string must be wrapped with a "...", a lone 1 will be seen as an integer, 1.0 is as a double, true or false are boolean values.

Sample configuration

input {
    loghub.receivers.ZMQ {
        listen: "tcp://localhost:2120",
        decoder: loghub.decoders.SerializedObject
    }
} | $main
input {
    loghub.receivers.Udp {
        port: 2121
        decoder: loghub.decoders.Msgpack
    }
} | $apache

output $main | { loghub.senders.ElasticSearch }

pipeline[apache] { loghub.processors.Geoip { datfilepath:"/user/local/share/GeoIP/GeoIP.dat", locationfield:"location", threads:4 } } | $main
pipeline[main] {
    loghub.processors.Log4JExtract { 
        if: [objectClass]=="org.apache.log4j.spi.LoggingEvent"
        source: "message",
        success: { [objectClass]- | [log4] = true },
    }
    | [logger_name] == "jrds.starter.Timer" || [info] > 4 ? loghub.processors.Drop : ( loghub.processors.ParseJson | log("${logger_name%s", WARN) )
}
plugins: ["/usr/share/loghub/plugins", "/usr/share/loghub/scripts"]

This configuration define two receivers, one that listen using 0MQ for log4j events. The other listen for msgpack encoded events on an udp port, like some that can be generated by mod_log_net.

The events received on UDP are sent to one pipeline called apache. All the events are transferred to the "main" pipeline after resolving location from visitors.

The log4j events are directly send to the main pipeline, that does some magic treatment on it. Pay attention to the test. It will be evaluated as a groovy scripts.

A property called "extensions" is defined. It allows to define custom extensions folders that will be used to resolve scripts and added to the class path.

In the configuration file, all the agent are defined using directly the class name.

Syntax

Comments

Java like comments are used: // for full line and /*...*/ for sections.

Types

Types allowed in a configuration file are either literal, object, array or map.

Literal

A literal is a raw type String, number, etc. written using the Java syntax and are resolver using a String constructor for that type. They map to String, Integer, Double, Character or Boolean java Type, using the same notation that Java:

  • "String" is a String
  • 'c' is a Character
  • 1 is an Integer
  • 1.0 is a Double
  • true or false are Boolean.

Pattern

Some operators expect a pattern as an argument, which should be written in either the form /pattern/ or """pattern""". This pattern follows the regular expression syntax used in Java. If the pattern involves line separators, they must be appropriately escaped.

Object

An object is written in the syntax :

class.name { attribute: value, ...}

The class name is a java class name that is resolved using an internal class loader. Attribute is a bean from this class, so the method class.name.setBean must exist with the right type. A bean value can be of any types: another object, a literal, an array or a map.

Array

An array is written as:

[ value, ... ]

Value can be of any allowed type.

To avoid confusion with event variable, single element array can be written as [value,]

Expression

Expression are construct that are evaluated and return a value. The following operators are handled:

  • any literal
  • stringLiteral ( expressionsList )?, used to evaluate strings format
  • an event variable
  • .~ or !, as prefix logical unary operator that both negate a boolean value
  • +, - as a prefix numerical unary operators
  • **, *, /, *, +, -, <<, >>, >>>, <=> as numerical binary operators
  • .&, .^ .| as numerical binary operators, that acts like the more common numerical & (and), ^ (xor), | (or)
  • <, <=, >, >=, in, !in, instanceof, !instanceof, ==, !=, ===, !==, &&, || as logical binary operator
  • =~ compare a value to a pattern and return an array with all the groups given in the pattern.
  • ==~ return true if the first argument matches the second argument given as a pattern
  • ( ) to group an expression for priority.
  • [x] as postfix unary operators that return the x-th element of an array or list; if negative, it counts from the end.
  • trim, capitalize, uncapitalize,isBlank,normalize, uppercase, lowercase are String functions.
  • join join any iterable element using the first argument given as a String and return a String.
  • split split the secong argument using the first one given as a regex pattern.
  • gsub does a string substitution of the first argument, using the second argument given as a pattern and replace by the third argument, given as a String
  • now return the current time as an Instant
  • isEmpty as a generic function.
  • set(expressionsList), return the given expressions as a set (a LinkedHashSet).
  • list(expressionsList), return the given expressions as a list (an ArrayList).

Usually the operators or functions are fail-safe. If they can’t be applied, they do nothing.

The syntax, evaluation and precedence of operators are deeply inspired by Groovy and should generally return the same values.

The function isEmpty can be used on a wide type of object, and check for null value, or emptiness of collections. Pay attention that " " is not empty as it’s size is 1, but it’s a blank string, so the expression isEmpty(" ") == isBlank(" ") is false. It can’t be used to check for the existence of an attribute.

The special construct [...] == * returns true if the event variable is defined. For example, [a b] = false | [a b] == * ? [c] = true will set the attribute c to true because the variable is defined, and, [a b]- | [a b] != * ? [c] = true will set c to true for the opposite reason, it’s missing.

Event variable

Any expression or many attributes can use event variable. They are writing using the syntax [path to key]. As event value can contain path, this syntax defines the path to sub-key. For example if the json serialization of an event is

{
    ....
    "a": {"b": 'c'}
}

To reach the value 'c', the syntax will be [a b]. The allowed elements are java identifier. If they are not valid, for example if they include a - or a space, they can be wrapped in double quote: [a b] return the same values as [a "b"].

The special value [@timestamp] return the timestamp of the event.

[@context a b] return the event reception context, like the remote IP. Receiver might define custom context. A common value is [@context principal] that contains the Java Principal used to connect on authenticated receivers. Another one is [@context remoteAddress address] that contains the SocketAddress of the remote end of the connexion ; [@context remoteAddress address hostName] should not be used, at will do a synchronous name resolution that will hang the event.

The special value [@lastException], should only be used inside a exception pipeline of a processor. If this processor failed with a ProcessorException, it will contain the error message.

Pipe elements

A Pipeline contains many type of element that are chained together using the symbol |

Commands

A pipeline can contain some specific commands, like the keyword drop that drop the event.

Event manipulation

Simple changed can be done to an event.

  • A field can be dropped, using the syntax [field] -.
  • A field can be renamed, using the syntax [newfield] < [oldfield].
  • A value can be appended to a field, detecting its type: [a] =+ Expression will append add an element to an array, a list or a set. If a does not exist, it will be created as list containing one or zero value, depending on the result of the evaluation of the expression, i.e. [a] =+ null will create an empty list if a was not present in the event.
  • A field can be assigned to a value, using a groovy expression, using the syntax [field] = expression. This expression can call other field using the event variable ([name]) syntax. For example [sum] = [path v1] + [path v2] will put the sum of the two field using their full path in field sum.

If the destination field is @timestamp, it will be stored in the event timestamp, not a field called @timestamp.

Tests

A Pipeline can contain some test that are written as

 expression ? then : else

expression is a groovy logical expression that must return a true or false values. Event fields are written enclose in square bracket: [field].

then and else are two processors or sub-pipeline where the event will be sent according to the evaluation of the expression.

Sub pipeline

For using mainly in test, a pipeline can contain another pipeline written as

( element | element... )

The event will go through it and will be sent back after being processed.

A named pipeline

Another pipeline can be used by referring to its name. It a case of sub pipeline. It's written as

$pipename

If the symbol used is + instead of |, a copy of this event will be sent to the new pipeline, for example:

pipeline[main] { ... + $second | ...}
pipeline[second] { ... }

The pipeline second will receive a copy of all the event seen by the pipeline main that reach the calling step.

If the symbol used is >, the event will not be processed anymore in the current pipeline but will be sent to second pipeline. It's generally used in test condition, for example:

pipeline[main] { ... sometest ? ( Object {...} > $second) }
pipeline[second] { ... }

The pipeline second will receive all the event seen by the pipeline main that reach the calling step.

For both + and >, if the destination is the single step, just prefix the destination with the good symbol:

sometest ? ( > $second) | othertest ? ( + $second )

Processors

Process are object derivative from loghub.Processor class. They take an inputs as an elements and process them. They can drop it, transform it or take any kind of action.

Commands

In a pipeline, some commands control the flow of events

Drop

The single keyword 'drop' an event

Fire

The keyword 'fire' can be used to fire a new event. It will be sent at the beginning of the given pipeline.

The syntax is

fire { [fieldname]: fieldvalue; ... } > $piperef

fieldvalue is a groovy expression. It can extract values from the current event using the event variable syntax. For example, writing:

pipeline[main] {
    ...
    fire { [a] = 1 ; [b] = [count] * 3 } > $alert
}

Will fire an event and send it to the pipeline alert. The new event will have too fields set, a with the value 1 and b with a value calculated from the field count of the current event.

Log

The keyword log can be used to send log information's to dedicated log4j logger.

The syntax is

log ("message expression", LEVEL)

The message expression is a groovy expression that is given the variable event that contains the current event. LEVEL is a log4j2 level. The message will be sent to the logger called loghub.eventlogger.<pipelinename>.

Merge

LogHub can merge many events in one and then send the result. This command take a lot of arguments and is explained in Merging events

Path

This assertion is used to create a sub-view of an event, to make variable path shorted. For example:

   path [a] ( [b] = 1 | [c]= 2 )

gives the same result as

   [a b] = 1 | [a c] = 2

Top level elements

A configuration file contains any number of inputs, output, pipelines, properties and sources.

Pipeline

A pipeline is defined with :

pipeline[name] { pipelement | pipelement ... }

Empty pipeline are allowed. They can be used to join sender or receiver.

Input

An input is written as

input { Object {decoder: ... } } | $pipeline

'input' is the exact string input, Object is a java object that inherits from the class loghub.Receiver. $pipeline is optional and is the name of the pipeline that will received generated element from this input ; if missing, it defaults to $main. decoder is an object using a class derived from loghub.Decoder ; it takes a byte[] and build a new loghub.Event from its content.

Output

An output is written as

output $pipeline | { Object {encoder: ... } } 

'output' is the exact string output , Object is a java object that inherits from the class loghub.Sender. $pipeline is the name of the pipeline that will send generated element to this output. encoder is an object using a class derived from loghub.Encoder ; it takes a loghub.Event and serialise it to a byte[] that will be sent using this output.

Properties

The following properties can be used to control some Loghub components.

  • hprofDumpPath, the path to a hprof file to dump in case of critical failure.
  • http.port, the listening port for the internal http dashboard, default to -1 (inactive).
  • maxSteps, the maximum number of processing step an event can go through before being dropped.
  • numWorkers, the number of processing threads.
  • includes, an array or a string that gives folder where additional configuration files are given ; a glob like path/*.conf can be given
  • jmx.proto, the protocol to be used to listen on jmx, can be "rmi" or "jmxmp".
  • jmx.port, the listening port for jmx management.
  • jmx.listen, the jmx IP that jmx management bind to.
  • jwt.secret
  • jwt.alg
  • locale, the default locale that will be used for output, any string that Locale.forLanguageTag can take.
  • log4j.configFile, the path or the URL to a log4j2 configuration file.
  • log4j.defaultlevel
  • plugins, an array of path or jar that will contain additional components.
  • queueDepth
  • queueWeight
  • ssl.ephemeralDHKeySize, default to 2048.
  • ssl.rejectClientInitiatedRenegotiation, default to true.
  • ssl.context, default to TLSv1.2.
  • ssl.issuers, the accepted issuers for client certificates.
  • ssl.providername
  • ssl.providerclass
  • ssl.keymanageralgorithm
  • ssl.trustmanageralgorithm
  • ssl.securerandom
  • ssl.trusts
  • timezone the default timezone that will be used for output, any string that [TimeZone.getTimeZone] (https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html#getTimeZone-java.lang.String-) can take.
  • zmq.keystore
  • zmq.numSocket
  • zmq.linger

Every property can be also given as a system properties, using the standard -D in command line. System properties will override values given in the configuration file.

Sources

The mapping syntax can take a source to resolve values:

sources:
  asource: SourceObject {
    bean: value
    ...
  }

They can be used in pipelines with the mapping construct, that transform the content of an attribute using a map:

map [...] %sourcename

The only currently supported source is FileMap that can process CSV or JSON files:

loghub.sources.FileMap {
    mappingFile: "somefile.csv",
    keyName: "ElementID",
    valueName: "Name"
}

where keyName and valueName are the columns or JSON attributes to use. If a CSV is parsed, instead of columns name, columns number can be directly given (starting from 0), and the first line will not be used as a header. The file extension is used to identify the file type.

Storing secrets

Someone might want to store some secrets outside of configuration files, like HTTP authentication password.

A private secret store can be used for that. Each secret is given an alias that will be resolved to the actual value when parsing configuration file. A secret alias is identified by the symbol *.

For example:

secrets.source: "path/to/secret.jceks"

output $sender | {
    loghub.senders.ElasticSearch {
        login: "loghub",
        password: *espassword,
    }
}

To create the secret store:

 java -jar target/loghub-0.0.1-SNAPSHOT.jar secrets --create -s path/to/secrets

And to add a secret:

 echo "Secr3tP4ssw0rd" | java -jar target/loghub-0.0.1-SNAPSHOT.jar secrets --add -s path/to/secrets -a espassword -i

The secrets file is a jceks keystore file, so it can be inspected with a tool like KeyStore Explorer.

The path to the secret store file can be a URL, but then the command --create might easily fails.