-
Notifications
You must be signed in to change notification settings - Fork 11
Configuration file
For configuration, loghub uses a DSL generated using antlr. Its syntax is a custom mix of logstash configuration files, java and a small taste of groovy. The exact grammar can be found at https://github.com/fbacchella/LogHub/blob/master/src/main/antlr4/loghub/Route.g4.
The configuration components are strongly typed, so string must be wrapped with a "..."
, a lone 1
will be seen as an
integer, 1.0
is as a double, true
or false
are boolean values.
input {
loghub.receivers.ZMQ {
listen: "tcp://localhost:2120",
decoder: loghub.decoders.SerializedObject
}
} | $main
input {
loghub.receivers.Udp {
port: 2121
decoder: loghub.decoders.Msgpack
}
} | $apache
output $main | { loghub.senders.ElasticSearch }
pipeline[apache] { loghub.processors.Geoip { datfilepath:"/user/local/share/GeoIP/GeoIP.dat", locationfield:"location", threads:4 } } | $main
pipeline[main] {
loghub.processors.Log4JExtract {
if: [objectClass]=="org.apache.log4j.spi.LoggingEvent"
source: "message",
success: { [objectClass]- | [log4] = true },
}
| [logger_name] == "jrds.starter.Timer" || [info] > 4 ? loghub.processors.Drop : ( loghub.processors.ParseJson | log("${logger_name%s", WARN) )
}
plugins: ["/usr/share/loghub/plugins", "/usr/share/loghub/scripts"]
This configuration define two receivers, one that listen using 0MQ for log4j events. The other listen for msgpack encoded events on an udp port, like some that can be generated by mod_log_net.
The events received on UDP are sent to one pipeline called apache
. All the events are transferred to the "main" pipeline
after resolving location from visitors.
The log4j events are directly send to the main pipeline, that does some magic treatment on it. Pay attention to the test. It will be evaluated as a groovy scripts.
A property called "extensions" is defined. It allows to define custom extensions folders that will be used to resolve scripts and added to the class path.
In the configuration file, all the agent are defined using directly the class name.
Java like comments are used: //
for full line and /*...*/
for sections.
Types allowed in a configuration file are either literal, object, array or map.
A literal is a raw type String, number, etc. written using the Java syntax and are resolver using a String constructor for that type. They map to String, Integer, Double, Character or Boolean java Type, using the same notation that Java:
-
"String"
is a String -
'c'
is a Character -
1
is an Integer -
1.0
is a Double -
true
orfalse
are Boolean.
Some operators expect a pattern as an argument, which should be written in either the form /pattern/
or """pattern"""
.
This pattern follows the regular expression syntax used in Java. If the pattern involves line separators, they must be
appropriately escaped.
An object is written in the syntax :
class.name { attribute: value, ...}
The class name is a java class name that is resolved using an internal class loader. Attribute is a bean from this class,
so the method class.name.setBean
must exist with the right type. A bean value can be of any types: another object, a literal, an array or a map.
An array is written as:
[ value, ... ]
Value can be of any allowed type.
To avoid confusion with event variable, single element array can be written as [value,]
Expression are construct that are evaluated and return a value. The following operators are handled:
- any literal
-
stringLiteral ( expressionsList )?
, used to evaluate strings format - an event variable
-
.~
or!
, as prefix logical unary operator that both negate a boolean value -
+
,-
as a prefix numerical unary operators -
**
,*
,/
,*
,+
,-
,<<
,>>
,>>>
,<=>
as numerical binary operators -
.&
,.^
.|
as numerical binary operators, that acts like the more common numerical&
(and),^
(xor),|
(or) -
<
,<=
,>
,>=
,in
,!in
,instanceof
,!instanceof
,==
,!=
,===
,!==
,&&
,||
as logical binary operator -
=~
compare a value to a pattern and return an array with all the groups given in the pattern. -
==~
return true if the first argument matches the second argument given as a pattern -
(
)
to group an expression for priority. -
[x]
as postfix unary operators that return the x-th element of an array or list; if negative, it counts from the end. -
trim
,capitalize
,uncapitalize
,isBlank
,normalize
,uppercase
,lowercase
are String functions. -
join
join any iterable element using the first argument given as aString
and return aString
. -
split
split the secong argument using the first one given as a regex pattern. -
gsub
does a string substitution of the first argument, using the second argument given as a pattern and replace by the third argument, given as aString
-
now
return the current time as an Instant -
isEmpty
as a generic function. -
set(expressionsList)
, return the given expressions as a set (a LinkedHashSet). -
list(expressionsList)
, return the given expressions as a list (an ArrayList).
Usually the operators or functions are fail-safe. If they can’t be applied, they do nothing.
The syntax, evaluation and precedence of operators are deeply inspired by Groovy and should generally return the same values.
The function isEmpty
can be used on a wide type of object, and check for null value, or emptiness of collections.
Pay attention that " "
is not empty as it’s size is 1, but it’s a blank string, so the expression isEmpty(" ") == isBlank(" ")
is false.
It can’t be used to check for the existence of an attribute.
The special construct [...] == *
returns true if the event variable is defined. For example, [a b] = false | [a b] == * ? [c] = true
will set the
attribute c
to true because the variable is defined, and, [a b]- | [a b] != * ? [c] = true
will set c
to true for
the opposite reason, it’s missing.
Any expression or many attributes can use event variable. They are writing using the syntax [path to key]
. As event value
can contain path, this syntax defines the path to sub-key. For example if the json serialization of an event is
{
....
"a": {"b": 'c'}
}
To reach the value 'c'
, the syntax will be [a b]
. The allowed elements are java identifier. If they are not valid, for
example if they include a -
or a space, they can be wrapped in double quote: [a b]
return the same values as [a "b"]
.
The special value [@timestamp]
return the timestamp of the event.
[@context a b]
return the event reception context, like the remote IP. Receiver might define custom context. A common
value is [@context principal]
that contains the Java Principal
used to connect on authenticated receivers. Another one is [@context remoteAddress address]
that contains the
SocketAddress of the remote
end of the connexion ; [@context remoteAddress address hostName]
should not be used, at will do a synchronous name
resolution that will hang the event.
The special value [@lastException]
, should only be used inside a exception
pipeline of a processor. If this processor
failed with a ProcessorException
, it will contain the error message.
A Pipeline contains many type of element that are chained together using the symbol |
A pipeline can contain some specific commands, like the keyword drop
that drop the event.
Simple changed can be done to an event.
- A field can be dropped, using the syntax
[field] -
. - A field can be renamed, using the syntax
[newfield] < [oldfield]
. - A value can be appended to a field, detecting its type:
[a] =+ Expression
will append add an element to an array, a list or a set. If a does not exist, it will be created as list containing one or zero value, depending on the result of the evaluation of the expression, i.e.[a] =+ null
will create an empty list ifa
was not present in the event. - A field can be assigned to a value, using a groovy expression, using the syntax
[field] = expression
. This expression can call other field using the event variable ([name]
) syntax. For example[sum] = [path v1] + [path v2]
will put the sum of the two field using their full path in fieldsum
.
If the destination field is @timestamp
, it will be stored in the event timestamp, not a field called @timestamp.
A Pipeline can contain some test that are written as
expression ? then : else
expression
is a groovy logical expression that must return a true or false values. Event fields are written enclose in square bracket: [field]
.
then
and else
are two processors or sub-pipeline where the event will be sent according to the evaluation of the expression.
For using mainly in test, a pipeline can contain another pipeline written as
( element | element... )
The event will go through it and will be sent back after being processed.
Another pipeline can be used by referring to its name. It a case of sub pipeline. It's written as
$pipename
If the symbol used is +
instead of |
, a copy of this event will be sent to the
new pipeline, for example:
pipeline[main] { ... + $second | ...}
pipeline[second] { ... }
The pipeline second
will receive a copy of all the event seen by the pipeline main
that reach the calling step.
If the symbol used is >
, the event will not be processed anymore in the current pipeline but will be sent to second
pipeline. It's generally used in test condition, for example:
pipeline[main] { ... sometest ? ( Object {...} > $second) }
pipeline[second] { ... }
The pipeline second
will receive all the event seen by the pipeline main
that reach the calling step.
For both +
and >
, if the destination is the single step, just prefix the destination with the good symbol:
sometest ? ( > $second) | othertest ? ( + $second )
Process are object derivative from loghub.Processor
class. They take an inputs as an elements and process them. They can
drop it, transform it or take any kind of action.
In a pipeline, some commands control the flow of events
The single keyword 'drop' an event
The keyword 'fire' can be used to fire a new event. It will be sent at the beginning of the given pipeline.
The syntax is
fire { [fieldname]: fieldvalue; ... } > $piperef
fieldvalue
is a groovy expression. It can extract values from the current event using the event variable syntax. For example, writing:
pipeline[main] {
...
fire { [a] = 1 ; [b] = [count] * 3 } > $alert
}
Will fire an event and send it to the pipeline alert
. The new event will have too fields set, a
with the value 1 and b
with a value calculated from the field count
of the current event.
The keyword log can be used to send log information's to dedicated log4j logger.
The syntax is
log ("message expression", LEVEL)
The message expression is a groovy expression that is given the variable event
that contains the current event. LEVEL
is a log4j2 level. The message will be sent to the logger called loghub.eventlogger.<pipelinename>
.
LogHub can merge many events in one and then send the result. This command take a lot of arguments and is explained in Merging events
This assertion is used to create a sub-view of an event, to make variable path shorted. For example:
path [a] ( [b] = 1 | [c]= 2 )
gives the same result as
[a b] = 1 | [a c] = 2
A configuration file contains any number of inputs, output, pipelines, properties and sources.
A pipeline is defined with :
pipeline[name] { pipelement | pipelement ... }
Empty pipeline are allowed. They can be used to join sender or receiver.
An input is written as
input { Object {decoder: ... } } | $pipeline
'input' is the exact string input
, Object is a java object that inherits from the class loghub.Receiver
. $pipeline
is optional and is the name of the pipeline that will received generated element from this input ; if missing, it defaults
to $main
. decoder
is an object using a class derived from loghub.Decoder
; it takes a byte[]
and build a new loghub.Event
from its content.
An output is written as
output $pipeline | { Object {encoder: ... } }
'output' is the exact string output
, Object is a java object that inherits from the class loghub.Sender
. $pipeline
is the name of the pipeline that will send generated element to this output. encoder
is an object using a class derived
from loghub.Encoder
; it takes a loghub.Event
and serialise it to a byte[]
that will be sent using this output.
The following properties can be used to control some Loghub components.
-
hprofDumpPath
, the path to a hprof file to dump in case of critical failure. -
http.port
, the listening port for the internal http dashboard, default to -1 (inactive). -
maxSteps
, the maximum number of processing step an event can go through before being dropped. -
numWorkers
, the number of processing threads. -
includes
, an array or a string that gives folder where additional configuration files are given ; a glob likepath/*.conf
can be given -
jmx.proto
, the protocol to be used to listen on jmx, can be "rmi" or "jmxmp". -
jmx.port
, the listening port for jmx management. -
jmx.listen
, the jmx IP that jmx management bind to. jwt.secret
jwt.alg
-
locale
, the default locale that will be used for output, any string that Locale.forLanguageTag can take. -
log4j.configFile
, the path or the URL to a log4j2 configuration file. log4j.defaultlevel
-
plugins
, an array of path or jar that will contain additional components. queueDepth
queueWeight
-
ssl.ephemeralDHKeySize
, default to 2048. -
ssl.rejectClientInitiatedRenegotiation
, default to true. -
ssl.context
, default to TLSv1.2. -
ssl.issuers
, the accepted issuers for client certificates. ssl.providername
ssl.providerclass
ssl.keymanageralgorithm
ssl.trustmanageralgorithm
ssl.securerandom
ssl.trusts
-
timezone
the default timezone that will be used for output, any string that [TimeZone.getTimeZone] (https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html#getTimeZone-java.lang.String-) can take. zmq.keystore
zmq.numSocket
zmq.linger
Every property can be also given as a system properties, using the standard -D in command line. System properties will override values given in the configuration file.
The mapping syntax can take a source to resolve values:
sources:
asource: SourceObject {
bean: value
...
}
They can be used in pipelines with the mapping construct, that transform the content of an attribute using a map:
map [...] %sourcename
The only currently supported source is FileMap that can process CSV or JSON files:
loghub.sources.FileMap {
mappingFile: "somefile.csv",
keyName: "ElementID",
valueName: "Name"
}
where keyName
and valueName
are the columns or JSON attributes to use. If a CSV is parsed, instead of columns name,
columns number can be directly given (starting from 0), and the first line will not be used as a header. The file extension
is used to identify the file type.
Someone might want to store some secrets outside of configuration files, like HTTP authentication password.
A private secret store can be used for that. Each secret is given an alias that will be resolved to the actual value when parsing configuration file. A secret alias is identified by the symbol *.
For example:
secrets.source: "path/to/secret.jceks"
output $sender | {
loghub.senders.ElasticSearch {
login: "loghub",
password: *espassword,
}
}
To create the secret store:
java -jar target/loghub-0.0.1-SNAPSHOT.jar secrets --create -s path/to/secrets
And to add a secret:
echo "Secr3tP4ssw0rd" | java -jar target/loghub-0.0.1-SNAPSHOT.jar secrets --add -s path/to/secrets -a espassword -i
The secrets file is a jceks keystore file, so it can be inspected with a tool like KeyStore Explorer.
The path to the secret store file can be a URL, but then the command --create
might easily fails.