A pipeline is made up of operators. The pipeline defines how stanza should input, process, and output logs.
Many stanza pipelines are a linear sequence of operators. Logs flow from one operator to the next, according to the order in which they are defined.
For example, the following pipeline will read logs from a file, parse them as json
, and print them to stdout
:
pipeline:
- type: file_input
include:
- my-log.json
- type: json_parser
- type: stdout
Notice that every operator has a type
field. The type
of operator must always be specified.
Linear pipelines are sufficient for many use cases, but stanza is also capabile of processing non-linear pipelines as well. In order to use non-linear pipelines, the id
and output
fields must be understood. Let's take a close look at these.
Each operator in a pipeline has a unique id
. By default, id
will take the same value as type
. Alternately, you can specify an id
for any operator. If your pipeline contains multiple operators of the same type
, then the id
field must be used.
All operators (except output operators) support an output
field. By default, the output field takes the value of the next operator's id
.
Let's look at how these default values work together by considering the linear pipeline shown above. The following pipeline would be exactly the same (although much more verbosely defined):
pipeline:
- type: file_input
id: file_input
include:
- my-log.json
output: json_parser
- type: json_parser
id: json_parser
output: stdout
- type: stdout
id: stdout
Additionally, we could accomplish the same task using custom id
's.
pipeline:
- type: file_input
id: my_file
include:
- my-log.json
output: my_parser
- type: json_parser
id: my_parser
output: my_out
- type: stdout
id: my_out
We could even shuffle the order of operators, so long as we're explicitly declaring each output. This is a little counterintuitive, so it isn't recommended. However, it is shown here to highlight the fact that operators in a pipeline are ultimately connected via output
's and id
's.
pipeline:
- type: stdout # 3rd operator
id: my_out
- type: json_parser # 2nd operator
id: my_parser
output: my_out
- type: file_input # 1st operator
id: my_file
include:
- my-log.json
output: my_parser
Finally, we could even remove some of the id
's and output
's, and depend on the default values. This is even less readable, so again would not be recommended. However, it is provided here to demonstrate that default values can be depended upon.
pipeline:
- type: json_parser # 2nd operator
- type: stdout # 3rd operator
- type: file_input # 1st operator
include:
- my-log.json
output: json_parser
Now that we understand how id
and output
work together, we can configure stanza to run more complex pipelines. Technically, the structure of a stanza pipeline is limited only in that it must be a directed, acyclic, graph.
Let's consider a pipeline with two inputs and one output:
pipeline:
- type: file_input
include:
- my-log.json
output: stdout # flow directly to stdout
- type: windows_eventlog_input
channel: security
# implicitly flow to stdout
- type: stdout
Here's another, where we read from two files that should be parsed differently:
pipeline:
# Read and parse a JSON file
- type: file_input
id: file_input_one
include:
- my-log.json
- type: json_parser
output: stdout # flow directly to stdout
# Read and parse a text file
- type: file_input
id: file_input_two
include:
- my-other-log.txt
- type: regex_parser
regex: ... # regex appropriate to file format
# implicitly flow to stdout
# Print
- type: stdout
Finally, in some cases, you might expect multiple log formats to come from a single input. This solution uses the router operator. The router
operator allows one to define multiple "routes", each of which has an output
.
pipeline:
# Read log file
- type: file_input
include:
- my-log.txt
# Route based on log type
- type: router
routes:
- expr: '$record startsWith "ERROR"'
output: error_parser
- expr: '$record startsWith "INFO"'
output: info_parser
# Parse logs with format one
- type: regex_parser
id: error_parser
regex: ... # regex appropriate to parsing error logs
output: stdout # flow directly to stdout
# Parse logs with format two
- type: regex_parser
id: info_parser
regex: ... # regex appropriate to parsing info logs
output: stdout # flow directly to stdout
# Print
- type: stdout