Skip to content

Latest commit

 

History

History
166 lines (132 loc) · 4.92 KB

pipeline.md

File metadata and controls

166 lines (132 loc) · 4.92 KB

Pipeline

A pipeline is made up of operators. The pipeline defines how stanza should input, process, and output logs.

Linear Pipelines

Many stanza pipelines are a linear sequence of operators. Logs flow from one operator to the next, according to the order in which they are defined.

For example, the following pipeline will read logs from a file, parse them as json, and print them to stdout:

pipeline:
  - type: file_input
    include: 
      - my-log.json
  - type: json_parser
  - type: stdout

Notice that every operator has a type field. The type of operator must always be specified.

id and output

Linear pipelines are sufficient for many use cases, but stanza is also capabile of processing non-linear pipelines as well. In order to use non-linear pipelines, the id and output fields must be understood. Let's take a close look at these.

Each operator in a pipeline has a unique id. By default, id will take the same value as type. Alternately, you can specify an id for any operator. If your pipeline contains multiple operators of the same type, then the id field must be used.

All operators (except output operators) support an output field. By default, the output field takes the value of the next operator's id.

Let's look at how these default values work together by considering the linear pipeline shown above. The following pipeline would be exactly the same (although much more verbosely defined):

pipeline:
  - type: file_input 
    id: file_input
    include: 
      - my-log.json
    output: json_parser
  - type: json_parser
    id: json_parser
    output: stdout
  - type: stdout
    id: stdout

Additionally, we could accomplish the same task using custom id's.

pipeline:
  - type: file_input
    id: my_file
    include: 
      - my-log.json
    output: my_parser
  - type: json_parser
    id: my_parser
    output: my_out
  - type: stdout
    id: my_out

We could even shuffle the order of operators, so long as we're explicitly declaring each output. This is a little counterintuitive, so it isn't recommended. However, it is shown here to highlight the fact that operators in a pipeline are ultimately connected via output's and id's.

pipeline:
  - type: stdout      # 3rd operator
    id: my_out
  - type: json_parser # 2nd operator
    id: my_parser
    output: my_out
  - type: file_input  # 1st operator
    id: my_file
    include: 
      - my-log.json
    output: my_parser

Finally, we could even remove some of the id's and output's, and depend on the default values. This is even less readable, so again would not be recommended. However, it is provided here to demonstrate that default values can be depended upon.

pipeline:
  - type: json_parser # 2nd operator
  - type: stdout      # 3rd operator
  - type: file_input  # 1st operator
    include: 
      - my-log.json
    output: json_parser

Non-Linear Pipelines

Now that we understand how id and output work together, we can configure stanza to run more complex pipelines. Technically, the structure of a stanza pipeline is limited only in that it must be a directed, acyclic, graph.

Let's consider a pipeline with two inputs and one output:

pipeline:
  - type: file_input
    include: 
      - my-log.json
    output: stdout # flow directly to stdout

  - type: windows_eventlog_input
    channel: security
    # implicitly flow to stdout

  - type: stdout

Here's another, where we read from two files that should be parsed differently:

pipeline:
  # Read and parse a JSON file
  - type: file_input
    id: file_input_one
    include: 
      - my-log.json
  - type: json_parser
    output: stdout # flow directly to stdout
  
  # Read and parse a text file
  - type: file_input
    id: file_input_two
    include: 
      - my-other-log.txt
  - type: regex_parser
    regex: ... # regex appropriate to file format
    # implicitly flow to stdout

  # Print
  - type: stdout

Finally, in some cases, you might expect multiple log formats to come from a single input. This solution uses the router operator. The router operator allows one to define multiple "routes", each of which has an output.

pipeline:
  # Read log file
  - type: file_input
    include: 
      - my-log.txt

  # Route based on log type
  - type: router
    routes:
      - expr: '$record startsWith "ERROR"'
        output: error_parser
      - expr: '$record startsWith "INFO"'
        output: info_parser

  # Parse logs with format one
  - type: regex_parser
    id: error_parser
    regex: ... # regex appropriate to parsing error logs
    output: stdout # flow directly to stdout

  # Parse logs with format two
  - type: regex_parser
    id: info_parser
    regex: ... # regex appropriate to parsing info logs
    output: stdout # flow directly to stdout

  # Print
  - type: stdout