Skip to content

Commit

Permalink
config from multiple sources (#507)
Browse files Browse the repository at this point in the history
* remove schema and rule checker
* add defaults module to store all logprep defaults
* refactor get_versions_string and DEFAULT_LOCATION_CONFIG and move them to util module
* write tests for get_versions_string which was only tested implicit
* remove rule validation
* reimplement logprep.util.configuration module
* reimplement logprep.runner module
* add reload method to configuration 
* add commandline option to print config as json or yaml
* add reload successful an failure metric to runner
* implement configuration equality -> equal version == equal configuration
* remove MultiprocessingPipeline
* update changelog
* update architecture visualizations
* move exception handling to configuration module

to handle exceptions where they occur, the exception handling of
the exceptions where moved to the util/configuration module.
In `run_logprep` only the InvalidConfiguration has to be handled.
The Configuration only raises the `InvalidConfigurationError` exception

* add LogprepException and ensure InvalidConfigurationErrors adds errors only once
---------

Co-authored-by: dtrai2 <[email protected]>
Co-authored-by: djkhl <[email protected]>
  • Loading branch information
3 people authored Feb 6, 2024
1 parent 87e9e21 commit 3601d6d
Show file tree
Hide file tree
Showing 85 changed files with 3,034 additions and 3,109 deletions.
10 changes: 8 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,16 @@
### Breaking

* reimplement the logprep CLI, see `logprep --help` for more information.
* remove feature to reload configuration by sending signal `SIGUSR1`
* remove feature to validate rules because it is already included in `logprep test config`

### Features


* add a `number_of_successful_writes` metric to the s3 connector, which counts how many events were successfully written to s3
* make the s3 connector work with the new `_write_backlog` method introduced by the `confluent_kafka` commit bugfix in v9.0.0
* add option to Opensearch Output Connector to use parallel bulk implementation (default is True)

* add feature to logprep to load config from multiple sources (files or uris)
* add feature to logprep to print the resulting configruation with `logprep print json|yaml <Path to config>` in json or yaml

### Improvements

Expand All @@ -20,6 +22,10 @@
* make the s3 connector blocking by removing threading
* revert the change from v9.0.0 to always check the existence of a field for negated key-value based lucene filter expressions
* make store_custom in s3, opensearch and elasticsearch connector not call `batch_finished_callback` to prevent data loss that could be caused by partially processed events
* remove the `schema_and_rule_checker` module
* rewrite Logprep Configuration object see documentation for more details
* rewrite Runner
* delete MultiProcessingPipeline class to simplify multiprocesing

### Bugfix

Expand Down
31 changes: 3 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,13 +289,13 @@ Depending on how you have installed Logprep you have different choices to run Lo
If you have installed it via PyPI or the Github Development release just run:
```
logprep $CONFIG
logprep run $CONFIG
```
If you have installed Logprep via cloning the repository then you should run it via:
```
PYTHONPATH="." python3 logprep/run_logprep.py $CONFIG
PYTHONPATH="." python3 logprep/run_logprep.py run $CONFIG
```
Where `$CONFIG` is the path or uri to a configuration file (see the documentation about the
Expand All @@ -307,37 +307,12 @@ The next sections all assume an installation via pip
The following command can be executed to verify the configuration file without having to run Logprep:
```
logprep --verify-config $CONFIG
logprep test config $CONFIG
```
Where `$CONFIG` is the path or uri to a configuration file (see the documentation about the
[configuration](https://logprep.readthedocs.io/en/latest/user_manual/configuration/index.html)).
### Validating Labeling-Schema and Rules
The following command can be executed to validate the schema and the rules:
```
logprep --validate-rules $CONFIG
```
Where `$CONFIG` is the path or uri to a configuration file (see the documentation about the
[configuration](https://logprep.readthedocs.io/en/latest/user_manual/configuration/index.html)).
Alternatively, the validation can be performed directly. Assuming you have cloned the repository
from git.
```
PYTHONPATH="." python3 logprep/util/schema_and_rule_checker.py --labeling-schema $LABELING_SCHEMA --labeling-rules $LABELING_RULES
```
Where `$LABELING_SCHEMA` is the path to a labeling-schema (JSON file) and `$LABELING_RULES` is
the path to a directory with rule files (JSON/YML files, see Rules.md, subdirectories
are permitted)
Analogously, `--normalization-rules` and `--pseudonymizer-rules` can be used.
Validation does also perform a verification of the pipeline section of the Logprep configuration.
### Reload the Configuration
Expand Down
252 changes: 128 additions & 124 deletions doc/source/development/architecture/diagramms/logprep_start.drawio

Large diffs are not rendered by default.

Large diffs are not rendered by default.

216 changes: 167 additions & 49 deletions doc/source/development/architecture/diagramms/multiprocessing.drawio

Large diffs are not rendered by default.

Large diffs are not rendered by default.

132 changes: 60 additions & 72 deletions doc/source/development/architecture/diagramms/pipelineManager.drawio
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<mxfile host="Electron" modified="2023-12-18T09:19:53.975Z" agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/22.1.2 Chrome/114.0.5735.289 Electron/25.9.4 Safari/537.36" etag="nOcAJcWz8JEnXxHV2bY2" version="22.1.2" type="device">
<mxfile host="Electron" modified="2024-02-01T14:20:51.955Z" agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/22.1.16 Chrome/120.0.6099.109 Electron/28.1.0 Safari/537.36" etag="QJoxnwYhX1PIMikuXQZW" version="22.1.16" type="device">
<diagram id="SRfpee8Bwv2kgKTGE94v" name="Page-1">
<mxGraphModel dx="1036" dy="606" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0">
<mxGraphModel dx="1728" dy="998" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
Expand All @@ -10,17 +10,10 @@
<mxCell id="10" value="PipelineManager" style="swimlane;html=1;startSize=20;horizontal=0;" parent="9" vertex="1">
<mxGeometry y="20" width="1560" height="350" as="geometry" />
</mxCell>
<mxCell id="15" value="" style="edgeStyle=none;html=1;" parent="10" source="13" target="14" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="13" value="_create_manager()" style="ellipse;whiteSpace=wrap;html=1;" parent="10" vertex="1">
<mxGeometry x="60" y="20" width="120" height="80" as="geometry" />
</mxCell>
<mxCell id="17" value="" style="edgeStyle=none;html=1;" parent="10" source="14" target="16" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="14" value="_set_configuartion()" style="whiteSpace=wrap;html=1;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="10" vertex="1">
<mxGeometry x="260" y="30" width="120" height="60" as="geometry" />
<mxCell id="17" value="" style="edgeStyle=none;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" parent="10" source="armna5WRkvfgfis_-d7w-70" target="16" edge="1">
<mxGeometry relative="1" as="geometry">
<mxPoint x="380" y="60" as="sourcePoint" />
</mxGeometry>
</mxCell>
<mxCell id="21" value="" style="edgeStyle=none;html=1;" parent="10" source="16" target="20" edge="1">
<mxGeometry relative="1" as="geometry" />
Expand Down Expand Up @@ -57,14 +50,29 @@
<mxCell id="66" value="stop()" style="ellipse;whiteSpace=wrap;html=1;" parent="10" vertex="1">
<mxGeometry x="1409" y="195" width="111" height="55" as="geometry" />
</mxCell>
<mxCell id="12" value="MultiprocessingPipeline" style="swimlane;html=1;startSize=20;horizontal=0;" parent="9" vertex="1">
<mxCell id="armna5WRkvfgfis_-d7w-70" value="restart()" style="whiteSpace=wrap;html=1;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" vertex="1" parent="10">
<mxGeometry x="59" y="30" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="armna5WRkvfgfis_-d7w-73" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="10" source="59" target="16">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="59" value="remove(failed_pipelines)" style="whiteSpace=wrap;html=1;rounded=0;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="10" vertex="1">
<mxGeometry x="1199" y="30" width="150" height="60" as="geometry" />
</mxCell>
<mxCell id="57" value="restart_failed_pipeline()" style="whiteSpace=wrap;html=1;rounded=0;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="10" vertex="1">
<mxGeometry x="1204" y="190" width="140" height="60" as="geometry" />
</mxCell>
<mxCell id="60" value="" style="edgeStyle=orthogonalEdgeStyle;jumpStyle=arc;html=1;" parent="10" source="57" target="59" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="12" value="multiprocessing.Process" style="swimlane;html=1;startSize=20;horizontal=0;" parent="9" vertex="1">
<mxGeometry y="370" width="1560" height="160" as="geometry" />
</mxCell>
<mxCell id="32" value="__init__()" style="whiteSpace=wrap;html=1;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="12" vertex="1">
<mxCell id="32" value="start()" style="whiteSpace=wrap;html=1;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="12" vertex="1">
<mxGeometry x="818" y="60" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="36" value="pipeline.start()" style="whiteSpace=wrap;html=1;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="12" vertex="1">
<mxGeometry x="998" y="60" width="120" height="60" as="geometry" />
<mxCell id="26" value="join()" style="whiteSpace=wrap;html=1;rounded=0;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="12" vertex="1">
<mxGeometry x="629" y="60" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="24" style="edgeStyle=none;html=1;" parent="9" source="22" target="25" edge="1">
<mxGeometry relative="1" as="geometry">
Expand All @@ -74,17 +82,12 @@
<mxCell id="33" value="" style="edgeStyle=none;html=1;" parent="9" source="30" target="32" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="35" value="" style="edgeStyle=none;html=1;" parent="9" source="32" target="34" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="37" value="" style="edgeStyle=orthogonalEdgeStyle;html=1;" parent="9" source="34" target="36" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="39" value="" style="edgeStyle=orthogonalEdgeStyle;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" parent="9" source="36" target="38" edge="1">
<mxCell id="39" value="" style="edgeStyle=orthogonalEdgeStyle;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" parent="9" source="32" target="38" edge="1">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="1138" y="460" />
</Array>
<mxPoint x="1118" y="460" as="sourcePoint" />
</mxGeometry>
</mxCell>
<mxCell id="42" style="edgeStyle=orthogonalEdgeStyle;html=1;entryX=0;entryY=0.5;entryDx=0;entryDy=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;" parent="9" source="38" target="20" edge="1">
Expand All @@ -99,40 +102,12 @@
<mxCell id="11" value="Pipeline" style="swimlane;html=1;startSize=20;horizontal=0;" parent="9" vertex="1">
<mxGeometry y="530" width="1560" height="240" as="geometry" />
</mxCell>
<mxCell id="26" value="pipeline.join()" style="whiteSpace=wrap;html=1;rounded=0;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="11" vertex="1">
<mxGeometry x="595" y="20" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="25" value="pipeline.stop()" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="11" vertex="1">
<mxGeometry x="433" y="20" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="27" value="" style="edgeStyle=none;html=1;" parent="11" source="25" target="26" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="34" value="__init__()" style="whiteSpace=wrap;html=1;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="11" vertex="1">
<mxGeometry x="818" y="20" width="120" height="60" as="geometry" />
<mxGeometry x="433" y="55" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="38" value="pipeline.run()" style="whiteSpace=wrap;html=1;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="11" vertex="1">
<mxGeometry x="1078" y="20" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="60" value="" style="edgeStyle=orthogonalEdgeStyle;jumpStyle=arc;html=1;" parent="11" source="57" target="59" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="57" value="restart_failed_pipeline()" style="whiteSpace=wrap;html=1;rounded=0;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="11" vertex="1">
<mxGeometry x="1255" y="124" width="140" height="60" as="geometry" />
</mxCell>
<mxCell id="59" value="remove(failed_pipelines)" style="whiteSpace=wrap;html=1;rounded=0;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="11" vertex="1">
<mxGeometry x="1249" y="20" width="150" height="60" as="geometry" />
</mxCell>
<mxCell id="58" value="" style="edgeStyle=orthogonalEdgeStyle;jumpStyle=arc;html=1;strokeColor=#000000;" parent="9" source="55" target="57" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="61" style="edgeStyle=orthogonalEdgeStyle;jumpStyle=arc;html=1;entryX=1;entryY=0.5;entryDx=0;entryDy=0;fillColor=#60a917;strokeColor=#000000;exitX=0.5;exitY=0;exitDx=0;exitDy=0;" parent="9" source="59" target="16" edge="1">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="1324" y="80" />
</Array>
</mxGeometry>
</mxCell>
<mxCell id="49" style="edgeStyle=orthogonalEdgeStyle;html=1;jumpStyle=arc;entryX=0;entryY=0.5;entryDx=0;entryDy=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;" parent="9" source="26" target="53" edge="1">
<mxGeometry relative="1" as="geometry">
<mxPoint x="655.5263157894738" y="720" as="targetPoint" />
Expand All @@ -141,34 +116,47 @@
<mxCell id="43" value="Runner" style="swimlane;html=1;startSize=20;horizontal=0;" parent="9" vertex="1">
<mxGeometry y="770" width="1560" height="220" as="geometry" />
</mxCell>
<mxCell id="56" value="" style="edgeStyle=orthogonalEdgeStyle;jumpStyle=arc;html=1;" parent="43" source="53" target="55" edge="1">
<mxGeometry relative="1" as="geometry" />
<mxCell id="53" value="keep_&lt;br&gt;iterating()?" style="rhombus;whiteSpace=wrap;html=1;rounded=0;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="43" vertex="1">
<mxGeometry x="780" y="40" width="80" height="80" as="geometry" />
</mxCell>
<mxCell id="67" value="yes" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" parent="56" vertex="1" connectable="0">
<mxGeometry x="-0.3295" relative="1" as="geometry">
<mxPoint as="offset" />
<mxCell id="armna5WRkvfgfis_-d7w-68" value="start" style="ellipse;whiteSpace=wrap;html=1;" vertex="1" parent="43">
<mxGeometry x="59" y="70" width="120" height="80" as="geometry" />
</mxCell>
<mxCell id="armna5WRkvfgfis_-d7w-69" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=1;entryDx=0;entryDy=0;" edge="1" parent="9" source="armna5WRkvfgfis_-d7w-68" target="armna5WRkvfgfis_-d7w-70">
<mxGeometry relative="1" as="geometry">
<mxPoint x="119" y="120" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="63" style="edgeStyle=orthogonalEdgeStyle;html=1;entryX=0;entryY=0.5;entryDx=0;entryDy=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;" parent="43" source="53" target="62" edge="1">
<mxGeometry relative="1" as="geometry" />
<mxCell id="27" value="" style="edgeStyle=orthogonalEdgeStyle;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;" parent="9" source="25" target="26" edge="1">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="589" y="615" />
<mxPoint x="589" y="460" />
</Array>
</mxGeometry>
</mxCell>
<mxCell id="63" style="edgeStyle=orthogonalEdgeStyle;html=1;entryX=0.5;entryY=1;entryDx=0;entryDy=0;exitX=0.5;exitY=1;exitDx=0;exitDy=0;" parent="9" source="53" target="66" edge="1">
<mxGeometry relative="1" as="geometry">
<mxPoint x="1035" y="945" as="targetPoint" />
<Array as="points">
<mxPoint x="820" y="945" />
<mxPoint x="1465" y="945" />
</Array>
</mxGeometry>
</mxCell>
<mxCell id="68" value="no" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" parent="63" vertex="1" connectable="0">
<mxGeometry x="0.0848" y="2" relative="1" as="geometry">
<mxPoint as="offset" />
<mxPoint x="-434" y="46" as="offset" />
</mxGeometry>
</mxCell>
<mxCell id="53" value="keep_&lt;br&gt;iterating()?" style="rhombus;whiteSpace=wrap;html=1;rounded=0;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="43" vertex="1">
<mxGeometry x="780" y="40" width="80" height="80" as="geometry" />
</mxCell>
<mxCell id="55" value="_loop()" style="whiteSpace=wrap;html=1;rounded=0;fillColor=#60a917;fontColor=#ffffff;strokeColor=#2D7600;" parent="43" vertex="1">
<mxGeometry x="1035" y="50" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="62" value="stop()" style="whiteSpace=wrap;html=1;fillColor=#60a917;strokeColor=#2D7600;fontColor=#ffffff;rounded=0;" parent="43" vertex="1">
<mxGeometry x="1035" y="145" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="64" style="edgeStyle=orthogonalEdgeStyle;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0.5;entryY=1;entryDx=0;entryDy=0;" parent="9" source="62" target="66" edge="1">
<mxCell id="56" value="" style="edgeStyle=orthogonalEdgeStyle;jumpStyle=arc;html=1;" parent="9" source="53" target="57" edge="1">
<mxGeometry relative="1" as="geometry">
<mxPoint x="1509" y="265.00004882812505" as="targetPoint" />
<mxPoint x="1035" y="850" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="67" value="yes" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" parent="56" vertex="1" connectable="0">
<mxGeometry x="-0.3295" relative="1" as="geometry">
<mxPoint x="-183" as="offset" />
</mxGeometry>
</mxCell>
</root>
Expand Down

Large diffs are not rendered by default.

7 changes: 4 additions & 3 deletions doc/source/development/architecture/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,11 @@ The following diagrams illustrate the flow of a single event to make it more com
:file: ../../development/architecture/diagramms/event.drawio.html


Shared ressources within Multiprocessing
========================================
Multiprocessing
===============

This diagram shows what ressources are shared within the multiprocessing processes.
This diagram shows what ressources are shared within the multiprocessing processes and how the
processes are started and stopped.

.. raw:: html
:file: ../../development/architecture/diagramms/multiprocessing.drawio.html
Expand Down
4 changes: 2 additions & 2 deletions doc/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,13 @@ If you have installed it via PyPI or the Github Development release just run:

.. code-block:: bash
logprep $CONFIG
logprep run $CONFIG
If you have installed Logprep via cloning the repository then you should run it via:

.. code-block:: bash
PYTHONPATH="." python3 logprep/run_logprep.py $CONFIG
PYTHONPATH="." python3 logprep/run_logprep.py run $CONFIG
Where :code:`$CONFIG` is the path to a configuration file.
For more information see the :ref:`configuration` section.
Expand Down
Loading

0 comments on commit 3601d6d

Please sign in to comment.