-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Lucene compliant regex filter expression (#675)
* Rebase * Adding lucine compliance unit test for development * Adding lucene compliance for filter parsing of a rule. * Adding logger with deprecation warning for regex_fields * Add comment and documentation for lucene regex filter annotation * Quickfix for lucene regex filter * Adjusting Format * Adjusting Format 2 * Adjusting Format 3 * Attempting to remove indeces for regex filter string * Adding notebook for lucene regex filter development * WIP notebook for lucene regex filter development * Adding Notebook for lucene regex filter testing. * Adding Notebook for lucene regex filter testing same results as unit test * Adding first running version of lucene regex filter * Improving notebook for lucene conform regex filter. * Improving notebook for lucene conform regex filter 2. * Slight improve * Bug fix in regex notebook. * Adding Deprecated Warning * Removing temporary test * Adding rule tests for lucene compliance * Black formatting * Black formatting * Remove prototypey * add changelog entry and some prototypey things that actually do nothing yet * Adding lucine compliance unit test for development * Adding lucene compliance for filter parsing of a rule. * Quickfix for lucene regex filter * Adjusting Format 2 * Adding Deprecated Warning * Black formatting * Add documentation * Delete prototypeclass * add notebook to documentation --------- Co-authored-by: FabianMoessner <[email protected]> Co-authored-by: MoessnerFabian(Group) <[email protected]>
- Loading branch information
1 parent
499ff55
commit 1a42a12
Showing
7 changed files
with
348 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
206 changes: 206 additions & 0 deletions
206
doc/source/development/notebooks/processor_examples/regex.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,206 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Lucene regex filter\n", | ||
"This presentations contains an example of a filter with a lucene conform regular expression. \n", | ||
"A concatenator that merges different fields form an event is used as a processor for demonstrating the filter function. \n", | ||
"\n", | ||
"Until now it was necessary to flag keys of values that contain a regular expression with regex_fields. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"document = {\n", | ||
" 'data_stream': {\n", | ||
" 'dataset': 'windows', \n", | ||
" 'namespace': 'devopslab', \n", | ||
" 'type': 'logs'\n", | ||
" }, \n", | ||
" '_op_type': 'create'\n", | ||
" }\n", | ||
"\n", | ||
"expected = {\n", | ||
" 'data_stream': {\n", | ||
" 'dataset': 'windows', \n", | ||
" 'namespace': 'devopslab', \n", | ||
" 'type': 'logs'\n", | ||
" }, \n", | ||
" '_op_type': 'create', \n", | ||
" '_index': 'logs-windows-devopslab'\n", | ||
" }" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Define process" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import sys\n", | ||
"sys.path.insert(0,\"../../../../../\")\n", | ||
"import tempfile\n", | ||
"from copy import deepcopy\n", | ||
"from pathlib import Path\n", | ||
"\n", | ||
"from unittest import mock\n", | ||
"from logprep.factory import Factory\n", | ||
"\n", | ||
"rule_path = Path(tempfile.gettempdir()) / \"concatenator\"\n", | ||
"rule_path.mkdir(exist_ok=True)\n", | ||
"rule_file = rule_path / \"data-stream.yml\"\n", | ||
"\n", | ||
"if rule_file.exists():\n", | ||
" rule_file.unlink()\n", | ||
"\n", | ||
"processor_config = {\n", | ||
" \"myconcatenator\":{ \n", | ||
" \"type\": \"concatenator\",\n", | ||
" \"specific_rules\": [str(rule_path)],\n", | ||
" \"generic_rules\": [\"/dev\"],\n", | ||
" }\n", | ||
" }\n", | ||
"\n", | ||
"def concat_with_rule(rule_yaml):\n", | ||
" mydocument = deepcopy(document)\n", | ||
" if rule_file.exists():\n", | ||
" rule_file.unlink()\n", | ||
" rule_file.write_text(rule_yaml)\n", | ||
" concatenator = Factory.create(processor_config)\n", | ||
" print(f\"before: {mydocument}\")\n", | ||
" concatenator.process(mydocument)\n", | ||
" print(f\"after: {mydocument}\")\n", | ||
" print(mydocument == expected)\n", | ||
" " | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### regex_fields version" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"[Deprecated]: regex_fields are no longer necessary. Use Lucene regex annotation.\n" | ||
] | ||
}, | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"\n", | ||
"\n", | ||
"[Deprecation warning]: regex_fields are no longer necessary. Use lucene regex annotation.\n", | ||
"before: {'data_stream': {'dataset': 'windows', 'namespace': 'devopslab', 'type': 'logs'}, '_op_type': 'create'}\n", | ||
"after: {'data_stream': {'dataset': 'windows', 'namespace': 'devopslab', 'type': 'logs'}, '_op_type': 'create', '_index': 'logs-windows-devopslab'}\n", | ||
"True\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"rule_yaml = \"\"\"---\n", | ||
"filter: 'data_stream.type: \".*lo.*\"' \n", | ||
"regex_fields:\n", | ||
" - \"data_stream.type\"\n", | ||
"concatenator:\n", | ||
" source_fields:\n", | ||
" - data_stream.type\n", | ||
" - data_stream.dataset\n", | ||
" - data_stream.namespace\n", | ||
" target_field: _index\n", | ||
" separator: \"-\"\n", | ||
" overwrite_target: false\n", | ||
" delete_source_fields: false\n", | ||
"\"\"\"\n", | ||
"\n", | ||
"concat_with_rule(rule_yaml)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Lucene conform version without the need of regex_fields" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"before: {'data_stream': {'dataset': 'windows', 'namespace': 'devopslab', 'type': 'logs'}, '_op_type': 'create'}\n", | ||
"after: {'data_stream': {'dataset': 'windows', 'namespace': 'devopslab', 'type': 'logs'}, '_op_type': 'create', '_index': 'logs-windows-devopslab'}\n", | ||
"True\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"rule_yaml = \"\"\"---\n", | ||
"filter: 'data_stream.type: \"/.*lo.*/\"' \n", | ||
"concatenator:\n", | ||
" source_fields:\n", | ||
" - data_stream.type\n", | ||
" - data_stream.dataset\n", | ||
" - data_stream.namespace\n", | ||
" target_field: _index\n", | ||
" separator: \"-\"\n", | ||
" overwrite_target: false\n", | ||
" delete_source_fields: false\n", | ||
"\"\"\"\n", | ||
"concat_with_rule(rule_yaml)\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.12.3" | ||
}, | ||
"vscode": { | ||
"interpreter": { | ||
"hash": "586280540a85d3e21edc698fe7b86af2848b9b02644e6c22463da25c40a3f1be" | ||
} | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.