add abiltity to configure error output #668

ekneg54 · 2024-09-16T09:46:10Z

This changes the error handling implementation of logprep.
This PR aims at two main goals.

no error should permit further processing

This is handled by not raising FatalOutputError or FatalInputError but instead raise a CriticalInputError or a CriticalOutputError.
No error is handled in the pipeline process anymore all error causing events are written to error output.
To achieve this, I had to change the batch_finished_callback mechanic. As now every event gotten from input via get_next is committed to kafka utilizing the batch_finished_callback mechanic in the pipeline.py. no connections between intput and output connectors anymore.

no event should get lost

to make it simple. This PR has to ensure, that every event goes into output, error output or gets logged to console as last resort.

Every error raising event will be serialized together with its raising error to an error event and is put into a multiprocessing.Queue (ThrottlingQueue). In the main Thread these events were handled in a configured error output connector which indeed can be any output connector implemented in logprep.

To achieve theses goals I had to reimplement the opensearch output connector to simplify things a lot.

Please have a look on my changes and lets discuss. Feel free to give feedback and to ask your questions.
It is a very big PR. Sorry for that but the cut was a fundamental one.

ekneg54 · 2024-10-08T08:36:52Z

remove store_failed

add a componentqueuelistener to handle errors from queue into output connector

fix pipeline_manager tests

add tests for componentqueuelistener add more tests

remove double property WIP

bump test coverage for pipeline_manager to 100 percent fix most acceptance tests by adding error_output add more tests start fixing pipeline.py tests

add basic tests for pipeline_result add tests for pipeline_result

dtrai2

Some more considerations:

please check the rst configuration references for the documentation, currently the error_output is missing
tests/unit/charts/test_output_config.py has a error_index inside the opensearch configuration, which should be removed
consider if closing the component queue listener queue before draining (prevents adding new events from other processes)
please check the test coverage

CHANGELOG.md

charts/logprep/values.yaml

examples/exampledata/config/pipeline.yml

logprep/abc/component.py

tests/unit/framework/test_pipeline_manager.py

tests/unit/util/test_configuration.py

logprep/processor/base/exceptions.py

- Renamed multiple test functions for clarity and consistency. - Updated logging messages to provide better context for errors. - Improved documentation links and descriptions in YAML and Markdown files. - Fix method signatures

- Consolidate OutputQueueListener to use multiprocessing exclusively. - Remove threading implementation and related configurations. - Update tests and documentation to reflect these changes.

- Updated method name for improved clarity and consistency. - Adjusted related tests and function calls accordingly. - Enhanced documentation within the new method.

- Removed redundant case clauses for `CriticalOutputError` handling. - Updated unit tests to cover new error handling logic.

…put_config.py

- Ensure test does not hang if error output file is not created. - Timeout set to 10 seconds to prevent indefinite waiting periods.

- Ensure volume mounts do not include error-output-config - Check command string does not reference error-output-config.yaml

- InvalidConfigurationError receives an unspecified amount of arguments that couldn't be successfully forwarded to the LogprepException

- Added a new test to ensure logging of errors when error output itself encounters an error.

- ignore the 1 that is added to the error_queque for process synchronization reasons

- adjust `wait_for_output` to exclude specific forbidden outputs - add comment to clarify purpose of exclusion

- and as it can't be reached it also can't be tested

- Introduced tests for `listen` and `drain_queue` methods. - Verified logging of unexpected exceptions during queue processing. - Ensured specific items and sentinel values are ignored during queue operations. - Increases test coverage

- test is working locally but runs forever in ci pipeline

This commit refactors the ConfluentKafkaInput class to store offsets for the last message referenced by `_last_valid_records`. Previously, offsets were stored for each kafka partition in `_last_valid_records`, but now only the last valid record is stored. This change improves the efficiency of offset storage and reduces memory usage. Code changes: - Modified `ConfluentKafkaInput` class in `logprep/connector/confluent_kafka/input.py` - Removed `_last_valid_records` dictionary and replaced it with `_last_valid_record` variable - Updated `batch_finished_callback` method to store offsets for the last valid record

ekneg54 added the enhancement New feature or request label Sep 16, 2024

ekneg54 self-assigned this Sep 16, 2024

ekneg54 force-pushed the dev-implement-error-output branch 3 times, most recently from ae1d775 to 757ef5a Compare September 20, 2024 12:42

ekneg54 force-pushed the dev-implement-error-output branch from 68d2cdb to 69446fc Compare September 26, 2024 17:33

This was referenced Sep 26, 2024

add minor improvements #676

Merged

Fix exporter restart #677

Merged

ekneg54 force-pushed the dev-implement-error-output branch 2 times, most recently from 1ff01f5 to 2f1aab3 Compare October 2, 2024 12:36

ekneg54 marked this pull request as ready for review October 2, 2024 13:27

ekneg54 requested review from ppcad and clumsy9 October 2, 2024 13:40

ekneg54 force-pushed the dev-implement-error-output branch 5 times, most recently from 1238998 to 895c4cd Compare October 7, 2024 11:06

ekneg54 marked this pull request as draft October 8, 2024 08:40

ekneg54 added 10 commits October 9, 2024 11:40

add abiltity to configure error output

0b62cff

remove store_failed and error_topic

43dd315

remove store_failed

add error queue in pipeline_manager

d4504be

add a componentqueuelistener to handle errors from queue into output connector

refactor pipeline.py

28019da

fix pipeline_manager tests

add tests for ComponentQueueListener

1f5317c

add tests for componentqueuelistener add more tests

add wait for erorr output health

0f177fa

remove double log message

102d246

remove double property WIP

add test for infinite restart

638f256

bump test coverage for pipeline_manager to 100 percent fix most acceptance tests by adding error_output add more tests start fixing pipeline.py tests

adjust examples

915bf6e

add more result documentation

ae0574e

add basic tests for pipeline_result add tests for pipeline_result

dtrai2 requested changes Oct 23, 2024

View reviewed changes

dtrai2 and others added 25 commits October 24, 2024 10:59

Refactor test names and improve error logging

a4a1b23

- Renamed multiple test functions for clarity and consistency. - Updated logging messages to provide better context for errors. - Improved documentation links and descriptions in YAML and Markdown files. - Fix method signatures

fix black

d42b3f0

refactor and clean up OutputQueueListener

6acb311

- Consolidate OutputQueueListener to use multiprocessing exclusively. - Remove threading implementation and related configurations. - Update tests and documentation to reflect these changes.

rename get_component_instance to get_output_instance

3ea7916

- Updated method name for improved clarity and consistency. - Adjusted related tests and function calls accordingly. - Enhanced documentation within the new method.

remove mathc cases for critical output error handling

74ff842

- Removed redundant case clauses for `CriticalOutputError` handling. - Updated unit tests to cover new error handling logic.

remove unnecessary loop in test

ae1a130

remove unusedself.rule = rule assignment

342156b

update configuration doc with new parameters

d79ca2f

remove error_index key from opensearch_config dict in charts/test_out…

872bfef

…put_config.py

add timeout to error output test

6e15030

- Ensure test does not hang if error output file is not created. - Timeout set to 10 seconds to prevent indefinite waiting periods.

extend error output config test

68d7850

- Ensure volume mounts do not include error-output-config - Check command string does not reference error-output-config.yaml

add assertion to verify event _index in message backlog event

af71620

add *args to LogprepException

0e945c0

- InvalidConfigurationError receives an unspecified amount of arguments that couldn't be successfully forwarded to the LogprepException

add error handling acceptance test

8504424

- Added a new test to ensure logging of errors when error output itself encounters an error.

prevent writing 1 to error output

59508ed

- ignore the 1 that is added to the error_queque for process synchronization reasons

update error output check in acceptance test

8fda657

- adjust `wait_for_output` to exclude specific forbidden outputs - add comment to clarify purpose of exclusion

remove redundant error log statement as it can't be reached

de06bd6

- and as it can't be reached it also can't be tested

add tests pipeline manager

355fcca

- Introduced tests for `listen` and `drain_queue` methods. - Verified logging of unexpected exceptions during queue processing. - Ensured specific items and sentinel values are ignored during queue operations. - Increases test coverage

add test for reading out error from opensearch bulk call

d69495f

simplify test for ci pipeline

183fe1a

- test is working locally but runs forever in ci pipeline

change loglevel for bypassing rule tree message

29c0611

remove stuck test

d210506

fix last valid record handling in confluent kafka input tests

99b393b

add test to check if error output healthcheck is added to exporter

4db7e73

dtrai2 approved these changes Oct 29, 2024

View reviewed changes

ekneg54 marked this pull request as ready for review October 29, 2024 13:17

ekneg54 merged commit b593be6 into main Oct 29, 2024
13 checks passed

ekneg54 deleted the dev-implement-error-output branch October 29, 2024 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add abiltity to configure error output #668

add abiltity to configure error output #668

ekneg54 commented Sep 16, 2024 •

edited

Loading

ekneg54 commented Oct 8, 2024 •

edited

Loading

dtrai2 left a comment •

edited by ekneg54

Loading

add abiltity to configure error output #668

add abiltity to configure error output #668

Conversation

ekneg54 commented Sep 16, 2024 • edited Loading

ekneg54 commented Oct 8, 2024 • edited Loading

dtrai2 left a comment • edited by ekneg54 Loading

Choose a reason for hiding this comment

ekneg54 commented Sep 16, 2024 •

edited

Loading

ekneg54 commented Oct 8, 2024 •

edited

Loading

dtrai2 left a comment •

edited by ekneg54

Loading