Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of failed processing tasks #4727

Open
stroomdev66 opened this issue Jan 27, 2025 · 0 comments
Open

Improve handling of failed processing tasks #4727

stroomdev66 opened this issue Jan 27, 2025 · 0 comments
Labels
enhancement A new feature or enhancement to an existing feature

Comments

@stroomdev66
Copy link
Member

Normal data processing, even processing that fails with fatal parser exceptions etc, writes all errors to an error log but still marks the processing task as complete. This is because the process itself completed normally even if there are some problems with the data or the pipeline. When an error stream is produced it is easy to see what went wrong, fix the problem and then reprocess as required.

In some exceptional circumstances, processing tasks can be marked failed because the process does not complete normally. This can be caused by unusual errors such as out of memory exceptions and disk/storage failure. Thread interruptions also result in failed tasks. These can be caused by manual user intervention killing tasks via the Server Tasks screen, or by stopping nodes without stopping processing beforehand.

Once a task is marked as failed there is no easy way to get the data to reprocess. We curently have a manual workaround where we find failed tasks via a dashboard, find the associated stream ids and processor filters, then create new processor filters and delete the failed tasks manually from the database. This manual process has been tolerated for some time as we don't often get failed tasks.

We ideally need a new job that occasionally picks up failed tasks, creates an error stream for the failed process and then marks the failed task as complete. This would recover failed tasks in such a way that they would end up being treated the same way as any other stream processing job that encountered a fatal error, i.e. these errored stream processes could then be reprocessed the same way as any other errored stream.

As part of the reprocessing we would need to ensure that code that deletes superceeded data from previous processing also deletes any locked streams that may have been made during the previous failed process.

@stroomdev66 stroomdev66 added the enhancement A new feature or enhancement to an existing feature label Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or enhancement to an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant