Improve handling of failed processing tasks #4727

stroomdev66 · 2025-01-27T12:50:36Z

Normal data processing, even processing that fails with fatal parser exceptions etc, writes all errors to an error log but still marks the processing task as complete. This is because the process itself completed normally even if there are some problems with the data or the pipeline. When an error stream is produced it is easy to see what went wrong, fix the problem and then reprocess as required.

In some exceptional circumstances, processing tasks can be marked failed because the process does not complete normally. This can be caused by unusual errors such as out of memory exceptions and disk/storage failure. Thread interruptions also result in failed tasks. These can be caused by manual user intervention killing tasks via the Server Tasks screen, or by stopping nodes without stopping processing beforehand.

Once a task is marked as failed there is no easy way to get the data to reprocess. We curently have a manual workaround where we find failed tasks via a dashboard, find the associated stream ids and processor filters, then create new processor filters and delete the failed tasks manually from the database. This manual process has been tolerated for some time as we don't often get failed tasks.

We ideally need a new job that occasionally picks up failed tasks, creates an error stream for the failed process and then marks the failed task as complete. This would recover failed tasks in such a way that they would end up being treated the same way as any other stream processing job that encountered a fatal error, i.e. these errored stream processes could then be reprocessed the same way as any other errored stream.

As part of the reprocessing we would need to ensure that code that deletes superceeded data from previous processing also deletes any locked streams that may have been made during the previous failed process.

The text was updated successfully, but these errors were encountered:

stroomdev66 added the enhancement A new feature or enhancement to an existing feature label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve handling of failed processing tasks #4727

Improve handling of failed processing tasks #4727

stroomdev66 commented Jan 27, 2025

Improve handling of failed processing tasks #4727

Improve handling of failed processing tasks #4727

Comments

stroomdev66 commented Jan 27, 2025