You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Normal data processing, even processing that fails with fatal parser exceptions etc, writes all errors to an error log but still marks the processing task as complete. This is because the process itself completed normally even if there are some problems with the data or the pipeline. When an error stream is produced it is easy to see what went wrong, fix the problem and then reprocess as required.
In some exceptional circumstances, processing tasks can be marked failed because the process does not complete normally. This can be caused by unusual errors such as out of memory exceptions and disk/storage failure. Thread interruptions also result in failed tasks. These can be caused by manual user intervention killing tasks via the Server Tasks screen, or by stopping nodes without stopping processing beforehand.
Once a task is marked as failed there is no easy way to get the data to reprocess. We curently have a manual workaround where we find failed tasks via a dashboard, find the associated stream ids and processor filters, then create new processor filters and delete the failed tasks manually from the database. This manual process has been tolerated for some time as we don't often get failed tasks.
We ideally need a new job that occasionally picks up failed tasks, creates an error stream for the failed process and then marks the failed task as complete. This would recover failed tasks in such a way that they would end up being treated the same way as any other stream processing job that encountered a fatal error, i.e. these errored stream processes could then be reprocessed the same way as any other errored stream.
As part of the reprocessing we would need to ensure that code that deletes superceeded data from previous processing also deletes any locked streams that may have been made during the previous failed process.
The text was updated successfully, but these errors were encountered:
Normal data processing, even processing that fails with fatal parser exceptions etc, writes all errors to an error log but still marks the processing task as complete. This is because the process itself completed normally even if there are some problems with the data or the pipeline. When an error stream is produced it is easy to see what went wrong, fix the problem and then reprocess as required.
In some exceptional circumstances, processing tasks can be marked failed because the process does not complete normally. This can be caused by unusual errors such as out of memory exceptions and disk/storage failure. Thread interruptions also result in failed tasks. These can be caused by manual user intervention killing tasks via the
Server Tasks
screen, or by stopping nodes without stopping processing beforehand.Once a task is marked as failed there is no easy way to get the data to reprocess. We curently have a manual workaround where we find failed tasks via a dashboard, find the associated stream ids and processor filters, then create new processor filters and delete the failed tasks manually from the database. This manual process has been tolerated for some time as we don't often get failed tasks.
We ideally need a new job that occasionally picks up failed tasks, creates an error stream for the failed process and then marks the failed task as complete. This would recover failed tasks in such a way that they would end up being treated the same way as any other stream processing job that encountered a fatal error, i.e. these errored stream processes could then be reprocessed the same way as any other errored stream.
As part of the reprocessing we would need to ensure that code that deletes superceeded data from previous processing also deletes any locked streams that may have been made during the previous failed process.
The text was updated successfully, but these errors were encountered: