On error handler in pipeline #47

skurfuerst · 2023-12-22T12:24:48Z

this on_error handler is executed if any other step in
the pipeline failed; and can be e.g. used to trigger a
notification on error.

The code can already be reviewed; I'll only add a README example.

this on_error handler is executed if any other step in the pipeline failed; and can be e.g. used to trigger a notification on error.

codecov · 2023-12-22T12:26:03Z

Codecov Report

Attention: 48 lines in your changes are missing coverage. Please review.

Comparison is base (d7fdf27) 71.07% compared to head (f64ad94) 70.42%.

Files	Patch %	Lines
prunner.go	65.46%	40 Missing and 8 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #47      +/-   ##
==========================================
- Coverage   71.07%   70.42%   -0.65%     
==========================================
  Files          20       20              
  Lines        2199     2330     +131     
==========================================
+ Hits         1563     1641      +78     
- Misses        517      562      +45     
- Partials      119      127       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sebobo · 2024-01-16T15:39:24Z

We need exactly this feature too, how can we help finishing this up and having a release?

hlubek · 2024-01-16T16:45:39Z

I'm having a look at it right now 👍

hlubek

Hi, I think we really need this feature. So thanks for taking the initiative here.

I think we need to change the implementation a bit though to fit better into the async handling of prunner so we do not cause issues by the on error task execution (see my comment with the locked mutex).

hlubek · 2024-01-16T17:40:45Z

prunner.go

+	// we use a detached taskRunner and scheduler to run the onError task, to
+	// run synchronously (as we are already in an async goroutine here), won't have any cycles,
+	// and to simplify the code.
+	taskRunner := r.createTaskRunner(j)


I think we might have an issue cancelation here, since the scheduler / task runner is detached. So this could block the whole thing if the on error task is not finishing.

Locking the pipeline runner mutex (which is for all of prunner) for the whole on error task execution is not good, because we basically block the whole prunner process. It is only okay to write lock the mutex for data structure updates.

I think we need to base this on the *PipelineJob, where we capture the state/context for each running pipeline job. Maybe it's also enough to run the on error scheduler in a go routine and use the WaitGroup of *PipelineRunner to put it into the "normal" waiting behavior.

What about putting the on error task execution in startJob:

// Run graph asynchronously r.wg.Add(1) go func() { defer r.wg.Done() lastErr := job.sched.Schedule(graph) if lastErr != nil { // TODO Schedule the on error task (sync) } r.JobCompleted(job.ID, lastErr) }()

(we need to implement some kind of first task error state here though, since the last error is not really helpful)

Ideally we would put this behaviour in the taskctl.Scheduler itself, but it's more generic without real knowledge of output store etc. . I'm thinking of some kind of ghost task that only appears and runs on first error in the pipeline but is already defined. The issue here would be the variables for the failed task stdout/stderr that needs to be put into the variables 🤔.

Sebobo · 2024-01-17T08:10:59Z

Thx @skurfuerst and @hlubek for your work on this.

To fully explain our use case: We have a long list of exit codes for various failures in the pipeline including a map to translate the error codes for devs and editors in the UI. So the onError task should be able to catch that code and in our case write it to the database for persistence, so we can collect the error reasons for the whole pipeline history and it should also do some cleanup task.
Right now I can only interpret the job logs, which disappear after a while.

Sebobo · 2024-01-31T10:18:02Z

Any update on this? (Sorry for nagging)

hlubek · 2024-01-31T10:41:55Z

@skurfuerst Did you find some time to look at my notes about the change? I think we could basically go with the implementation and improve it later, but I fear that we pull in a more complexity and concepts that might be hard to change later.

skurfuerst · 2024-02-01T08:55:49Z

@hlubek thanks for your comments :) I sadly did not find time to fix your comments yet; but I can try ro work on this in about two weeks :) In case you can prototype your thoughts, feel free to do so :)

All the best, Sebastian

Sebobo · 2024-02-23T08:40:39Z

Ping :D

Sebobo · 2024-04-10T09:44:05Z

Pong ;)

skurfuerst added 2 commits December 22, 2023 11:28

TASK: add another assertion to testcases to ensure they ran successfully

33759d0

FEATURE: allow specifying on_error handler in pipeline

0b5f67c

this on_error handler is executed if any other step in the pipeline failed; and can be e.g. used to trigger a notification on error.

skurfuerst requested review from hlubek and JamesAlias December 22, 2023 12:24

fix issue detected by linter -> improve error handling

ee0f305

hlubek requested changes Jan 16, 2024

View reviewed changes

Cosmetic changes while reviewing / added TODO

f64ad94

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On error handler in pipeline #47

On error handler in pipeline #47

skurfuerst commented Dec 22, 2023

codecov bot commented Dec 22, 2023 •

edited

Loading

Sebobo commented Jan 16, 2024

hlubek commented Jan 16, 2024

hlubek left a comment

hlubek Jan 16, 2024 •

edited

Loading

Sebobo commented Jan 17, 2024

Sebobo commented Jan 31, 2024

hlubek commented Jan 31, 2024

skurfuerst commented Feb 1, 2024

Sebobo commented Feb 23, 2024

Sebobo commented Apr 10, 2024

On error handler in pipeline #47

Are you sure you want to change the base?

On error handler in pipeline #47

Conversation

skurfuerst commented Dec 22, 2023

codecov bot commented Dec 22, 2023 • edited Loading

Codecov Report

Sebobo commented Jan 16, 2024

hlubek commented Jan 16, 2024

hlubek left a comment

Choose a reason for hiding this comment

hlubek Jan 16, 2024 • edited Loading

Choose a reason for hiding this comment

Sebobo commented Jan 17, 2024

Sebobo commented Jan 31, 2024

hlubek commented Jan 31, 2024

skurfuerst commented Feb 1, 2024

Sebobo commented Feb 23, 2024

Sebobo commented Apr 10, 2024

codecov bot commented Dec 22, 2023 •

edited

Loading

hlubek Jan 16, 2024 •

edited

Loading