Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flytepropeller] Better handling for task aborts #5566

Conversation

pocheung1
Copy link

Tracking issue

https://flyte-org.slack.com/archives/C06SYN9QJ5N/p1721238762986439

Why are the changes needed?

When a Flyte agent sends the aborted status of a task execution back to Flyte Propeller, it is recorded as a task execution failure instead:

case flyteIdl.TaskExecution_ABORTED:
return core.PhaseInfoFailure(pluginErrors.TaskFailedWithError, "failed to run the job with aborted phase.\n"+resource.Message, taskInfo), nil

The Flyte UI ends up showing both the task execution and workflow execution as FAILED instead of ABORTED. This does not seem to align with user expectations.

What changes were proposed in this pull request?

This draft PR attempts to address the issue by improving handling of aborted task executions, and marking the task execution and workflow execution as ABORTED.

We would like to get some feedback on how aborted task executions should be handled and to see if this PR is a step in the right direction.

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link

welcome bot commented Jul 17, 2024

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

@pocheung1 pocheung1 force-pushed the better-handling-for-task-aborts branch from 44eba72 to e2bcf69 Compare July 17, 2024 20:44
Copy link

codecov bot commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 32.85714% with 47 lines in your changes missing coverage. Please review.

Project coverage is 60.12%. Comparing base (bba8c11) to head (e2bcf69).
Report is 233 commits behind head on master.

Files with missing lines Patch % Lines
flytepropeller/pkg/controller/nodes/executor.go 51.61% 10 Missing and 5 partials ⚠️
flytepropeller/pkg/controller/workflow/executor.go 0.00% 7 Missing and 2 partials ⚠️
...lytepropeller/pkg/controller/nodes/task/handler.go 0.00% 4 Missing and 1 partial ⚠️
...lyteplugins/go/tasks/pluginmachinery/core/phase.go 0.00% 2 Missing ⚠️
.../go/tasks/pluginmachinery/internal/webapi/cache.go 0.00% 2 Missing ⚠️
...propeller/pkg/apis/flyteworkflow/v1alpha1/iface.go 0.00% 2 Missing ⚠️
...ler/pkg/apis/flyteworkflow/v1alpha1/node_status.go 33.33% 1 Missing and 1 partial ⚠️
...tepropeller/pkg/controller/nodes/branch/handler.go 0.00% 1 Missing and 1 partial ⚠️
...er/pkg/controller/nodes/handler/transition_info.go 33.33% 2 Missing ⚠️
...er/pkg/controller/nodes/subworkflow/subworkflow.go 0.00% 2 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5566      +/-   ##
==========================================
- Coverage   60.99%   60.12%   -0.87%     
==========================================
  Files         793      646     -147     
  Lines       51325    46020    -5305     
==========================================
- Hits        31305    27669    -3636     
+ Misses      17136    15738    -1398     
+ Partials     2884     2613     -271     
Flag Coverage Δ
unittests-datacatalog 69.31% <ø> (ø)
unittests-flyteadmin 58.73% <ø> (+0.01%) ⬆️
unittests-flytecopilot 17.79% <ø> (ø)
unittests-flytectl ?
unittests-flyteidl 79.06% <ø> (+0.01%) ⬆️
unittests-flyteplugins 61.82% <0.00%> (+<0.01%) ⬆️
unittests-flytepropeller 57.24% <35.38%> (-0.08%) ⬇️
unittests-flytestdlib 65.68% <ø> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ddl-ebrown
Copy link
Contributor

It looks like this one probably isn't going to gain any traction with the project.

We (at Domino) would still like to see this scenario handled better within Flyte, but don't have the resources available to contribute anything else at the moment -- so going to close this one out.

Thanks for trying to figure out an approach @pocheung1 !

@ddl-ebrown ddl-ebrown closed this Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants