-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Outputs of a subworkflow in a remote launch plan is handled incorrectly #4369
Comments
BTW it is also very confusing having two config items with overlapping purposes:
|
I think in general having smaller |
@honnix this fix is rather confusing given the plethora of configuration options. Do you know if you are using signed urls? It seems the proto support for this flow is deprecated, but none of the configuration options are labeled as such. The overall API design suggests that fetching output data within admin is "best effort" and if it is not returned then the client should try. This leads me to believe that rather than attempted to return the error value here (in only the correct scenarios) that we should attempt to retrieve the data if it's missing from flytepropeller when we retrieve the execution. Thoughts? @pvditt please comment as well! |
@hamersaw The signed urls won't help in cases when flytepropeller needs to fetch outputs from flyteadmin for the following task executions and currently it doesn't use signed url for that if I'm not mistaken. Yes the Regarding retry, yes, if anything is in the critical path, a proper retry sounds like a great idea. In this case, however, a retry would not make any difference because there was no RPC failure so flytepropeller was not able to tell whether a workflow indeed didn't output anything or it was just flyteadmin refusing to fetch due to max size config. |
@hamersaw I believe that initial API design intention for retries was for the console. This PR removed the usage of UrlBlob on the console. I don't believe there's been an expectation of retries on these calls for a while. I made an issue to clean up UrlBlob. Good call out on retries from propellor. I'll add that to this PR. |
@hamersaw actually would it make sense to add retries on the propellor side when we already have system error retries? |
@honnix this issue should be closed by: #4602 There's a follow-up house-cleaning issue to have that GetExecutionData admin endpoint to bubble errors that I'll be resolving soon that doesn't block closing this issue. Please feel free to re-open this issue and tag me if this change does not fix the issue. |
Describe the bug
flyte/flytepropeller/pkg/controller/nodes/subworkflow/launchplan/admin.go
Line 283 in 9fe34db
storage.limits.maxDownloadMBs
is configured in flyteadmin, and it does not fail loudly when the outputs size is larger thanstorage.limits.maxDownloadMBs
. What's worse is that flytepropeller happily stores the 0-sized protobuf file to be used as the inputs of the following task.Currently the error message is impossible to understand by users and also very hard for flyte admins to investigate:
Expected behavior
flytepropeller should fail loudly to indicate that it could not receive expected outputs.
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: