-
We are having issues with workflows where task pods complete successfully (output.pbs uploaded with correct content) but somehow these completions are never registered and follow-up tasks don't start. The web console keeps showing these tasks as running. The only reference in logs that I can find is from FlyteAdmin pod and make little sense to me:
Does anyone have an idea? Also, found this:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Looks like the error message is too large. Try changing the failure back off configuration. Propeller is trying to recover. This can be controlled using: The pod could be shown as complete without any errors because maximum failure retries are configured to be 50. Refer to discussion #1876 If you have a large string with large outputs, you could configure propeller storage to increase the limit, although this is not recommended. It is better to offload using a file. |
Beta Was this translation helpful? Give feedback.
-
Also refer to #1876 |
Beta Was this translation helpful? Give feedback.
Looks like the error message is too large. Try changing the failure back off configuration. Propeller is trying to recover.
https://github.com/flyteorg/flytepropeller/blob/a2bfb996c77172419ea5d1562ce6af38e6eb1119/pkg/controller/nodes/task/pre_post_execution.go#L108
This can be controlled using:
flyte/kustomize/overlays/eks/flyte/config/propeller/core.yaml
Line 34 in 152aa6a
The pod could be shown as complete without any errors because maximum failure retries are configured to be 50. Refer to discussion #1876
If you have a large string with large outputs, you could configure propeller storage to increase the limit, although this is not recommended. It is b…