You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
I have searched existing issues and could not find a match for this bug
In our environment, we encountered an issue where the workflow continues running after an error occurred.
As shown in the image, the pod node encountered an error, but the step node shows success.
The workflow continues running and causes an unexpected result.
(on the right is a normal error situation)
After some investigations, we found that when workflow-controller is busy, it will cause the informer to be inconsistent with the cluster.
this causes lots of 409 warning logs:
Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \\\""my-another-workflow0vcmln\\\"": the object has been modified; please apply your changes to the latest version and try again Conflict
It also causes the event processing of workflows to be messed up, and it looks to be the root cause of the issue we encountered
It is difficult to reproduce, can anyone help with this issue?
timeline:
A: the step occurred error
B: the step after A
09:35:39.062 -> A init Pending
09:35:39.073 -> A create pod
09:35:42.109 -> A update to phase Pending (PodInitializing)
09:35:48:943 -> A update to phase Running
09:36:46:040 -> A update to phase Succeeded
09:36:49:260 -> A update to phase Succeeded
09:36:49.279 -> B init Pending
09:36:49.493 -> B create pod
09:36:52.382 -> B update phase to pending (PodInitializing)
09:36:59:301 -> A Pending (PodInitializing)
09:37:02:301 -> A update to phase Running
09:37:05:430 -> A update to phase Running
09:37:08.673 -> A update to phase Failed
09:38:15.793 -> A update to phase Failed
09:38:15.793 -> B update phase to Succeeded
@jingkkkkai Given that we only support the last two minor versions (3.6.x and 3.5.y), could you please confirm if this bug exists on 3.5 or 3.6? Thanks
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
In our environment, we encountered an issue where the workflow continues running after an error occurred.
As shown in the image, the pod node encountered an error, but the step node shows success.
The workflow continues running and causes an unexpected result.
(on the right is a normal error situation)
After some investigations, we found that when
workflow-controller
is busy, it will cause the informer to be inconsistent with the cluster.this causes lots of 409 warning logs:
It also causes the event processing of workflows to be messed up, and it looks to be the root cause of the issue we encountered
It is difficult to reproduce, can anyone help with this issue?
timeline:
workflow-controller related logs:
Version(s)
v3.4.8
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
N/A
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: