Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: move taskresult reconcile earlier to avoid workflow stuck. Fixes: #13085 #13859

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shuangkun
Copy link
Member

@shuangkun shuangkun commented Nov 3, 2024

Fixes #13850

Motivation

Modifications

The root cause is that taskresult has not been reconceiled. So move taskresult reconcile earlier to avoid workflow stuck.

time="2024-11-03T17:44:15.553Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=1753330 namespace=argo workflow=workflow-template-hello-world-gfbjh
time="2024-11-03T17:44:15.554Z" level=debug msg="Task results completion status: map[workflow-template-hello-world-gfbjh-2798872602:false]" namespace=argo workflow=workflow-template-hello-world-gfbjh
time="2024-11-03T17:44:15.554Z" level=debug msg="taskresults of workflow are incomplete or still have daemon nodes, so can't mark workflow completed" fromPhase=Running namespace=argo toPhase=Error workflow=workflow-template-hello-world-gfbjh
time="2024-11-03T17:44:15.554Z" level=error msg="Unable to set ExecWorkflow" error="WorkflowSpec may not change during execution when the controller is set `templateReferencing: Secure`" namespace=argo workflow=workflow-template-hello-world-gfbjh

Verification

The scenario in the issue is verified and the workflow is set to error.

@shuangkun
Copy link
Member Author

/retest

@isubasinghe
Copy link
Member

I am a bit worried we are just moving this around, this was an earlier change that moved the calling of the taskresult reconcile function to where it is now.

@shuangkun
Copy link
Member Author

I am a bit worried we are just moving this around, this was an earlier change that moved the calling of the taskresult reconcile function to where it is now.

Thank you for your reply. After testing, if there is a problem between processing logs and taskResultReconciliation execution (for example, the template in the templateReferencing: Secure scenario is modified), the workflow will be stuck because the taskresult of the created pod will never be reconciled and returned.

@isubasinghe
Copy link
Member

Yeah that makes sense, thanks for taking a look.

@sstarcher
Copy link

@shuangkun thanks for your work. @isubasinghe anything we can do to move this along?

@isubasinghe
Copy link
Member

@sstarcher we cannot move the taskresult reconcile to earlier, it will result in the workflow getting stuck. So we need to somehow address this before we can get this merged.

@shuangkun
Copy link
Member Author

Theoretically, after entering Processing, the earlier the taskresult is processed, the less likely it is to get stuck.

@shuangkun shuangkun added the area/controller Controller issues, panics label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

templateReferencing: Secure causing workflows to become stuck when workflow templates change
3 participants