Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

templateReferencing: Secure causing workflows to become stuck when workflow templates change #13850

Open
3 of 4 tasks
coreyhinkle opened this issue Nov 1, 2024 · 1 comment · May be fixed by #13859
Open
3 of 4 tasks
Assignees

Comments

@coreyhinkle
Copy link

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

When a workflow template is changed while a workflow is running in templateReferencing: Secure mode I expect workflows to fail. What I've seen is if a workflow is in the last step when it is edited, the workflow gets stuck in a constant running state after the script completes.

I was able to reproduce this by using the below workflow, waiting for it to hit the sleep, and then adding echo "test" after the sleep.

Version(s)

v3.5.10, v.3.5.12, c702ab72433eb8cd26db07f0025dceba91e5e994c8071b0df89b27b63a73f0d2

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

kind: WorkflowTemplate
metadata:
  name: test
  namespace: argo
spec:
  entrypoint: bash-script-example
  templates:
  - name: bash-script-example
    steps:
    - - name: print
        template: print-message
  - name: print-message
    script:
      image: busybox
      command: ["sh"]
      source: |
        echo "about to sleep"
        sleep 60
        echo "done sleeping"

Logs from the workflow controller

time="2024-11-01T15:21:02.866Z" level=info msg="Processing workflow" Phase= ResourceVersion=2198451 namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:02.880Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=0 workflow=test-f9gk9
time="2024-11-01T15:21:02.880Z" level=info msg="Updated phase  -> Running" namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:02.880Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:02.880Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:02.880Z" level=info msg="Steps node test-f9gk9 initialized Running" namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:02.880Z" level=info msg="StepGroup node test-f9gk9-2828767984 initialized Running" namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:02.880Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:02.880Z" level=info msg="Pod node test-f9gk9-1379899069 initialized Pending" namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:02.905Z" level=info msg="Created pod: test-f9gk9[0].print (test-f9gk9-print-message-1379899069)" namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:02.905Z" level=info msg="Workflow step group node test-f9gk9-2828767984 not yet completed" namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:02.905Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:02.905Z" level=info msg=reconcileAgentPod namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:02.914Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=2198454 workflow=test-f9gk9
time="2024-11-01T15:21:12.867Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=2198454 namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:12.867Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=test-f9gk9
time="2024-11-01T15:21:12.867Z" level=info msg="node changed" namespace=argo new.message= new.phase=Running new.progress=0/1 nodeID=test-f9gk9-1379899069 old.message= old.phase=Pending old.progress=0/1 workflow=test-f9gk9
time="2024-11-01T15:21:12.868Z" level=info msg="Workflow step group node test-f9gk9-2828767984 not yet completed" namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:12.868Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:12.868Z" level=info msg=reconcileAgentPod namespace=argo workflow=test-f9gk9
time="2024-11-01T15:21:12.874Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=2198616 workflow=test-f9gk9
time="2024-11-01T15:22:16.126Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=2198616 namespace=argo workflow=test-f9gk9
time="2024-11-01T15:22:16.127Z" level=error msg="Unable to set ExecWorkflow" error="WorkflowSpec may not change during execution when the controller is set `templateReferencing: Secure`" namespace=argo workflow=test-f9gk9

Logs from in your workflow's wait container

time="2024-11-01T15:22:06.452Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2024-11-01T15:22:06.452Z" level=info msg="No output parameters"
time="2024-11-01T15:22:06.452Z" level=info msg="No output artifacts"
time="2024-11-01T15:22:06.453Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: test-f9gk9/test-f9gk9-print-message-1379899069/main.log"
time="2024-11-01T15:22:06.453Z" level=info msg="Creating minio client using static credentials" endpoint="minio:9000"
time="2024-11-01T15:22:06.453Z" level=info msg="Saving file to s3" bucket=my-bucket endpoint="minio:9000" key=test-f9gk9/test-f9gk9-print-message-1379899069/main.log path=/tmp/argo/outputs/logs/main.log
time="2024-11-01T15:22:06.462Z" level=info msg="Save artifact" artifactName=main-logs duration=8.99804ms error="<nil>" key=test-f9gk9/test-f9gk9-print-message-1379899069/main.log
time="2024-11-01T15:22:06.462Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2024-11-01T15:22:06.462Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2024-11-01T15:22:06.476Z" level=info msg="Alloc=9151 TotalAlloc=17202 Sys=24149 NumGC=5 Goroutines=10"
@shuangkun
Copy link
Member

It seems to be related to the workflowtaskresult not being completed. After encountering "WorkflowSpec may not change during execution when the controller is set `templateReferencing: Secure", I saw that the workflow-controller executed

		err := woc.setStoredWfSpec()
		if err != nil {
			woc.markWorkflowError(ctx, err)
			return err
		}

@shuangkun shuangkun added the area/controller Controller issues, panics label Nov 3, 2024
shuangkun added a commit to shuangkun/argo-workflows that referenced this issue Nov 3, 2024
@shuangkun shuangkun self-assigned this Nov 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants