You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When deploying a Canary Rollout (using workloadRef) in Argo Rollouts v1.8.0, if the new version's pods fail to come up (e.g. due to ImagePullBackOff), the rollout remains stuck in a "Progressing" state indefinitely. Even with progressDeadlineSeconds: 600 and progressDeadlineAbort: true configured, the rollout never marks itself as failed and manual abort commands (kubectl argo rollouts abort <rollout-name>) do not take effect unless the failing ReplicaSet is manually deleted. This blocks any subsequent updates, as the rollout remains stuck with the failing new ReplicaSet.
To Reproduce
Create a Canary Rollout using workloadRef to an existing Deployment.
Update the rollout to use a new image that is misconfigured or non-existent so that pods fail (e.g., trigger ImagePullBackOff).
Confirm that the rollout status remains "Progressing" with a message like:
Name: xapproxy
Namespace: red
Status: ◌ Progressing
Message: more replicas need to be updated
Strategy: Canary
Step: 0/4
SetWeight: 10
ActualWeight: 0
Images: dockerhub.io/istio/proxyv2:1.17.1 (canary, stable)
dockerhub.io/myproxy:25.2.1.3-745.76dfaae (canary)
dockerhub.io/myproxy:25.2.1.3-746.76dfaae (stable)
and observe that the rollout does not transition to a failed/aborted state.
As a workaround, executing argo undo/abort: kubectl argo rollouts undo <rollout-name> -n <namespace> or kubectl argo rollouts abort <rollout-name> -n <namespace>
followed by manually deleting the failing ReplicaSet (e.g., kubectl delete rs <failing-replicaset> -n <namespace>) is required before any new version updates are applied.
Expected behavior
The rollout controller should detect that the new ReplicaSet’s pods are not becoming Ready within the defined progress deadline and automatically abort or mark the rollout as failed.
The abort command should work regardless of the pod state, allowing the rollout to revert to the stable ReplicaSet and enabling new updates.
No manual intervention (such as deleting the failing ReplicaSet) should be necessary.
Version
Argo Rollouts version: 1.8.0 (stable, released ~2 weeks ago)
The text was updated successfully, but these errors were encountered:
revandarth
changed the title
Rollout Stuck Indefinitely When New Pods Fail to Come Up – Abort and Progress Deadline Not Triggering
Rollout stuck indefinitely when new pods fail to come up – abort and progressDeadline NOT triggering
Feb 12, 2025
Can you confirm if this behavior exists on older versions? Does this also only effect workloadRef, if you don't use workloadRef does it behave as expected?
Checklist:
Describe the bug
When deploying a Canary Rollout (using workloadRef) in Argo Rollouts v1.8.0, if the new version's pods fail to come up (e.g. due to ImagePullBackOff), the rollout remains stuck in a "Progressing" state indefinitely. Even with
progressDeadlineSeconds: 600
andprogressDeadlineAbort: true
configured, the rollout never marks itself as failed and manual abort commands (kubectl argo rollouts abort <rollout-name>
) do not take effect unless the failing ReplicaSet is manually deleted. This blocks any subsequent updates, as the rollout remains stuck with the failing new ReplicaSet.To Reproduce
workloadRef
to an existing Deployment.and observe that the rollout does not transition to a failed/aborted state.
kubectl argo rollouts undo <rollout-name> -n <namespace>
orkubectl argo rollouts abort <rollout-name> -n <namespace>
followed by manually deleting the failing ReplicaSet (e.g.,
kubectl delete rs <failing-replicaset> -n <namespace>
) is required before any new version updates are applied.Expected behavior
Version
Argo Rollouts version: 1.8.0 (stable, released ~2 weeks ago)
The text was updated successfully, but these errors were encountered: