You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @Monokaix , in the current implementation, with the above setup, if the pod state changes from running to failed, the failed event should trigger a restart action on the job. However, if the Volcano controller restarts and the default out-of-sync request is handled first, the job's state will become finished before the next failed event, preventing the restart action from being executed, will this be covered by #3813?
Happy path:
job running -> pod failed event -> request with restart -> running state with restart job action executed.
Corner case:
job running -> pod failed event -> somehow out of sync request handled -> job status changed to failed -> request with restart (generated by pod failed) -> finished state won't execute the actions.
Please describe your problem in detail
vcjob with one worker pod and policy like following, if the pod failed, the vcjob will transit into failed state without restarting the job/pod.
According to the code, the finished state won't execution the action defined in the policies, just want to confirm if this is the by design behaviour:
volcano/pkg/controllers/job/state/finished.go
Lines 28 to 31 in 68fba2c
Any other relevant information
No response
The text was updated successfully, but these errors were encountered: