Skip to content

Commit

Permalink
Merge pull request #19001 from mapk-amazon/random-failure-fix
Browse files Browse the repository at this point in the history
Fixes random job failures in kubernetes
  • Loading branch information
mvdbeek authored Oct 17, 2024
2 parents b342cf5 + 3576afd commit 409b790
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion lib/galaxy/jobs/runners/kubernetes.py
Original file line number Diff line number Diff line change
Expand Up @@ -772,7 +772,10 @@ def check_watched_item(self, job_state):
# as probably this means that the k8s API server hasn't
# had time to fill in the object status since the
# job was created only too recently.
if len(job.obj["status"]) == 0:
# It is possible that k8s didn't account for the status of the pods
# and they are in the uncountedTerminatedPods status. In this
# case we also need to wait a moment
if len(job.obj["status"]) == 0 or job.obj["status"].get("uncountedTerminatedPods"):
return job_state
if "succeeded" in job.obj["status"]:
succeeded = job.obj["status"]["succeeded"]
Expand Down

0 comments on commit 409b790

Please sign in to comment.