-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: update the storage-version-migration script to complete execution in finite time #1682
base: main
Are you sure you want to change the base?
fix: update the storage-version-migration script to complete execution in finite time #1682
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR needs a bit of re-work. The script as is will always complete and return false positives.
if [[ "${jobStatus}" == *"Complete"* ]]; then | ||
echo "Job ${JOB_NAME} has completed successfully!" | ||
exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not sufficient. condition.type
tells you if the status condition is reporting the Complete
or Failed
. The Complete
and Failed
condition types should always be populated in the Conditions
array.
You need to inspect the condition.status
field, which can be True
, False
, or Unknown
.
while [ "$(kubectl -n shipwright-build get job "${JOB_NAME}" -o json | jq -r '.status.completionTime // ""')" == "" ]; do | ||
echo "[INFO] Storage version migraton job is still running" | ||
sleep 10 | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably where the bug is. Per the Job API documentation:
completionTime(Time):
Represents time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC. The completion time is set when the job finishes successfully, and only then. The value cannot be updated or removed. The value indicates the same or later point in time as the startTime field.
I suspect that in your case, the job is failing. This script doesn't see a completion time because the Job
never succeeds, hence it just runs forever.
See https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/job-v1/#JobStatus
Changes
hack/storage-version-migration.sh
script is mentioned in the install steps. If that script is executed it runs infinitely without reporting any success or failure. This PR adds changes to put checks inhack/storage-version-migration.sh
script to give success or failure response and terminate the execution in finite time.Fixes #1680
Submitter Checklist
See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.
Release Notes