Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test interuptabilty of job-runner synchronous prepare/finalize steps #585

Open
bloodearnest opened this issue Feb 27, 2023 · 2 comments
Open
Assignees

Comments

@bloodearnest
Copy link
Member

If we want to automate the deployment of job-runner, we need to be able to restart at any point.

Previously, interrupting a PREPARING operation could cause issues and jobs to fail. A docker cp subprocess could hang, for example.

We're now on linux VM, and also we've moved to the BindMount API, which means docker cp (or docker at all) is not involved. We've additionally made improvements to the logic around determining if a job has finished PREPARING or not.

We should test that job-runner can be interrupted during a PREPARING stage, and restart the PREPARING step when restarted.

We should test performance of redoing the copy (copying to a destination that already exists is slow that one that doesn't). We wouldn't want a big copy to be interupted and be slower the 2nd time.

@lucyb
Copy link
Contributor

lucyb commented Aug 23, 2023

For now, we should put an alert on it and observe its behaviour.

@madwort
Copy link
Contributor

madwort commented Aug 23, 2023

job server is preparing - killed & restart - does it error after the restart?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants