[ENH] Mark wandb run as preempting as soon as signal received #17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Mark the job as preempting on wandb as soon as the interruption signal is received, so if the job suddenly dies wandb will report it as preempting/preempted and not crashed/failed. This ensures it is clear when reading wandb which jobs have stopped because they were preempted and which died of their own accord. This doesn't make much impact in the setting where we reach the time limit of the job and requeue it due to SIGUSR1, but will make more of an impact when the job is interrupted without much preemption warning due to queue priority.
Note that we somewhat redundantly still call
wandb.mark_preempting()
later in the code inCheckpointer.preempt_wandb_run
. This is just to make sure the job is definitely marked correctly on wandb even if the two methods are called in quick succession.