Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Mark wandb run as preempting as soon as signal received #17

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions wandb_preempt/checkpointer.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,11 +115,15 @@ def mark_preempted(self, sig: int, frame: Optional[FrameType]):
sig: The signal number.
frame: The current stack frame.
"""
if self.marked_preempted:
self.maybe_print(f"Received signal {sig}, but already preempting.")
return
self.marked_preempted = True
self.maybe_print(
f"Received signal {sig}. Marking as pre-empted and will halt and requeue"
f"Received signal {sig}. Marking as preempting and will halt and requeue"
" the job at next call of checkpointer.step()."
)
self.marked_preempted = True
wandb.mark_preempting()

def checkpoint_path(self, counter: int) -> str:
"""Get the path to a checkpoint file for a given checkpointing step.
Expand Down