-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: logfile quoting and scancel error handling #140
Changes from 2 commits
ceec90f
a0cccb4
d48c726
1ede6fa
6f0e3e1
fc7973d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -137,7 +137,7 @@ def run_job(self, job: JobExecutorInterface): | |
f"sbatch " | ||
f"--parsable " | ||
f"--job-name {self.run_uuid} " | ||
f"--output {slurm_logfile} " | ||
f"--output '{slurm_logfile}' " | ||
f"--export=ALL " | ||
f"--comment {comment_str}" | ||
) | ||
|
@@ -408,6 +408,13 @@ def cancel_jobs(self, active_jobs: List[SubmittedJobInfo]): | |
) | ||
except subprocess.TimeoutExpired: | ||
self.logger.warning("Unable to cancel jobs within a minute.") | ||
except subprocess.CalledProcessError as e: | ||
msg = e.stderr.decode().strip() | ||
if msg: | ||
msg = f": {msg}" | ||
raise WorkflowError( | ||
f"Unable to cancel jobs with scancel (exit code {e.returncode}){msg}" | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am unsure about the exit codes of slurm scancel. Do you know more? THis here would fail with a WorkflowError for anything except 0. In the docs there is nothing about exit codes. I recently received an exit code of 8 from scancel. And gemini says the following without being able to give me a source for it:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @cmeesters |
||
|
||
async def job_stati(self, command): | ||
"""Obtain SLURM job status of all submitted jobs with sacct | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance exception handling by using
raise ... from err
.The new exception handling block improves error reporting. However, it is recommended to use
raise ... from err
to distinguish exceptions from errors in exception handling.Apply this diff to enhance exception handling:
Committable suggestion
Tools
Ruff