-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: logfile quoting and scancel error handling #140
Conversation
WalkthroughThe changes involve modifications to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant SLURMExecutor
participant Subprocess
User->>SLURMExecutor: run_job()
SLURMExecutor->>Subprocess: Execute SLURM command with 'slurm_logfile'
Subprocess-->>SLURMExecutor: Return success or error
SLURMExecutor-->>User: Return job status
User->>SLURMExecutor: cancel_jobs()
SLURMExecutor->>Subprocess: Execute scancel command
alt Error Occurred
Subprocess-->>SLURMExecutor: Raise CalledProcessError
SLURMExecutor-->>User: Return error message with exit code
else Success
Subprocess-->>SLURMExecutor: Return success
SLURMExecutor-->>User: Confirm cancellation
end
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (1)
- snakemake_executor_plugin_slurm/init.py (2 hunks)
Additional context used
Ruff
snakemake_executor_plugin_slurm/__init__.py
415-417: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Additional comments not posted (1)
snakemake_executor_plugin_slurm/__init__.py (1)
140-140
: LGTM!The change to enclose
slurm_logfile
in single quotes ensures that file paths with spaces or special characters are correctly handled when passed to the SLURM command.The code changes are approved.
except subprocess.CalledProcessError as e: | ||
msg = e.stderr.decode().strip() | ||
if msg: | ||
msg = f": {msg}" | ||
raise WorkflowError( | ||
f"Unable to cancel jobs with scancel (exit code {e.returncode}){msg}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance exception handling by using raise ... from err
.
The new exception handling block improves error reporting. However, it is recommended to use raise ... from err
to distinguish exceptions from errors in exception handling.
Apply this diff to enhance exception handling:
-raise WorkflowError(
- f"Unable to cancel jobs with scancel (exit code {e.returncode}){msg}"
-)
+raise WorkflowError(
+ f"Unable to cancel jobs with scancel (exit code {e.returncode}){msg}"
+) from e
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
except subprocess.CalledProcessError as e: | |
msg = e.stderr.decode().strip() | |
if msg: | |
msg = f": {msg}" | |
raise WorkflowError( | |
f"Unable to cancel jobs with scancel (exit code {e.returncode}){msg}" | |
) | |
except subprocess.CalledProcessError as e: | |
msg = e.stderr.decode().strip() | |
if msg: | |
msg = f": {msg}" | |
raise WorkflowError( | |
f"Unable to cancel jobs with scancel (exit code {e.returncode}){msg}" | |
) from e |
Tools
Ruff
415-417: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
except subprocess.CalledProcessError as e: | ||
msg = e.stderr.decode().strip() | ||
if msg: | ||
msg = f": {msg}" | ||
raise WorkflowError( | ||
f"Unable to cancel jobs with scancel (exit code {e.returncode}){msg}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure about the exit codes of slurm scancel. Do you know more? THis here would fail with a WorkflowError for anything except 0. In the docs there is nothing about exit codes. I recently received an exit code of 8 from scancel. And gemini says the following without being able to give me a source for it:
0: The job was successfully canceled.
1: The job could not be found.
2: The job is already in the completed state.
3: The job is already in the canceled state.
4: The job is already in the suspended state.
5: The job is already in the pending state.
6: The job is already running.
7: The job is already in the requeued state.
8: The job was successfully canceled.
9: The job was successfully suspended.
10: The job was successfully requeued.
11: The job was successfully resumed.
12: The job was successfully held.
13: The job was successfully released.
14: The job was successfully signaled.
15: The job was successfully terminated.
16: The job was successfully restarted.
17: The job was successfully updated.
18: The job was successfully requeued to a different partition.
19: The job was successfully requeued to a different queue.
20: The job was successfully requeued to a different priority.
21: The job was successfully requeued to a different account.
22: The job was successfully requeued to a different user.
23: The job was successfully requeued to a different group.
24: The job was successfully requeued to a different association.
25: The job was successfully requeued to a different node.
26: The job was successfully requeued to a different nodelist.
27: The job was successfully requeued to a different feature.
28: The job was successfully requeued to a different constraint.
29: The job was successfully requeued to a different time.
30: The job was successfully requeued to a different deadline.
31: The job was successfully requeued to a different dependency.
32: The job was successfully requeued to a different requeue priority.
33: The job was successfully requeued to a different requeue time.
34: The job was successfully requeued to a different requeue deadline.
35: The job was successfully requeued to a different requeue dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @cmeesters
will take a look in the late afternoon - right now, I have meeting after meeting. |
@johanneskoester I tried to read the source code. Didn't help. Consider this: $ scancel
scancel: error: No job identification provided
$ echo $?
1
$ sacct -j 16161523 -o state -X
State
----------
COMPLETED
$ scancel 16161523
$ echo $?
0 Which, according to your list, ought to be 2 for the last line. Also, your list states, that exit codes 8 and 0 are identical! Some of the listed codes do not make sense at all: BTW I like your PR, might have a look into #136 (the feature works for me, but the tests fail, because an apparent NONE type?). |
PS black is ok with the length of line 489, the CI is not. That is an issue in itself, don't you think? after the post about the signals: If we erase considering the error code (because of the questionable purpose), we can delete it and the line shortens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (1)
- snakemake_executor_plugin_slurm/init.py (2 hunks)
Additional context used
Ruff
snakemake_executor_plugin_slurm/__init__.py
415-417: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Additional comments not posted (1)
snakemake_executor_plugin_slurm/__init__.py (1)
140-140
: LGTM!The change to enclose
slurm_logfile
in single quotes ensures that file paths with spaces or special characters are correctly handled.
Thanks for checking this (I really only stupidly pasted the AI output on the error codes, this again shows how useless this can be ATM). Still, since I observed the error code 8, let us keep it in the exception message for now, maybe it is useful for some people. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (1)
- snakemake_executor_plugin_slurm/init.py (2 hunks)
Additional context used
Ruff
snakemake_executor_plugin_slurm/__init__.py
415-418: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Additional comments not posted (1)
snakemake_executor_plugin_slurm/__init__.py (1)
140-140
: Approved change to logfile quoting.The change to enclose the
slurm_logfile
variable in single quotes is a good practice to ensure that paths with spaces or special characters are correctly handled. This is crucial for maintaining the integrity of file paths in a UNIX environment.
introduced "raise Error from exception" as suggested by coderabbit - while not necessary, it can make future error handling a bit more traceable. Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)
415-418
: LGTM! Nitpick: Consider removingfrom e
.Using
from e
is a good practice to distinguish the exception from errors in exception handling. However, in this specific case, it may not be necessary as there are no additional exception handling steps. Consider removing it for simplicity.-raise WorkflowError( - "Unable to cancel jobs with scancel " - f"(exit code {e.returncode}){msg}" -) from e +raise WorkflowError( + "Unable to cancel jobs with scancel " + f"(exit code {e.returncode}){msg}" +)
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (1)
- snakemake_executor_plugin_slurm/init.py (2 hunks)
Additional comments not posted (2)
snakemake_executor_plugin_slurm/__init__.py (2)
140-140
: LGTM!Enclosing the
slurm_logfile
in single quotes is a good practice to handle file paths with spaces or special characters correctly.
411-418
: Improved error handling for job cancellation.The new exception handling block for
subprocess.CalledProcessError
enhances error reporting when canceling jobs withscancel
. It captures the exit code and standard error output to provide more context about the failure. This improves the robustness of the job cancellation process.
Gnarf, now the tests fail, because of missing test files. Wonderful. Not something, I will check on a Sunday morning, though. Tomorrow, I will have lectures till noon, only them I might have time to investigate. |
Should be fixed now. |
🤖 I have created a release *beep* *boop* --- ## [0.10.1](v0.10.0...v0.10.1) (2024-09-07) ### Bug Fixes * logfile quoting and scancel error handling ([#140](#140)) ([cb5d656](cb5d656)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary by CodeRabbit
New Features
Bug Fixes