You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is just a thought for the SLURMCluster for now (since that's what I'm familiar with) but similar options may be available in other clusters too. Currently, the cancel_command in the SLURMJob class is a bare "scancel".
This means that, even when workers are shutdown completely gracefully, the Slurm job is marked as CANCELLED. Instead, if the command were scancel --signal=SIGTERM the job would be marked as COMPLETED. Its possible there could be cases where we would want a job to cancelled, which complicates this somewhat.
In the simple case, however, I think this could be implmented with a simple change of cancel_command to:
classSLURMJob(Job):
# Override class variablessubmit_command="sbatch"cancel_command="scancel --signal=SIGTERM"config_name="slurm"
It'd be great to get some more thoughts on the implications for this.
The text was updated successfully, but these errors were encountered:
Hi! This sounds also perfectly acceptable to me. I don't think there is any case in which we would really like to have a CANCELLED status! Thanks for proposing this, and I think this might be possible with other schedulers too!
Hey all,
This is just a thought for the
SLURMCluster
for now (since that's what I'm familiar with) but similar options may be available in other clusters too. Currently, thecancel_command
in theSLURMJob
class is a bare"scancel"
.dask-jobqueue/dask_jobqueue/slurm.py
Line 15 in 8713202
This means that, even when workers are shutdown completely gracefully, the Slurm job is marked as
CANCELLED
. Instead, if the command werescancel --signal=SIGTERM
the job would be marked asCOMPLETED
. Its possible there could be cases where we would want a job to cancelled, which complicates this somewhat.In the simple case, however, I think this could be implmented with a simple change of
cancel_command
to:It'd be great to get some more thoughts on the implications for this.
The text was updated successfully, but these errors were encountered: