Improvements to Workflow Queue Tools #2351
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses Issue #2350 and Issue #2329 . It also improves the queue querying by removing COMPLETED jobs from the list of Slurm job id's to request information about, since we already know what happened to those. This could be expanded to all "final" states, but for now I have left it as COMPLETED which is the most common final state in the processing tables and I'd like to think a little harder about any ramifications of not querying for other "final" states.
This solves #2350 by identifying failed dependencies and not submitting them. Instead printing a message notifying the user of the failed dependency and printing what it would have attempted had it not been for the failed dependency. It assigns a STATUS of "UNSUBMITTED", which is the outcome of the code in
main
, except that it first spends 3 minutes attempting to submit to Slurm and being refused.This solves #2329 by including "FAILED" by default in
desispec.workflow.processing.update_and_recursively_submit()
and any code that callsdesispec.workflow.queue.get_resubmission_states()
. I added a new variableno_resub_failed
which isFalse
by default that can be provided at the command line as--no-resub-failed
to bothdesi_proc_night
anddesi_resubmit_queue_failures
to turn this off if we want the old behavior where FAILED jobs are not resubmitted indesi_proc_night
.Lastly, this cleans some things up, for instance if no QID's are provided sacct returns the three most recent jobs, which is benign, but is better to intercept and return an empty table.
desispec.workflow.queue.update_from_queue()
was modifying the processing table in-place. I've updated the code to first make a copy that is returned.I believe I have tested all of the new functions in an ipython session where I ran the various codes and showed that they do what I expected. This includes making a fake processing table row with a CANCELLED dependency and finding that it chooses not to submit the job: