-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow CWL workflows to have jobs use all of a Slurm node's memory #5052
base: master
Are you sure you want to change the base?
Conversation
…options and env vars
…ments use whole nodes' memory
I still need to manually test this to make sure it actually does what it is meant to do. |
I wrote a test for this and it does indeed seem to issue jobs that ask for whole Slurm nodes when I use the two new options together. I also fixed Slurm job cleanup when a workflow is killed; it wasn't doing that before because shutdown() wasn't doing any killing in |
@DailyDreaming Can you review this? |
This should fix #4971. Instead of
--defaultMemory 0
, to run CWL jobs that lack their ownramMin
with a full Slurm node's memory, you would now pass--no-cwl-default-ram --slurmDefaultAllMem=True
.This might cause some new problems:
--defaultMemory
unless the user passes--no-cwl-default-ram
. Previously I think we were ignoring the spec and always using the Toil--defaultMemory
. This might break some workflow runs that used to work because of us giving them more memory than the spec said to.Also, #4971 says we're supposed to implement a real framework for doing this kind of memory expansion across all batch systems that support it. But I didn't want to add a new bool flag onto
Requirer
for such a specific purpose. Probably if we need it we should combine it with preemptible somehow into a tag/flag system. Or we could implement memory range requirements and allow the top of the range to be unbounded, or treat some threshold upper limit as "all the node's memory" in the Slurm batch system.Changelog Entry
To be copied to the draft changelog by merger:
--slurmDefaultAllMem
option to run jobs lacking their own memory requirements with Slurm's--mem=0
, so they get a whole node's memory.toil-cwl-runner
now has--no-cwl-default-ram
(and--cwl-default-ram
) to control whether the CWL spec's defaultramMin
is applied, or Toil's own default memory logic is used.--dont_allocate_mem
and--allocate_mem
options have been deprecated and replaced with--slurmAllocateMem
, which can beTrue
orFalse
.Reviewer Checklist
issues/XXXX-fix-the-thing
in the Toil repo, or from an external repo.camelCase
that want to be insnake_case
.docs/running/{cliOptions,cwl,wdl}.rst
Merger Checklist