-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Better interaction / documentation with non-exclusive Slurm jobs #26788
Comments
Allocating an interactive Slurm job with |
Thanks for filing, Paul! For those not following the Gitter, this was spawned based on some conversation there:
|
An additional wrinkle that I'm encountering now that I'm extending the above to multi-node: If you only have a fraction of the CPUs in each of multiple nodes, they may not have the same binding on all the nodes that For example I'm trying to run on 16 cores each from two 128-core nodes, launched from the primary node of two (tc-dgx006). The secondary node (tc-dgx007) faults over a bad binding immediately, and then the primary node hangs until the user Ctrl+Cs:
The srun error on the secondary node shows a binding mask of I'm able to work around it by setting |
Summary of Feature
Some of the multi-locale features struggle in heavily-contended Slurm environments. I can't comment on all the launchers but here's a few things I ran into:
salloc
to run the_real
jobs ... This presumes that a newly-enqueued job will run within an interactive timescale, which may not be true.gasnet_ibv
andgasnet_ucx
launchers will try tosrun
if the wrapper program is run from within an allocation, which is fine if you have--exclusive
on yoursalloc
but may hang without insight if you are only using a non-exclusive portion of a node. (Particularly interactively likesalloc srun --pty
, as any requested resources get assigned to the pty and none remain to run Chapel's innersrun
s.) The--oversubscribe
flag tosalloc
and the--overlap
flag tosrun
(and equivalentSLURM_
environment variables) can help here, but don't seem to be incorporated auto-magically.gasnet_ibv
andgasnet_ucx
don't seem to consider the current SLURM job's memory when deciding on segment size. Rather, they seem to grab for all the physical memory on the node, which will sometimes trip an OOM insrun
, but mostly cause silent SIGKILL, even with GASNet tracing enabled.(1) is reasonably documented already. Basically don't use the
slurm-
-prefixed launcher if there's not a reasonable chance you can hop right on your node(s), or useCHPL_LAUNCHER_USE_SBATCH
to have it generate a batch for you. (I haven't tried the latter, as I am mixing both Chapel and non-Chapel workloads in the same batch)(2) is a stumbling block for folks with lesser Slurm experience. Slurm's environment variables can get non-
--exclusive
jobs working out of the box today. My preference would be to have the Chapel wrapper automatically apply overlap flags, but I can see a case to be made that this is in the scope of the user's responsibility, in which case some documentation might save folks like me time.(3) Seems like a bug. The communications layer shouldn't be grabbing for more memory than the Slurm job has available. I don't know offhand whether Chapel or GASNet should enforce that. There is already some reference to
GASNET_PHYSMEM_MAX
in the infiniband documentation but it doesn't include a notion of having effective access to less than the whole node's RAM.IIRC when I tried passing it (but not exporting) it didn't pass from myProgram to myProgram_real, or else otherwise didn't effect the SIGKILL. However, manually settingGASNET_MAX_SEGSIZE
to some value within my Slurm job's allocation did get me running again.Edit: on a fresh build
GASNET_PHYSMEM_MAX
seems sufficient to prevent the SIGKILLTrying to summarize some live debugging that happened over the last weeks on Gitter. Please correct any misunderstandings or misinterpretations on my part!
Steps to reproduce:
2) Try to launch a multi-locale program (even with
-nL 1
) within a non-exclusive Slurm interactive job, without oversubscribe or overlap flags. Doesn't seem to matter whether you use ssh, pmi, or mpi as the spawner3) Try to run a multi-locale program (even with
-nL 1
) within a non-exclusive Slurm job, where the--mem
Slurm flag is some fraction of the node's physical memory. Apply theGASNET_VERBOSEENV=1
environment variable and look at the value ofGASNET_MAX_SEGSIZE
The text was updated successfully, but these errors were encountered: