Skip to content

Conversation

mattaezell
Copy link

The Slurm srun --external-launcher option, added in 23.11, allows users to start a daemon that has access to all the resources without actually consuming any of them. It was intended for process managers like hydra, orted, or flux-as-a-step-manager to launch tasks, but I think we can use it here also.

The configury need more work, but I wanted to get some feedback before spending time on that. Nothing older than 23.11 is considered "supported" anymore. Does it make sense to completely remove the BROKEN_SRUN construct at this point?

ezy@borg005:~/mpi_hello> time spindle srun -n2240 -N40 ./mpihi >/dev/null

real    0m6.724s
user    0m0.021s
sys     0m0.054s
ezy@borg005:~/mpi_hello> time srun -n2240 -N40 ./mpihi >/dev/null

real    0m11.810s
user    0m0.015s
sys     0m0.044s

@mplegendre
Copy link
Member

I hadn't been aware of the --external-launcher before, but a quick scan of the man page is promising. That is the kind of thing we're trying to do when we launch spindle_be. Will investigate further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants