A snap package for the Slurm workload manager - a highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm is used by many of the world's supercomputers and computer clusters.
This snap is currently experimental. If you're interested in the development of this snap - and want to contribute - please reach out to me on the Ubuntu HPC Matrix space.
Slurm is a workload manager designed for supercomputers. It is scalable, highly configurable, and supports scheduling for generic computational resources such as GPUs. This snap package includes:
munged
: The daemon responsible for authenticating local MUNGE clients and servicing their credential encode & decode requests.slurmctld
: The central management daemon of Slurm.slurmd
: The compute node daemon of Slurm.slurmdbd
: The Slurm database daemon. Provides an interface to a database for Slurm.slurmrestd
: The Slurm REST API daemon. Provides an interface to Slurm via a REST API.- The command line applications used to interface with both Slurm and MUNGE.
- A configuration API for dynamically configuring various Slurm functionality.
Slurm is a highly configurable workload manager. The Slurm snap can be configured using
snap set slurm ...
. See the list below for all the possible configuration options
you can set on the snap:
Configuration options related to the Slurm workload manager. Please refer to the parameters section of the slurm.conf configuration file manual page for an in-depth description of each configuration option and how it modifies the behaviour of Slurm.
account-storage-external-host
accounting-storage-backup-host
accounting-storage-enforce
accounting-storage-host
accounting-storage-parameters
accounting-storage-pass
accounting-storage-port
accounting-storage-tres
accounting-storage-type
accounting-storage-user
accounting-store-flags
acct-gather-energy-type
acct-gather-filesystem-type
acct-gather-interconnect-type
acct-gather-node-freq
acct-gather-profile-type
allow-spec-resources-usage
auth-alt-parameters
auth-alt-types
auth-info
auth-type
batch-start-timeout
bcast-exclude
bcast-parameters
burst-buffer-type
cli-filter-plugins
cluster-name
communication-parameters
complete-wait
core-spec-plugin
cpu-freq-def
cpu-freq-governors
cred-type
debug-flags
def-cpu-per-gpu
def-mem-per-cpu
def-mem-per-gpu
def-mem-per-node
dependency-parameters
disable-root-jobs
eio-timeout
enforce-part-limits
epilog
epilog-msg-time
epilog-slurmctld
ext-sensors-freq
ext-sensors-type
fair-share-dampening-factor
federation-parameters
first-job-id
get-env-timeout
gpu-freq-def
gres-types
group-update-force
group-update-time
health-check-interval
health-check-node-state
health-check-program
inactive-limit
include
interactive-step-options
job-acct-gather-frequency
job-acct-gather-params
job-acct-gather-type
job-comp-host
job-comp-loc
job-comp-params
job-comp-pass
job-comp-port
job-comp-type
job-comp-user
job-container-type
job-file-append
job-requeue
job-submit-plugins
kill-on-bad-exit
kill-wait
launch-parameters
licenses
log-time-format
mail-domain
mail-prog
max-array-size
max-batch-requeue
max-job-count
max-job-id
max-mem-per-cpu
max-mem-per-node
max-node-count
max-step-count
max-tasks-per-node
mcs-parameters
mcs-plugin
message-timeout
min-job-age
mpi-default
mpi-params
node-features-plugins
over-time-limit
plug-stack-config
plugin-dir
power-parameters
power-plugin
preempt-exempt-time
preempt-mode
preempt-parameters
preempt-type
prep-parameters
prep-plugins
priority-calcp-period
priority-decay-half-life
priority-favor-small
priority-flags
priority-max-age
priority-parameters
priority-site-factor-parameters
priority-site-factor-plugin
priority-type
priority-usage-reset-period
priority-weight-age
priority-weight-assoc
priority-weight-fair-share
priority-weight-job-size
priority-weight-partition
priority-weight-qos
priority-weight-tres
private-data
proctrack-type
prolog
prolog-epilog-timeout
prolog-flags
prolog-slurmctld
propagate-prio-process
propagate-resource-limits
propagate-resource-limits-except
reboot-program
reconfig-flags
requeue-exit
requeue-exit-hold
resume-fail-program
resume-program
resume-rate
resume-timeout
resv-epilog
resv-over-run
resv-prolog
return-to-service
route-plugin
scheduler-parameters
scheduler-time-slice
scheduler-type
scron-parameters
select-type
select-type-parameters
slurm-sched-log-file
slurm-sched-log-level
slurm-user
slurmctld-addr
slurmctld-debug
slurmctld-host
slurmctld-log-file
slurmctld-parameters
slurmctld-pid-file
slurmctld-port
slurmctld-primary-off-prog
slurmctld-primary-on-prog
slurmctld-syslog-debug
slurmctld-timeout
slurmd-debug
slurmd-log-file
slurmd-parameters
slurmd-pid-file
slurmd-port
slurmd-spool-dir
slurmd-syslog-debug
slurmd-timeout
slurmd-user
srun-epilog
srun-port-range
srun-prolog
state-save-location
suspend-exc-nodes
suspend-exc-parts
suspend-exc-states
suspend-program
suspend-rate
suspend-time
suspend-timeout
switch-parameters
switch-type
task-epilog
task-plugin
task-plugin-param
task-prolog
tcp-timeout
tmp-fs
topology-param
topology-plugin
track-wc-key
tree-width
unkillable-step-program
unkillable-step-timeout
use-pam
vsize-factor
wait-time
x11-parameters
Configuration options related to compute nodes. Please refer to the node configuration section of the slurm.conf configuration file manual page for an in-depth description of each configuration option and how it modifies the behaviour of compute nodes.
nodes.<node-name>.bcast-addr
nodes.<node-name>.boards
nodes.<node-name>.core-spec-count
nodes.<node-name>.cores-per-socket
nodes.<node-name>.cpu-bind
nodes.<node-name>.cpu-spec-list
nodes.<node-name>.cpus
nodes.<node-name>.features
nodes.<node-name>.gres
nodes.<node-name>.mem-spec-limit
nodes.<node-name>.node-addr
nodes.<node-name>.node-hostname
nodes.<node-name>.node-name
nodes.<node-name>.port
nodes.<node-name>.procs
nodes.<node-name>.real-memory
nodes.<node-name>.reason
nodes.<node-name>.sockets
nodes.<node-name>.sockets-per-board
nodes.<node-name>.state
nodes.<node-name>.threads-per-core
nodes.<node-name>.tmp-disk
nodes.<node-name>.weight
Configuration options related to frontend compute nodes. Please refer to the frontend node configuration section of the slurm.conf configuration file manual page for an in-depth description of each configuration option and how it modifies the behaviour of frontend compute nodes.
frontend-nodes.<frontend-node-name>.allow-groups
frontend-nodes.<frontend-node-name>.allow-users
frontend-nodes.<frontend-node-name>.deny-groups
frontend-nodes.<frontend-node-name>.deny-users
frontend-nodes.<frontend-node-name>.frontend-addr
frontend-nodes.<frontend-node-name>.frontend-name
frontend-nodes.<frontend-node-name>.port
frontend-nodes.<frontend-node-name>.reason
frontend-nodes.<frontend-node-name>.state
Configuration options related to down compute nodes. Please refer to the down node configuration section of the slurm.conf configuration file manual page for an in-depth description of each configuration option and how it modifies the behaviour of compute nodes.
down-nodes.nodes
down-nodes.reason
down-nodes.state
Configuration options related to node sets. Please refer to the nodeset configuration section of the slurm.conf configuration file manual page for an in-depth description of each configuration option and how it modifies the behaviour of node sets.
node-sets.<node-set>.feature
node-sets.<node-set>.nodes
Configuration options related to partitions. Please refer to the partition configuration section of the slurm.conf configuration file manual page for an in-depth description of each configuration option and how it modifies the behaviour of partitions.
partitions.<partition-name>.alloc-nodes
partitions.<partition-name>.allow-accounts
partitions.<partition-name>.allow-groups
partitions.<partition-name>.allow-qos
partitions.<partition-name>.alternate
partitions.<partition-name>.cpu-bind
partitions.<partition-name>.def-cpu-per-gpu
partitions.<partition-name>.def-mem-per-cpu
partitions.<partition-name>.def-mem-per-gpu
partitions.<partition-name>.def-mem-per-node
partitions.<partition-name>.default
partitions.<partition-name>.default-time
partitions.<partition-name>.deny-accounts
partitions.<partition-name>.deny-qos
partitions.<partition-name>.disable-root-jobs
partitions.<partition-name>.exclusive-user
partitions.<partition-name>.grace-time
partitions.<partition-name>.hidden
partitions.<partition-name>.lln
partitions.<partition-name>.max-cpus-per-node
partitions.<partition-name>.max-cpus-per-socket
partitions.<partition-name>.max-mem-per-cpu
partitions.<partition-name>.max-mem-per-node
partitions.<partition-name>.max-nodes
partitions.<partition-name>.max-time
partitions.<partition-name>.min-nodes
partitions.<partition-name>.nodes
partitions.<partition-name>.over-subscribe
partitions.<partition-name>.over-time-limit
partitions.<partition-name>.partition-name
partitions.<partition-name>.power-down-on-idle
partitions.<partition-name>.preempt-mode
partitions.<partition-name>.priority-job-factor
partitions.<partition-name>.priority-tier
partitions.<partition-name>.qos
partitions.<partition-name>.req-resv
partitions.<partition-name>.resume-timeout
partitions.<partition-name>.root-only
partitions.<partition-name>.select-type-parameters
partitions.<partition-name>.state
partitions.<partition-name>.suspend-time
partitions.<partition-name>.suspend-timeout
partitions.<partition-name>.tres-billing-weights
munge.key
- Set the munge.key secret use by the
munged
daemon to authenticate hosts.
- Set the munge.key secret use by the
munge.max-thread-count
- Set the maximum number of threads that
munged
can spawn for processing authentication requests.
- Set the maximum number of threads that
slurmd.config-server
- Set configuration server for
slurmd
. Required when runningslurmd
in configless mode. The daemon will download the slurm.conf configuration file from the primary control server.
- Set configuration server for
Configuration options related to the slurmdbd
daemon. Please refer to the
slurmdbd.conf configuration file
manual page for an in-depth description of each configuration option
and how it modifies the behaviour of the slurmdbd
daemon.
slurmd.archive-dir
slurmd.archive-events
slurmd.archive-jobs
slurmd.archive-resvs
slurmd.archive-script
slurmd.archive-steps
slurmd.archive-suspend
slurmd.archive-txn
slurmd.archive-usage
slurmd.auth-alt-parameters
slurmd.auth-alt-types
slurmd.auth-info
slurmd.auth-type
slurmd.commit-delay
slurmd.communication-parameters
slurmd.dbd-addr
slurmd.dbd-backup-host
slurmd.dbd-host
slurmd.dbd-port
slurmd.debug-flags
slurmd.debug-level
slurmd.debug-level-syslog
slurmd.default-qos
slurmd.log-file
slurmd.log-time-format
slurmd.max-query-time-range
slurmd.message-timeout
slurmd.parameters
slurmd.pid-file
slurmd.plugin-dir
slurmd.private-data
slurmd.purge-event-after
slurmd.purge-job-after
slurmd.purge-resv-after
slurmd.purge-step-after
slurmd.purge-suspend-after
slurmd.purge-txn-after
slurmd.purge-usage-after
slurmd.slurm-user
slurmd.storage-backup-host
slurmd.storage-host
slurmd.storage-loc
slurmd.storage-parameters
slurmd.storage-pass
slurmd.storage-port
slurmd.storage-type
slurmd.storage-user
slurmd.tcp-timeout
slurmd.track-slurmctld-down
slurmd.track-wc-key
slurmrestd.max-connections
- Set the maximum number of connections to process at one time.
slurmrestd.max-thread-count
- Set the maximum number of threads to spawn for processing client connections.
Want to build and test the Slurm snap locally without pulling from the Snap Store? Want to bundle in your own custom Slurm plugins? Use the following commands to build and install the Slurm snap on your system. These instructions assume that you are running on a Linux distribution that supports installing snap packages. Please see this page for a list of Linux distributions that support using snap packages.
git clone [email protected]:NucciTheBoss/slurm-snap.git
cd slurm-snap
sudo snap install lxd
sudo lxd init --minimal
sudo snap install snapcraft --classic
snapcraft
sudo snap install ./slurm*.snap --dangerous --classic
The Slurm workload manager snap package is a project of the Ubuntu HPC community. It is an open source project that is welcome to community involvement, contributions, suggestions, fixes, and constructive feedback. Interested in being involved with the development of this snap package? Check out the links below:
The Slurm snap is free software, distributed under the Apache Software License, version 2.0. See LICENSE for more information. The Slurm workload manager itself is licensed under the GNU General Public License, version 2, or any later version. See Slurm's legal notice for further licensing information about Slurm itself.