Skip to content

Commit 41ebbfd

Browse files
committed
[Slurm] Redesign the Slurm simple commands
- Focus the section on synchrony of sbatch (non-blocking) and salloc (blocking). - Explain how blocking launcher (salloc) is used to launch interactive jobs. - Explain how srun can be used to create jobs implicitly.
1 parent 88f2f16 commit 41ebbfd

File tree

3 files changed

+142
-76
lines changed

3 files changed

+142
-76
lines changed

docs/slurm/commands.md

Lines changed: 78 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,69 +1,102 @@
1-
# Main Slurm Commands
1+
# Main Slurm commands
22

3-
## Submit Jobs
3+
## Submitting jobs
44

55
<!--submit-start-->
66

7-
There are three ways of submitting jobs with slurm, using either [`sbatch`](https://slurm.schedmd.com/sbatch.html), [`srun`](https://slurm.schedmd.com/srun.html) or [`salloc`](https://slurm.schedmd.com/salloc.html):
7+
Jobs in the [Slurm scheduler](/slurm/) are executed in batch or interactive mode. Batch jobs are executed asynchronously in the background, whereas interactive jobs allow the user to issue commands directly in a shell session. In both cases, the users must request resources for his job, including a finite amount of time for which they can occupy the compute resources.
88

9-
=== "sbatch (passive job)"
10-
```bash
11-
### /!\ Adapt <partition>, <qos>, <account> and <command> accordingly
12-
sbatch -p <partition> [--qos <qos>] [-A <account>] [...] <path/to/launcher.sh>
13-
```
14-
=== "srun (interactive job)"
9+
The batch launcher script may contain `srun` commands to launch [job steps](). The job steps can run in sequence or in parallel given that enough resources are available in the job allocation or that resources can be shared. Access to resources such as nodes, memory, and accelerator devices, can be requested with appropriate [partition](/partitions/) and [constraint]() options.
10+
11+
### Executing a job in batch mode with `sbatch`
12+
13+
<!--sbatch-start-->
14+
15+
_Batch job scripts_ are submitted to the scheduler with the [`sbatch`](https://slurm.schedmd.com/sbatch.html) command.
16+
17+
- The command adds a resource allocation request to the scheduler job queue together with a _copy_ of a job luncher script to execute in the allocation. The command then exits.
18+
- When the requested resources are available, a job is lunched and the job script is executed in the first node of the allocated resources.
19+
- The job allocation is freed when the job script finishes or the allocation times out.
20+
21+
The execution of the job script is thus asynchronous to the execution of the `sbatch` command.
22+
23+
!!! info "Typical `sbatch` (batch job) options"
24+
To submit a bash job script to be executed asynchronously by the scheduler use the following `sbatch` command.
1525
```bash
16-
### /!\ Adapt <partition>, <qos>, <account> and <command> accordingly
17-
srun -p <partition> [--qos <qos>] [-A <account>] [...] ---pty bash
26+
sbatch --partition=<partition> [--qos=<qos>] [--account=<account>] [...] <path/to/launcher_script.sh>
1827
```
19-
`srun` is also to be using within your launcher script to initiate a _job step_.
28+
Upon job submission, Slurm print a message with the job's ID; the job ID is used to identify this job in all Slurm interactions.
2029

21-
=== "salloc (request allocation/interactive job)"
22-
```bash
23-
# Request interactive jobs/allocations
24-
### /!\ Adapt <partition>, <qos>, <account> and <command> accordingly
25-
salloc -p <partition> [--qos <qos>] [-A <account>] [...] <command>
30+
!!! warning "Accessing script from a submission script"
31+
If you reference any other script or program from the submission script, the ensure that the file referenced is accessible.
32+
33+
- Use the full path to the file referenced.
34+
- Ensure that the file is stored in a networked file system and accessible from every node.
35+
36+
!!! example "Example job submission"
37+
```console
38+
$ sbatch <path/to/launcher_script.sh>
39+
submitted batch job 864933
2640
```
41+
<!--sbatch-end-->
2742

28-
### `sbatch`
43+
### Execute a job in interactive mode with `salloc`
2944

30-
<!--sbatch-start-->
45+
_Interactive jobs_ are launched with the [`salloc`](https://slurm.schedmd.com/salloc.html) command.
3146

32-
[`sbatch`](https://slurm.schedmd.com/sbatch.html) is used to submit a batch _launcher script_ for later execution, corresponding to _batch/passive submission mode_.
33-
The script will typically contain one or more `srun` commands to launch parallel tasks.
34-
Upon submission with `sbatch`, Slurm will:
47+
- The command submits a resources allocation request to the scheduler job queue, and blocks until the resources are available.
48+
- When the requested resources are available, a job is lunched and a command is executed in the first node of the allocated resources.
49+
- The allocation is freed when the interactive session terminates with and `exit` command, or the allocation times out.
3550

36-
* allocate resources (nodes, tasks, partition, constraints, etc.)
37-
* runs a single **copy** of the batch script on the _first_ allocated node
38-
- in particular, if you depend on other scripts, ensure you have refer to them with the _complete_ path toward them.
51+
The main difference of `salloc` from `sbatch` is that the `salloc` runs for the whole runtime of the command that is executed in the allocation, that it `salloc` is a blocking version of `sbatch`.
3952

40-
When you submit the job, Slurm responds with the job's ID, which will be used to identify this job in reports from Slurm.
53+
!!! info "Typical `salloc` (interactive job) options"
54+
To start an interactive job scheduler use the following `salloc` command.
55+
```bash
56+
sbatch --partition=<partition> [--qos=<qos>] [--account=<account>] [--x11] [...] [<commmand>]
57+
```
58+
- The `salloc` command will block until the requested resources are available, and it will the launch the `<command>` in the first node of the allocation.
59+
- The `<command>` argument is optional; if no command is provided then the behavior of `salloc` depends on the configuration of Slurm. Our site Slurm is configured to launch an interactive shell when no `<command>` is provided.
60+
Upon job submission, Slurm print a message with the job's ID; the job ID is used to identify this job in all Slurm interactions.
4161

42-
```bash
43-
# /!\ ADAPT path to launcher accordingly
44-
$ sbatch <path/to/launcher>.sh
45-
Submitted batch job 864933
46-
```
47-
<!--sbatch-end-->
62+
!!! example "Example interactive job submission"
63+
```console
64+
$ salloc --partition=batch --qos=normal --nodes=1 --time=8:00:00
65+
salloc: Granted job allocation 9805184
66+
salloc: Nodes aion-0207 are ready for job
67+
```
4868

49-
### `srun`
69+
??? info "Configuring the default behavior of `salloc`"
5070

51-
[`srun`](https://slurm.schedmd.com/srun.html) is used to initiate parallel _job steps within a job_ **OR** to _start an interactive job_
52-
Upon submission with `srun`, Slurm will:
71+
The [LaunchParameters](https://slurm.schedmd.com/slurm.conf.html#OPT_LaunchParameters) option of Slurm configuration ([`slurm.conf`](https://slurm.schedmd.com/slurm.conf.html)) is a comma separated list of options for the job launch plugin. The `use_interactive_step` option has `salloc` launch a shell on the first node of the allocation; otherwise `salloc` launches a shell locally, in the machine where it was invoked.
5372

54-
* (_eventually_) allocate resources (nodes, tasks, partition, constraints, etc.) when run for _interactive_ submission
55-
* launch a job step that will execute on the allocated resources.
73+
The [InteractiveStepOptions](https://slurm.schedmd.com/slurm.conf.html#OPT_InteractiveStepOptions) of Slurm configuration determines the command run by `salloc` when `use_interactive_step` is included in LaunchParameters. The default value is
74+
```
75+
--interactive --preserve-env --pty $SHELL"
76+
```
77+
where `--interactive` creates an "interactive step" that will not consume resources so that other job steps may run in parallel with the interactive step running the shell. The [`--pty` option](https://slurm.schedmd.com/srun.html#OPT_pty) is required when creating an implicit reservation for an interactive shell.
5678

57-
A job can contain multiple job steps executing sequentially
58-
or in parallel on independent or shared resources within the job's
59-
node allocation.
79+
Note that `--interactive` is an internal potion and is not meant to be used outside setting the InteractiveStepOptions.
6080

61-
### salloc
81+
To create a allocation without launching any command, use the `--no-shell` option. Then `salloc` immediately exits after allocating job resources without running a command. Job steps can still be launched in the job allocation using the `srun` command with the `--jobid=<job allocation id>` command.
6282

63-
[`salloc`](https://slurm.schedmd.com/salloc.html) is used to _allocate_ resources for a job
64-
in real time. Typically this is used to allocate resources (nodes, tasks, partition, etc.) and spawn a
65-
shell. The shell is then used to execute srun commands to launch
66-
parallel tasks.
83+
### Implicit job creation with `srun`
84+
85+
The [`srun`](https://slurm.schedmd.com/srun.html) is used to initiate parallel job steps within a job allocation. However, if `srun` is invoked outside an allocation, then
86+
87+
- `srun` automatically allocates a job in a blocking manner similar to `salloc`, and
88+
- when the requested resources become available, it launches a single job step to run the provided command.
89+
90+
??? info "Lunching job with `srun`"
91+
To create an implicit job allocation and launch a job step with `srun` provide the usual options usually required by `salloc` or `sbatch`.
92+
```bash
93+
srun --partition=<partition> [--qos=<qos>] [--account=<account>] [...] <command>
94+
```
95+
To launch an interactive allocation, use the [`--pty` option](https://slurm.schedmd.com/srun.html#OPT_pty).
96+
```bash
97+
srun --partition=<partition> [--qos=<qos>] [--account=<account>] [...] --pty bash --login
98+
```
99+
The `--pty` option instructs `srun` to execute the command in [terminal mode](https://en.wikipedia.org/wiki/Terminal_mode) in a [pseudoterminal](https://en.wikipedia.org/wiki/Pseudoterminal), so you can interact with bash as if it was launched in your terminal.
67100

68101
<!--submit-end-->
69102

docs/slurm/index.md

Lines changed: 61 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,83 @@
11
# Slurm Resource and Job Management System
22

3-
ULHPC uses [Slurm](https://slurm.schedmd.com/) (_Simple Linux Utility for Resource Management_) for cluster/resource management and job scheduling.
4-
This middleware is responsible for allocating resources to users, providing a framework for starting, executing and monitoring work on allocated resources and scheduling work for future execution.
3+
The UL HPC uses [Slurm](https://slurm.schedmd.com/) (formerly an acronym for _Simple Linux Utility for Resource Management_) cluster and workload management. The Slurm scheduler is a cluster workload manager and performs three main functions:
4+
5+
- allocates access to [resources](#jobs-and-resources) for fixed time intervals,
6+
- provides a framework for starting, executing, and monitoring work on allocated resources, and
7+
- maintains a priority queue that schedules and regulates access to resources.
58

69
[:fontawesome-solid-right-to-bracket: Official docs](https://slurm.schedmd.com/documentation.html){: .md-button .md-button--link }
710
[:fontawesome-solid-right-to-bracket: Official FAQ](https://slurm.schedmd.com/faq.html){: .md-button .md-button--link }
811
[:fontawesome-solid-right-to-bracket: ULHPC Tutorial/Getting Started](https://ulhpc-tutorials.readthedocs.io/en/latest/beginners/){: .md-button .md-button--link }
912

10-
[![](https://hpc-docs.uni.lu/slurm/images/2022-ULHPC-user-guide.png)](https://hpc-docs.uni.lu/slurm/2022-ULHPC-user-guide.pdf)
11-
12-
!!! important "IEEE ISPDC22: ULHPC Slurm 2.0"
13+
??? info "IEEE ISPDC22: ULHPC Slurm 2.0"
1314
If you want more details on the RJMS optimizations performed upon Aion acquisition, check out our [IEEE ISPDC22](https://orbilu.uni.lu/handle/10993/51494) conference paper (21<sup>st</sup> IEEE Int. Symp. on Parallel and Distributed Computing) presented in Basel (Switzerland) on July 13, 2022.
15+
1416
> __IEEE Reference Format__ | [ORBilu entry](https://orbilu.uni.lu/handle/10993/51494) | [slides](https://hpc-docs.uni.lu/slurm/2022-07-13-IEEE-ISPDC22.pdf) <br/>
15-
> Sebastien Varrette, Emmanuel Kieffer, and Frederic Pinel, "Optimizing the Resource and Job Management System of an Academic HPC and Research Computing Facility". _In 21st IEEE Intl. Symp. on Parallel and Distributed Computing (ISPDC22)_, Basel, Switzerland, 2022.
17+
> Sebastien Varrette, Emmanuel Kieffer, and Frederic Pinel, "Optimizing the Resource and Job Management System of an Academic HPC and Research Computing Facility". _In 21st IEEE Intl. Symp. on Parallel and Distributed Computing (ISPDC'22)_, Basel, Switzerland, 2022.
1618

19+
[![](https://hpc-docs.uni.lu/slurm/images/2022-ULHPC-user-guide.png)](https://hpc-docs.uni.lu/slurm/2022-ULHPC-user-guide.pdf)
1720

18-
## TL;DR Slurm on ULHPC clusters
21+
## Overview of the configuration of Slurm on UL HPC clusters
1922

2023
<!--tldr-start-->
2124

22-
In its concise form, the Slurm configuration in place on [ULHPC
23-
supercomputers](../systems/index.md) features the following attributes you
24-
should be aware of when interacting with it:
25-
26-
* Predefined [_Queues/Partitions_](../slurm/partitions.md) depending on node type
27-
- `batch` (Default Dual-CPU nodes) _Max_: 64 nodes, 2 days walltime
28-
- `gpu` (GPU nodes nodes) _Max_: 4 nodes, 2 days walltime
29-
- `bigmem` (Large-Memory nodes) _Max_: 1 node, 2 days walltime
30-
- In addition: `interactive` (for quicks tests) _Max_: 2 nodes, 2h walltime
31-
* for code development, testing, and debugging
32-
* Queue Policy: _[cross-partition QOS](../slurm/qos.md)_, mainly tied to _priority level_ (`low` $\rightarrow$ `urgent`)
33-
- `long` QOS with extended Max walltime (`MaxWall`) set to **14 days**
34-
- special _preemptible QOS_ for [best-effort](/jobs/best-effort.md') jobs: `besteffort`.
35-
* [Accounts hierarchy](../slurm/accounts.md) associated to supervisors (multiple
36-
associations possible), projects or trainings
37-
- you **MUST** use the proper account as a [detailed usage
38-
tracking](../policies/usage-charging.md) is performed and reported.
39-
* [Slurm Federation configuration](https://slurm.schedmd.com/federation.html) between `iris` and `aion`
40-
- ensures global policy (coherent job ID, global scheduling, etc.) within ULHPC systems
41-
- easily submit jobs from one cluster to another using `-M, --cluster aion|iris`
25+
The main configuration options for Slurm that affect the resources that are available for jobs in [UL HPC systems](/systems/) are the following.
26+
27+
- [__Queues/Partitions__](/slurm/partitions) group nodes according to the set of hardware _features_ they implement.
28+
- `batch`: default dual-CPU nodes. Limited to _max_:
29+
- 64 nodes, and
30+
- 2 days walltime.
31+
- `gpu`: GPU nodes nodes. Limited to _max_:
32+
- 4 nodes, and
33+
- 2 days walltime.
34+
- `bigmem`: large-memory nodes. Limited to _max_:
35+
- 1 node, and
36+
- 2 days walltime.
37+
- `interactive`: _floating partition_ across all node types allowing higher priority allocation for quicks tests. Best used in interactive allocations for code development, testing, and debugging. Limited to _max_:
38+
- 2 nodes, and
39+
- 2h walltime.
40+
- [__Queue policies/Quality of Service (QoS's)__](/slurm/qos) apply restrictions to resource access and modify job priority on top of (overriding) access restrictions and priority modifications applied by partitions.
41+
- _Cross-partition QoS's_ are tied to a priority level.
42+
- `low`: Priority 10 and _max_ 300 jobs per user.
43+
- `normal`: Priority 100 and _max_ 100 jobs per user.
44+
- `high`: Priority 200 and _max_ 50 jobs per user.
45+
- `urgent`: Priority 1000 and _max_ 20 jobs per user.
46+
- _Special QoS's_ that control priority access to special hardware.
47+
- `iris-hopper`: Priority 100 and _max_ 100 jobs per user.
48+
- _Long_ type QoS's have extended max walltime (`MaxWall`) of _14 days_ and are defined per cluster/partition combination (`<cluster>-<partition>-long`).
49+
- `aion-batch-long`: _max_ 16 nodes and 8 jobs per user.
50+
- `iris-batch-long`: _max_ 16 nodes and 8 jobs per user.
51+
- `iris-gpu-long`: _max_ 2 nodes per and 4 jobs per user.
52+
- `iris-bigmem-long`: _max_ 2 nodes per and 4 jobs per user.
53+
- `iris-hopper-long`: _max_ 1 GPU per and 100 jobs per user.
54+
- Special _preemptible QoS_ for [best-effort](/jobs/best-effort') jobs.
55+
- `besteffort`: jobs in best effort OoS can be interrupted by jobs in any other QoS. The processes running during interruption are killed, so the executables use in best effort jobs require a [custom checkpoint-restart mechanism](https://docs.nersc.gov/development/checkpoint-restart/).
56+
- [__Accounts__](/slurm/accounts) organize user access to resources hierarchically. Accounts are associated to organization (like faculties), supervisors (multiple associations possible), and activities (like projects, and trainings).
57+
- A default account is associated with all users affiliated with the University of Luxembourg.
58+
- Users not associated with the University of Luxembourg must have access and specify an account association when allocating resources for a job.
59+
- Users must use the proper account as resource usage is [tracked](/policies/usage-charging) and reported.
60+
- [__Federated scheduling__](https://slurm.schedmd.com/federation.html) supports scheduling jobs across both `iris` and `aion`.
61+
- A global policy (coherent job ID, global scheduling, etc.) is enforced within all UL HPC systems.
62+
- Submission of jobs from one cluster to another is possible using the `-M, --cluster (aion|iris)` option.
63+
64+
??? info "Features, partitions, and floating partitions"
65+
_Features_ in Slurm are tags that correspond to hardware capabilities of nodes. For instance the `volta` flag in UL HPC system denotes that the node has GPUs of the Volta architecture.
66+
67+
_Partitions_ are collections of nodes that usually have a homogeneous set of features. For instance all nodes of the GPU partition in UL HPC system have GPUs of the Volta architecture. As a result, partitions tend to be mutually exclusive sets.
68+
69+
_Floating partitions_ contain nodes from multiple partitions. As a result, floating partitions have nodes with variable features. The `-C, --constraint` flag is available to filter nodes in floating partitions according to their features.
4270

4371
<!--tldr-end-->
4472

45-
For more details, see the appropriate pages in the left menu.
73+
## Jobs and resources
74+
75+
A _job_ is the minimal independent unit of work in a Slurm cluster.
76+
4677

47-
## Jobs
78+
an allocation of resources such as compute nodes assigned to a user for an certain amount of time. Jobs can be _interactive_ or _passive_ (e.g., a batch script) scheduled for later execution.
4879

49-
A **job** is an allocation of resources such as compute nodes assigned to a user for an certain amount of time.
50-
Jobs can be _interactive_ or _passive_ (e.g., a batch script) scheduled for later execution.
80+
The resources that the scheduler manages are physical entities like nodes, CPU cores, GPUs, access to special devices, but also system resources like and memory and I/O operations.
5181

5282
!!! question "What characterize a job?"
5383
A user _jobs_ have the following key characteristics:

docs/slurm/job_steps.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Job steps
2+
3+
Job steps are processes launched within a job which consume the job resources. Job steps are initiated with the `srun` command. The job steps in a job can be execute also in parallel given that enough resources are available.

0 commit comments

Comments
 (0)