forked from chaos/slurm
-
Notifications
You must be signed in to change notification settings - Fork 0
/
RELEASE_NOTES
107 lines (87 loc) · 4.5 KB
/
RELEASE_NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
RELEASE NOTES FOR SLURM VERSION 2.4
9 April 2012
IMPORTANT NOTE:
If using the slurmdbd (SLURM DataBase Daemon) you must update this first.
The 2.4 slurmdbd will work with SLURM daemons of version 2.1.3 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it. No real harm will come from updating
your systems before the slurmdbd, but they will not talk to each other
until you do. Also at least the first time running the slurmdbd you need to
make sure your my.cnf file has innodb_buffer_pool_size equal to at least 64M.
You can accomplish this by adding the line
innodb_buffer_pool_size=64M
under the [mysqld] reference in the my.cnf file and restarting the mysqld.
This is needed when converting large tables over to the new database schema.
SLURM can be upgraded from version 2.3 to version 2.4 without loss of jobs or
other state information.
HIGHLIGHTS
==========
* Major modifications to support IBM BlueGene/Q systems.
* New SPANK callbacks added to slurmd: slurm_spank_slurmd_{init,exit} and
job epilog/prolog: slurm_spank_job_{prolog,epilog}.
* Added MPI plugin, mpi/pmi2, which supports MPI_Comm_spawn() function.
* Improved throughput rate for short-lived jobs.
CONFIGURATION FILE CHANGES (see "man slurm.conf" for details)
=============================================================
* "PriorityFlags" added
* "RebootProgram" added
* "ReconfigFlags" added
* "SlurmdDebugLevel" and "SlurmctldDebugLevel" now accept string names in
addition to numeric values (e.g. "info", "verbose", "debug", etc.). Output
of scontrol and sview commands also use the string names.
* Changed default value of "StateSaveLocation" configuration parameter from
"/tmp" to "/var/spool" to help avoid purging.
* Change default "SchedulerParameters" "max_switch_wait" field value from 60 to
300 seconds.
* Added new "SchedulerParameters" of "bf_max_job_user", maximum number of jobs
to attempt backfilling per user.
COMMAND CHANGES (see man pages for details)
===========================================
* Modified advance reservation to select resources optimized for network
topology and accept multiple specific block sizes rather than a single node
count.
* Added trigger flag for a permanent trigger. The trigger will NOT be purged
after an event occurs, but only when explicitly deleted.
* Added the ability to reboot all compute nodes after they become idle. The
RebootProgram configuration parameter must be set and an authorized user
must execute the command "scontrol reboot_nodes".
* Added the ability to update a node's NodeAddr and NodeHostName with scontrol.
* Added the option "--name" to the sacct and squeue commands.
* Add support for job allocations with multiple job constraint counts. For
example: salloc -C "[rack1*2&rack2*4]" ... will allocate the job 2 nodes
from rack1 and 4 nodes from rack2. Support for only a single constraint
name been added to job step support.
* Changed meaning of squeue "-n" option to job name from node name for
consistency with other commands. The "-w" option was added for a short
node name option. Long options --names and --nodes remain unchanged.
OTHER CHANGES
=============
* Improve task binding logic by making fuller use of HWLOC library,
especially with respect to Opteron 6000 series processors.
* Changde to output tools labels from "BP" to "Midplane" (i.e. "BP_List" was
changed to "MidplaneList").
* Modified srun to fork a processes which can terminate the job and/or step
allocation if the initial srun process is abnormallly terminated (e.g. by
SIGKILL).
* Added support for Cray GPU memory allocation as GRES (Generic RESources).
* Correct setting of CUDA_VISIBLE_DEVICES for gres/gpu plugin if device files
to be used are not in numeric order (e.g. GPU 1 maps to "/dev/nvidia4").
API CHANGES
===========
* Added the UserID of the user issuing the RPC to the job_submit/lua functions.
Changed members of the following structs
========================================
Added the following struct definitions
======================================
block_info_t: cnode_err_cnt added
slurm_ctl_conf_t priority_flags, reboot_program and reconfig_flags added
trigger_info_t: flags added
update_node_msg_t: node_addr and node_hostname added
Changed the following enums and #defines
========================================
TRIGGER_FLAG_PERM Added
Added the following API's
=========================
Changed the following API's
===========================