forked from SchedMD/slurm
-
Notifications
You must be signed in to change notification settings - Fork 0
/
RELEASE_NOTES
144 lines (133 loc) · 7.95 KB
/
RELEASE_NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
RELEASE NOTES FOR SLURM VERSION 24.05
IMPORTANT NOTES:
If using the slurmdbd (Slurm DataBase Daemon) you must update this first.
NOTE: If using a backup DBD you must start the primary first to do any
database conversion, the backup will not start until this has happened.
The 24.05 slurmdbd will work with Slurm daemons of version 23.02 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it.
Slurm can be upgraded from version 23.02 or 23.11 to version 24.05 without loss
of jobs or other state information. Upgrading directly from an earlier version
of Slurm will result in loss of state information.
All SPANK plugins must be recompiled when upgrading from any Slurm version
prior to 24.05.
HIGHLIGHTS
==========
-- Remove support for Cray XC ("cray_aries") systems.
-- Federation - allow client command operation when slurmdbd is unavailable.
-- burst_buffer/lua - Added two new hooks: slurm_bb_test_data_in and
slurm_bb_test_data_out. The syntax and use of the new hooks are documented
in etc/burst_buffer.lua.example. These are required to exist. slurmctld now
checks on startup if the burst_buffer.lua script loads and contains all
required hooks; slurmctld will exit with a fatal error if this is not
successful. Added PollInterval to burst_buffer.conf. Removed the arbitrary
limit of 512 copies of the script running simultaneously.
-- Add QOS limit MaxTRESRunMinsPerAccount.
-- Add QOS limit MaxTRESRunMinsPerUser.
-- Add ELIGIBLE environment variable to jobcomp/script plugin.
-- Always use the QOS name for SLURM_JOB_QOS environment variables.
Previously the batch environment would use the description field,
which was usually equivalent to the name.
-- cgroup/v2 - Require dbus-1 version >= 1.11.16.
-- Allow NodeSet names to be used in SuspendExcNodes.
-- SuspendExcNodes=<nodes>:N now counts allocated nodes in N. The first N
powered up nodes in <nodes> are protected from being suspended.
-- Store job output, input and error paths in SlurmDBD.
-- Add USER_DELETE reservation flag to allow users with access to a reservation
to delete it.
-- Add SlurmctldParameters=enable_stepmgr to enable step management through
the slurmstepd instead of the controller.
-- Added PrologFlags=RunInJob to make prolog and epilog run inside the job
extern step to include it in the job's cgroup.
-- Add ability to reserve MPI ports at the job level for stepmgr jobs and
subdivide them at the step level.
-- slurmrestd - Add --generate-openapi-spec argument.
CONFIGURATION FILE CHANGES (see appropriate man page for details)
=====================================================================
-- CoreSpecPlugin has been removed.
-- Removed TopologyPlugin tree and dragonfly support from select/linear.
If those topology plugins are desired please switch to select/cons_tres.
-- Changed the default value for UnkillableStepTimeout to 60 seconds or five
times the value of MessageTimeout, whichever is greater.
-- An error log has been added if JobAcctGatherParams 'UsePss' or 'NoShare' are
configured with a plugin other than jobacct_gather/linux. In such case these
parameters are ignored.
-- helpers.conf - Added Flags=rebootless parameter allowing feature changes
without rebooting compute nodes.
-- topology/block - Replaced the BlockLevels with BlockSizes in topology.conf.
-- Add contain_spank option to SlurmdParameters. When set, spank_user_init(),
spank_task_post_fork(), and spank_task_exit() will execute within the
job_container/tmpfs plugin namespace.
-- Add SlurmctldParameters=max_powered_nodes=N, which prevents powering up
nodes after the max is reached.
-- Add ExclusiveTopo to a partition definition in slurm.conf.
-- Add AccountingStorageParameters=max_step_records to limit how many steps
are recorded in the database for each job -- excluding batch, extern, and
interactive steps.
COMMAND CHANGES (see man pages for details)
===========================================
-- Add support for "elevenses" as an additional time specification.
-- Add support for sbcast --preserve when job_container/tmpfs configured
(previously documented as unsupported).
-- scontrol - Add new subcommand 'power' for node power control.
-- squeue - Adjust StdErr, StdOut, and StdIn output formats. These will now
consistently print "(null)" if a value is unavailable. StdErr will no
longer display StdOut if it is not distinctly set. StdOut will now
correctly display the default filename pattern for job arrays, and no
longer show it for non-batch jobs. However, the expansion patterns will
no longer be substituted by default.
-- Add --segment to job allocation to be used in topology/block.
-- Add --exclusive=topo for use with topology/block.
-- squeue - Add --expand-patterns option to expand StdErr, StdOut, StdIn
filename patterns as best as possible.
-- sacct - Add --expand-patterns option to expand StdErr, StdOut, StdIn
filename patterns as best as possible.
-- sreport - Requesting format=Planned will now return the expected Planned
time as documented, instead of PlannedDown. To request Planned Down,
one must use now format=PLNDDown or format=PlannedDown explicitly. The
abbreviations "Pl" or "Pla" will now make reference to Planned instead of
PlannedDown.
API CHANGES
===========
-- Removed ListIterator type from <slurm/slurm.h>.
-- Removed slurm_xlate_job_id() from <slurm/slurm.h>
SLURMRESTD CHANGES
==================
-- openapi/dbv0.0.38 and openapi/v0.0.38 plugins have been removed.
-- openapi/dbv0.0.39 and openapi/v0.0.39 plugins have been tagged as
deprecated to warn of their removal in the next release.
-- Changed slurmrestd.service to only listen on TCP socket by default.
Environments with existing drop-in units for the service may need
further adjustments to work after upgrading.
-- slurmrestd - Tagged `script` field as deprecated in
'POST /slurm/v0.0.41/job/submit' in anticipation of removal in future
OpenAPI plugin versions. Job submissions should set the `job.script` (or
`jobs[0].script` for HetJobs) fields instead.
-- slurmrestd - Attempt to automatically convert enumerated string arrays with
incoming non-string values into strings. Add warning when incoming value for
enumerated string arrays can not be converted to string and silently ignore
instead of rejecting entire request. This change affects any endpoint that
uses an enunmerated string as given in the OpenAPI specification. An
example of this conversion would be to 'POST /slurm/v0.0.41/job/submit' with
'.job.exclusive = true'. While the JSON (boolean) true value matches a
possible enumeration, it is not the expected "true" string. This change
automatically converts the (boolean) true to (string) "true" avoiding a
parsing failure.
-- slurmrestd - Add 'POST /slurm/v0.0.41/job/allocate' endpoint. This endpoint
will create a new job allocation without any steps. The allocation will need
to be ended via signaling the job or it will run to the timelimit.
-- slurmrestd - Allow startup when slurmdbd is not configured and avoid loading
slurmdbd specific plugins.
MPI/PMI2 CHANGES
================
-- Jobs submitted with the SLURM_HOSTFILE environment variable set implies
using an arbitrary distribution. Nevertheless, the logic used in PMI2 when
generating their associated PMI_process_mapping values has been changed and
will now be the same used for the plane distribution, as if "-m plane" were
used. This has been changed because the original arbitrary distribution
implementation did not account for multiple instances of the same host being
present in SLURM_HOSTFILE, providing an incorrect process mapping in such
case. This change also enables distributing tasks in blocks when using
arbitrary distribution, which was not the case before. This only affects
mpi/pmi2 plugin.