1. Home
  2. Adding a job queue or altering node settings for a subbet of nodes in SLURM

Adding a job queue or altering node settings for a subbet of nodes in SLURM

These instructions we completed using BCM 10 on Ubuntu 22.04 but should work for all supported platforms 9.0 and higher.

Presumptions

We are presuming you have a functional SLURM cluster and multiple nodes which can be split into distinct job queues or settings.

root@ew-b100-u2204-09-27:~# module load slurm
root@ew-b100-u2204-09-27:~# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
defq* up infinite 5 idle node[001-005]
root@ew-b100-u2204-09-27:~# srun hostname
node001
root@ew-b100-u2204-09-27:~#

Create a Job Queue ( optional )

This step will be to create a new job queue (partition). If your goal is just to alter the node level settings then you can skip this step.

root@ew-b100-u2204-09-27:~# cmsh
[ew-b100-u2204-09-27]% wlm
[ew-b100-u2204-09-27->wlm[slurm]]% jobqueue
[ew-b100-u2204-09-27->wlm[slurm]->jobqueue]% list
Name (key) Nodes
------------ ------------------------
defq node001..node005
[ew-b100-u2204-09-27->wlm[slurm]->jobqueue]% add newq
[ew-b100-u2204-09-27->wlm[slurm]->jobqueue*[newq*]]% commit
[ew-b100-u2204-09-27->wlm[slurm]->jobqueue[newq]]% quit
root@ew-b100-u2204-09-27:~# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
defq* up infinite 5 idle node[001-005]
newq up infinite 0 n/a
root@ew-b100-u2204-09-27:~#

Now that the new job queue is defined we need to allocate nodes to the queue.

Clone an appropriate configurationoverlay

We will take a current configurationoverlay and clone it. We will clear all categories, nodes, and raise the priority so that later when this new overlay is assigned to nodes it will take priority over any lower priority settings.

root@ew-b100-u2204-09-27:~# cmsh
[ew-b100-u2204-09-27]% configurationoverlay
[ew-b100-u2204-09-27->configurationoverlay]% list
Name (key) Priority All head nodes Nodes Categories Roles
-------------------- ---------- -------------- ---------------- ---------------- ----------------
slurm-accounting 500 yes slurmaccounting
slurm-client 500 no default slurmclient
slurm-server 500 yes slurmserver
slurm-submit 500 no default slurmsubmit
wlm-headnode-submit 600 yes slurmsubmit
[ew-b100-u2204-09-27->configurationoverlay]% clone slurm-client slurm-client-newq
[ew-b100-u2204-09-27->configurationoverlay*[slurm-client-newq*]]% show Parameter Value
-------------------------------- ------------------------------------------------
Name slurm-client-newq
Revision
All head nodes no
Priority 500
Nodes
Categories default
Roles slurmclient
Customizations <0 in submode>
[ew-b100-u2204-09-27->configurationoverlay*[slurm-client-newq*]]% clear categories
[ew-b100-u2204-09-27->configurationoverlay*[slurm-client-newq*]]% clear nodes
[ew-b100-u2204-09-27->configurationoverlay*[slurm-client-newq*]]% set priority 505

Assign nodes to this configuration

We will assign nodes003 through node005 to this new configuration which is still identical to the existing configuration with only defq.

[ew-b100-u2204-09-27->configurationoverlay*[slurm-client-newq*]]% set nodes node003..node005
[ew-b100-u2204-09-27->configurationoverlay*[slurm-client-newq*]]% show Parameter Value
-------------------------------- ------------------------------------------------
Name slurm-client-newq
Revision
All head nodes no
Priority 505
Nodes node003..node005
Categories
Roles slurmclient
Customizations <0 in submode>
[ew-b100-u2204-09-27->configurationoverlay*[slurm-client-newq*]]%

Customize the settings for these nodes

Here we enter the roles for the configuration, make the changes we would like, and commit the change.

[ew-b100-u2204-09-27->configurationoverlay*[slurm-client-newq*]]% roles
[ew-b100-u2204-09-27->configurationoverlay*[slurm-client-newq*]->roles]% use slurmclient
[ew-b100-u2204-09-27->configurationoverlay*[slurm-client-newq*]->roles[slurmclient]]% show
Parameter Value
---------------------------------- ------------------------------------------------
Name slurmclient
Revision
Type SlurmClientRole
Add services yes
WLM cluster slurm
Slots 0
All Queues no
Queues defq
Features
Sockets 0
Cores Per Socket 0
ThreadsPerCore 0
Boards 0
SocketsPerBoard 0
RealMemory 0B
NodeAddr
Weight 0
Port 0
TmpDisk 0
Reason
CPU Spec List
Core Spec Count 0
Mem Spec Limit 0B
GPU auto detect BCM
Node Customizations <0 in submode>
Generic Resources <0 in submode>
Cpu Bindings None
Slurm hardware probe auto detect yes
Memory autodetection slack 0.0%
IMEX no
[ew-b100-u2204-09-27->configurationoverlay*[slurm-client-newq*]->roles[slurmclient]]% set queues newq
[ew-b100-u2204-09-27->configurationoverlay*[slurm-client-newq*]->roles*[slurmclient*]]% commit
[ew-b100-u2204-09-27->configurationoverlay[slurm-client-newq]->roles[slurmclient]]%

Confirm operation

Now that the changes are committed we can review the assigned nodes and check with SLURM that the changes have been accepted and propagated.

[ew-b100-u2204-09-27->configurationoverlay]% list                                                                                                                                                                                                                                       Name (key)           Priority   All head nodes Nodes             Categories       Roles
-------------------- ---------- -------------- ----------------- ---------------- ----------------
slurm-accounting 500 yes slurmaccounting
slurm-client 500 no default slurmclient
slurm-client-newq 505 no node003..node005 slurmclient
slurm-server 500 yes slurmserver
slurm-submit 500 no default slurmsubmit
wlm-headnode-submit 600 yes slurmsubmit
[ew-b100-u2204-09-27->configurationoverlay]% quit
root@ew-b100-u2204-09-27:~# sinfo -la
Fri Sep 27 21:33:25 2024
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE RESERVATION NODELIST
defq* up infinite 1-infinite no NO all 2 idle node[001-002]
newq up infinite 1-infinite no NO all 3 idle node[003-005]
root@ew-b100-u2204-09-27:~# srun -p newq hostname
node003
root@ew-b100-u2204-09-27:~#
Updated on September 27, 2024