Here is an example of enabling QOS (Quality of Service) in SchedMD Slurm and applying the QOS to a partition using Bright.
*Please note this is valid for Bright 8.2 and earlier releases.
# sacctmgr add qos test
Adding QOS(s)
test
Settings
Description = test
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
# sacctmgr modify qos test set maxjobs=2
Modified qos...
test
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
# sacctmgr show qos format=name,maxjobs
Name MaxJobs
---------- -------
normal
test 2
Finally update the options for the partition in Bright to use this new QOS configuration.
# cmsh
% jobqueue
% use defq
% get options
QoS=N/A ExclusiveUser=NO OverTimeLimit=0 State=UP JobDefaults=(null)
% set options "QoS=test ExclusiveUser=NO OverTimeLimit=0 State=UP JobDefaults=(null)"
% commit
Is there any updated version of these steps for Bright 9.x?
Hi Josh,
For example, you can configure limiting the GPUs to the users from slurm using the following steps: set MaxTRESPerUser=gres/gpu=10 set MaxTRESPerAccount=gres/gpu=10
Assuming that you already have the gres configured in slurm, please add the following in slurm.conf file outside the autogenerated section:
AccountingStorageEnforce=qos
AccountingStorageTRES=gres/gpu
Restart slurmctld on the controller node, and run scontrol reconfigure
# systemctl restart slurmctld
# scontrol reconfigure
You can then set the limit using the sacctmgr command, either per user or per account:
# sacctmgr modify QOS
–OR–
# sacctmgr modify QOS
You can refer to the following Schedmd articles for more details:
https://slurm.schedmd.com/qos.html
https://bugs.schedmd.com/show_bug.cgi?id=4767
https://bugs.schedmd.com/show_bug.cgi?id=3397