1. Home
  2. Workload Management
  3. Slurmctld shows the error “we don’t have select plugin type 102”

Slurmctld shows the error “we don’t have select plugin type 102”

This article is being updated. Please be aware the content herein, not limited to version numbers and slight syntax changes, may not match the output from the most recent versions of Bright. This notation will be removed when the content has been updated

When enabling shared resources in Slurm as per the article here, you may see the following error in /var/log/slurmctld on the headnode:

we don't have select plugin type 102 

Checking through the logs you may also see:

error: Incomplete job record fatal:
Incomplete job state save file, start with '-i' to ignore this

Occasionally, when enabling shared resources in Slurm, the job state save file becomes incomplete. To work around this issue, perform the following steps.
First, stop slurmctld in Bright:

# cmsh
% device use master
% services
% stop slurm
% quit
 
Next, have you SelectType and SelectTypeParameters set how you want them to be configured in slurm.conf. 
Then, start slurmctld by running the following command on your head node:

# /cm/shared/apps/slurm/current/sbin/slurmctld -i 
That will tell slurmctld to start while ignoring the incomplete job state save file error. 
After that, kill the process for slurmctld:

# killall slurmctld 

Then, start slurmctld from Bright again:

# cmsh
% device use master
% services
% start slurm

Now slurmctld should be starting properly using your desired slurm.conf settings. 
You may also need to run the scontrol reconfigure command once slurmctld is started to notify the compute nodes.

scontrol reconfigure

Updated on October 12, 2020

Related Articles

Comments

  1. This KB article helped us get our cluster back online after switching SLURM fromselect/linear to select/cons_res. Thanks for posting.

Leave a Comment