Categories

ID #1461

Slurmctld shows the error "we don't have select plugin type 102"

When enabling shared resources in Slurm as per the article here, you may see the following error in /var/log/slurmctld on the headnode:

we don't have select plugin type 102 

 

Checking through the logs you may also see:

error: Incomplete job record
fatal: Incomplete job state save file, start with '-i' to ignore this

 

Occasionally, when enabling shared resources in Slurm, the job state save file becomes incomplete. To work around this issue, perform the following steps.


First, stop slurmctld in Bright:

# cmsh
% device use master
% services
% stop slurm
% quit 


Next, have you SelectType and SelectTypeParameters set how you want them to be configured in slurm.conf.
Then, start slurmctld by running the following command on your head node:

# /cm/shared/apps/slurm/current/sbin/slurmctld -i 


That will tell slurmctld to start while ignoring the incomplete job state save file error.
After that, kill the process for slurmctld:

# killall slurmctld 


Then, start slurmctld from Bright again:

# cmsh
% device use master
% services
% start slurm


Now slurmctld should be starting properly using your desired slurm.conf settings.
You may also need to run the scontrol reconfigure command once slurmctld is started to notify the compute nodes.

scontrol reconfigure

Tags: Slurm

Related entries:

You cannot comment on this entry