Categories

ID #1436

How do I run Intel Cluster Checker on a Bright cluster?

This document assumes that Intel Cluster Checker 2019 is used. If a different version of Cluster Checker is used, instructions are probably similar but may differ slightly.

 

  • Install your Bright cluster and bring up all of the nodes
  • Install the Intel Cluster Runtime RPM from Bright YUM repo

yum install intel-cluster-runtime

  •  Schedule the Intel Cluster Ready environment module to be loaded at every login for new users and for root

echo "module load intel-cluster-runtime" >> /etc/skel/.bashrc
echo "module load intel-cluster-runtime" >> /root/.bashrc
echo "module load shared intel-cluster-runtime" >> /cm/images/default-image/root/.bashrc

NOTE 1: the 'shared' module is needed for root on the compute nodes because by default root logins do not touch /cm/shared to avoid lock issues when NFS server is down.

NOTE 2: The reason why we want to run Cluster Checker as root (as opposed to an ordinary user), is because otherwise dmidecode can not be used to obtain information about the memory DIMMs.

  • Download Intel Cluster Checker from Intel website at https://software.intel.com/en-us/intel-cluster-checker
  • Copy file l_clck_p_2019.0.015.tgz to the cluster's root account.
  • Untar file

tar -xvzf intel-clck-2019.0-20180529-2019.0-20180529.x86_64.rpm

  • Run the installer

cd l_clck_p_2019.0.015/
./install.sh

  • Go through the installer
    • Accept License Agreement by scrolling through and typing "accept"
    • Make sure that Installation target is set to "[ Current system only]" (which is the default)
    • Select option 1 for finishing configuring installation targets
    • Select option 1 for starting the installation
    • Wait until installation is finished and press enter a few times to exit
    • Intel Cluster Checker is now installed in /opt/intel/clck_latest
  • Copy the /opt/intel tree into the software image:

cp -a /opt/intel /cm/images/default-image/opt/

  • Propagate the changes to your nodes by using e.g. the following command in CMSH: 

device imageupdate -w -c default

  • Load the environment settings

source /opt/intel/clck_latest/bin/clckvars.sh

  • Set a temporary directory to be used because root's home directory is not shared across the nodes

export CLCK_SHARED_TEMP_DIR=/home/cmsupport

  • Create a nodes file

for i in ´seq -w 001 004´; do echo node$i; done > nodefile

  • Run Cluster Checker

clck -f nodefile

  • Check the clck_results file for results or run clck-analyze

clck-analyze -f nodefile

  • If there are "false negatives", it may be necessary to run with a custom configuration to disable tests or change thresholds 

cp $CLCK_ROOT/etc/clck.xml ~/my_clck.xml

  • Modify the configuration according to the CLCK documentation and re-run with:

clck -f nodefile -c ~/my_clck.xml

Tags: -

Related entries:

You cannot comment on this entry