General considerations for installing a Bright DGX cluster

Contents

Loading the correct kernel modules

If you are going to use the built-in gigabit Ethernet interface as your internal cluster network between the head node(s) and the DGX nodes, there is nothing special that needs to be done in terms of loading kernel modules. This is because the igb module is already present by default in the software image’s list of kernel modules that are included in the initrd.

If you are going to use one of the Mellanox interfaces for the internal cluster network, it is important to add the mlx5_core kernel module to your software image. Without this kernel module, the Mellanox interface will not be visible during the node’s PXE booting process. This can be done as follows:

[root@mycluster ~]# cmsh
[mycluster]% softwareimage 
[mycluster->softwareimage]% use dgxa100-image
[mycluster->softwareimage[dgxa100-image]]% kernelmodules 
[mycluster->softwareimage[dgxa100-image]->kernelmodules]% add mlx5_core
[mycluster->softwareimage[dgxa100-image*]->kernelmodules]% commit

Fixing predictable device names

If you are using one of the Mellanox interfaces on the DGX nodes for the internal cluster network, depending on which Linux distribution and kernel you are using, the network interface names that are seen during the node installation phase may deviate from the network interface names that are seen once the OS is fully booted. In particular, an interface such as enp225s0f0 may initially come up as enp225s0f0np0. It is currently unknown why this happens, but there is a simple workaround, which is to set the following finalize script for your node category:

#!/bin/bash
#
scriptsdir=/etc/sysconfig/network-scripts
if [ -d /etc/network/interfaces.d ];then
  scriptsdir=/etc/network/interfaces.d
fi
for f in /localdisk/$scriptsdir/*np?; do
  if [ ! -e $f ]; then continue; fi
  newname=${f%np?}
  mv $f $newname
  filename=`basename $f`
  ifacename=${filename#ifcfg-}
  newifacename=${ifacename%np?}
  subst="'s/$ifacename/$newifacename/g'"
  eval sed -i $subst $newname
done

Preventing false positive `ipmihealth` failures

After the cluster has been installed, it is fairly common to see healthcheck failures such as this:

Thu Mar 11 09:50:05 2021 [warning] dgx-03: The trigger 'Failing health checks' is active because the measurable 'ipmih
ealth' is FAIL (37984)

This is not necessarily a problem, and the ipmihealth healthcheck can simply be disabled as follows:

[root@mycluster ~]# cmsh
[mycluster]% monitoring 
[mycluster->monitoring]% measurable 
[mycluster->monitoring->measurable]% use ipmihealth 
[mycluster->monitoring->measurable[ipmihealth]]% set disabled yes
[mycluster->monitoring->measurable*[ipmihealth*]]% commit
[mycluster->monitoring->measurable[ipmihealth]]%

Installing CUDA

If you will be using containerized workload exclusively on the DGX cluster, it is not necessary to install CUDA. If you intend to run GPU workload natively on the nodes (i.e. without using containers), you will have to install the Bright CUDA on the head node. CUDA will be installed in the /cm/shared tree which is available on all of the nodes in the cluster.

At the time of writing, installing the DGX OS software stack will install the 450 version of the NVIDIA driver. This version is not compatible with the latest version of CUDA, so it is recommended to install an older version of CUDA.

If you intend to use Bright’s ML packages on your DGX cluster, it is a good idea to check for which version of CUDA the packages that you would like to use are available, keeping in mind that this CUDA version must be compatible with the NVIDIA driver that is installed as part of the DGX OS software stack.

At the time of writing, it is recommended to install CUDA 10.2:

[root@mycluster~]# yum install cuda10.2-sdk cuda10.2-toolkit

Setting an appropriate disk setup

DGX nodes typically have 2 smaller NVME drives and 4 larger NVME drives. The 2 smaller NVME drives tend to be used as the OS drive in RAID1 and the 4 larger NVME drives are typically configured in RAID0 as a scratch drive.

The default Bright disk layout only uses the first NVME drive, so in order to be able to use all 6 drives, the disk needs to be changed. Disk layouts can be set for individual nodes, but it is recommended to set it for a category of nodes.

The disk layout below assumes that the 2 smaller drives are /dev/nvme1n1 and /dev/nvme2n1 and that the 4 larger drives are /dev/nvme0n1, /dev/nvme3n1, /dev/nvme4n1 and /dev/nvme5n1. This can be verified as follows:

[root@dgx-node ~]# fdisk -l | grep Disk | grep /dev/nvme
Disk /dev/nvme4n1: 3.5 TiB, 3840755982336 bytes, 7501476528 sectors
Disk /dev/nvme5n1: 3.5 TiB, 3840755982336 bytes, 7501476528 sectors
Disk /dev/nvme1n1: 1.8 TiB, 1920383410176 bytes, 3750748848 sectors
Disk /dev/nvme2n1: 1.8 TiB, 1920383410176 bytes, 3750748848 sectors
Disk /dev/nvme3n1: 3.5 TiB, 3840755982336 bytes, 7501476528 sectors
Disk /dev/nvme0n1: 3.5 TiB, 3840755982336 bytes, 7501476528 sectors

The disk-layout below can be set by saving it to a file (e.g. /tmp/mydisklayout.xml) and then using CMSH to load it. Note that CMSH will read the contents of the file, and will set it as an XML value. Therefore when the file is updated, the following steps will have to be repeated to update the disk setup that will be used for nodes as they boot.

[root@mycluster ~]# cmsh
[mycluster]% category use dgxa100
[mycluster->category[dgxa100]]% set disksetup /tmp/mydisksetup
[mycluster->category[dgxa100*]]% commit

The following disk setup will use RAID1 for the two smaller OS NVME drives and RAID0 for creating a filesystem that spans the 4 larger NVME drives which is mounted under /project.

<diskSetup>                                                                                                                                                                                                                                                  
  <device>                                                                                                                                                                                                                                                   
    <blockdev>/dev/nvme1n1</blockdev>                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
    <partition id="boot1" partitiontype="esp">                                                                                                                                                                                                               
      <size>512M</size>                                                                                                                                                                                                                                      
      <type>linux raid</type>                                                                                                                                                                                                                                
    </partition>                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                             
    <partition id="swap1" partitiontype="esp">                                                                                                                                                                                                               
      <size>16G</size>                                                                                                                                                                                                                                       
      <type>linux raid</type>                                                                                                                                                                                                                                
    </partition>                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                             
    <partition id="os1" partitiontype="esp">                                                                                                                                                                                                                 
      <size>max</size>                                                                                                                                                                                                                                       
      <type>linux raid</type>                                                                                                                                                                                                                                
    </partition>                                                                                                                                                                                                                                             
  </device>                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                             
  <device>                                                                                                                                                                                                                                                   
    <blockdev>/dev/nvme2n1</blockdev>                                                                                                                                                                                                                        
    <partition id="boot2" partitiontype="esp">                                                                                                                                                                                                               
      <size>512M</size>                                                                                                                                                                                                                                      
      <type>linux raid</type>                                                                                                                                                                                                                                
    </partition>                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                             
    <partition id="swap2" partitiontype="esp">                                                                                                                                                                                                               
      <size>16G</size>                                                                                                                                                                                                                                       
      <type>linux raid</type>                                                                                                                                                                                                                                
    </partition>                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                             
    <partition id="os2" partitiontype="esp">                                                                                                                                                                                                                 
      <size>max</size>                                                                                                                                                                                                                                       
      <type>linux raid</type>                                                                                                                                                                                                                                
    </partition>                                                                                                                                                                                                                                             
  </device>                                                                                                                                                                                                                                                  
  <device>                                                                                                                                                                                                                                                   
    <blockdev>/dev/nvme0n1</blockdev>                                                                                                                                                                                                                        
    <partition id="project1" partitiontype="esp">                                                                                                                                                                                                            
      <size>max</size>                                                                                                                                                                                                                                       
      <type>linux raid</type>                                                                                                                                                                                                                                
    </partition>                                                                                                                                                                                                                                             
  </device>                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                             
  <device>                                                                                                                                                                                                                                                   
    <blockdev>/dev/nvme3n1</blockdev>                                                                                                                                                                                                                        
    <partition id="project2" partitiontype="esp">                                                                                                                                                                                                            
      <size>max</size>                                                                                                                                                                                                                                       
      <type>linux raid</type>                                  
    </partition>                                               
  </device>                                                    
  <device>                                                     
    <blockdev>/dev/nvme4n1</blockdev> 
    <partition id="project3" partitiontype="esp">
      <size>max</size>                                         
      <type>linux raid</type>                                  
    </partition>                                               
  </device>                                                    
                                                               
  <device>                                                     
    <blockdev>/dev/nvme5n1</blockdev> 
    <partition id="project4" partitiontype="esp">
      <size>max</size>                                         
      <type>linux raid</type>                                  
    </partition>                                               
  </device>                                                    

  <raid id="boot">                                             
    <member>boot1</member>                                     
    <member>boot2</member>                                     
    <level>1</level>                                           
    <filesystem>ext2</filesystem>
    <mountPoint>/boot</mountPoint>
    <mountOptions>defaults,noatime,nodiratime</mountOptions>
  </raid>
  <raid id="swap">
    <member>swap1</member>
    <member>swap2</member>
    <level>1</level>
    <swap/>
  </raid>
  
  <raid id="os">
    <member>os1</member>
    <member>os2</member>
    <level>1</level>
    <filesystem>xfs</filesystem>
    <mountPoint>/</mountPoint>
    <mountOptions>defaults,noatime,nodiratime</mountOptions>
  </raid>

  <raid id="project">
    <member>project1</member>
    <member>project2</member>
    <member>project3</member>
    <member>project4</member>
    <level>0</level>
    <filesystem>xfs</filesystem>
    <mountPoint>/project</mountPoint>
    <mountOptions>defaults,noatime,nodiratime</mountOptions>
  </raid>
</diskSetup>

Because this disk layout uses RAID1 and RAID0, it is necessary to schedule the relevant kernel modules to be loaded for the software image. This will cause the kernel modules to be included in the initrd that is loaded when the nodes are booted.

[root@mycluster ~]# cmsh
[mycluster]% softwareimage use dgxa100-image
[mycluster->softwareimage[dgxa100-image]]% kernelmodules 
[mycluster->softwareimage[dgxa100-image-]->kernelmodules]% add raid0
[mycluster->softwareimage[dgxa100-image*]->kernelmodules]% add raid1
[mycluster->softwareimage[dgxa100-image*]->kernelmodules]% commit

Lastly, you will want to make sure that the /project directory exists in the software image.

[root@mycluster ~]# mkdir -p /cm/images/dgxa100-image/project

Updated on October 14, 2021

Loading the correct kernel modules

Fixing predictable device names

Preventing false positive ipmihealth failures

Installing CUDA

Setting an appropriate disk setup

Related Articles

Leave a Comment Cancel

Preventing false positive `ipmihealth` failures