How do I install official DGX packages for RHEL into Bright software images?
NVIDIA provides official packages on RHEL for DGX. These official packages can be installed into Bright software images, and the images can then be deployed on DGX clusters.
The steps to do this are described next. The commands that follow need to be run on the head node, inside the chrooted environment of the software image.
Get into the chrooted environment
To get into the chrooted environment of a software image called software-image-to-use
# chroot /cm/images/software-image-to-use
Prepare the repository
In the chroot environment, run:
# wget -P /etc/pki/rpm-gpg/ https://international.download.nvidia.com/dgx/repos/rhel-files/RPM-GPG-KEY-dgx-cosmos-support
# cat > /etc/yum.repos.d/nvidia-dgx.repo <<EOF
[nvidia-dgx-7]name=NVIDIA DGX EL7
Install the packages
Still in the chroot environment, these initial steps are to override the dependency issues with libraries from base and EPEL packages:
# yum install --downloadonly libglvnd-glx libglvnd-egl
# cd /var/cache/yum/x86_64/7/base/packages/
# rpm -i --replacefiles libglvnd-*.rpm
Still in the chroot environment, the rest of the packages are installed:
# yum install -y "@DGX-1 Configurations" dgx-conf-cachefilesd kernel-headers kernel-debug-devel cuda-drivers cuda-drivers-diagnostic dgx-persistence-mode docker "@NVIDIA container runtime" python36 "@DGX System Management"
# yum update -y kernel kernel-tools kernel-tools-libs kernel-devel kernel-debug-devel kernel-headers
The image is now ready to be deployed on a DGX system.