• Create persistent UDEV rules to rename the disks consistently based on HW address

    This article was tested on DGX OS 6.2 BCM10 1. Edit /cm/images/<IMAGENAME>/usr/lib/udev/rules.d/60-persistent-storage-<DGXTYPE>.rules and add the following lines:    a. For DGX H100 ########## persistent nvme rules by HW address ########## KERNEL==”nvme[0-9]n[0-9]”, ATTRS{address}==”0000:01:00.0″, SYMLINK+=”disk/by-id/osdisk-1″ KERNEL==”nvme[0-9]n[0-9]”, ATTRS{address}==”0000:02:00.0″, SYMLINK+=”disk/by-id/osdisk-2″ KERNEL==”nvme[0-9]n[0-9]”, ATTRS{address}==”0000:ab:00.0″, SYMLINK+=”disk/by-id/raiddisk-1″ KERNEL==”nvme[0-9]n[0-9]”, ATTRS{address}==”0000:ac:00.0″, SYMLINK+=”disk/by-id/raiddisk-2″ KERNEL==”nvme[0-9]n[0-9]”, ATTRS{address}==”0000:ad:00.0″, SYMLINK+=”disk/by-id/raiddisk-3″ KERNEL==”nvme[0-9]n[0-9]”, ATTRS{address}==”0000:ae:00.0″, SYMLINK+=”disk/by-id/raiddisk-4″ KERNEL==”nvme[0-9]n[0-9]”, ATTRS{address}==”0000:2a:00.0″,…

  • How to install Azure Managed Lustre FS client on top of BCM 10

    This article is tested on BCM10 with Ubuntu 22.04 1. clone the default software-image cmsh softwareimage clone default-image amlfs-image 2. Install Azure LTS kernel inside the amlfs-image cm-chroot-sw-img /cm/images/amlfs-image/ apt update && apt install linux-image-azure-lts-22.04 3. install Azure LTS headers inside the amlfs-image cm-chroot-sw-img /cm/images/amlfs-image/ apt update && apt install…

  • How to deploy Longhorn on top of a Bright cluster

    This document is tested on BCM10 with Ubuntu 22.04 Preparation 1. We will start with deploying kubernetes latest version, using cm-kubernetes-setup utility, with the following components containerd calico as the network plugin No keyverno Permission manager No additional operators 2. Install open-iscsi in the software image root@longhorn:~# cm-chroot-sw-img /cm/images/default-image/ root@default-image:/#…

  • How to install GPUDirect Storage (GDS) on BCM 10 (DGX – BaseOS 6)

    This document is verified on BCM10 with Ubuntu 22.04 (kernel 5.15) with GDS 12.3 and MOFED 23.10 on DGX A100 hardware Preparation 1. Clone the default software image “dgx-os-6.1-a100-image” (or “dgx-os-6.1-h100-image”) to dgxos61-gds-image: # cmsh % softwareimage % clone dgx-os-6.1-a100 dgxos61-gds-image % commit (wait until the initrd is generated)  …

  • How to install GPUDirect Storage (GDS) on Bright 9.2 (DGX – BaseOS 5.4)

    This document is verified on Bright 9.2 with Ubuntu 20.04 with GDS 11.8 and MOFED 5.4 Preparation Clone the default software image “default-image” to dgx54-gds-image: Convert the dgx54-gds-image into a DGX OS image by following these steps: a. Pick a DGX compute node and set its software image to the…

  • Installing Kubernetes on Air-Gapped Systems

    Kubernetes is most easily installed on a cluster that is able to access the internet. For clusters without internet access it is still possible to deploy Kubernetes, with a few additional steps. The following document refers to Bright 9.2 on RHEL8.   1. Install the following RPM packages, and their…