• Enabling Kdump (RHEL/CentOS)

    In the case where you need to diagnose kernel crash issues on a BCM managed cluster based on RHEL you can use the kexec tools that are part of the underlying operating system. These instructions have been tested against BCM 10 and RHEL8 but they should remain similar across a…

  • Troubleshooting provisioning issues on a system with SuperMicro BMC

    Please note this article shouldn’t replace the need to contact your vendor for guidance on hardware issues. Here are some steps that may assist in resolving provisioning issues with SuperMicro BMCs. Is the system running the latest BMC firmware? Check the vendor website. Have you attempted a BMC reset? “ipmitool…

  • Is it possible to clone the primary headnode from the secondary in an HA cluster?

    Important note: This process is generally used to recover a primary headnode from a failure state (filesystem corruption for example). This process doesn’t replace a good backup regime. If you intend to use this process to recover a primary headnode, we recommend contacting Bright Support first so we may assess…

  • Can I use Rufus to create a bootable USB drive?

    We recommend not using Rufus to create a bootable USB drive from the Bright ISO.Rufus changes the ISO Hybrid disk format which results in the Bright installer failing at the rootfs part of the Bright installer.

  • How can I fix “Failed to initialize NVML: Driver/library version mismatch?”

    Background The “Failed to initialize NVML: Driver/library version mismatch?” error generally means the CUDA Driver is still running an older release incompatible with the CUDA toolkit version currently in use. Rebooting the compute nodes will typically resolve this issue. However, if you do not wish to reboot the compute node,…

  • Enabling QOS in Slurm with Bright 8.2 and earlier releases.

    Here is an example of enabling QOS (Quality of Service) in SchedMD Slurm and applying the QOS to a partition using Bright. *Please note this is valid for Bright 8.2 and earlier releases. Finally update the options for the partition in Bright to use this new QOS configuration.