• How do I add a QCOW image as a software image?

    A software image is a directory on the head node that is being used to provision compute nodes. It contains a full Linux filesystem. In order to create a new software image from a QCOW image, we must first mount the QCOW and copy the contents. This can be done…

  • How do I upgrade to Bright 9.1?

    The upgrade procedure was originally published in parallel with Bright 9.1-7.   Please take the time to completely read the following document before proceeding with the upgrade.  https://support.brightcomputing.com/upgrade-manuals/9.1/upgrade-manual.pdf As always, please feel free to reach out to the support staff if you need assistance.

  • How do I create an edge test setup?

    Edge set ups are characterized by having computational resources in multiple geographic locations. Staging such an environment in a single lab for evaluation or testing purposes, can be remarkably challenging. In this article we will describe a setup that can be used to build a Bright setup spanning several edge…

  • How can I get access to nightly builds of packages?

    The packages you will find in the Bright repositories have gone through a QA process. Updated packages are released roughly every 3-4 weeks for the latest version of Bright. Older versions of Bright will receive updates less frequently. It may be desirable to have access to the latest version of…

  • General considerations for installing a Bright DGX cluster

    Loading the correct kernel modules If you are going to use the built-in gigabit Ethernet interface as your internal cluster network between the head node(s) and the DGX nodes, there is nothing special that needs to be done in terms of loading kernel modules. This is because the igb module…

  • How should I set up Slurm on a DGX cluster?

    Background A workload management system is helpful for scheduling jobs on a cluster of nodes. The steps below describe how to set up Slurm so that GPUs have to be explicitly requested. This makes sharing GPU and CPU computing resources much easier for many people. Installation Steps 1. To start…

  • How do I validate that my DGX cluster is working properly?

    One of the best ways to stress test your DGX cluster is to use NVIDIA’s HPC benchmarks which can be found in NGC. Since this software is packaged as a container image, we will need to use a container runtime engine such as Singularity to run it. It is worth…

  • How can I have multiple network interfaces on a node in the same IP subnet?

    Background When you configure multiple network interfaces on a single machine with an IP address in the same IP subnet, you will need to do additional configuration work to allow the networking stack in the Linux kernel to use these interfaces properly. By default, only one of the IP addresses…

  • How do I use Grafana to visualize monitoring data from a Bright cluster?

    Although Bright View has extensive capabilities when it comes to visualizing monitoring information, it may be desirable to be able to visualize monitoring data using Grafana (e.g. when Grafana is used as a datacenter wide monitoring tool). As of Bright 8.2 it is possible to follow the instructions below to…

  • How can I use Grafana to monitor multiple Bright clusters?

    As of Bright 8.2-24 / 9.0-12 / 9.1-2, it is possible to add the cluster as a data source to Grafana so that the Grafana interface can be used to visualize monitoring information. To quickly verify that cmdaemon on your cluster is compatible, you can check that the build date…