• How to avoid the ‘too many measurables’ event message

    The ‘too many measurables’ event messages are logged when a monitoring data producer has more than 500 measurables. When the event is logged, one can check the list of measurables created by the producer mentioned in the message, for example if ‘Too many measurables for: ProcNetDev’ event message is logged,…

  • How to disable a GPU on a node

    In certain scenarios disabling a node GPU can be necessary, for example when a GPU on a node becomes faulty and replacement is about to arrive. In this article we will show 2 possible ways for disabling an NVIDIA GPU on a compute node.   Method 1 * Collect the…

  • Exclude lists for the DGX Docker nodes

    This article is applicable for the DGX nodes where Docker has been set up without using NVIDIA Bright Cluster Manager. At the time of setting up Docker using Bright, a few Docker files and directories are added to different exclude lists to avoid syncing undesired files between software image and…

  • How to run SLURM jobs in Singularity containers via Jupyter

    In this article we are going to demonstrate a procedure to run SLURM jobs in Singularity containers by Jupyter on a Bright 9.2 Ubuntu 20.04 cluster. We assume that Jupyter, SLURM and Singularity have already been set up on the target cluster by following NVIDIA Bright Cluster Manager manuals. The…

  • Enabling Kdump (Ubuntu)

    The instructions in this article can be followed to enable Kdump on Ubuntu 20.04 compute nodes.  As an additional precaution, if you have a test compute node you could consider that for testing this procedure first.  One possibility is to clone the existing production software image in use on the…

  • How to create a Docker image to run Jupyter kernels

    This article demonstrates a procedure to create a Docker image which can be used to run Jupyter kernels via Kubernetes. Some sample Docker files for creating Jupyter kernel compatible images can be found in the following directories: In addition to the Docker images mentioned in the sample Docker files, any…