How do I configure Kubernetes to use NVIDIA GPUs on a Bright 8.0 cluster? Warning: this article is specifically intended for Bright 8.0. For instructions on enabling GPUs in Kubernetes for more recent versions...
How to collect metrics from older GPUs using NVML The Bright 8.0 metrics system uses DCGM to collect metrics from GPUs. But DCGM doesn’t support older GPUs. Not to...
How can I use GPUs with Bright Deep learning stack? This article is being updated. Please be aware the content herein, not limited to version numbers and slight syntax changes, may...
Using NVIDIA GPUs in X-application on a headless node via VNC The following steps can be followed to enable direct rendering from an x-client (glxgears or similar) running on a headless...
How Do I Create Docker images to use NVIDIA GPUs with Spark and XGBoost via RAPIDS? The steps described in this page can be followed to build a Docker image that is suitable for running distributed...
How can I run a simple test to stress test my GPUs? Make sure CUDA, git and cmake are installed on the head node of the cluster: Clone the Multi GPU Benchmark...
The NVIDIA GPU Operator with Kubernetes on a Bright Cluster This article has been written for existing Kubernetes deployments in Bright versions <= 9.2. Prerequisites This article was written with...
How to install GPUDirect Storage (GDS) on Bright 9.2 (DGX – BaseOS 5.4) This document is verified on Bright 9.2 with Ubuntu 20.04 with GDS 11.8 and MOFED 5.4 Preparation Clone the default...
Kubernetes: Limit GPU resource usage for a namespace About: This article contains Kubernetes resource quota examples for limiting GPU usage per namespace. For further information on setting resources...
How to install GPUDirect Storage (GDS) on BCM 10 (DGX – BaseOS 6) This document is verified on BCM10 with Ubuntu 22.04 (kernel 5.15) with GDS 12.3 and MOFED 23.10 on DGX A100...
How to disable a GPU on a node In certain scenarios disabling a node GPU can be necessary, for example when a GPU on a node becomes faulty...