1. Home
  2. Major Operating System (OS) upgrades on a BCM cluster

Major Operating System (OS) upgrades on a BCM cluster

Preamble

There are several paths available for upgrading the operating system of a BCM Cluster. In this article, we will attempt to break down the upgrade process into separate sections, which may be applied to your cluster environment.

It is worth noting in Bright 9.0 and later releases, Multi-OS and Multi-Architecture support is available. 
Please see the section titled “11.7 Creating Images For Other Distributions And Architectures (Multidistro And Multiarch)” in the administration guide for details. 

In a nutshell, if the cluster is running Bright version 9.0 or later, the headnode could continue to run a different major OS release to the software image on the compute node.
For Example a RHEL 7 may be used to  create a RHEL 8 image, if the upgraded software is only required on the compute nodes.

As part of the multidistro setup process, a separate node-installer, /cm/shared and default image is created.
The RHEL 8 nodes in the above example would:
* boot the default-image-rhel8-x86_64 software image,
* provision with the /cm/node-installer-rhel8-x86_64
* mount /cm/shared-rhel8-x86_64, where the headnode and other RHEL 7 nodes would /cm/shared-rhel7-x86_64.

We recommend confirming whether the Multidistro feature of BCM meets your requirements before embarking on a Major OS upgrade. 

I am upgrading the Operating System (OS) of the head node.  

The Operating System of the head node in a BCM cluster is a standalone entity that is managed directly by the system administrator of the cluster.
Packages for the operating system are provided by upstream vendors and they generally fall outside the scope of BCM itself.
An exception here is DGX hardware running DGX OS which is provided by NVIDIA. 

In-place upgrades of the OS on the head node.

While upstream distributions may support in-place upgrades of the OS on a BCM head node, this is currently not supported or recommended by the BCM team.
Examples of in-place upgrades that BCM does not support

  • In-place upgrades of Ubuntu 20.04 to 22.04.
  • In-place upgrades for RHEL 7 to 8.

This is due to the many differences in configuration and packages from RHEL7 to RHEL8/RHEL9, Ubuntu 20.04 to 22.04 to make this process automated.

Major upgrades of the OS on the head node.

In general, major upgrades of the Operating System on a BCM headnode require a full reinstallation of the system. 
There is currently no procedure to upgrade the operating system while keeping the same installation of Bright, as there are too many differences in configuration and packages in major OS upgrades to make this process automated.
It is recommended the existing system be backed up before the reinstall as all existing data on the head node is lost. 
Examples of a major operating system upgrade on the head node that requires a full reinstallation are: 

  • RHEL 7 to RHEL 8 or 9.
  • Ubuntu 20.04 to 22.04.
  • DGX OS 5 to 6.

Major upgrades of the OS on the head node within the same BCM major release.

In a major OS upgrade where a reinstall is required, the configuration from the existing cluster may be saved and imported into the newly installed head node, where the major BCM version is the same.
For example:

  • Upgrading a head node on BCM 9.2 (RHEL 7) to BCM 9.2 (RHEL 8).
  • Upgrading a head node on BCM 9.2 (Ubuntu 20.04) to BCM 9.2 (Ubuntu 22.04).

In this circumstance, it is strongly recommended the existing cluster be upgraded to the latest minor release of BCM before re-installation. This limits possible incompatibilities between minor releases of the same major BCM release.

Major upgrades of the OS on the head node including the upgrade of the BCM major release.

In the event of a major OS and BCM upgrade, directly exporting the configuration from the previous cluster into the new cluster is not supported.
This is due to differences in the API and database schemas of major releases. Generally, these upgrades are done through a clean re-installation and configuration of the cluster. The existing configuration may be exported as a reference.

Examples of upgrade paths that require a clean re-installation of the cluster. 

  • Upgrading a head node on BCM 9.2 (RHEL 7) to BCM 10 (RHEL 8).
  • Upgrading a head node on BCM 9.2 (Ubuntu 20.04) to BCM 10.0 (Ubuntu 22.04).

 

I am upgrading the Operating System (OS) on the compute nodes.

In-place upgrades of the OS on the compute node. 

While upstream distributions may support in-place upgrades of the OS on a BCM compute node, this is currently not supported or recommended by the BCM team.
Examples of in-place upgrades that BCM does not support

  • In-place upgrades of Ubuntu 20.04 to 22.04.
  • In-place upgrades for RHEL 7 to 8

One exception is the migration from CentOS 8 to Rocky 8 using the upstream migration scripts provided by the Rocky Linux project. 
Please note that NVIDIA BCM support is unable to assist with third-party migration scripts for CentOS to Rocky Linux.

Major upgrades of the OS on the compute node.

In general, we recommend deploying new software images for Major OS upgrades on compute nodes and adding the required applications and configuration to the software image.This essentially means a clean reinstallation of the compute nodes with the new OS from a new software image. 

In BCM 9.0 and later, multi-distribution support was added. This allows the Linux distribution on the compute node to differ from the headnode.
Please refer to the section titled “11.7 Creating Images For Other Distributions And Architectures (Multidistro And Multiarch)” in the administration guide for supported Linux distributions. 

Software images may be generated using the cm-create-image command when the distro matches the headnode, or cm-image when the software image distribution.

Major upgrades of the OS on the head node including the upgrade of the BCM major release.

In general, we recommend deploying new software images for Major OS and BCM upgrades on compute nodes.
Taking an existing image from a previous cluster running an older OS (for example: BCM 9.1 Ubuntu 20.04) and importing this into the new cluster (for example: BCM 10.0, Ubuntu 22.04) is not supported. 

Updated on March 27, 2024