Upgrading Kubernetes version 1.24 to 1.26 on a Bright 9.2 cluster.

Prerequisites

This article is written with Bright Cluster Manager 9.2 in mind, where Kubernetes is currently deployed with the default version 1.24.9 using containerd as its container runtime.
The instructions are written with RHEL 8 and Ubuntu 20.04 in mind.
These instructions have been run in dev environments a couple of times, all caveats should be covered by this KB article. We do however recommend making a backup of Etcd so a roll-back to an older version is possible. This backup can be made without interrupting the running cluster. Please follow the instructions on the following URL to create a snapshot of Etcd: https://kb.brightcomputing.com/knowledge-base/etcd-backup-and-restore-with-bright-9-0/
DISCLAIMER: Please note that the Pod Security Policies feature has been removed from Kubernetes in version 1.25 (see: https://kubernetes.io/docs/concepts/security/pod-security-policy/). Extra manual work will be needed if this feature has been enabled with cm-kubernetes-setup --enable-psp.

Special note:

Please make sure containerd is used as the container runtime using:

root@rb-kube92:~# kubectl get nodes -o wide
NAME        STATUS   ROLES                  AGE     VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                               KERNEL-VERSION              CONTAINER-RUNTIME
node001     Ready    worker                 8m17s   v1.24.9   10.141.0.1       <none>        Red Hat Enterprise Linux 8.7 (Ootpa)   4.18.0-425.3.1.el8.x86_64   containerd://1.6.21
node002     Ready    worker                 8m16s   v1.24.9   10.141.0.2       <none>        Red Hat Enterprise Linux 8.7 (Ootpa)   4.18.0-425.3.1.el8.x86_64   containerd://1.6.21
node003     Ready    worker                 8m17s   v1.24.9   10.141.0.3       <none>        Red Hat Enterprise Linux 8.7 (Ootpa)   4.18.0-425.3.1.el8.x86_64   containerd://1.6.21
rb-kube92   Ready    control-plane,master   8m17s   v1.24.9   10.141.255.254   <none>        Red Hat Enterprise Linux 8.7 (Ootpa)   4.18.0-425.3.1.el8.x86_64   containerd://1.6.21

This should say containerd://<version>. If docker is being used, a different upgrade procedure will be necessary, and not the one described in this KB article. If it does, as in the above example output.

Second note:

We do need to restart the cmd service on Kubernetes nodes as part of kubelet updates. Due to incompatible flags being used by kubelet that have been removed in Kubernetes v1.26. For this reason we will see a systemctl restart cmd appearing in the commands to execute later on in this KB article.

2. Upgrade approach

For the purposes of this KB article we will use the following example deployment on six nodes, 3 head nodes (2 of them in a HA setup, which is not a requirement), and 3 compute-nodes make up the Kubernetes cluster.

[root@rb-kube92-a ~]# module load kubernetes/default/1.24.9

[root@rb-kube92-a ~]# kubectl get nodes
NAME          STATUS   ROLES                  AGE   VERSION
rb-kube92-a   Ready    control-plane,master   37m   v1.24.9
rb-kube92-b   Ready    control-plane,master   36m   v1.24.9
node001       Ready    control-plane,master   37m   v1.24.9
node002       Ready    worker                 37m   v1.24.9
node003       Ready    worker                 37m   v1.24.9
node004       Ready    worker                 37m   v1.24.9

[root@rb-kube92-a ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.24.9", GitCommit:"3cce4a82b44f032d0cd1a1790e6d2f5a55d20aae", GitTreeState:"clean", BuildDate:"2021-08-11T18:16:05Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.24.9", GitCommit:"3cce4a82b44f032d0cd1a1790e6d2f5a55d20aae", GitTreeState:"clean", BuildDate:"2021-08-11T18:10:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}

3. Prepare a configuration overlay for control-plane

We’re updating from version 1.24 to 1.26.

Between these versions no additional flags are used, only certain flags have been deprecated, but if a sufficiently new cmdaemon process is deployed. Then this should not be a problem. (We do require a restart of CMDaemon as we mentioned in the preconditions in step 1.)

4. Prepare software images

We will bump the kubernetes package for each software image that is relevant to the Kubernetes cluster. In this example scenario our three compute nodes are provisioned from /cm/images/default-image. We will use the cm-chroot-sw-img program to replace the kubernetes package.

[root@rb-kube92-a ~]# cm-chroot-sw-img /cm/images/default-image/ # go into chroot

$ apt install -y cm-kubernetes124- cm-kubernetes126 # for ubuntu

$ yum swap -y cm-kubernetes124 cm-kubernetes126 # for RHEL

$ exit

5. Update one of the control-plane nodes

We will pick node001. If your cluster does not have control-plane nodes running on compute nodes, see the next section on how to update the Head Nodes, and pick a Head Node that runs as a control-plane. If you only have control-plane nodes running on compute nodes, step 6 will not be needed, and can be skipped.

5.1. Drain the node first

Please refer to the upstream documentation for details w/r/t draining here:

https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/

One note of caution, Pods that are running on the given node will be evicted/terminated. Unless they are managed by a higher-level construct such as a Deployment, they won’t be rescheduled on a different node.

We will execute drain on the node with the following command:

root@rb-kube92-a:~# kubectl drain node001 --ignore-daemonsets --delete-emptydir-data
node/node001 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-csvw5
evicting pod local-path-storage/local-path-provisioner-8f77648b6-xrgdr
evicting pod ingress-nginx/ingress-nginx-admission-create-1-5-1-vzvbq
evicting pod ingress-nginx/ingress-nginx-admission-patch-1-5-1-6ncxw
evicting pod kube-system/calico-kube-controllers-7fc4577899-k9c6r
pod/ingress-nginx-admission-patch-1-5-1-6ncxw evicted
pod/ingress-nginx-admission-create-1-5-1-vzvbq evicted
pod/calico-kube-controllers-7fc4577899-k9c6r evicted
pod/local-path-provisioner-8f77648b6-xrgdr evicted
node/node001 drained

We will need the --ignore-daemonsets flag, and likely the --delete-emptydir-data flag, but depending on the workload of the cluster, additional flags might be needed as well. The kubectl drain command should hint to such additional flags, similarly to how it does this when we do not pass any flags:

root@rb-kube92-a:~# kubectl drain node001
node/node001 cordoned
error: unable to drain node "node001" due to error:cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-csvw5, continuing command...
There are pending nodes to be drained:
node001
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-csvw5

We can confirm from the drain command output that the node has been succesfully drained (“node/node001 drained“). The status for the node will also show “SchedulingDisabled“:

root@rb-kube92-a:~# kubectl get nodes
NAME        STATUS                   ROLES                       AGE   VERSION
node001     Ready,SchedulingDisabled control-plane,master,worker 102m  v1.24.9
node002     Ready                    worker                      102m  v1.24.9
node003     Ready                    worker                      102m  v1.24.9
node004     Ready                    worker                      102m  v1.24.9
rb-kube92-a Ready                    control-plane,master        102m  v1.24.9
rb-kube92-b Ready                    control-plane,master        102m  v1.24.9

5.2. Update the node

Given that this node has not received an image update yet as, in our example, the node is in a separate category from the one used by the workers so we need to do that first if in this scenario:

[root@rb-kube92-a ~]# cmsh
[rb-kube92-a]% device
[rb-kube92-a->device]% imageupdate -w node001
Wed Nov 23 15:28:44 2022 [notice] rb-kube92-a: Provisioning started: sending ea-k8s-update:/cm/images/default-image to node001:/, mode UPDATE, dry run = no
Wed Nov 23 15:29:32 2022 [notice] rb-kube92-a: Provisioning completed: sent ea-k8s-update:/cm/images/default-image to node001:/, mode UPDATE, dry run = no
imageupdate -w node001 [ COMPLETED ]

We will now trigger a restart of the Kubernetes services, and Bright Cluster Manager.

pdsh -w node001 "systemctl daemon-reload; systemctl restart kubelet; systemctl restart kube-proxy; systemctl restart cmd;"

In this case we can try to exercise the API server on the node via curl:

[root@rb-kube92-a ~]# curl -k https://node001:6443; echo
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "Unauthorized",
"reason": "Unauthorized",
"code": 401
}

We should be able to see the updated version appear for node001 in:

[root@rb-kube92-a ~]# kubectl get nodes
NAME          STATUS                     ROLES                  AGE   VERSION
node001       Ready,SchedulingDisabled   control-plane,master   47m   v1.26.3  <<< updated
node002       Ready                      worker                 47m   v1.24.9
node003       Ready                      worker                 47m   v1.24.9
node004       Ready                      worker                 47m   v1.24.9
rb-kube92-a   Ready                      control-plane,master   47m   v1.24.9
rb-kube92-b   Ready                      control-plane,master   47m   v1.24.9
sdf

Please note that at this point there is a possible version mismatch between the “kubectl” binary and the version of the Kubernetes API server. In case kubectl is still version 1.24 and happens to hit the control-plane we just updated, we will see this warning:

[root@rb-kube92-a ~]# kubectl version
...
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.17", GitCommit:"22a9682c8fe855c321be75c5faacde343f909b04", GitTreeState:"clean", BuildDate:"2023-08-23T23:44:35Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.15", GitCommit:"1649f592f1909b97aa3c2a0a8f968a3fd05a7b8b", GitTreeState:"clean", BuildDate:"2024-03-14T00:54:27Z", GoVersion:"go1.21.8", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.24) and server (1.26) exceeds the supported minor version skew of +/-1

This is because officially an update from 1.24 to 1.26, an upgrade to the intermediate version 1.25 has to be done as well. This document describes the update directly to 1.26. This warning might temporarily cause certain Base Command Manager health checks to fail, this problem will go away as soon as we’ve completed updating all the control-plane nodes and the Head Nodes (see section 6)

5.3. Undrain the node.

Now we can undrain with “kubectl uncordon node001” as follows:

root@rb-kube92-a:~# kubectl uncordon node001
node/node001 uncordoned

This has the following result on the get nodes output (no more SchedulingDisabled status).

[root@rb-kube92-a ~]# kubectl get nodes
NAME          STATUS  ROLES                  AGE   VERSION
node001       Ready   control-plane,master   47m   v1.26.3  <<< updated
node002       Ready   worker                 47m   v1.24.9
node003       Ready   worker                 47m   v1.24.9
node004       Ready   worker                 47m   v1.24.9
rb-kube92-a   Ready   control-plane,master   47m   v1.24.9
rb-kube92-b   Ready   control-plane,master   47m   v1.24.9

6. Updating Head Nodes

In order to continue updating the Head Nodes though, we need to execute the same steps we did in the software image(s) (from step 4) on the Head Nodes themselves. In case there are two, execute the following on both Head Nodes.

[root@rb-kube92-a ~]# apt install -y cm-kubernetes124- cm-kubernetes126 # for ubuntu

[root@rb-kube92-a ~]# yum swap -y cm-kubernetes124 cm-kubernetes126 # for RHEL

If you do not have any control-plane nodes running on Head Nodes, you might want to skip the rest of this section, and repeat the previous section (5) for the other control-plane nodes first.

And we also have to take the same steps w/r/t draining and undraining the node, also introduced in section 5:

root@rb-kube92-a:~# kubectl drain rb-kube92-a --ignore-daemonsets --delete-emptydir-data
node/rb-kube92-a cordoned
...
node/rb-kube92-a drained

We can restart the kubernetes services + BCM on one of the Head Nodes, we’ll pick the active Head Node in our case:

[root@rb-kube92-a ~]# systemctl daemon-reload; systemctl restart kubelet; systemctl restart kube-proxy; systemctl restart cmd;

Then wait till the node has been updated:

[root@rb-kube92-a ~]# kubectl get nodes
NAME         STATUS                     ROLES                  AGE   VERSION
rb-kube92-a  Ready,SchedulingDisabled   control-plane,master   47m   v1.26.3  <<< updated
rb-kube92-b  Ready                      control-plane,master   47m   v1.24.9
node001      Ready                      control-plane,master   47m   v1.26.3  <<< (updated in previous section)
node002      Ready                      worker                 47m   v1.24.9
node003      Ready                      worker                 47m   v1.24.9
node004      Ready                      worker                 47m   v1.24.9

We then undrain the node:

root@rb-kube92-a:~# kubectl uncordon rb-kube92-a
node/rb-kube92-a uncordoned

Finally, we will repeat for the secondary Head Node (passive in our case). And after that, finally, the Kubernetes control-plane should be fully updated.

[root@rb-kube92-a ~]# kubectl get nodes
NAME         STATUS   ROLES                  AGE   VERSION
rb-kube92-a  Ready    control-plane,master   47m   v1.26.3  <<< updated
rb-kube92-b  Ready    control-plane,master   47m   v1.26.3  <<< updated
node001      Ready    control-plane,master   47m   v1.26.3  <<< (was updated in section 5.)
node002      Ready    worker                 47m   v1.24.9
node003      Ready    worker                 47m   v1.24.9
node004      Ready    worker                 47m   v1.24.9

7. Image update one of the workers

We start with one to see if we can update on of the kubelets. This should give us some confidence before upgrading all of the kubelets.

In our example node002 is a worker, and we will first drain the node. Again see https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/ for more details on draining.

[root@rb-kube92-a ~]# kubectl drain node002 --ignore-daemonsets --delete-emptydir-data

The drain command will evict all Pods and prevent anything from being scheduled on the node. After the command finishes successfully we will issue an imageupdate on node002 via cmsh.

[root@rb-kube92-a ~]# cmsh
[rb-kube92-a]% device
[rb-kube92-a->device]% imageupdate -w node002
Wed Nov 23 15:09:02 2022 [notice] rb-kube92-a: Provisioning started: sending ea-k8s-a:/cm/images/default-image to node002:/, mode UPDATE, dry run = no
Wed Nov 23 15:09:56 2022 [notice] rb-kube92-a: Provisioning completed: sent ea-k8s-a:/cm/images/default-image to node002:/, mode UPDATE, dry run = no
imageupdate -w node002 [ COMPLETED ]

We will now restart cmd, kubelet and kube-proxy services on the node.

[root@rb-kube92-a ~]# pdsh -w node002 'systemctl daemon-reload; systemctl restart cmd; systemctl restart kubelet.service; systemctl restart kube-proxy.service'

After a few moments, verify that the kubelet has been updated correctly.

[root@rb-kube92-a ~]# kubectl get nodes 
NAME        STATUS                    ROLES                 AGE  VERSION 
rb-kube92-a Ready                     control-plane,master  66m  v1.26.3
rb-kube92-b Ready                     control-plane,master  66m  v1.26.3 
node001     Ready                     control-plane,master  66m  v1.26.3 
node002     Ready,SchedulingDisabled  worker                66m  v1.26.3 <<< updated 
node003     Ready                     worker                66m  v1.24.9 
node004     Ready                     worker                66m  v1.24.9

Notice how node002 has version set to 1.26.3 Now we can re-enable scheduling for the node.

[root@rb-kube92-a ~]# kubectl uncordon node002
node/node002 uncordoned

8. Image update the rest of the workers

This can be done similarly to step 7, one-by-one, or in batches. In the case of this KB article we’ll do the remaining compute nodes node00[3-4] in one go. First the draining:

root@rb-kube92-a:~# kubectl drain node003 --ignore-daemonsets --delete-emptydir-data
node/node003 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-lt9pq
evicting pod cmkpm-system/cmkpm-controller-manager-8c4b69895-m9m5g
evicting pod kubernetes-dashboard/kubernetes-dashboard-bff8f9bcf-qhkpv
pod/cmkpm-controller-manager-8c4b69895-m9m5g evicted
pod/kubernetes-dashboard-bff8f9bcf-qhkpv evicted
node/node003 drained
root@rb-kube92-a:~# kubectl drain node004 --ignore-daemonsets --delete-emptydir-data
node/node004 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-28pvv
evicting pod kube-system/calico-kube-controllers-7fc4577899-r4rht
evicting pod kube-system/kube-state-metrics-6d44cbdb56-x5488
evicting pod kube-system/metrics-server-77c677b45f-pblwt
pod/kube-state-metrics-6d44cbdb56-x5488 evicted
pod/calico-kube-controllers-7fc4577899-r4rht evicted
pod/metrics-server-77c677b45f-pblwt evicted
node/node004 drained

We issue an imageupdate, but for the whole category in cmsh: device; imageupdate -c default -w

We restart the services:

pdsh -w node00[3-4] 'systemctl daemon-reload; systemctl restart cmd; systemctl restart kubelet.service; systemctl restart kube-proxy.service'

We confirm the version has updated.

[root@rb-kube92-a ~]# kubectl get nodes
NAME          STATUS                      ROLES                  AGE   VERSION
rb-kube92-a   Ready                       control-plane,master   76m   v1.26.3
rb-kube92-b   Ready                       control-plane,master   75m   v1.26.3
node001       Ready                       control-plane,master   76m   v1.26.3
node002       Ready                       worker                 76m   v1.26.3
node003       Ready,SchedulingDisabled    worker                 76m   v1.26.3
node004       Ready,SchedulingDisabled    worker                 76m   v1.26.3

Then we undrain:

root@rb-kube92-a:~# kubectl uncordon node003
node/node003 uncordoned
root@rb-kube92-a:~# kubectl uncordon node004
node/node004 uncordoned

Then we confirm that scheduling is re-enabled:

[root@rb-kube92-a ~]# kubectl get nodes
NAME          STATUS   ROLES                  AGE   VERSION
rb-kube92-a   Ready    control-plane,master   76m   v1.26.3
rb-kube92-b   Ready    control-plane,master   75m   v1.26.3
node001       Ready    control-plane,master   76m   v1.26.3
node002       Ready    worker                 76m   v1.26.3
node003       Ready    worker                 76m   v1.26.3
node004       Ready    worker                 76m   v1.26.3

9. Updating Addons

Issuing the following command updates the addons. The output for the command has been omitted to avoid cluttering this KB article, but backups of the original yaml are made to the following directory: /cm/local/apps/kubernetes/var/, this information is printed as part of the output.

cm-kubernetes-setup -v --update-addons

The update script will have backed up the old configuration inside NVIDIA Bright Cluster Manager as well:

[rb-kube92-a]% kubernetes
[rb-kube92-a->kubernetes[default]]% appgroups
[rb-kube92-a->kubernetes[default]->appgroups]% list
Name (key) Applications
-------------------------------- ------------------------------
system <13 in submode>
system-backup-2022-11-23-193952 <13 in submode>

10. Finalize the update.

Module files: In case the Head Nodes were not part of the Kubernetes cluster we just upgraded, chances are the old module file is still present. In that case a restart of the “cmd” process will force Base Command Manager to generate it. This can be done as follows on all Head Nodes:

[root@rb-kube92-a ~]# systemctl restart cmd

Kubernetes should be ready at this point, we can get rid of the old module file and make one final change to the configuration overlays.

[root@rb-kube92-a ~]# pdsh -A rm -rf /cm/local/modulefiles/kubernetes/default/1.24.9

Reboot nodes if needed: We do occasionally see certain Pods resulting in CrashLoopBackOff state after they have been upgraded. Typically if this happens it will be only on one or two specific nodes. This seems to happen randomly. The problem is usually (after inspecting the container logs for the failing Pods) that they cannot communicate with the Kubernetes API server (via the Kubernetes Service Network). This is almost always resolved by issuing a reboot of the node. We believe it can be a race-condition that happens with Calico Pods and/or NVIDIA Network Operator pods restarting. Resulting in existing Pods that no longer seem to have functioning networking. Restarting of Pods on such nodes doesn’t always fix the problem, but it might, we found that a restart of the node seems to do the trick. This might be an issue specific to Kubernetes 1.26 and the Calico version 3.20.6.

11. Rollback the update.

In order to go back to the previous version 1.24, we have to follow the reverse of steps 1-10.

Downgrade the addons

This is needed if Step 9 was executed only.

[root@rb-kube92-a ~]# cmsh
[rb-kube92-a]% kubernetes
[rb-kube92-a->kubernetes[default]]% appgroups
[rb-kube92-a->kubernetes[default]->appgroups]% list
Name (key)                       Applications
-------------------------------- ------------------------------
system                           <13 in submode>
system-backup-2022-11-24-112008  <13 in submode>
[rb-kube92-a->kubernetes[default]->appgroups]% set system enabled no
[rb-kube92-a->kubernetes*[default*]->appgroups*]% set system-backup-2022-11-24-112008 enabled yes
[rb-kube92-a->kubernetes*[default*]->appgroups*]% commit

This should keep Kubernetes busy for a minute, after it’s done restoring all the resources, do the following steps:

Downgrading the packages

We need to downgrade the newly installed cm-kubernetes126 and downgrade it everywhere to cm-kubernetes124 .

Meaning that on both Head Nodes and relevant software images the following command needs to be executed.

apt install -y cm-kubernetes126- cm-kubernetes124  # for ubuntu

yum swap -y cm-kubernetes126 cm-kubernetes124  # for RHEL

Image update relevant nodes

We need to image update the relevant nodes next, in order for all Kubernetes nodes to have the Kubernetes 1.24 binaries again. (e.g. imageupdate -c default -w in cmsh)

Restart services

Depending on the state of the cluster, you might, as we did with the upgrade, drain nodes first before restarting services, and undrain them again once the version has been confirmed to have been downgraded. In case Kubernetes is malfunctioning this can be also be skipped. We have tested this procedure without the draining successfully as well.

On all the nodes relevant to the Kube cluster, we need to execute the following reload + restarts. In our example setup, as follows. Please note that it includes a restart of Bright Cluster Manager.

[root@rb-kube92-a ~]# pdsh -w rb-kube92-a,rb-kube92-b,node00[1-4] "systemctl daemon-reload; systemctl restart cmd; systemctl restart '*kube*.service'"

We can cleanup the module file for version 1.26 to avoid it from popping up in tab-completion.

[root@rb-kube92-a ~]# pdsh -A rm -rf /cm/local/modulefiles/kubernetes/default/1.26.3

All versions should be back to 1.24.9:

[root@rb-kube92-a ~]# kubectl get nodes
NAME          STATUS   ROLES                  AGE   VERSION
rb-kube92-a   Ready    control-plane,master   22h   v1.24.9
rb-kube92-b   Ready    control-plane,master   22h   v1.24.9
node001       Ready    control-plane,master   22h   v1.24.9
node002       Ready    worker                 22h   v1.24.9
node003       Ready    worker                 22h   v1.24.9
node004       Ready    worker                 22h   v1.24.9

It is very unlikely with this downgrade from 1.26 back to 1.24, however, should something get into an invalid, unrecoverable state, we can restore the Etcd database at this point with the snapshot created in Step 1. The instructions for this are explained in the same KB article referenced in Step 1.

Updated on September 24, 2024

Prerequisites