-
Prerequisites
- This article is written with Bright Cluster Manager 9.2 in mind, where Kubernetes is currently deployed with the default version 1.24.9 using containerd as its container runtime.
- The instructions are written with RHEL 8 and Ubuntu 20.04 in mind.
- These instructions have been run in dev environments a couple of times, all caveats should be covered by this KB article. We do however recommend making a backup of Etcd so a roll-back to an older version is possible. This backup can be made without interrupting the running cluster. Please follow the instructions on the following URL to create a snapshot of Etcd: https://kb.brightcomputing.com/knowledge-base/etcd-backup-and-restore-with-bright-9-0/
- DISCLAIMER: Please note that the Pod Security Policies feature has been removed from Kubernetes in version 1.25 (see: https://kubernetes.io/docs/concepts/security/pod-security-policy/). Extra manual work will be needed if this feature has been enabled with “cm-kubernetes-setup –enable-psp”.
Special note:
Please make sure containerd is used as the container runtime using:
root@rb-kube92:~# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node001 Ready worker 8m17s v1.24.9 10.141.0.1 <none> Red Hat Enterprise Linux 8.7 (Ootpa) 4.18.0-425.3.1.el8.x86_64 containerd://1.6.21
node002 Ready worker 8m16s v1.24.9 10.141.0.2 <none> Red Hat Enterprise Linux 8.7 (Ootpa) 4.18.0-425.3.1.el8.x86_64 containerd://1.6.21
node003 Ready worker 8m17s v1.24.9 10.141.0.3 <none> Red Hat Enterprise Linux 8.7 (Ootpa) 4.18.0-425.3.1.el8.x86_64 containerd://1.6.21
rb-kube92 Ready control-plane,master 8m17s v1.24.9 10.141.255.254 <none> Red Hat Enterprise Linux 8.7 (Ootpa) 4.18.0-425.3.1.el8.x86_64 containerd://1.6.21
This should say containerd://<version>
. If docker is being used, a different upgrade procedure will be necessary, and not the one described in this KB article. If it does, as in the above example output.
Second note:
We do need to restart the cmd
service on Kubernetes nodes as part of kubelet updates. Due to incompatible flags being used by kubelet that have been removed in Kubernetes v1.26. For this reason we will see a systemctl restart cmd
appearing in the commands to execute later on in this KB article.
2. Upgrade approach
For the purposes of this KB article we will use the following example deployment on six nodes, 3 head nodes (2 of them in a HA setup, which is not a requirement), and 3 compute-nodes make up the Kubernetes cluster.
[root@rb-kube92-a ~]# module load kubernetes/default/1.24.9
[root@rb-kube92-a ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
rb-kube92-a Ready control-plane,master 37m v1.24.9
rb-kube92-b Ready control-plane,master 36m v1.24.9
node001 Ready control-plane,master 37m v1.24.9
node002 Ready worker 37m v1.24.9
node003 Ready worker 37m v1.24.9
node004 Ready worker 37m v1.24.9
[root@rb-kube92-a ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.24.9", GitCommit:"3cce4a82b44f032d0cd1a1790e6d2f5a55d20aae", GitTreeState:"clean", BuildDate:"2021-08-11T18:16:05Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.24.9", GitCommit:"3cce4a82b44f032d0cd1a1790e6d2f5a55d20aae", GitTreeState:"clean", BuildDate:"2021-08-11T18:10:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
3. Prepare a configuration overlay for control-plane
We’re updating from version 1.24 to 1.26.
Between these versions no additional flags are used, only certain flags have been deprecated, but if a sufficiently new cmdaemon process is deployed. Then this should not be a problem. (We do require a restart of CMDaemon as we mentioned in the preconditions in step 1.)
4. Prepare software images
We will bump the kubernetes package for each software image that is relevant to the Kubernetes cluster. In this example scenario our three compute nodes are provisioned from /cm/images/default-image
. We will use the cm-chroot-sw-img
program to replace the kubernetes package.
[root@rb-kube92-a ~]# cm-chroot-sw-img /cm/images/default-image/ # go into chroot
$ apt install -y cm-kubernetes124- cm-kubernetes126 # for ubuntu
$ yum swap -y cm-kubernetes124 cm-kubernetes126 # for RHEL
$ exit
5. Update one of the control-plane nodes
We will pick node001
. If your cluster does not have control-plane nodes running on compute nodes, see the next section on how to update the Head Nodes, and pick a Head Node that runs as a control-plane. If you only have control-plane nodes running on compute nodes, step 6 will not be needed, and can be skipped.
5.1. Drain the node first
Please refer to the upstream documentation for details w/r/t draining here:
https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/
One note of caution, Pods that are running on the given node will be evicted/terminated. Unless they are managed by a higher-level construct such as a Deployment, they won’t be rescheduled on a different node.
We will execute drain on the node with the following command:
root@rb-kube92-a:~# kubectl drain node001 --ignore-daemonsets --delete-emptydir-data node/node001 cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-csvw5 evicting pod local-path-storage/local-path-provisioner-8f77648b6-xrgdr evicting pod ingress-nginx/ingress-nginx-admission-create-1-5-1-vzvbq evicting pod ingress-nginx/ingress-nginx-admission-patch-1-5-1-6ncxw evicting pod kube-system/calico-kube-controllers-7fc4577899-k9c6r pod/ingress-nginx-admission-patch-1-5-1-6ncxw evicted pod/ingress-nginx-admission-create-1-5-1-vzvbq evicted pod/calico-kube-controllers-7fc4577899-k9c6r evicted pod/local-path-provisioner-8f77648b6-xrgdr evicted node/node001 drained
We will need the –ignore-daemonsets flag, and likely the –delete-emptydir-data flag, but depending on the workload of the cluster, additional flags might be needed as well. The kubectl drain command should hint to such additional flags, similarly to how it does this when we do not pass any flags:
root@rb-kube92-a:~# kubectl drain node001 node/node001 cordoned error: unable to drain node "node001" due to error:cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-csvw5, continuing command... There are pending nodes to be drained: node001 cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-csvw5
We can confirm from the drain command output that the node has been succesfully drained (“node/node001 drained”). The status for the node will also show “SchedulingDisabled”:
root@rb-kube92-a:~# kubectl get nodes NAME STATUS ROLES AGE VERSION node001 Ready,SchedulingDisabled control-plane,master,worker 102m v1.24.9 node002 Ready worker 102m v1.24.9 node003 Ready worker 102m v1.24.9 node004 Ready worker 102m v1.24.9 rb-kube92-a Ready control-plane,master 102m v1.24.9 rb-kube92-b Ready control-plane,master 102m v1.24.9
5.2. Update the node
Given that this node has not received an image update yet as, in our example, the node is in a separate category from the one used by the workers so we need to do that first if in this scenario:
[root@rb-kube92-a ~]# cmsh
[rb-kube92-a]% device
[rb-kube92-a->device]% imageupdate -w node001
Wed Nov 23 15:28:44 2022 [notice] rb-kube92-a: Provisioning started: sending ea-k8s-update:/cm/images/default-image to node001:/, mode UPDATE, dry run = no
Wed Nov 23 15:29:32 2022 [notice] rb-kube92-a: Provisioning completed: sent ea-k8s-update:/cm/images/default-image to node001:/, mode UPDATE, dry run = no
imageupdate -w node001 [ COMPLETED ]
We will now trigger a restart of the Kubernetes services, and Bright Cluster Manager.
pdsh -w node001 "systemctl daemon-reload; systemctl restart kubelet; systemctl restart kube-proxy; systemctl restart cmd;"
In this case we can try to exercise the API server on the node via curl:
[root@rb-kube92-a ~]# curl -k https://node001:6443; echo
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "Unauthorized",
"reason": "Unauthorized",
"code": 401
}
We should be able to see the updated version appear for node001 in:
[root@rb-kube92-a ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node001 Ready,SchedulingDisabled control-plane,master 47m v1.26.3 <<< updated
node002 Ready worker 47m v1.24.9
node003 Ready worker 47m v1.24.9
node004 Ready worker 47m v1.24.9
rb-kube92-a Ready control-plane,master 47m v1.24.9
rb-kube92-b Ready control-plane,master 47m v1.24.9
sdf
Please note that at this point there is a possible version mismatch between the “kubectl” binary and the version of the Kubernetes API server. In case kubectl is still version 1.24 and happens to hit the control-plane we just updated, we will see this warning:
[root@rb-kube92-a ~]# kubectl version
...
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.17", GitCommit:"22a9682c8fe855c321be75c5faacde343f909b04", GitTreeState:"clean", BuildDate:"2023-08-23T23:44:35Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.15", GitCommit:"1649f592f1909b97aa3c2a0a8f968a3fd05a7b8b", GitTreeState:"clean", BuildDate:"2024-03-14T00:54:27Z", GoVersion:"go1.21.8", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.24) and server (1.26) exceeds the supported minor version skew of +/-1
This is because officially an update from 1.24 to 1.26, an upgrade to the intermediate version 1.25 has to be done as well. This document describes the update directly to 1.26. This warning might temporarily cause certain Base Command Manager health checks to fail, this problem will go away as soon as we’ve completed updating all the control-plane nodes and the Head Nodes (see section 6)
5.3. Undrain the node.
Now we can undrain with “kubectl uncordon node001” as follows:
root@rb-kube92-a:~# kubectl uncordon node001 node/node001 uncordoned
This has the following result on the get nodes output (no more SchedulingDisabled status).
[root@rb-kube92-a ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node001 Ready control-plane,master 47m v1.26.3 <<< updated
node002 Ready worker 47m v1.24.9
node003 Ready worker 47m v1.24.9
node004 Ready worker 47m v1.24.9
rb-kube92-a Ready control-plane,master 47m v1.24.9
rb-kube92-b Ready control-plane,master 47m v1.24.9
6. Updating Head Nodes
In order to continue updating the Head Nodes though, we need to execute the same steps we did in the software image(s) (from step 4) on the Head Nodes themselves. In case there are two, execute the following on both Head Nodes.
[root@rb-kube92-a ~]# apt install -y cm-kubernetes124- cm-kubernetes126 # for ubuntu
[root@rb-kube92-a ~]# yum swap -y cm-kubernetes124 cm-kubernetes126 # for RHEL
If you do not have any control-plane nodes running on Head Nodes, you might want to skip the rest of this section, and repeat the previous section (5) for the other control-plane nodes first.
And we also have to take the same steps w/r/t draining and undraining the node, also introduced in section 5:
root@rb-kube92-a:~# kubectl drain rb-kube92-a --ignore-daemonsets --delete-emptydir-data node/rb-kube92-a cordoned ... node/rb-kube92-a drained
We can restart the kubernetes services + BCM on one of the Head Nodes, we’ll pick the active Head Node in our case:
[root@rb-kube92-a ~]# systemctl daemon-reload; systemctl restart kubelet; systemctl restart kube-proxy; systemctl restart cmd;
Then wait till the node has been updated:
[root@rb-kube92-a ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
rb-kube92-a Ready,SchedulingDisabled control-plane,master 47m v1.26.3 <<< updated
rb-kube92-b Ready control-plane,master 47m v1.24.9
node001 Ready control-plane,master 47m v1.26.3 <<< (updated in previous section)
node002 Ready worker 47m v1.24.9
node003 Ready worker 47m v1.24.9
node004 Ready worker 47m v1.24.9
We then undrain the node:
root@rb-kube92-a:~# kubectl uncordon rb-kube92-a node/rb-kube92-a uncordoned
Finally, we will repeat for the secondary Head Node (passive in our case). And after that, finally, the Kubernetes control-plane should be fully updated.
[root@rb-kube92-a ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
rb-kube92-a Ready control-plane,master 47m v1.26.3 <<< updated
rb-kube92-b Ready control-plane,master 47m v1.26.3 <<< updated
node001 Ready control-plane,master 47m v1.26.3 <<< (was updated in section 5.)
node002 Ready worker 47m v1.24.9
node003 Ready worker 47m v1.24.9
node004 Ready worker 47m v1.24.9
7. Image update one of the workers
We start with one to see if we can update on of the kubelets. This should give us some confidence before upgrading all of the kubelets.
In our example node002
is a worker, and we will first drain the node. Again see https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/ for more details on draining.
[root@rb-kube92-a ~]# kubectl drain node002 --ignore-daemonsets --delete-emptydir-data
The drain command will evict all Pods and prevent anything from being scheduled on the node. After the command finishes successfully we will issue an imageupdate
on node002
via cmsh.
[root@rb-kube92-a ~]# cmsh
[rb-kube92-a]% device
[rb-kube92-a->device]% imageupdate -w node002
Wed Nov 23 15:09:02 2022 [notice] rb-kube92-a: Provisioning started: sending ea-k8s-a:/cm/images/default-image to node002:/, mode UPDATE, dry run = no
Wed Nov 23 15:09:56 2022 [notice] rb-kube92-a: Provisioning completed: sent ea-k8s-a:/cm/images/default-image to node002:/, mode UPDATE, dry run = no
imageupdate -w node002 [ COMPLETED ]
We will now restart cmd, kubelet and kube-proxy services on the node.
[root@rb-kube92-a ~]# pdsh -w node002 'systemctl daemon-reload; systemctl restart cmd; systemctl restart kubelet.service; systemctl restart kube-proxy.service'
After a few moments, verify that the kubelet has been updated correctly.
[root@rb-kube92-a ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
rb-kube92-a Ready control-plane,master 66m v1.26.3
rb-kube92-b Ready control-plane,master 66m v1.26.3
node001 Ready control-plane,master 66m v1.26.3
node002 Ready,SchedulingDisabled worker 66m v1.26.3 <<< updated
node003 Ready worker 66m v1.24.9
node004 Ready worker 66m v1.24.9
Notice how node002 has version set to 1.26.3 Now we can re-enable scheduling for the node.
[root@rb-kube92-a ~]# kubectl uncordon node002
node/node002 uncordoned
8. Image update the rest of the workers
This can be done similarly to step 7, one-by-one, or in batches. In the case of this KB article we’ll do the remaining compute nodes node00[3-4]
in one go. First the draining:
root@rb-kube92-a:~# kubectl drain node003 --ignore-daemonsets --delete-emptydir-data node/node003 cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-lt9pq evicting pod cmkpm-system/cmkpm-controller-manager-8c4b69895-m9m5g evicting pod kubernetes-dashboard/kubernetes-dashboard-bff8f9bcf-qhkpv pod/cmkpm-controller-manager-8c4b69895-m9m5g evicted pod/kubernetes-dashboard-bff8f9bcf-qhkpv evicted node/node003 drained root@rb-kube92-a:~# kubectl drain node004 --ignore-daemonsets --delete-emptydir-data node/node004 cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-28pvv evicting pod kube-system/calico-kube-controllers-7fc4577899-r4rht evicting pod kube-system/kube-state-metrics-6d44cbdb56-x5488 evicting pod kube-system/metrics-server-77c677b45f-pblwt pod/kube-state-metrics-6d44cbdb56-x5488 evicted pod/calico-kube-controllers-7fc4577899-r4rht evicted pod/metrics-server-77c677b45f-pblwt evicted node/node004 drained
We issue an imageupdate
, but for the whole category in cmsh: device; imageupdate -c default -w
We restart the services:
pdsh -w node00[3-4] 'systemctl daemon-reload; systemctl restart cmd; systemctl restart kubelet.service; systemctl restart kube-proxy.service'
We confirm the version has updated.
[root@rb-kube92-a ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
rb-kube92-a Ready control-plane,master 76m v1.26.3
rb-kube92-b Ready control-plane,master 75m v1.26.3
node001 Ready control-plane,master 76m v1.26.3
node002 Ready worker 76m v1.26.3
node003 Ready,SchedulingDisabled worker 76m v1.26.3
node004 Ready,SchedulingDisabled worker 76m v1.26.3
Then we undrain:
root@rb-kube92-a:~# kubectl uncordon node003 node/node003 uncordoned root@rb-kube92-a:~# kubectl uncordon node004 node/node004 uncordoned
Then we confirm that scheduling is re-enabled:
[root@rb-kube92-a ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
rb-kube92-a Ready control-plane,master 76m v1.26.3
rb-kube92-b Ready control-plane,master 75m v1.26.3
node001 Ready control-plane,master 76m v1.26.3
node002 Ready worker 76m v1.26.3
node003 Ready worker 76m v1.26.3
node004 Ready worker 76m v1.26.3
9. Updating Addons
Issuing the following command updates the addons. The output for the command has been omitted to avoid cluttering this KB article, but backups of the original yaml are made to the following directory: /cm/local/apps/kubernetes/var/, this information is printed as part of the output.
cm-kubernetes-setup -v --update-addons
The update script will have backed up the old configuration inside NVIDIA Bright Cluster Manager as well:
[rb-kube92-a]% kubernetes
[rb-kube92-a->kubernetes[default]]% appgroups
[rb-kube92-a->kubernetes[default]->appgroups]% list
Name (key) Applications
-------------------------------- ------------------------------
system <13 in submode>
system-backup-2022-11-23-193952 <13 in submode>
10. Finalize the update.
Module files: In case the Head Nodes were not part of the Kubernetes cluster we just upgraded, chances are the old module file is still present. In that case a restart of the “cmd” process will force Base Command Manager to generate it. This can be done as follows on all Head Nodes:
[root@rb-kube92-a ~]# systemctl restart cmd
Kubernetes should be ready at this point, we can get rid of the old module file and make one final change to the configuration overlays.
[root@rb-kube92-a ~]# pdsh -A rm -rf /cm/local/modulefiles/kubernetes/default/1.24.9
Reboot nodes if needed: We do occasionally see certain Pods resulting in CrashLoopBackOff state after they have been upgraded. Typically if this happens it will be only on one or two specific nodes. This seems to happen randomly. The problem is usually (after inspecting the container logs for the failing Pods) that they cannot communicate with the Kubernetes API server (via the Kubernetes Service Network). This is almost always resolved by issuing a reboot of the node. We believe it can be a race-condition that happens with Calico Pods and/or NVIDIA Network Operator pods restarting. Resulting in existing Pods that no longer seem to have functioning networking. Restarting of Pods on such nodes doesn’t always fix the problem, but it might, we found that a restart of the node seems to do the trick. This might be an issue specific to Kubernetes 1.26 and the Calico version 3.20.6.
11. Rollback the update.
In order to go back to the previous version 1.24, we have to follow the reverse of steps 1-10.
Downgrade the addons
This is needed if Step 9 was executed only.
[root@rb-kube92-a ~]# cmsh
[rb-kube92-a]% kubernetes
[rb-kube92-a->kubernetes[default]]% appgroups
[rb-kube92-a->kubernetes[default]->appgroups]% list
Name (key) Applications
-------------------------------- ------------------------------
system <13 in submode>
system-backup-2022-11-24-112008 <13 in submode>
[rb-kube92-a->kubernetes[default]->appgroups]% set system enabled no
[rb-kube92-a->kubernetes*[default*]->appgroups*]% set system-backup-2022-11-24-112008 enabled yes
[rb-kube92-a->kubernetes*[default*]->appgroups*]% commit
This should keep Kubernetes busy for a minute, after it’s done restoring all the resources, do the following steps:
Downgrading the packages
We need to downgrade the newly installed cm-kubernetes126 and downgrade it everywhere to cm-kubernetes124 .
Meaning that on both Head Nodes and relevant software images the following command needs to be executed.
apt install -y cm-kubernetes126- cm-kubernetes124 # for ubuntu
yum swap -y cm-kubernetes126 cm-kubernetes124 # for RHEL
Image update relevant nodes
We need to image update the relevant nodes next, in order for all Kubernetes nodes to have the Kubernetes 1.24 binaries again. (e.g. imageupdate -c default -w in cmsh)
Restart services
Depending on the state of the cluster, you might, as we did with the upgrade, drain nodes first before restarting services, and undrain them again once the version has been confirmed to have been downgraded. In case Kubernetes is malfunctioning this can be also be skipped. We have tested this procedure without the draining successfully as well.
On all the nodes relevant to the Kube cluster, we need to execute the following reload + restarts. In our example setup, as follows. Please note that it includes a restart of Bright Cluster Manager.
[root@rb-kube92-a ~]# pdsh -w rb-kube92-a,rb-kube92-b,node00[1-4] "systemctl daemon-reload; systemctl restart cmd; systemctl restart '*kube*.service'"
We can cleanup the module file for version 1.26 to avoid it from popping up in tab-completion.
[root@rb-kube92-a ~]# pdsh -A rm -rf /cm/local/modulefiles/kubernetes/default/1.26.3
All versions should be back to 1.24.9:
[root@rb-kube92-a ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
rb-kube92-a Ready control-plane,master 22h v1.24.9
rb-kube92-b Ready control-plane,master 22h v1.24.9
node001 Ready control-plane,master 22h v1.24.9
node002 Ready worker 22h v1.24.9
node003 Ready worker 22h v1.24.9
node004 Ready worker 22h v1.24.9
It is very unlikely with this downgrade from 1.26 back to 1.24, however, should something get into an invalid, unrecoverable state, we can restore the Etcd database at this point with the snapshot created in Step 1. The instructions for this are explained in the same KB article referenced in Step 1.