How to Patch KubernetesCertsExpiration
health check in BCM <= 10.25.03
This is the regression that is present in BCM 10.25.03
(and possibly 10.24.11
).
The KubernetesCertsExpiration
health check script runs on Kubernetes control-plane nodes to keep track of expiring certificates.
root@bcmhead1:~# cmsh [bcmhead1]% device use node001 [bcmhead1->device[node001]]% latesthealthdata Measurable Parameter Type Value Age State --------------------------- -------------------------- -------------- ---------- ---------- -------- Etcd Etcd PASS 2m KubernetesCertsExpiration Kubernetes FAIL 2m JSONDecodeError: Expecting value: line 1 column 1 (char 0) KubernetesComponentsStatus Kubernetes PASS 2m KubernetesNodesStatus Kubernetes PASS 2m KubernetesPodsStatus Kubernetes PASS 2m
This bug was fixed in a later version, and can be patched on the cluster as follows.
1. Confirming BCM version
We can first confirm which version of BCM is installed with:
root@bcmhead1:~# cm-package-release-info -f cmdaemon
To check whether the Kubernetes master nodes are also running the same BCM version, we can query in cmsh
as follows.
root@bcmhead1:~# cmsh [bcmhead1]% device [bcmhead1->device]% cmdaemonversions
If the BCM version is newer than 10.25.03
everywhere, this means the patch script won’t be applicable.
2. Downloading the patch script
Next, we download the patch script.
root@bcmhead1:~# wget https://support2.brightcomputing.com/kb/cm-kube-healthchecks-manage root@bcmhead1:~# chmod +x cm-kube-healthchecks-manage
When we use this script, we need to use which Kubernetes cluster to patch, since BCM supports managing multiple Kubernetes clusters. The list of labels can be found in the kubernetes submode inside cmsh.
root@bcmhead1:~# cmsh -c 'kubernetes; list' Name (key) ------------------ default
In our case the cluster label we will use throughout the KB article is default
.
3. Check status with patch script
We execute the following command, example output included. Nothing is modified in this case.
root@bcmhead1:~# ./cm-kube-healthchecks-manage --kube-cluster=default status 2025-05-20 09:16:03,595 - cm-healthchecks-manage - INFO - ##### CLI invoked: ['./cm-kube-healthchecks-manage', '--kube-cluster=default', 'status'] ##### 2025-05-20 09:16:04,242 - cm-healthchecks-manage - INFO - Checking health checks status for cluster default 2025-05-20 09:16:04,243 - cm-healthchecks-manage - INFO - Overlay kube-default-worker has role kubelet for kube cluster default 2025-05-20 09:16:04,244 - cm-healthchecks-manage - INFO - Overlay kube-default-master has role kubelet for kube cluster default 2025-05-20 09:16:04,248 - cm-healthchecks-manage - INFO - Master nodes: ci-tmp-100-u2204-stout-refuge-200445, node001 2025-05-20 09:16:04,248 - cm-healthchecks-manage - DEBUG - Could not find software image for node: ci-tmp-100-u2204-stout-refuge-200445 2025-05-20 09:16:04,248 - cm-healthchecks-manage - INFO - Found 1 unique software images /cm/local/apps/python3/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ci-tmp-100-u2204-stout-refuge-200445: b'500e7ddcc3f70ecaa4ba68d7e6827481' warnings.warn( 2025-05-20 09:16:04,660 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration 2025-05-20 09:16:05,019 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py /cm/local/apps/python3/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for node001: b'c6659c4299b88cf0f60a2ea22b507f2c' warnings.warn( 2025-05-20 09:16:05,316 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration 2025-05-20 09:16:05,723 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py 2025-05-20 09:16:05,829 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/images/default-image/cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration 2025-05-20 09:16:05,843 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/images/default-image/cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py Health Check Scripts Status: ========================== +--------------------------------------+---------------------------+----------------------------------------------------------------+-------------------------------------------------+ | Node | Script | Node MD5 | Software Image MD5 | +======================================+===========================+================================================================+=================================================+ | ci-tmp-100-u2204-stout-refuge-200445 | kubernetescertsexpiration | 2ad91afe408396fbe41443c84bae3e65 (needs patch!) | - | +--------------------------------------+---------------------------+----------------------------------------------------------------+-------------------------------------------------+ | | kubernetescore.py | ac430d1886f2077846983bdb280a11fb (needs patch!) | - | +--------------------------------------+---------------------------+----------------------------------------------------------------+-------------------------------------------------+ | node001 | kubernetescertsexpiration | 286643f5a22fa9629332d1019ccf799d (unknown (use patch --force)) | 2ad91afe408396fbe41443c84bae3e65 (needs patch!) | +--------------------------------------+---------------------------+----------------------------------------------------------------+-------------------------------------------------+ | | kubernetescore.py | ac430d1886f2077846983bdb280a11fb (needs patch!) | ac430d1886f2077846983bdb280a11fb (needs patch!) | +--------------------------------------+---------------------------+----------------------------------------------------------------+-------------------------------------------------+
In the above slightly contrived example we can see most files are known to the script that they need the patch. Only one on node001
has been manually modified, so the MD5 hash is unknown. In those cases the script will not patch those files, unless --force
is specified in the next step.
4. Executing the patch
Instead of status we use patch
, or patch --force
. This next command will patch the two files on the master nodes for the specified Kubernetes cluster, and also the appropriate software images for those nodes. This patch will not trigger any restarts whatsoever, it only fixes the health check script that is periodically invoked by BCM (every 2 minutes by default)
root@bcmhead1:~# ./cm-kube-healthchecks-manage --kube-cluster=default patch --force 2025-05-20 09:16:31,298 - cm-healthchecks-manage - INFO - ##### CLI invoked: ['./cm-kube-healthchecks-manage', '--kube-cluster=default', 'patch', '--force'] ##### 2025-05-20 09:16:31,984 - cm-healthchecks-manage - INFO - Patching health check scripts for cluster default 2025-05-20 09:16:31,985 - cm-healthchecks-manage - INFO - Overlay kube-default-worker has role kubelet for kube cluster default 2025-05-20 09:16:31,985 - cm-healthchecks-manage - INFO - Overlay kube-default-master has role kubelet for kube cluster default 2025-05-20 09:16:31,988 - cm-healthchecks-manage - INFO - Master nodes: node001, ci-tmp-100-u2204-stout-refuge-200445 2025-05-20 09:16:31,988 - cm-healthchecks-manage - DEBUG - Could not find software image for node: ci-tmp-100-u2204-stout-refuge-200445 2025-05-20 09:16:31,988 - cm-healthchecks-manage - INFO - Found 1 unique software images /cm/local/apps/python3/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for node001: b'c6659c4299b88cf0f60a2ea22b507f2c' warnings.warn( 2025-05-20 09:16:32,222 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration 2025-05-20 09:16:32,676 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py 2025-05-20 09:16:32,782 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/images/default-image/cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration 2025-05-20 09:16:32,794 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/images/default-image/cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py /cm/local/apps/python3/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ci-tmp-100-u2204-stout-refuge-200445: b'500e7ddcc3f70ecaa4ba68d7e6827481' warnings.warn( 2025-05-20 09:16:32,906 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration 2025-05-20 09:16:33,262 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py {'ci-tmp-100-u2204-stout-refuge-200445': {'image': {'cert_script': {'md5': None, 'path': None}, 'core_script': {'md5': None, 'path': None}}, 'node': {'cert_script': {'md5': '2ad91afe408396fbe41443c84bae3e65', 'path': '/cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration'}, 'core_script': {'md5': 'ac430d1886f2077846983bdb280a11fb', 'path': '/cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py'}}}, 'node001': {'image': {'cert_script': {'md5': '2ad91afe408396fbe41443c84bae3e65', 'path': '/cm/images/default-image/cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration'}, 'core_script': {'md5': 'ac430d1886f2077846983bdb280a11fb', 'path': '/cm/images/default-image/cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py'}}, 'node': {'cert_script': {'md5': '286643f5a22fa9629332d1019ccf799d', 'path': '/cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration'}, 'core_script': {'md5': 'ac430d1886f2077846983bdb280a11fb', 'path': '/cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py'}}}} 2025-05-20 09:16:33,471 - cm-healthchecks-manage - INFO - Processing node: node001 2025-05-20 09:16:33,560 - cm-healthchecks-manage - INFO - Updating /cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration on node node001 (forced = True) 2025-05-20 09:16:33,859 - cm-healthchecks-manage - INFO - Updating /cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py on node node001 (forced = True) 2025-05-20 09:16:33,972 - cm-healthchecks-manage - INFO - Updating /cm/images/default-image/cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration (software image of node001, ..., force=True) 2025-05-20 09:16:33,973 - cm-healthchecks-manage - INFO - Updating /cm/images/default-image/cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py (software image of node001, ..., force=True) 2025-05-20 09:16:33,974 - cm-healthchecks-manage - INFO - Processing node: ci-tmp-100-u2204-stout-refuge-200445 2025-05-20 09:16:34,067 - cm-healthchecks-manage - INFO - Updating /cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration on node ci-tmp-100-u2204-stout-refuge-200445 (forced = True) 2025-05-20 09:16:34,594 - cm-healthchecks-manage - INFO - Updating /cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py on node ci-tmp-100-u2204-stout-refuge-200445 (forced = True) 2025-05-20 09:16:34,803 - cm-healthchecks-manage - INFO - Successfully updated the following files: 2025-05-20 09:16:34,803 - cm-healthchecks-manage - INFO - - Node node001: /cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration 2025-05-20 09:16:34,803 - cm-healthchecks-manage - INFO - - Node node001: /cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py 2025-05-20 09:16:34,803 - cm-healthchecks-manage - INFO - - Image /cm/images/default-image/cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration 2025-05-20 09:16:34,803 - cm-healthchecks-manage - INFO - - Image /cm/images/default-image/cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py 2025-05-20 09:16:34,803 - cm-healthchecks-manage - INFO - - Node ci-tmp-100-u2204-stout-refuge-200445: /cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration 2025-05-20 09:16:34,803 - cm-healthchecks-manage - INFO - - Node ci-tmp-100-u2204-stout-refuge-200445: /cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py
5. Verify that the patch succeeded.
We can now re-run the status
command from before, to see if the patch has done its job.
root@bcmhead1:~# ./cm-kube-healthchecks-manage --kube-cluster=default status 2025-05-20 09:16:40,253 - cm-healthchecks-manage - INFO - ##### CLI invoked: ['./cm-kube-healthchecks-manage', '--kube-cluster=default', 'status'] ##### 2025-05-20 09:16:40,935 - cm-healthchecks-manage - INFO - Checking health checks status for cluster default 2025-05-20 09:16:40,936 - cm-healthchecks-manage - INFO - Overlay kube-default-worker has role kubelet for kube cluster default 2025-05-20 09:16:40,937 - cm-healthchecks-manage - INFO - Overlay kube-default-master has role kubelet for kube cluster default 2025-05-20 09:16:40,940 - cm-healthchecks-manage - INFO - Master nodes: ci-tmp-100-u2204-stout-refuge-200445, node001 2025-05-20 09:16:40,941 - cm-healthchecks-manage - DEBUG - Could not find software image for node: ci-tmp-100-u2204-stout-refuge-200445 2025-05-20 09:16:40,941 - cm-healthchecks-manage - INFO - Found 1 unique software images /cm/local/apps/python3/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ci-tmp-100-u2204-stout-refuge-200445: b'500e7ddcc3f70ecaa4ba68d7e6827481' warnings.warn( 2025-05-20 09:16:41,135 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration 2025-05-20 09:16:41,476 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py /cm/local/apps/python3/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for node001: b'c6659c4299b88cf0f60a2ea22b507f2c' warnings.warn( 2025-05-20 09:16:41,772 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration 2025-05-20 09:16:42,036 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py 2025-05-20 09:16:42,141 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/images/default-image/cm/local/apps/cmd/scripts/healthchecks/kubernetescertsexpiration 2025-05-20 09:16:42,155 - cm-healthchecks-manage - DEBUG - Executing: md5sum /cm/images/default-image/cm/local/apps/cmd/scripts/healthchecks/kubernetescore.py Health Check Scripts Status: ========================== +--------------------------------------+---------------------------+---------------------------------------------------------+---------------------------------------------------------+ | Node | Script | Node MD5 | Software Image MD5 | +======================================+===========================+=========================================================+=========================================================+ | ci-tmp-100-u2204-stout-refuge-200445 | kubernetescertsexpiration | 96b77ffcf2ca08d6a07fd38c79d4e7ef (ok (already patched)) | - | +--------------------------------------+---------------------------+---------------------------------------------------------+---------------------------------------------------------+ | | kubernetescore.py | 94490aa4b3e76d3542f12b1253a7f410 (ok (already patched)) | - | +--------------------------------------+---------------------------+---------------------------------------------------------+---------------------------------------------------------+ | node001 | kubernetescertsexpiration | 96b77ffcf2ca08d6a07fd38c79d4e7ef (ok (already patched)) | 96b77ffcf2ca08d6a07fd38c79d4e7ef (ok (already patched)) | +--------------------------------------+---------------------------+---------------------------------------------------------+---------------------------------------------------------+ | | kubernetescore.py | 94490aa4b3e76d3542f12b1253a7f410 (ok (already patched)) | 94490aa4b3e76d3542f12b1253a7f410 (ok (already patched)) | +--------------------------------------+---------------------------+---------------------------------------------------------+---------------------------------------------------------+
The output shows that in our case the two control plane nodes have the correctly patched files, and the script is also patched in the software image.
6. Verify that the health check is now working again
We can use cmsh
for this again, and in order to force a recheck, besides latesthealthdata
we can also run samplenow
or more specifically samplenow --checks
.
root@bcmhead1:~# cmsh [bcmhead1]% device use node001 [bcmhead1->device[node001]]% latesthealthdata Measurable Parameter Type Value Age State --------------------------- -------------------------- -------------- ---------- ---------- -------- Etcd Etcd PASS 2m KubernetesCertsExpiration Kubernetes PASS 0.759s KubernetesComponentsStatus Kubernetes PASS 0.758s KubernetesNodesStatus Kubernetes PASS 2m KubernetesPodsStatus Kubernetes PASS 2m