In BCM 9.2
(and also earlier versions) we depended on the Pod Security Policies (PSP) feature of Kubernetes in order to restrict user privileges. PSP provided a way to block host system mounts from arbitrary directories, running privileged containers, and so on. However in Kubernetes 1.21
PSP was deprecated, and in version 1.25
support was completely dropped. With Kubernetes 1.27
already being EOL at the time of writing this KB article (in September 2024) we need to migrate PSP to something else, before we can upgrade the Kubernetes version.
In BCM 9.2
(and newer versions) we migrated from PSP to Kyverno in order to achieve similar security improvements. This KB article is there to migrate first from PSP to Kyverno in an existing Kubernetes cluster running on BCM 9.2
, and when this is succesful, upgrading to a newer Kubernetes version is made possible.
Prerequisites
- This KB article will assume a cluster that has been set up with a Kubernetes version <=
1.24
. - This KB article also assumes the PSP feature is enabled at some point (using
cm-kubernetes-setup --psp
) - The BCM cluster is running BCM
9.2-16
or later. - Backup Etcd (see https://kb.brightcomputing.com/knowledge-base/etcd-backup-and-restore-with-bright-9-0/)
- Optionally: Schedule inaccessibility to the Kubernetes cluster, since during the transition of disabling PSP, and installing Kyverno, there is a window where policies are not being enforced.
1. Update to latest Permissions Manager package
Please make sure the cluster is updated and is running BCM version 9.2-16
. Especially the following packages are important:
cmdaemon
cm-setup
cm-kubernetes-permissions-manager
After these packages are up to date, the Permissions Manager Helm chart needs to be updated as well. This can be done as follows (example for Ubuntu)
apt update && apt install cm-kubernetes-permissions-manager module load kubernetes helm upgrade -n cm permissions-manager /cm/shared/apps/kubernetes-permissions-manager/current/helm/cm-kubernetes-permissions-manager-*.tgz
The latest version of the Permissions Manager supports Both Pod Security Policies and Kyverno, which is important in order to do the migration properly.
2. Verify if PSP are enabled.
We have to check the Kubernetes API server, whether it has been configured by going into cmsh
, and open the correct configuration overlay, and list the admission control parameter for the apiserver role.
root@rb-kube92v124x:~# cmsh [rb-kube92v124x]% configurationoverlay [rb-kube92v124x->configurationoverlay]% use kube-default-master [rb-kube92v124x->configurationoverlay[kube-default-master]]% roles [rb-kube92v124x->configurationoverlay[kube-default-master]->roles]% use kubernetes::apiserver [rb-kube92v124x->configurationoverlay[kube-default-master]->roles[Kubernetes::ApiServer]]% get admissioncontrol NamespaceLifecycle LimitRanger ServiceAccount DefaultStorageClass DefaultTolerationSeconds MutatingAdmissionWebhook ValidatingAdmissionWebhook ResourceQuota PodSecurityPolicy
If PodSecurityPolicy
is not listed above, then it means Pod Security Policies are not being enforced. It is still possible that there is left-over PSP configuration. This can be found by using kubectl as follows.
root@rb-kube92v124x:~# kubectl get psp Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES cmsupport-restricted false RunAsAny MustRunAs MustRunAs MustRunAs false secret,hostPath,configMap,emptyDir,persistentVolumeClaim local-path-provisioner-psp true RunAsAny RunAsAny RunAsAny RunAsAny false secret,hostPath,configMap,persistentVolumeClaim privileged true * RunAsAny RunAsAny RunAsAny RunAsAny false *
3. Validate PSP is working (optional)
Like in the previous example the user cmsupport
has restricted privileges. Another way to list the users is to look at either the Helm charts for each user in the cm-permissions
namespace, or the CmKubernetesPermissionUser
resource as follows.
root@rb-kube92v124x:~# helm list -n cm-permissions NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION cmsupport-c0xld1q cm-permissions 1 2024-09-24 15:20:53.760805571 +0000 UTC deployed cm-kubernetes-permission-user-0.0.1 0.0.1 ray-l9x71kv cm-permissions 1 2024-09-25 08:44:46.087990729 +0000 UTC deployed cm-kubernetes-permission-user-0.0.1 0.0.1 root@rb-kube92v124x:~# kubectl get cmkubernetespermissionuser -n cm-permissions NAME AGE cmsupport-c0xld1q 17h ray-l9x71kv 12m
Optionally, inspect the manifest for one of the users, it should not list anything Kyverno related yet.
helm get manifest -n cm-permissions ray-l9x71kv
Optionally, verify if PSP is working by creating the following file as a regular user with Kubernetes access.
cat << EOF > /tmp/pod.yaml apiVersion: v1 kind: Pod metadata: name: host-mount-pod spec: containers: - name: test-container image: busybox command: ["/bin/sh"] args: ["-c", "ls -al /test ; sleep infinity"] volumeMounts: - name: host-etc mountPath: /test volumes: - name: host-etc hostPath: path: /etc type: Directory EOF
Then, as a regular user, create it in their restricted namespace, and expect an error as follows.
ray@rb-kube92v124x:~$ kubectl create -f /tmp/pod.yaml -n ray-restricted Error from server (Forbidden): error when creating "/tmp/pod.yaml": pods "host-mount-pod" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.volumes[0].hostPath.pathPrefix: Invalid value: "/etc": is not allowed to be used]
4. Backup all users
On the Active Headnode, execute the following to generate an export of all the Permission Manager data:
cm-kubernetes-setup --backup-permissions backup_users.yaml
If needed we can restore this at a later point as follows.
cm-kubernetes-setup --restore-permissions backup_users.yaml
5. Disable PSP on the cluster
Please note that this will result in the cluster not enforcing the restrictions that PSP provides from this point on.
cm-kubernetes-setup --disable-psp
6. Install Kyverno on the cluster
First create a kyverno-values.yaml
file on the Active Head Node, containing the following contents.
admissionController: replicas: 1 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists - effect: NoSchedule key: node-role.kubernetes.io/control-plane operator: Exists backgroundController: replicas: 1 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists - effect: NoSchedule key: node-role.kubernetes.io/control-plane operator: Exists cleanupController: replicas: 1 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists - effect: NoSchedule key: node-role.kubernetes.io/control-plane operator: Exists cleanupJobs: admissionReports: tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists - effect: NoSchedule key: node-role.kubernetes.io/control-plane operator: Exists clusterAdmissionReports: tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists - effect: NoSchedule key: node-role.kubernetes.io/control-plane operator: Exists policyReportsCleanup: tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists - effect: NoSchedule key: node-role.kubernetes.io/control-plane operator: Exists reportsController: replicas: 1 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists - effect: NoSchedule key: node-role.kubernetes.io/control-plane operator: Exists webhooksCleanup: tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists - effect: NoSchedule key: node-role.kubernetes.io/control-plane operator: Exists
Please note that the leading spaces are important to preserve. Then install the Helm chart as follows.
module load kubernetes helm repo add kyverno https://kyverno.github.io/kyverno/ helm install --wait --create-namespace --version 3.0.9 -f kyverno-values.yaml --timeout 10m0s --namespace kyverno kyverno kyverno/kyverno
The above version 3.0.9
was chosen since it is the last version of Kyverno that supports Kubernetes 1.24
. Kyverno can be upgraded after the Kubernetes versions have been upgraded.
7. Prepare Kyverno Policies values file
Next create a kyverno-policies-values.yaml
file on the Active Head Node, containing the following contents.
policyExclude: disallow-capabilities: any: - resources: kinds: - Pod namespaces: - default disallow-host-namespaces: any: - resources: kinds: - Pod namespaces: - default - prometheus disallow-host-path: any: - resources: kinds: - Pod namespaces: - default - '*-restricted' - prometheus - kube-system - cm - local-path-storage disallow-host-ports: any: - resources: kinds: - Pod namespaces: - default - prometheus disallow-privileged-containers: any: - resources: kinds: - Pod namespaces: - default - kube-system - cm - local-path-storage validationFailureAction: Enforce
This above values file is very minimal, and is configuring exceptions to several Kyverno policies. Each policy under policyExclude
is being set to not apply to certain resources in specific namespaces.
It may be necessary to add certain namespaces that are appropriate for your cluster to the above configuration. If for example the NVIDIA GPU Operator is also installed, that typically means we have to add the gpu-operator
to three exceptions (disallow-host-path
, disallow-host-namespaces
, disallow-privileged-containers
). For other components, such as prometheus, runai, local storage, NVIDIA Network Operator, Ceph, MetalLB, and other components perhaps specific to your organization, similar requirements can exist.
First we will explain and list the policies and their exceptions as they are in the above values yaml.
- disallow-capabilities: Restricts Linux capabilities for containers
- Excludes: Pods in the
default
namespace
- Excludes: Pods in the
- disallow-host-namespaces: Prevents pods from using host namespaces
- Excludes: Pods in the
default
andprometheus
namespaces
- Excludes: Pods in the
- disallow-host-path: Blocks pods from mounting host paths
- Excludes: Pods in
default
, any namespace ending with-restricted
,prometheus
,kube-system
,cm
, andlocal-path-storage
namespaces
- Excludes: Pods in
- disallow-host-ports: Prevents pods from binding to host ports
- Excludes: Pods in the
default
andprometheus
namespaces
- Excludes: Pods in the
- disallow-privileged-containers: Blocks running privileged containers
- Excludes: Pods in
default
,kube-system
,cm
, andlocal-path-storage
namespaces
- Excludes: Pods in
Now that hopefully the values file makes some sense, we will show a more elaborate policies file that includes all the operators, addons, and other optional components that can be installed by the BCM Kubernetes wizard (some of these may be from newer BCM versions, but still make a good example). This version of the file listed below hopefully makes it easy–at least for those components–to decide if they are relevant for your cluster, and if they thus should be removed or kept in the final kyverno-policies-values.yaml
file.
policyExclude: disallow-capabilities: any: - resources: kinds: - Pod namespaces: - default - ceph-csi-rbd # CEPH - metallb-system # MetalLB (Load Balancer) - ovn-kubernetes # Kubernetes OVN CNI disallow-host-namespaces: any: - resources: kinds: - Pod namespaces: - default - prometheus # Kube Prometheus Stack - gpu-operator # NVIDIA GPU Operator - network-operator # NVIDIA Network Operator - runai # Run:AI SaaS - ceph-csi-rbd # CEPH - metallb-system # MetalLB (Load Balancer) - ovn-kubernetes # Kubernetes OVN CNI disallow-host-path: any: - resources: kinds: - Pod namespaces: - default - '*-restricted' - prometheus # Kube Prometheus Stack - kube-system - cm - local-path-storage - gpu-operator # NVIDIA GPU Operator - network-operator # NVIDIA Network Operator - runai # Run:AI SaaS - ceph-csi-rbd # CEPH - ovn-kubernetes # Kubernetes OVN CNI disallow-host-ports: any: - resources: kinds: - Pod namespaces: - default - prometheus # Kube Prometheus Stack - ceph-csi-rbd # CEPH - metallb-system # MetalLB (Load Balancer) - ovn-kubernetes # Kubernetes OVN CNI disallow-privileged-containers: any: - resources: kinds: - Pod namespaces: - default - kube-system - cm - local-path-storage - gpu-operator # NVIDIA GPU Operator - network-operator # NVIDIA Network Operator - runai # Run:AI SaaS - ceph-csi-rbd # CEPH - metallb-system # MetalLB (Load Balancer) - ovn-kubernetes # Kubernetes OVN CNI validationFailureAction: Enforce
It is possible that later additional exceptions are discovered, in that case the values file can be updated and the helm chart updated with the new values file. We will explain this later in Step 9.
8. Install the Kyverno Policies Helm Chart
Given that we’ve completed the previous Step (7) to the best of our knowledge, we will now execute the helm install as follows.
helm install --wait --version 3.0.7 -f kyverno-policies-values.yaml --timeout 10m0s --namespace kyverno kyverno-policies kyverno/kyverno-policies
The above version 3.0.7
was chosen since it is the last version of Kyverno that supports Kubernetes 1.24
. Kyverno can be upgraded after the Kubernetes versions have been upgraded.
Validation
Now if both Helm charts installed succesfully, we can again validate whether or not the Policies are working. We can repeat the optional Step 3 at this point.
The inspection of the user helm charts should this time result in Kyverno output as well, like below.
root@rb-kube92v124x:~# helm get manifest -n cm-permissions ray-l9x71kv | grep kyverno # Source: cm-kubernetes-permission-user/templates/05-kyverno-drop-privs-hostpath.yaml apiVersion: kyverno.io/v1 # Source: cm-kubernetes-permission-user/templates/05-kyverno-drop-privs-hostpath.yaml apiVersion: kyverno.io/v1 # Source: cm-kubernetes-permission-user/templates/05-kyverno-drop-privs-hostpath.yaml apiVersion: kyverno.io/v1 # Source: cm-kubernetes-permission-user/templates/05-kyverno-drop-privs-hostpath.yaml apiVersion: kyverno.io/v1 # Source: cm-kubernetes-permission-user/templates/06-kyverno-hostpath-restriction.yaml apiVersion: kyverno.io/v1
And the enforcing should still happen, the end result should be the same, but the textual output presented by the webhook is slightly different for Kyverno.
ray@rb-kube92v124x:~$ kubectl create -f /tmp/pod.yaml -n ray-restricted Error from server: error when creating "/tmp/pod.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: resource Pod/ray-restricted/host-mount-pod was blocked due to the following policies ray-l9x71kv-lhv: ray-l9x71kv-lhv: 'validation failure: hostPath volumes are confined to [/home/ray].'
9. Make changes to the Kyverno Policies
When policies are blocking the creation of Pods, or causing other failures. Sometimes these can be found through Kubernetes API resources themselves, or through Kubernetes Events. Usually that means a new exception is discovered. For example, if in Step 7 we did not prepare the NVIDIA GPU operator’s namespace in the excludes, and we try to install the GPU operator, now that we have Kyverno we get the following error message:
root@rb-kube92v124x:~# helm install --wait -n gpu-operator --create-namespace \ --version v1.10.1 \ --set driver.enabled=false \ --set operator.defaultRuntime=containerd \ --set toolkit.enabled=true \ --set toolkit.env[0].name=CONTAINERD_CONFIG \ --set toolkit.env[0].value=/cm/local/apps/containerd/var/etc/conf.d/nvidia-cri.toml \ gpu-operator nvidia/gpu-operator Error: INSTALLATION FAILED: admission webhook "validate.kyverno.svc-fail" denied the request: resource DaemonSet/gpu-operator/gpu-operator-node-feature-discovery-worker was blocked due to the following policies disallow-host-path: autogen-host-path: 'validation error: HostPath volumes are forbidden. The field spec.volumes[*].hostPath must be unset. rule autogen-host-path failed at path /spec/template/spec/volumes/0/hostPath/'
In some cases, errors may only show up after the helm chart installed successfully. This is usually the case if there is a kind of feature discovery, which in turn decides to schedule Pods on certain nodes or not. In that case the Helm chart might succeed, but failures appear in Kubernetes events.
Now that we found out that we need additional exceptions for our NVIDIA GPU Operator example. We can extract the kyverno policies values file from Helm, if we don’t already have it, using:
helm get values -n kyverno kyverno-policies | sed '/USER-SUPPLIED VALUES:/d' > kyverno-policies-values.yaml
At this point we can open the kyverno-policies-values.yaml
file and add the gpu-operator
namespace to the appropriate exceptions. In this case the changes can be taken from Step 7 where we list the “full” values file including the NVIDIA GPU Operator.
Now we need to “upgrade” the chart (we’ll only be updating the values file configuration). However, before we execute the helm upgrade
command, we need to know which version we are dealing with, for example we are dealing with version 3.0.7
in this case.
root@rb-kube92v124x:~# helm list -n kyverno NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION kyverno kyverno 1 2024-09-25 11:45:34.987572389 +0200 CEST deployed kyverno-3.0.9 v1.10.7 kyverno-policies kyverno 1 2024-09-25 11:48:45.704720815 +0200 CEST deployed kyverno-policies-3.0.7 v1.10.6
The version of the kyverno-policies chart has to match the version in the below helm upgrade
command, please change the value therefore if needed, otherwise we’ll do an upgrade of the Kyverno version at the same time, which may not be what we want to do at this point.
helm upgrade --wait --version 3.0.7 -f kyverno-policies-values.yaml --timeout 10m0s --namespace kyverno kyverno-policies kyverno/kyverno-policies
Now a redeploy of the NVIDIA GPU Operator should succeed, and the Helm chart is no longer blocked by Kyverno.
10. Final notes
Now that Pod Security Policies are no longer being used, and Kyverno is up and running. It is possible to continue with the Upgrade procedure of Kubernetes if that was the motivation for this migration. After that, it is also possible to upgrade Kyverno to a newer version. The version used in this KB article is rather old, since it’s the last version that still supports Kubernetes 1.24
. Upgrading Kyverno can be done as follows.
helm get values -n kyverno kyverno | sed '/USER-SUPPLIED VALUES:/d' > kyverno-values.yaml
helm get values -n kyverno kyverno-policies | sed '/USER-SUPPLIED VALUES:/d' > kyverno-policies-values.yaml
helm upgrade --wait --version 3.2.6 -f kyverno-values.yaml --timeout 10m0s --namespace kyverno kyverno kyverno/kyverno
helm upgrade --wait --version 3.2.5 -f kyverno-policies-values.yaml --timeout 10m0s --namespace kyverno kyverno-policies kyverno/kyverno-policies
Upgrading from Kubernetes 1.24
to 1.26
is described in the following KB article: https://kb.brightcomputing.com/knowledge-base/upgrading-kubernetes-version-1-24-to-1-26-on-a-bright-9-2-cluster/.