1. Home
  2. BCM 9.2 Kubernetes Permission Manager migration from Pod Security Policies to Kyverno Policies

BCM 9.2 Kubernetes Permission Manager migration from Pod Security Policies to Kyverno Policies

In BCM 9.2 (and also earlier versions) we depended on the Pod Security Policies (PSP) feature of Kubernetes in order to restrict user privileges. PSP provided a way to block host system mounts from arbitrary directories, running privileged containers, and so on. However in Kubernetes 1.21 PSP was deprecated, and in version 1.25 support was completely dropped. With Kubernetes 1.27 already being EOL at the time of writing this KB article (in September 2024) we need to migrate PSP to something else, before we can upgrade the Kubernetes version.

In BCM 9.2 (and newer versions) we migrated from PSP to Kyverno in order to achieve similar security improvements. This KB article is there to migrate first from PSP to Kyverno in an existing Kubernetes cluster running on BCM 9.2, and when this is succesful, upgrading to a newer Kubernetes version is made possible. 

Prerequisites
  • This KB article will assume a cluster that has been set up with a Kubernetes version <= 1.24.
  • This KB article also assumes the PSP feature is enabled at some point (using cm-kubernetes-setup --psp)
  • The BCM cluster is running BCM 9.2-16 or later.
  • Backup Etcd (see https://kb.brightcomputing.com/knowledge-base/etcd-backup-and-restore-with-bright-9-0/)
  • Optionally: Schedule inaccessibility to the Kubernetes cluster, since during the transition of disabling PSP, and installing Kyverno, there is a window where policies are not being enforced.
1. Update to latest Permissions Manager package

Please make sure the cluster is updated and is running BCM version 9.2-16. Especially the following packages are important:

  • cmdaemon
  • cm-setup
  • cm-kubernetes-permissions-manager

After these packages are up to date, the Permissions Manager Helm chart needs to be updated as well. This can be done as follows (example for Ubuntu)

apt update && apt install cm-kubernetes-permissions-manager 
module load kubernetes
helm upgrade -n cm permissions-manager /cm/shared/apps/kubernetes-permissions-manager/current/helm/cm-kubernetes-permissions-manager-*.tgz

The latest version of the Permissions Manager supports Both Pod Security Policies and Kyverno, which is important in order to do the migration properly.

2. Verify if PSP are enabled.

We have to check the Kubernetes API server, whether it has been configured by going into cmsh, and open the correct configuration overlay, and list the admission control parameter for the apiserver role.

root@rb-kube92v124x:~# cmsh
[rb-kube92v124x]% configurationoverlay 
[rb-kube92v124x->configurationoverlay]% use kube-default-master 
[rb-kube92v124x->configurationoverlay[kube-default-master]]% roles
[rb-kube92v124x->configurationoverlay[kube-default-master]->roles]% use kubernetes::apiserver
[rb-kube92v124x->configurationoverlay[kube-default-master]->roles[Kubernetes::ApiServer]]% get admissioncontrol 
NamespaceLifecycle
LimitRanger
ServiceAccount
DefaultStorageClass
DefaultTolerationSeconds
MutatingAdmissionWebhook
ValidatingAdmissionWebhook
ResourceQuota
PodSecurityPolicy

If PodSecurityPolicy is not listed above, then it means Pod Security Policies are not being enforced. It is still possible that there is left-over PSP configuration. This can be found by using kubectl as follows.

root@rb-kube92v124x:~# kubectl get psp
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME                         PRIV    CAPS   SELINUX    RUNASUSER   FSGROUP     SUPGROUP    READONLYROOTFS   VOLUMES
cmsupport-restricted         false          RunAsAny   MustRunAs   MustRunAs   MustRunAs   false            secret,hostPath,configMap,emptyDir,persistentVolumeClaim
local-path-provisioner-psp   true           RunAsAny   RunAsAny    RunAsAny    RunAsAny    false            secret,hostPath,configMap,persistentVolumeClaim
privileged                   true    *      RunAsAny   RunAsAny    RunAsAny    RunAsAny    false            *
3. Validate PSP is working (optional)

Like in the previous example the user cmsupport has restricted privileges. Another way to list the users is to look at either the Helm charts for each user in the cm-permissions namespace, or the CmKubernetesPermissionUser resource as follows.

root@rb-kube92v124x:~# helm list -n cm-permissions
NAME              NAMESPACE      REVISION UPDATED                                 STATUS   CHART                               APP VERSION
cmsupport-c0xld1q cm-permissions 1        2024-09-24 15:20:53.760805571 +0000 UTC deployed cm-kubernetes-permission-user-0.0.1 0.0.1      
ray-l9x71kv       cm-permissions 1        2024-09-25 08:44:46.087990729 +0000 UTC deployed cm-kubernetes-permission-user-0.0.1 0.0.1      

root@rb-kube92v124x:~# kubectl get cmkubernetespermissionuser -n cm-permissions
NAME                AGE
cmsupport-c0xld1q   17h
ray-l9x71kv         12m

Optionally, inspect the manifest for one of the users, it should not list anything Kyverno related yet.

helm get manifest -n cm-permissions ray-l9x71kv

Optionally, verify if PSP is working by creating the following file as a regular user with Kubernetes access.

cat << EOF > /tmp/pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: host-mount-pod
spec:
  containers:
  - name: test-container
    image: busybox
    command: ["/bin/sh"]
    args: ["-c", "ls -al /test ; sleep infinity"]
    volumeMounts:
    - name: host-etc
      mountPath: /test
  volumes:
  - name: host-etc
    hostPath:
      path: /etc
      type: Directory
EOF

Then, as a regular user, create it in their restricted namespace, and expect an error as follows.

ray@rb-kube92v124x:~$ kubectl create -f /tmp/pod.yaml -n ray-restricted
Error from server (Forbidden): error when creating "/tmp/pod.yaml": pods "host-mount-pod" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.volumes[0].hostPath.pathPrefix: Invalid value: "/etc": is not allowed to be used]
4. Backup all users

On the Active Headnode, execute the following to generate an export of all the Permission Manager data:

cm-kubernetes-setup --backup-permissions backup_users.yaml

If needed we can restore this at a later point as follows.

cm-kubernetes-setup --restore-permissions backup_users.yaml

5. Disable PSP on the cluster

Please note that this will result in the cluster not enforcing the restrictions that PSP provides from this point on.

cm-kubernetes-setup --disable-psp

6. Install Kyverno on the cluster

First create a kyverno-values.yaml file on the Active Head Node, containing the following contents.

admissionController:
  replicas: 1
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
backgroundController:
  replicas: 1
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
cleanupController:
  replicas: 1
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
cleanupJobs:
  admissionReports:
    tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/master
      operator: Exists
    - effect: NoSchedule
      key: node-role.kubernetes.io/control-plane
      operator: Exists
  clusterAdmissionReports:
    tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/master
      operator: Exists
    - effect: NoSchedule
      key: node-role.kubernetes.io/control-plane
      operator: Exists
policyReportsCleanup:
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
reportsController:
  replicas: 1
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
webhooksCleanup:
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists

Please note that the leading spaces are important to preserve. Then install the Helm chart as follows.

module load kubernetes
helm repo add kyverno https://kyverno.github.io/kyverno/
helm install --wait --create-namespace --version 3.0.9 -f kyverno-values.yaml --timeout 10m0s --namespace kyverno kyverno kyverno/kyverno

The above version 3.0.9 was chosen since it is the last version of Kyverno that supports Kubernetes 1.24. Kyverno can be upgraded after the Kubernetes versions have been upgraded.

7. Prepare Kyverno Policies values file

Next create a kyverno-policies-values.yaml file on the Active Head Node, containing the following contents.

policyExclude:
  disallow-capabilities:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
  disallow-host-namespaces:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
        - prometheus
  disallow-host-path:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
        - '*-restricted'
        - prometheus
        - kube-system
        - cm
        - local-path-storage
  disallow-host-ports:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
        - prometheus
  disallow-privileged-containers:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
        - kube-system
        - cm
        - local-path-storage
validationFailureAction: Enforce

This above values file is very minimal, and is configuring exceptions to several Kyverno policies. Each policy under policyExclude is being set to not apply to certain resources in specific namespaces.

It may be necessary to add certain namespaces that are appropriate for your cluster to the above configuration. If for example the NVIDIA GPU Operator is also installed, that typically means we have to add the gpu-operator to three exceptions (disallow-host-path, disallow-host-namespaces, disallow-privileged-containers). For other components, such as prometheus, runai, local storage, NVIDIA Network Operator, Ceph, MetalLB, and other components perhaps specific to your organization, similar requirements can exist.

First we will explain and list the policies and their exceptions as they are in the above values yaml.

  • disallow-capabilities: Restricts Linux capabilities for containers
    • Excludes: Pods in the default namespace
  • disallow-host-namespaces: Prevents pods from using host namespaces
    • Excludes: Pods in the default and prometheus namespaces
  • disallow-host-path: Blocks pods from mounting host paths
    • Excludes: Pods in default, any namespace ending with -restricted, prometheus, kube-system, cm, and local-path-storage namespaces
  • disallow-host-ports: Prevents pods from binding to host ports
    • Excludes: Pods in the default and prometheus namespaces
  • disallow-privileged-containers: Blocks running privileged containers
    • Excludes: Pods in default, kube-system, cm, and local-path-storage namespaces

Now that hopefully the values file makes some sense, we will show a more elaborate policies file that includes all the operators, addons, and other optional components that can be installed by the BCM Kubernetes wizard (some of these may be from newer BCM versions, but still make a good example). This version of the file listed below hopefully makes it easy–at least for those components–to decide if they are relevant for your cluster, and if they thus should be removed or kept in the final kyverno-policies-values.yaml file.

policyExclude:
  disallow-capabilities:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
        - ceph-csi-rbd   # CEPH
        - metallb-system # MetalLB (Load Balancer)
        - ovn-kubernetes # Kubernetes OVN CNI
  disallow-host-namespaces:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
        - prometheus       # Kube Prometheus Stack
        - gpu-operator     # NVIDIA GPU Operator
        - network-operator # NVIDIA Network Operator
        - runai            # Run:AI SaaS
        - ceph-csi-rbd     # CEPH
        - metallb-system   # MetalLB (Load Balancer)
        - ovn-kubernetes   # Kubernetes OVN CNI
  disallow-host-path:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
        - '*-restricted'
        - prometheus       # Kube Prometheus Stack
        - kube-system
        - cm
        - local-path-storage
        - gpu-operator     # NVIDIA GPU Operator
        - network-operator # NVIDIA Network Operator
        - runai            # Run:AI SaaS
        - ceph-csi-rbd     # CEPH
        - ovn-kubernetes   # Kubernetes OVN CNI
  disallow-host-ports:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
        - prometheus     # Kube Prometheus Stack
        - ceph-csi-rbd   # CEPH
        - metallb-system # MetalLB (Load Balancer)
        - ovn-kubernetes # Kubernetes OVN CNI
  disallow-privileged-containers:
    any:
    - resources:
        kinds:
        - Pod
        namespaces:
        - default
        - kube-system
        - cm
        - local-path-storage
        - gpu-operator     # NVIDIA GPU Operator
        - network-operator # NVIDIA Network Operator
        - runai            # Run:AI SaaS
        - ceph-csi-rbd     # CEPH
        - metallb-system   # MetalLB (Load Balancer)
        - ovn-kubernetes   # Kubernetes OVN CNI
validationFailureAction: Enforce

It is possible that later additional exceptions are discovered, in that case the values file can be updated and the helm chart updated with the new values file. We will explain this later in Step 9. 

8. Install the Kyverno Policies Helm Chart

Given that we’ve completed the previous Step (7) to the best of our knowledge, we will now execute the helm install as follows.

helm install --wait --version 3.0.7 -f kyverno-policies-values.yaml --timeout 10m0s --namespace kyverno kyverno-policies kyverno/kyverno-policies

The above version 3.0.7 was chosen since it is the last version of Kyverno that supports Kubernetes 1.24. Kyverno can be upgraded after the Kubernetes versions have been upgraded.

Validation

Now if both Helm charts installed succesfully, we can again validate whether or not the Policies are working. We can repeat the optional Step 3 at this point.

The inspection of the user helm charts should this time result in Kyverno output as well, like below.

root@rb-kube92v124x:~# helm get manifest -n cm-permissions ray-l9x71kv | grep kyverno
# Source: cm-kubernetes-permission-user/templates/05-kyverno-drop-privs-hostpath.yaml
apiVersion: kyverno.io/v1
# Source: cm-kubernetes-permission-user/templates/05-kyverno-drop-privs-hostpath.yaml
apiVersion: kyverno.io/v1
# Source: cm-kubernetes-permission-user/templates/05-kyverno-drop-privs-hostpath.yaml
apiVersion: kyverno.io/v1
# Source: cm-kubernetes-permission-user/templates/05-kyverno-drop-privs-hostpath.yaml
apiVersion: kyverno.io/v1
# Source: cm-kubernetes-permission-user/templates/06-kyverno-hostpath-restriction.yaml
apiVersion: kyverno.io/v1

And the enforcing should still happen, the end result should be the same, but the textual output presented by the webhook is slightly different for Kyverno.

ray@rb-kube92v124x:~$ kubectl create -f /tmp/pod.yaml -n ray-restricted
Error from server: error when creating "/tmp/pod.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: 

resource Pod/ray-restricted/host-mount-pod was blocked due to the following policies 

ray-l9x71kv-lhv:
  ray-l9x71kv-lhv: 'validation failure: hostPath volumes are confined to [/home/ray].'
9. Make changes to the Kyverno Policies

When policies are blocking the creation of Pods, or causing other failures. Sometimes these can be found through Kubernetes API resources themselves, or through Kubernetes Events. Usually that means a new exception is discovered. For example, if in Step 7 we did not prepare the NVIDIA GPU operator’s namespace in the excludes, and we try to install the GPU operator, now that we have Kyverno we get the following error message:

root@rb-kube92v124x:~# helm install --wait -n gpu-operator --create-namespace \
             --version v1.10.1 \
             --set driver.enabled=false \
             --set operator.defaultRuntime=containerd \
             --set toolkit.enabled=true \
             --set toolkit.env[0].name=CONTAINERD_CONFIG \
             --set toolkit.env[0].value=/cm/local/apps/containerd/var/etc/conf.d/nvidia-cri.toml \
             gpu-operator nvidia/gpu-operator
Error: INSTALLATION FAILED: admission webhook "validate.kyverno.svc-fail" denied the request: 

resource DaemonSet/gpu-operator/gpu-operator-node-feature-discovery-worker was blocked due to the following policies 

disallow-host-path:
  autogen-host-path: 'validation error: HostPath volumes are forbidden. The field
    spec.volumes[*].hostPath must be unset. rule autogen-host-path failed at path
    /spec/template/spec/volumes/0/hostPath/'

In some cases, errors may only show up after the helm chart installed successfully. This is usually the case if there is a kind of feature discovery, which in turn decides to schedule Pods on certain nodes or not. In that case the Helm chart might succeed, but failures appear in Kubernetes events.

Now that we found out that we need additional exceptions for our NVIDIA GPU Operator example. We can extract the kyverno policies values file from Helm, if we don’t already have it, using:

helm get values -n kyverno kyverno-policies | sed '/USER-SUPPLIED VALUES:/d' > kyverno-policies-values.yaml

At this point we can open the kyverno-policies-values.yaml file and add the gpu-operator namespace to the appropriate exceptions. In this case the changes can be taken from Step 7 where we list the “full” values file including the NVIDIA GPU Operator.

Now we need to “upgrade” the chart (we’ll only be updating the values file configuration). However, before we execute the helm upgrade command, we need to know which version we are dealing with, for example we are dealing with version 3.0.7 in this case.

root@rb-kube92v124x:~# helm list -n kyverno
NAME             NAMESPACE REVISION UPDATED                                  STATUS   CHART                  APP VERSION
kyverno          kyverno   1        2024-09-25 11:45:34.987572389 +0200 CEST deployed kyverno-3.0.9          v1.10.7    
kyverno-policies kyverno   1        2024-09-25 11:48:45.704720815 +0200 CEST deployed kyverno-policies-3.0.7 v1.10.6    

The version of the kyverno-policies chart has to match the version in the below helm upgrade command, please change the value therefore if needed, otherwise we’ll do an upgrade of the Kyverno version at the same time, which may not be what we want to do at this point.

helm upgrade --wait --version 3.0.7 -f kyverno-policies-values.yaml --timeout 10m0s --namespace kyverno kyverno-policies kyverno/kyverno-policies

Now a redeploy of the NVIDIA GPU Operator should succeed, and the Helm chart is no longer blocked by Kyverno.

10. Final notes

Now that Pod Security Policies are no longer being used, and Kyverno is up and running. It is possible to continue with the Upgrade procedure of Kubernetes if that was the motivation for this migration. After that, it is also possible to upgrade Kyverno to a newer version. The version used in this KB article is rather old, since it’s the last version that still supports Kubernetes 1.24. Upgrading Kyverno can be done as follows.

helm get values -n kyverno kyverno | sed '/USER-SUPPLIED VALUES:/d' > kyverno-values.yaml
helm get values -n kyverno kyverno-policies | sed '/USER-SUPPLIED VALUES:/d' > kyverno-policies-values.yaml
helm upgrade --wait --version 3.2.6 -f kyverno-values.yaml --timeout 10m0s --namespace kyverno kyverno kyverno/kyverno
helm upgrade --wait --version 3.2.5 -f kyverno-policies-values.yaml --timeout 10m0s --namespace kyverno kyverno-policies kyverno/kyverno-policies

Upgrading from Kubernetes 1.24 to 1.26 is described in the following KB article: https://kb.brightcomputing.com/knowledge-base/upgrading-kubernetes-version-1-24-to-1-26-on-a-bright-9-2-cluster/.

 

Updated on September 25, 2024