0. Prerequisites
- This KB article assumes a Rocky9 cluster and NVIDIA cluster manager version 9.2.
- This KB article assumes a Kubernetes installation already present (based on Calico or Flannel)
- The Kubernetes setup wizard does currently not support Cilium. If support is added we will update this KB article accordingly.
1. Background
The article may be applicable to other Linux distributions, but the motivation was an issue found on Rocky 9.
The problem was that Calico version 3.20.x and 3.24.x at least, result in errors in the calico-node deployment on the Head Node(s). This problem may not be apparent immediately unfortunately. Depending on when you look at the output for the following command, the Pods may all show “Running 1/1” (The 1/1 indicating all the containers are considered Ready/Healthy):
root@rb-rocky9khn:~# kubectl get pod -n kube-system -o wide -l k8s-app=calico-node NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-node-8mltn 1/1 Running 0 17h 10.141.0.1 node001 calico-node-9wfvv 1/1 Running 0 17h 10.141.255.254 rb-rocky9khn
A few seconds later, the same command might show (note Ready 0/1, and notice the zero restarts..):
root@rb-rocky9khn:~# kubectl get pod -n kube-system -o wide -l k8s-app=calico-node NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-node-8mltn 1/1 Running 0 17h 10.141.0.1 node001 calico-node-9wfvv 0/1 Running 0 17h 10.141.255.254 rb-rocky9khn
The logs will show a different story. It will complain about various existing rules already present in nftables.
2023-03-15 09:25:46.080 [ERROR][153583] felix/table.go 855: iptables-save failed because there are incompatible nft rules in the table. Remove the nft rules to continue. ipVersion=0x4 table="filter" 2023-03-15 09:25:46.080 [WARNING][153583] felix/table.go 804: Killing iptables-nft-save process after a failure error=iptables-save failed because there are incompatible nft rules in the table 2023-03-15 09:25:46.080 [WARNING][153583] felix/table.go 763: iptables-nft-save command failed error=iptables-save failed because there are incompatible nft rules in the table ipVersion=0x4 stderr="" table="filter" 2023-03-15 09:25:46.107 [INFO][153583] felix/int_dataplane.go 1060: Linux interface addrs changed. addrs=set.mapSet{} ifaceName="calico_tmp_B" 2023-03-15 09:25:46.107 [INFO][153583] felix/int_dataplane.go 1060: Linux interface addrs changed. addrs=set.mapSet{} ifaceName="calico_tmp_A" 2023-03-15 09:25:46.120 [INFO][153583] felix/int_dataplane.go 1060: Linux interface addrs changed. addrs= ifaceName="calico_tmp_A" 2023-03-15 09:25:46.120 [INFO][153583] felix/int_dataplane.go 1060: Linux interface addrs changed. addrs= ifaceName="calico_tmp_B" 2023-03-15 09:25:46.187 [ERROR][153583] felix/table.go 855: iptables-save failed because there are incompatible nft rules in the table. Remove the nft rules to continue. ipVersion=0x4 table="filter" 2023-03-15 09:25:46.187 [WARNING][153583] felix/table.go 804: Killing iptables-nft-save process after a failure error=iptables-save failed because there are incompatible nft rules in the table 2023-03-15 09:25:46.187 [WARNING][153583] felix/table.go 763: iptables-nft-save command failed error=iptables-save failed because there are incompatible nft rules in the table ipVersion=0x4 stderr="" table="filter" 2023-03-15 09:25:46.391 [ERROR][153583] felix/table.go 855: iptables-save failed because there are incompatible nft rules in the table. Remove the nft rules to continue. ipVersion=0x4 table="filter" 2023-03-15 09:25:46.391 [WARNING][153583] felix/table.go 804: Killing iptables-nft-save process after a failure error=iptables-save failed because there are incompatible nft rules in the table 2023-03-15 09:25:46.391 [WARNING][153583] felix/table.go 763: iptables-nft-save command failed error=iptables-save failed because there are incompatible nft rules in the table ipVersion=0x4 stderr="" table="filter" 2023-03-15 09:25:46.794 [ERROR][153583] felix/table.go 855: iptables-save failed because there are incompatible nft rules in the table. Remove the nft rules to continue. ipVersion=0x4 table="filter" 2023-03-15 09:25:46.794 [WARNING][153583] felix/table.go 804: Killing iptables-nft-save process after a failure error=iptables-save failed because there are incompatible nft rules in the table 2023-03-15 09:25:46.794 [WARNING][153583] felix/table.go 763: iptables-nft-save command failed error=iptables-save failed because there are incompatible nft rules in the table ipVersion=0x4 stderr="" table="filter" 2023-03-15 09:25:46.794 [PANIC][153583] felix/table.go 769: iptables-nft-save command failed after retries ipVersion=0x4 table="filter" panic: (*logrus.Entry) 0xc00058ea00 goroutine 266 [running]: github.com/sirupsen/logrus.Entry.log(0xc00007e1e0, 0xc0005f8720, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7fff00000000, ...) /go/pkg/mod/github.com/projectcalico/logrus@v1.0.4-calico/entry.go:128 +0x6a5 github.com/sirupsen/logrus.(*Entry).Panic(0xc0000a9f90, 0xc000e30b58, 0x1, 0x1) /go/pkg/mod/github.com/projectcalico/logrus@v1.0.4-calico/entry.go:173 +0xfa github.com/sirupsen/logrus.(*Entry).Panicf(0xc0000a9f90, 0x29c2b7f, 0x1f, 0xc000e30c10, 0x1, 0x1) /go/pkg/mod/github.com/projectcalico/logrus@v1.0.4-calico/entry.go:221 +0xc5 github.com/projectcalico/felix/iptables.(*Table).getHashesAndRulesFromDataplane(0xc0005cb800, 0x1309c2d2, 0x3fc7e60) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210729235055-29a866674aea/iptables/table.go:769 +0x49c github.com/projectcalico/felix/iptables.(*Table).loadDataplaneState(0xc0005cb800) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210729235055-29a866674aea/iptables/table.go:606 +0x1e5 github.com/projectcalico/felix/iptables.(*Table).Apply(0xc0005cb800, 0xc000b5c7b0) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210729235055-29a866674aea/iptables/table.go:990 +0xfe5 github.com/projectcalico/felix/dataplane/linux.(*InternalDataplane).apply.func3(0xc000310040, 0xc000310048, 0xc00070ac00, 0xc000310050, 0xc0005cb800) /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210729235055-29a866674aea/dataplane/linux/int_dataplane.go:1818 +0x3c created by github.com/projectcalico/felix/dataplane/linux.(*InternalDataplane).apply /go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20210729235055-29a866674aea/dataplane/linux/int_dataplane.go:1817 +0x6c5
Such existing entries can be queried using the “nft” commandline tool (nft list table filter, nft list chain nat eth1_masq, etc.).
We came to the conclusion that Shorewall entries, such as the following examples, written by Shorewall, are confusing Calico:
table ip nat { chain eth1_masq { ip saddr 10.141.0.0/16 counter packets 1 bytes 60 masquerade } } table ip filter { chain ~log0 { counter packets 4 bytes 240 log prefix "Shorewall:net-fw:ACCEPT:" level info counter packets 4 bytes 240 accept } chain shorewall { # recent: SET name: %CURRENTTIME side: source mask: 255.255.255.255 counter packets 0 bytes 0 } }
Configuring Calico into eBPF mode didn’t get rid of the errors.
Step 1: Removing Calico (or Flannel)
If you are using flannel, do it for the “flannel” application instead:
root@rb-rocky9khn:~# cmsh [rb-rocky9khn]% kubernetes [rb-rocky9khn->kubernetes[default]]% appgroups [rb-rocky9khn->kubernetes[default]->appgroups]% use system [rb-rocky9khn->kubernetes[default]->appgroups[system]]% applications [rb-rocky9khn->kubernetes[default]->appgroups[system]->applications]% set calico enabled no [rb-rocky9khn->kubernetes*[default*]->appgroups*[system*]->applications*]% commit [rb-rocky9khn->kubernetes[default]->appgroups[system]->applications]%
This should automatically get rid of all the calico components:
root@rb-rocky9khn:~# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-7c59c9cb69-4fxs4 1/1 Running 0 17h coredns-7c59c9cb69-5jv77 1/1 Running 0 17h kube-state-metrics-ddc87b89b-q4kct 1/1 Running 0 17h metrics-server-bcbd76cfc-bdc2p 1/1 Running 0 17h metrics-server-bcbd76cfc-hph4f 1/1 Running 0 17h
Step 2: Install cilium binary on the Head Node(s)
This can be done with the following snippet (these instructions have been copied from: https://github.com/cilium/cilium-cli#installation). The only minor adjustment we made is that we set the CILIUM_CLI_VERSION to v0.13.1, since that is the version we chose to test during the creation of this KB article. The snippet from the URL will install the latest stable version (which can be newer in the future).
CILIUM_CLI_VERSION=v0.13.1 CLI_ARCH=amd64 if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum} sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
We can test the binary as follows.
root@rb-rocky9khn:~# module load kubernetes/default/1.24.9 root@rb-rocky9khn:~# cilium version cilium-cli: v0.13.1 compiled with go1.20 on linux/amd64 cilium image (default): v1.13.0 cilium image (stable): v1.13.0 cilium image (running): unknown. Unable to obtain cilium version, no cilium pods found in namespace "kube-system"
Step 3: Prepare shorewall (firewall) in cmsh
We need to first start a few commands using cmsh, either automatically (below, first block) or manually (below, second block).
cat << EOT >> add_cilium_firewall_role.txt device use master roles use firewall zones add cil .. .. interfaces add cil lxc+ detect routeback add cil cilium_host .. .. policies add cil all ACCEPT .. .. commit EOT cmsh -f add_cilium_firewall_role.txt
This should result in a restart of the shorewall service. The above should be the equivalent of the following, executed manually:
root@rb-rocky9khn:~# cmsh [rb-rocky9khn]% device use master [rb-rocky9khn->device[rb-rocky9khn]]% roles [rb-rocky9khn->device[rb-rocky9khn]->roles]% use firewall [rb-rocky9khn->device[rb-rocky9khn]->roles[firewall]]% zones [rb-rocky9khn->device[rb-rocky9khn]->roles[firewall]->zones]% add cil [rb-rocky9khn->device*[rb-rocky9khn*]->roles*[firewall*]->zones[1]]% .. [rb-rocky9khn->device*[rb-rocky9khn*]->roles*[firewall*]->zones]% .. [rb-rocky9khn->device*[rb-rocky9khn*]->roles*[firewall*]]% interfaces [rb-rocky9khn->device*[rb-rocky9khn*]->roles*[firewall*]->interfaces]% add cil lxc+ detect routeback [rb-rocky9khn->device*[rb-rocky9khn*]->roles*[firewall*]->interfaces[2]]% add cil cilium_host [rb-rocky9khn->device*[rb-rocky9khn*]->roles*[firewall*]->interfaces[3]]% .. [rb-rocky9khn->device*[rb-rocky9khn*]->roles*[firewall*]->interfaces]% .. [rb-rocky9khn->device*[rb-rocky9khn*]->roles*[firewall*]]% policies [rb-rocky9khn->device*[rb-rocky9khn*]->roles*[firewall*]->policies]% add cil all ACCEPT [rb-rocky9khn->device*[rb-rocky9khn*]->roles*[firewall*]->policies[1]]% .. [rb-rocky9khn->device*[rb-rocky9khn*]->roles*[firewall*]->policies]% .. [rb-rocky9khn->device*[rb-rocky9khn*]->roles*[firewall*]]% commit [rb-rocky9khn->device[rb-rocky9khn]->roles[firewall]]%
If we do not do this, we will find later in “dmesg -w” output, that cilium traffic will be blocked by shorewall.
Step 4: Get the correct CIDR for the Pod network from cmsh
In our case this has been configured as 172.29.0.0/16 for the kubernetes cluster named “default”.
root@rb-rocky9khn:~# cmsh [rb-rocky9khn]% kubernetes [rb-rocky9khn->kubernetes[default]]% get podnetwork kube-default-pod [rb-rocky9khn->kubernetes[default]]% network [rb-rocky9khn->network]% show kube-default-pod Parameter Value -------------------------------- ------------------------------------------------ Name kube-default-pod Domain Name pod.cluster.local ... Base address 172.29.0.0 Broadcast address 172.29.255.255 Dynamic range start 0.0.0.0 Dynamic range end 0.0.0.0 Netmask bits 16 Gateway 0.0.0.0 ...
We need to pass this CIDR to the “cilium install” command in the next step.
Step 5: Install cilium
cilium install –config cluster-pool-ipv4-cidr=172.29.0.0/16
The output of the above install command is pretty verbose, in the end the result should look similar to:
root@rb-rocky9khn:~# cilium install --config cluster-pool-ipv4-cidr=172.29.0.0/16 ℹ️ Using Cilium version 1.13.0 🔮 Auto-detected cluster name: default 🔮 Auto-detected datapath mode: tunnel 🔮 Auto-detected kube-proxy has not been installed ℹ️ Cilium will fully replace all functionalities of kube-proxy ... ℹ️ Manual overwrite in ConfigMap: cluster-pool-ipv4-cidr=172.29.0.0/16 ... ✅ Cilium was successfully installed! Run 'cilium status' to view installation health
The status command should look similar to:
root@rb-rocky9khn:~# cilium status /¯¯\ /¯¯\__/¯¯\ Cilium: OK \__/¯¯\__/ Operator: OK /¯¯\__/¯¯\ Hubble Relay: disabled \__/¯¯\__/ ClusterMesh: disabled \__/ Deployment cilium-operator Desired: 1, Ready: 1/1, Available: 1/1 DaemonSet cilium Desired: 2, Ready: 2/2, Available: 2/2 Containers: cilium Running: 2 cilium-operator Running: 1 Cluster Pods: 12/12 managed by Cilium Image versions cilium quay.io/cilium/cilium:v1.13.0@sha256:6544a3441b086a2e09005d3e21d1a4afb216fae19c5a60b35793c8a9438f8f68: 2 cilium-operator quay.io/cilium/operator-generic:v1.13.0@sha256:4b58d5b33e53378355f6e8ceb525ccf938b7b6f5384b35373f1f46787467ebf5: 1
Some Pods in a broken state might need to be recreated now that the Networking part of Kubernetes has been replaced.
Alternative solutions
Weave Networking used to be a valid alternative for Calico to try in certain situations, however the latest release for Weave was January 25th 2021.
Flannel might work, but we haven’t tried, since Flannel is not really designed for larger Kubernetes clusters.
We have deployed Calico in eBPF mode before, however, we still seem to get errors in Rocky9 using it (for now, this might improve in future versions).