1. Prerequisites
- The following article was written with Bright Cluster Manager 9.1 in mind, but should work for versions 8.2, 9.0, 9.1 and 9.2 as well.
2. Kubernetes networks
During Bright Cluster Manager’s Kubernetes setup wizard the administrator is asked to define two CIDR’s for the Kube Pod Network and the Kube Service Network.
In cmsh the references to these networks can be found in the Kubernetes submode (some output omitted for brevity):
[root@headnode ~]# cmsh
[headnode]% kubernetes
[headnode->kubernetes[default]]% show
Parameter Value
------------------------------------------- ---------------------------------------------------------------------------------------------------
...
Service Network kube-default-service
Pod Network kube-default-pod
Pod Network Node Mask
Internal Network internalnet
KubeDNS IP 10.150.255.254
...
The networks themselves can be found and configured in the network submode:
[headnode]% network list
Name (key) Type Netmask bits Base address Domain name IPv6
--------------------- -------------- -------------- ---------------- ---------------------- ----
externalnet External 24 192.168.200.0 openstacklocal no
globalnet Global 0 0.0.0.0 cm.cluster
internalnet Internal 16 10.141.0.0 eth.cluster
kube-default-pod Internal 16 172.28.0.0 pod.cluster.local
kube-default-service Internal 16 10.150.0.0 service.cluster.local
3. Kubernetes configuration
Three relevant parameters for the Kube Controller Manager are populated with these aforementioned ranges:
- –cluster-cidr=172.28.0.0/16
- –service-cluster-ip-range=10.150.0.0/16
- –allocate-node-cidrs (required for the other two parameters to function)
Other relevant parameters that we don’t explicitly set or change are:
- –node-cidr-mask-size
- –node-cidr-mask-size-ipv4 (default 24)
- –node-cidr-mask-size-ipv6 (default 64)
The cluster CIDR is /16 by default, and the mask used for node CIDRs is /24 by default, which means 256 nodes can get a /24 CIDR. Changing the mask value to something lower can increase the number of nodes.
3.1. Changing the node-CIDR mask
Since this mask is not managed by Bright Cluster Manager by default, it can be added via cmsh inside the Kubernetes::Controller
role as a generic additional option:
[root@headnode ~]# cmsh
[headnode]% configurationoverlay
[headnode->configurationoverlay]% use kube-default-master
[headnode->configurationoverlay[kube-default-master]]% roles
[headnode->configurationoverlay[kube-default-master]->roles]% use kubernetes::controller
[headnode->configurationoverlay[kube-default-master]->roles[Kubernetes::Controller]]% append options "--node-cidr-mask-size-ipv4 27"
[headnode->configurationoverlay*[kube-default-master*]->roles*[Kubernetes::Controller*]]% commit
3.2. Controller Manager configuration file
With Bright Cluster Manager you can find the CIDR configurations provided to the Kubernetes Controller Manager via the following parameters file:
/cm/local/apps/kubernetes/var/etc/controller-manager
3.3. Kube Proxy configuration file
The second component that is provided the cluster CIDR is the kube proxy, via the following configuration file:
/cm/local/apps/kubernetes/var/etc/proxy.yaml
The relevant configuration in that YAML file is: clusterCIDR: 172.28.0.0/16
4. Changing the Kube Pod Network in cmsh
In this example we’ll change a previously chosen /22 CIDR to a /16.
We have a seven node Kubernetes cluster, but only four of the nodes have a CIDR allocated, as can be seen with the following kubectl command:
[root@localhost ~]# module load kubernetes/default/1.18.15
[root@localhost ~]# kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR} ' | sed 's/ /\n/g'
172.29.3.0/24
172.29.0.0/24
172.29.2.0/24
172.29.1.0/24
Since a /22 can be divided into exactly four /24 subnets this is expected. We want to expand the cluster CIDR to /16 so each node can get a /24 podCIDR.
[root@localhost ~]# cmsh
[localhost]% network
[localhost->network]% use kube-default-pod
[localhost->network[kube-default-pod]]% show
Parameter Value
-------------------------------- ------------------------------------------------
...
Base address 172.28.0.0
Broadcast address 172.29.255.255
Dynamic range start 0.0.0.0
Dynamic range end 0.0.0.0
Netmask bits 22
Gateway 0.0.0.0
...
[localhost->network[kube-default-pod]]% set netmaskbits 16
[localhost->network*[kube-default-pod*]]% commit
In a recent enough version of Bright Cluster Manager the above will result in Kubernetes configuration to be updated and Kubernetes services to be restarted automatically. These versions are:
- >= 8.2-28
- >= 9.0-19
- >= 9.1-13
- >= 9.2-3
For older versions we need to modify the the Kubernetes Master Configuration Overlays priority briefly, in order to make Bright Cluster Manager write out the new configuration. At the time of writing, changes made in the Networking submode do not automatically propagate to Kubernetes configuration files. This has been fixed in versions 8.2-28, 9.2-2, 9.1-13, 9.0-19. For now we have to execute the following in cmsh:
[localhost->network[kube-default-pod]]% configurationoverlay
[localhost->configurationoverlay]% use kube-default-master
[localhost->configurationoverlay[kube-default-master]]% get priority
510
[localhost->configurationoverlay[kube-default-master]]% set priority 511
[localhost->configurationoverlay*[kube-default-master*]]% commit
[localhost->configurationoverlay[kube-default-master]]%
Tue Apr 19 21:03:38 2022 [notice] node004: Service kube-proxy was restarted
Tue Apr 19 21:03:38 2022 [notice] node002: Service kube-controller-manager was restarted
Tue Apr 19 21:03:38 2022 [notice] node001: Service kube-proxy was restarted
Tue Apr 19 21:03:38 2022 [notice] node002: Service kube-proxy was restarted
Tue Apr 19 21:03:38 2022 [notice] node001: Service kube-controller-manager was restarted
Tue Apr 19 21:03:38 2022 [notice] node005: Service kube-proxy was restarted
Tue Apr 19 21:03:38 2022 [notice] node006: Service kube-proxy was restarted
Tue Apr 19 21:03:38 2022 [notice] node003: Service kube-proxy was restarted
After the restarts the priority can be restored to its original value.
[localhost->configurationoverlay[kube-default-master]]% set priority 510
[localhost->configurationoverlay*[kube-default-master*]]% commit
This won’t result in another restart, since we didn’t change any configuration (such as the CIDR) after the previous restart.
5. Check the node CIDRs
After the kube services restarts from the previous section, we should already see more nodes getting a CIDR:
[root@localhost ~]# module load kubernetes/default/1.18.15
[root@localhost ~]# kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR} ' | sed 's/ /\n/g'
172.29.3.0/24
172.29.4.0/24
172.29.5.0/24
172.29.6.0/24
172.29.0.0/24
172.29.2.0/24
172.29.1.0/24
6. Updating Calico Configuration
In case Calico is used, we can use calicoctl to list the default pool that is created when calico is first initialized. And if this was with the previous CIDR (likely) then this should be reflected in the below output:
[root@localhost ~]# calicoctl get pool -o wide
NAME CIDR NAT IPIPMODE VXLANMODE DISABLED SELECTOR
default-ipv4-ippool 172.29.0.0/22 true Always Never false all()
In this case we’re dealing with expanding the CIDR from /22 to /16, but if we had changed 172.29.0.0/16 to 172.30.0.0/16 for example, the same approach can be used.
[root@localhost ~]# calicoctl get pool -o yaml > pool.yaml
[root@localhost ~]# vim pool.yaml # change the CIDR
[root@localhost ~]# calicoctl delete -f pool.yaml && calicoctl apply -f pool.yaml
Calico should start using addresses within the /16 range right away for new pods. Pods need to be recreated in order for them to get an IP in the modified pool.
7. Updating Flannel Configuration
In case Flannel is used, the Controller Manager might keep showing errors such as:
Apr 20 11:43:33 node002 kube-controller-manager[27266]: E0420 11:43:33.265097 27266 controllermanager.go:521] Error starting "nodeipam"
Apr 20 11:43:33 node002 kube-controller-manager[27266]: F0420 11:43:33.265119 27266 controllermanager.go:235] error starting controllers: failed to mark cidr[172.29.1.0/24] at idx [0] as occupied for node: node001: cidr 172.29.1.0/24 is out the range of cluster cidr 10.75.0.0/16
In this example we’ve changed the base network from 172.29.0.0/16 to 10.75.0.0/16. Some more manual work is required for Flannel.
First ensure we delete the interfaces used by flannel on the Kubernetes nodes.
sudo ip link del cni0;
sudo ip link del flannel.1
For example by using pdsh:
pdsh -A "sudo ip link del cni0; sudo ip link del flannel.1"
Secondly, get rid of the flannel and core-dns pods:
root@headnode:~# kubectl delete pod -n kube-system -l app=flannel
pod "kube-flannel-ds-amd64-tqnwx" deleted
pod "kube-flannel-ds-amd64-wc87p" deleted
pod "kube-flannel-ds-amd64-zcq9n" deleted
pod "kube-flannel-ds-amd64-zpmqf" deleted
root@headnode:~# kubectl delete pod -n kube-system -l k8s-app=kube-dns
pod "coredns-b5cdc886c-8vp2b" deleted
pod "coredns-b5cdc886c-jswxd" deleted
Thirdly, we also need to manually change the allocated podCIDRs in the nodes. The easiest way we found so far is based of off the following serfault answer here.
The example bash script below can be copy & pasted on the Head Node after loading the module file. And recreates each Kubernetes node resource, and replaces “172.29.” with “10.75.”.
while read node; do
kubectl get node $node -o yaml > /tmp/$node.yaml
sed -i.bak "s/172.29./10.75./g" /tmp/$node.yaml
kubectl delete node $node
kubectl create -f /tmp/$node.yaml
done < <(kubectl get nodes --no-headers | awk '{print $1}')
Confirm that the podCIDRs are now correct (below example output from a different cluster, with only four nodes):
root@headnode:~# kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR} ' | sed 's/ /\n/g'
10.75.1.0/24
10.75.2.0/24
10.75.3.0/24
10.75.0.0/24
Final step: Recreate other Kubernetes PODs that are still using an old CIDR to migrate them.