Background
When you configure multiple network interfaces on a single machine with an IP address in the same IP subnet, you will need to do additional configuration work to allow the networking stack in the Linux kernel to use these interfaces properly.
By default, only one of the IP addresses that you assign within an IP subnet is usable. This is because the kernel may respond to an incoming packet through a different interface than the interface where the packet came in.
Setting Kernel Parameters
The net.ipv4.conf.<interface>.accept_local
kernel parameter needs to be set to 1
.
In addition, several other ARP and reverse path kernel parameters should be set appropriately.
There are several ways of accomplishing this, but on a Bright cluster, the easiest way is to create a file /etc/sysctl.d/99-multi-ip-in-subnet.conf
in the relevant software images (e.g. /cm/images/default-image
) with, for example, the following contents:
# Set defaults
net.ipv4.conf.all.arp_ignore=0
net.ipv4.conf.all.rp_filter=1
# Set ARP and reverse path settings for ib0
net.ipv4.conf.ib0.arp_ignore=1
net.ipv4.conf.ib0.arp_filter=0
net.ipv4.conf.ib0.arp_announce=2
net.ipv4.conf.ib0.rp_filter=0
# Set ARP and reverse path settings for ib1
net.ipv4.conf.ib1.arp_ignore=1
net.ipv4.conf.ib1.arp_filter=0
net.ipv4.conf.ib1.arp_announce=2
net.ipv4.conf.ib1.rp_filter=0
# Set accept_local for interfaces
net.ipv4.conf.ib0.accept_local=1
net.ipv4.conf.ib1.accept_local=1
NOTE: Substitute ib0
and ib1
with appropriate interface names, and expand for any further interfaces. Alternatively, the Linux kernel allows all
and default
to be specified instead of an actual interface name.
Setting up Routes
For RHEL8:
Several routes have to be created on all nodes. This can be done by creating the following files on each node:
/etc/sysconfig/network-scripts/route-<interface>
/etc/sysconfig/network-scripts/rule-<interface>
/etc/iproute2/rt_tables
In BCM, the easiest way of accomplishing this is to use a finalize script, which is a script that executes after the node has finished provisioning, but before systemd
is started.
A finalize script for a category can be set using Base View or CMSH. In CMSH:
[root@mdv-cluster ~]# cmsh
[mdv-cluster]% category use default
[mdv-cluster->category[default]]% set finalizescript [filename]
...
[mdv-cluster]% commit
For more information about finalize scripts, please consult the BCM documentation:
The following finalize script can be set for a category or for individual nodes to generate the appropriate content:
#!/bin/sh
INTERFACES="ib0 ib1"
bits_by_netmask () {
c=0 x=0$( printf '%o' ${1//./ } )
while [ $x -gt 0 ]; do
let c+=$((x%2)) 'x>>=1'
done
echo $c ; }
tblnum=200
for interface in $INTERFACES; do
eval netmask=\$CMD_INTERFACE_${interface}_NETMASK
eval src=\$CMD_INTERFACE_${interface}_IP
tbl=$interface
IFS=. read -r i1 i2 i3 i4 <<< "$src"
IFS=. read -r m1 m2 m3 m4 <<< "$netmask"
base=`printf "%d.%d.%d.%d\n" "$((i1 & m1))" "$((i2 & m2))" "$((i3 & m3))" "$((i4 & m4))"`
bits=`bits_by_netmask $netmask`
net="$base/$bits"
echo $net dev $interface src $src table $tbl >/localdisk/etc/sysconfig/network-scripts/route-$interface
echo from $src table $tbl >/localdisk/etc/sysconfig/network-scripts/rule-$interface
if ! grep -q $tblnum /localdisk/etc/iproute2/rt_tables; then
echo $tblnum $interface >>/localdisk/etc/iproute2/rt_tables
fi
tblnum=$((tblnum+1))
done
For RHEL9:
1) Connect to the software image chroot used by the compute nodes, for example:
# cm-chroot /cm/images/default-rhel9
2) Enable NetworkManager-dispatcher
# systemctl enable NetworkManager-dispatcher.service
3) Add the following rule/route policy scripts:
- 10-add-policy-based-rules.sh:
cat > /etc/NetworkManager/dispatcher.d/10-add-policy-based-rules.sh <<EOF
#!/bin/bash
IPoIB=(`ip -br a | grep ib | awk -F" " '{print $3}' | sed -e 's/\/16//g'`)
INTERFACES=(`ip -br a | grep ib | awk -F" " '{print $1}'`)
RANGE=${#IPoIB[@]}
for itr in $(seq 0 $((RANGE-1)) )
do
if [[ `ip rule show table ${INTERFACES[$itr]}` ]]
then
ip rule show table ${INTERFACES[$itr]}
else
ip rule add `cat /etc/sysconfig/network-scripts/rule-${INTERFACES[$itr]}`
#ip rule add from ${IPoIB[$itr]} table ${INTERFACES[$itr]}
ip route add `cat /etc/sysconfig/network-scripts/route-${INTERFACES[$itr]}`
#ip route add ${IPoIB[$itr]}/16 dev ${INTERFACES[$itr]} src ${IPoIB[$itr]} table ${INTERFACES[$itr]}
fi
done
EOF
- 20-add-policy-based-routing.sh:
cat > /etc/NetworkManager/dispatcher.d/20-add-policy-based-routing.sh << EOF
#!/bin/bash
IPoIB=(`ip -br a | grep ib | awk -F" " '{print $3}' | sed -e 's/\/16//g'`)
INTERFACES=(`ip -br a | grep ib | awk -F" " '{print $1}'`)
RANGE=${#IPoIB[@]}
for itr in $(seq 0 $((RANGE-1)) )
do
if [[ `ip route show table ${INTERFACES[$itr]}` ]]
then
ip route show table ${INTERFACES[$itr]}
else
#ip rule add `cat /etc/sysconfig/network-scripts/rule-${INTERFACES[$itr]}`
#ip rule add from ${IPoIB[$itr]} table ${INTERFACES[$itr]}
ip route add `cat /etc/sysconfig/network-scripts/route-${INTERFACES[$itr]}`
#ip route add ${IPoIB[$itr]}/16 dev ${INTERFACES[$itr]} src ${IPoIB[$itr]} table ${INTERFACES[$itr]}
fi
done
EOF
4) The following finalize script can be set for a category or for individual nodes to generate the appropriate content:
#!/bin/sh
INTERFACES="ib0 ib1"
bits_by_netmask () {
c=0 x=0$( printf '%o' ${1//./ } )
while [ $x -gt 0 ]; do
let c+=$((x%2)) 'x>>=1'
done
echo $c ; }
tblnum=200
for interface in $INTERFACES; do
eval netmask=\$CMD_INTERFACE_${interface}_NETMASK
eval src=\$CMD_INTERFACE_${interface}_IP
tbl=$interface
IFS=. read -r i1 i2 i3 i4 <<< "$src"
IFS=. read -r m1 m2 m3 m4 <<< "$netmask"
base=`printf "%d.%d.%d.%d\n" "$((i1 & m1))" "$((i2 & m2))" "$((i3 & m3))" "$((i4 & m4))"`
bits=`bits_by_netmask $netmask`
net="$base/$bits"
echo $net dev $interface src $src table $tbl >/localdisk/etc/sysconfig/network-scripts/route-$interface
echo from $src table $tbl >/localdisk/etc/sysconfig/network-scripts/rule-$interface
if ! grep -q $tblnum /localdisk/etc/iproute2/rt_tables; then
echo $tblnum $interface >>/localdisk/etc/iproute2/rt_tables
fi
tblnum=$((tblnum+1))
done
For Ubuntu:
1) Connect to the software image chroot used by the compute nodes, for example:
# cm-chroot /cm/images/default-ubuntu2204
2) Enable networkd-dispatcher:
# systemctl unmask networkd-dispatcher.service
# systemctl enable networkd-dispatcher.service
3) Enable systemd-networkd:
# systemctl unmask systemd-networkd
# systemctl enable systemd-networkd
4) Create a script to write out the ip rules and routes:
# cat > /etc/networkd-dispatcher/routable.d/10-policy-based-rules.sh << EOF
#!/bin/bash IPoIB=(`ip -br a | grep ib | awk -F" " '{print $3}' | awk -F"/" '{print $1}'`) INTERFACES=(`ip -br a | grep ib | awk -F" " '{print $1}'`) NETMASK=`ip -br a| grep ib | awk -F" " '{print $3}' | awk -F"/" '{print $2}' | head | uniq` tblnum=200 for ((index=0; index<${#INTERFACES[*]}; index++)) do echo "$tblnum ${INTERFACES[$index]}" >> /etc/iproute2/rt_tables if [[ `ip rule show table ${INTERFACES[$index]}` ]] then ip rule show table ${INTERFACES[$index]} else ip rule add from ${IPoIB[$index]} table ${INTERFACES[$index]} fi if [[ `ip route show table ${INTERFACES[$index]}` ]] then ip route show table ${INTERFACES[$index]} else ip route add `ipcalc ${IPoIB[$index]}/$NETMASK | grep Network | awk -F" " '{print $2}'` dev ${INTERFACES[$index]} src ${IPoIB[$index]} table ${INTERFACES[$index]} fi tblnum=$((tblnum+1)) done
EOF
5) Make the script executable
# chmod 755 /etc/networkd-dispatcher/routable.d/10-policy-based-rules.sh
NOTE: Make sure to modify the INTERFACES
list appropriately in the finalize script set for the category. After the finalize script has been set, the nodes should be rebooted.
When they come up, you should check that the files have been correctly generated on the nodes:
$ ip route show table ib0
32764: from 100.127.1.30 lookup ib0
$ ip rule show table ib0
100.127.0.0/16 dev ib0 proto kernel scope link src 100.127.1.30