When you configure multiple network interfaces on a single machine with an IP address in the same IP subnet, you will need to do some additional configuration work to allow the networking stack in the Linux kernel to use these interfaces properly. By default you will find that only one of the IP addresses that you assign within an IP subnet is usable. This is because the kernel may respond to an incoming packet through a different interface than the interface where the packet came in.
Setting kernel parameters
The net.ipv4.conf.<interface>.accept_local
kernel parameter needs to be set to 1
. In addition, a number of other ARP and reverse path kernel parameters should be set appropriately. There are several ways of accomplishing this, but on a Bright cluster, the easiest way is to create a file /etc/sysctl.d/99-multi-ip-in-subnet.conf
in the relevant software images (e.g. /cm/images/default-image
) with for example the following contents:
# Set defaults
net.ipv4.conf.all.arp_ignore=0
net.ipv4.conf.all.rp_filter=1
# Set ARP and reverse path settings for ib0
net.ipv4.conf.ib0.arp_ignore=1
net.ipv4.conf.ib0.arp_filter=0
net.ipv4.conf.ib0.arp_announce=2
net.ipv4.conf.ib0.rp_filter=0
# Set ARP and reverse path settings for ib1
net.ipv4.conf.ib1.arp_ignore=1
net.ipv4.conf.ib1.arp_filter=0
net.ipv4.conf.ib1.arp_announce=2
net.ipv4.conf.ib1.rp_filter=0
# Set accept_local for interfaces
net.ipv4.conf.ib0.accept_local=1
net.ipv4.conf.ib1.accept_local=1
It is important to substitute ib0
and ib1
with appropriate interface names, and expand for any further interfaces. Alternatively the Linux kernel allows all
and default
to be specified instead of an actual interface name.
Setting up routes
For RHEL8:
A number of routes have to be created on all nodes. This can be done by creating the following files on each node:
/etc/sysconfig/network-scripts/route-<interface>
/etc/sysconfig/network-scripts/rule-<interface>
/etc/iproute2/rt_tables
In Bright the easiest way of accomplishing this is to use a finalize script which executes after the node has finished provisioning, but before systemd
is started. Setting a finalize script for a category can be done using Bright View or CMSH. In CMSH:
[root@mdv-cluster ~]# cmsh
[mdv-cluster]% category use default
[mdv-cluster->category[default]]% set finalizescript [filename]
...
[mdv-cluster]% commit
For more information about finalize scripts, please consult the Bright Cluster Manager documentation.
The following finalize script can be set for a category or for individual nodes to generate the appropriate content:
#!/bin/sh
INTERFACES="ib0 ib1"
bits_by_netmask () {
c=0 x=0$( printf '%o' ${1//./ } )
while [ $x -gt 0 ]; do
let c+=$((x%2)) 'x>>=1'
done
echo $c ; }
tblnum=200
for interface in $INTERFACES; do
eval netmask=\$CMD_INTERFACE_${interface}_NETMASK
eval src=\$CMD_INTERFACE_${interface}_IP
tbl=$interface
IFS=. read -r i1 i2 i3 i4 <<< "$src"
IFS=. read -r m1 m2 m3 m4 <<< "$netmask"
base=`printf "%d.%d.%d.%d\n" "$((i1 & m1))" "$((i2 & m2))" "$((i3 & m3))" "$((i4 & m4))"`
bits=`bits_by_netmask $netmask`
net="$base/$bits"
echo $net dev $interface src $src table $tbl >/localdisk/etc/sysconfig/network-scripts/route-$interface
echo from $src table $tbl >/localdisk/etc/sysconfig/network-scripts/rule-$interface
if ! grep -q $tblnum /localdisk/etc/iproute2/rt_tables; then
echo $tblnum $interface >>/localdisk/etc/iproute2/rt_tables
fi
tblnum=$((tblnum+1))
done
For RHEL9:
1) chroot into the target software image of the compute nodes:
cm-chroot /cm/images/default-ubuntu2204
2) Enable NetworkManager-dispatcher
systemctl enable NetworkManager-dispatcher.service
3) add the following rule/route policy scripts:
cat > /etc/NetworkManager/dispatcher.d/10-add-policy-based-rules.sh <<EOF
#!/bin/bash
IPoIB=(`ip -br a | grep ib | awk -F" " '{print $3}' | sed -e 's/\/16//g'`)
INTERFACES=(`ip -br a | grep ib | awk -F" " '{print $1}'`)
RANGE=${#IPoIB[@]}
for itr in $(seq 0 $((RANGE-1)) )
do
if [[ `ip rule show table ${INTERFACES[$itr]}` ]]
then
ip rule show table ${INTERFACES[$itr]}
else
ip rule add `cat /etc/sysconfig/network-scripts/rule-${INTERFACES[$itr]}`
#ip rule add from ${IPoIB[$itr]} table ${INTERFACES[$itr]}
ip route add `cat /etc/sysconfig/network-scripts/route-${INTERFACES[$itr]}`
#ip route add ${IPoIB[$itr]}/16 dev ${INTERFACES[$itr]} src ${IPoIB[$itr]} table ${INTERFACES[$itr]}
fi
done
EOF
cat > /etc/NetworkManager/dispatcher.d/20-add-policy-based-routing.sh << EOF
#!/bin/bash
IPoIB=(`ip -br a | grep ib | awk -F" " '{print $3}' | sed -e 's/\/16//g'`)
INTERFACES=(`ip -br a | grep ib | awk -F" " '{print $1}'`)
RANGE=${#IPoIB[@]}
for itr in $(seq 0 $((RANGE-1)) )
do
if [[ `ip route show table ${INTERFACES[$itr]}` ]]
then
ip route show table ${INTERFACES[$itr]}
else
#ip rule add `cat /etc/sysconfig/network-scripts/rule-${INTERFACES[$itr]}`
#ip rule add from ${IPoIB[$itr]} table ${INTERFACES[$itr]}
ip route add `cat /etc/sysconfig/network-scripts/route-${INTERFACES[$itr]}`
#ip route add ${IPoIB[$itr]}/16 dev ${INTERFACES[$itr]} src ${IPoIB[$itr]} table ${INTERFACES[$itr]}
fi
done
EOF
4) The following finalize script can be set for a category or for individual nodes to generate the appropriate content:
#!/bin/sh
INTERFACES="ib0 ib1"
bits_by_netmask () {
c=0 x=0$( printf '%o' ${1//./ } )
while [ $x -gt 0 ]; do
let c+=$((x%2)) 'x>>=1'
done
echo $c ; }
tblnum=200
for interface in $INTERFACES; do
eval netmask=\$CMD_INTERFACE_${interface}_NETMASK
eval src=\$CMD_INTERFACE_${interface}_IP
tbl=$interface
IFS=. read -r i1 i2 i3 i4 <<< "$src"
IFS=. read -r m1 m2 m3 m4 <<< "$netmask"
base=`printf "%d.%d.%d.%d\n" "$((i1 & m1))" "$((i2 & m2))" "$((i3 & m3))" "$((i4 & m4))"`
bits=`bits_by_netmask $netmask`
net="$base/$bits"
echo $net dev $interface src $src table $tbl >/localdisk/etc/sysconfig/network-scripts/route-$interface
echo from $src table $tbl >/localdisk/etc/sysconfig/network-scripts/rule-$interface
if ! grep -q $tblnum /localdisk/etc/iproute2/rt_tables; then
echo $tblnum $interface >>/localdisk/etc/iproute2/rt_tables
fi
tblnum=$((tblnum+1))
done
For Ubuntu:
1) chroot into the target software image of the compute nodes:
cm-chroot /cm/images/default-ubuntu2204
2) enable networkd-dispatcher:
systemctl unmask networkd-dispatcher.service
systemctl enable networkd-dispatcher.service
3) enable systemd-networkd:
systemctl unmask systemd-networkd
systemctl enable systemd-networkd
4) create a script to write out the ip rules and routes:
cat > /etc/networkd-dispatcher/routable.d/10-policy-based-rules.sh << EOF
#!/bin/bash IPoIB=(`ip -br a | grep ib | awk -F" " '{print $3}' | awk -F"/" '{print $1}'`) INTERFACES=(`ip -br a | grep ib | awk -F" " '{print $1}'`) NETMASK=`ip -br a| grep ib | awk -F" " '{print $3}' | awk -F"/" '{print $2}' | head | uniq` tblnum=200 for ((index=0; index<${#INTERFACES[*]}; index++)) do echo "$tblnum ${INTERFACES[$index]}" >> /etc/iproute2/rt_tables if [[ `ip rule show table ${INTERFACES[$index]}` ]] then ip rule show table ${INTERFACES[$index]} else ip rule add from ${IPoIB[$index]} table ${INTERFACES[$index]} fi if [[ `ip route show table ${INTERFACES[$index]}` ]] then ip route show table ${INTERFACES[$index]} else ip route add `ipcalc ${IPoIB[$index]}/$NETMASK | grep Network | awk -F" " '{print $2}'` dev ${INTERFACES[$index]} src ${IPoIB[$index]} table ${INTERFACES[$index]} fi tblnum=$((tblnum+1)) done
EOF
5) make the script executable
chmod 755 /etc/networkd-dispatcher/routable.d/10-policy-based-rules.sh
NOTE: Pay attention to modify the INTERFACES
list appropriately in the finalize script that you set for the category. After the finalize script has been set, the nodes should be rebooted.
When they come up, you should check that the files have been properly generated on the nodes:
$ ip route show table ib0
32764: from 100.127.1.30 lookup ib0
$ ip rule show table ib0
100.127.0.0/16 dev ib0 proto kernel scope link src 100.127.1.30