The Linux Kernel ABI compatibility appears to have been broken in newer Linux kernel releases.
This appears to affect the kernels in SLES12 SP5, RHEL/CentOS 7.7 onwards, RHEL/Centos 8.1 onwards.
There may be other versions affected as well. For example: SLES 15
The issue shows up as follows when attempting to start the openibd service.
# service openibd start
Module mlx4_core belong to kernel which is not a part of ML[FAILED] skipping…
Module mlx4_ib belong to kernel which is not a part of MLNX[FAILED] skipping…
Module mlx4_core belong to kernel which is not a part of ML[FAILED] skipping…
Module mlx4_en belong to kernel which is not a part of MLNX[FAILED] skipping…
Module mlx5_core belong to kernel which is not a part of ML[FAILED] skipping…
Module mlx5_ib belong to kernel which is not a part of MLNX[FAILED] skipping…
Module mlx5_fpga_tools does not exist, skipping… [FAILED]
Module ib_umad belong to kernel which is not a part of MLNX[FAILED] skipping…
Module ib_uverbs belong to kernel which is not a part of ML[FAILED] skipping…
Module ib_ipoib belong to kernel which is not a part of MLN[FAILED]skipping…
Loading HCA driver and Access Layer: [ OK ]
Module rdma_cm belong to kernel which is not a part of MLNX[FAILED]skipping…
Module ib_ucm does not exist, skipping… [FAILED]
Module rdma_ucm belong to kernel which is not a part of MLN[FAILED]skipping…
There are two potential solutions.
Option1:
To work around this issue, the upstream Mellanox installer provides a “–add-kernel-support” flag. Unfortunately, the Bright packaged version of the Mellanox OFED doesn’t provide this functionality as it has the potential to break MPI and workload manager compatibility.
As a workaround for the Bright packages, perform the following:
1. In /etc/init.d/openibd on line 132, set FORCE=0 to FORCE=1. This causes openibd to ignore the kernel difference but relies on weak-updates.
2. Edit /etc/infiniband/openib.conf and set UCM_LOAD=no and MLX5_FPGA_LOAD=no. As most customers aren’t using Legacy cards or FPGAs, this should not be an issue.
3. Restart the openibd service.
Once complete, the Mellanox OFED modules should load as expected.
# service openibd start
Loading HCA driver and Access Layer: [ OK ]
The above changes would also need to be applied to your software images.
Option 2:
The alternative to the above steps is to use the upstream Mellanox installer with the –add-kernel-support flag.