This is known to occur with some BMC controllers. It can occur for several reasons including the following:
kipmi0 is a kernel helper thread involved in handling IPMI interfaces. Within IPMI, there are several standard classes of interfaces. Some of these classes, like KCS (Keyboard Control Style) and SMIC (System Management Interface Chip) do not use interrupt requests (IRQs) to
signal changes, and thus require polling to obtain command results. The kipmiN kernel helper threads perform this polling. Thus, it is normal for these threads to consume significant CPU time while an IPMI operation is in progress. In this case, there is a problem in the interaction between the driver and the hardware/firmware which leads the driver to believe that an operation is still in progress, causing the high CPU load to continue until the system is rebooted.
You can find a detail analysis in the Linux kernel documentation:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/IPMI.txt?id=HEAD
Fortunately, the kipmiN kernel helper threads are executed at low priority (so as not to hog the CPU) and this should not cause problems under normal usage scenarios. As the Linux kernel implements IPMI support in terms of the interfaces classes rather than in terms of individual IPMI chipsets, this issue cannot be worked around by the kernel effectively and should be addressed by the hardware vendor instead.
[root@node001 ~]# ps -l ax | grep ipmi 1 S 0 339 2 0 99 19 - 0 ipmi_t ? 9:16 [kipmi0]
Since the ipmi_si module was built in to the kernel in RHEL6, the following can be appended to the end of the kernel line in/etc/grub.conf
:
ipmi_si.kipmid_max_busy_us=<time_in_microseconds>
The kipmid_max_busy_us option sets the maximum amount of time, in microseconds, that kipmid will spin before sleeping for a tick. This value sets a balance between performance and CPU waste and needs to be tuned to your needs.
Unfortunately there is no “catch-all” value that can be recommended here. Test and iteration is the best way to go about determining the best value for the environment. For reference, you can start with a value of:
500 microseconds = 0.0005 seconds
For a given software image you can modify the kernel parameters as follows:
[root@demo ~]# cmsh
[demo]% softwareimage
[demo->softwareimage]% use default-image
[demo->softwareimage[default-image]]% get kernelparameters
rdblacklist=nouveau
[demo->softwareimage[default-image]]% set kernelparameters
"rdblacklist=nouveau ipmi_si.kipmid_max_busy_us=500"
[demo->softwareimage*[default-image*]]% commit
[demo->softwareimage[default-image]]% get kernelparameters
rdblacklist=nouveau ipmi_si.kipmid_max_busy_us=500
In the example, above the software image is called “default-image”. You might have to repeat the above procedure for multiple software images.