1. Home
  2. Cluster Monitoring
  3. Why is the kipmi0 process consuming so much CPU time?

Why is the kipmi0 process consuming so much CPU time?

This is known to occur with some BMC controllers.
It can occur for several reasons including the following:

kipmi0 is a kernel helper thread involved in handling IPMI interfaces.
Within IPMI, there are several standard classes of interfaces. Some of
these classes, like KCS (Keyboard Control Style) and SMIC (System
Management Interface Chip) do not use interrupt requests (IRQs) to
signal changes, and thus require polling to obtain command results. The
kipmiN kernel helper threads perform this polling. Thus, it is normal
for these threads to consume significant CPU time while an IPMI
operation is in progress. In this case, there is a problem in the
interaction between the driver and the hardware/firmware which leads the
driver to believe that an operation is still in progress, causing the
high CPU load to continue until the system is rebooted.

You can find a detail analysis in the Linux kernel documentation:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/IPMI.txt?id=HEAD
Fortunately, the kipmiN kernel helper threads are executed at low
priority (so as not to hog the CPU) and  this should not cause problems
under normal usage scenarios. As the Linux kernel implements IPMI
support in terms of the interfaces classes rather than in terms of
individual IPMI chipsets, this issue cannot be worked around by the
kernel effectively and should be addressed by the hardware vendor instead.

[root@node001 ~]# ps   -l ax | grep ipmi
1 S     0   339     2  0  99  19 -     0 ipmi_t ?          9:16 [kipmi0]
Since the ipmi_si module was built in to the kernel in RHEL6, the
following can be appended to the end of the kernel line in/etc/grub.conf:

ipmi_si.kipmid_max_busy_us=<time_in_microseconds>

The kipmid_max_busy_us option sets the maximum amount of time, in
microseconds, that kipmid will spin before sleeping for a tick. This
value sets a balance between performance and CPU waste and needs to be
tuned to your needs.

Unfortunately there is no “catch-all” value that can be recommended
here. Test and iteration is the best way to go about determining the
best value for the environment. For reference, you can start with a
value of:

500 microseconds = 0.0005 seconds
For a given software image you can modify the kernel parameters as follows:

[root@demo ~]# cmsh
[demo]% softwareimage
[demo->softwareimage]% use default-image
[demo->softwareimage[default-image]]% get  kernelparameters
rdblacklist=nouveau
[demo->softwareimage[default-image]]% set kernelparameters
"rdblacklist=nouveau ipmi_si.kipmid_max_busy_us=500"
[demo->softwareimage*[default-image*]]% commit
[demo->softwareimage[default-image]]% get kernelparameters
rdblacklist=nouveau ipmi_si.kipmid_max_busy_us=500
In the example, above the software image is called “default-image”. You
might have to repeat the above procedure for multiple software images.

Updated on August 14, 2020

Was this article helpful?

Related Articles

Leave a Comment