1. Home
  2. General
  3. What is an MCE failure? Why is it running alongside the disk burn test?

What is an MCE failure? Why is it running alongside the disk burn test?

This article is being updated. Please be aware the content herein, not limited to version numbers and slight syntax changes, may not match the output from the most recent versions of Bright. This notation will be removed when the content has been updated.

MCE stands for Machine Check Exception, and should not be ignored.

If you see the kernel reporting these, then it is highly likely that the hardware it is running on is not functioning properly and that the vendor needs to fix something.
Most commonly, you uncover these during Bright’s burn (stress test) of the cluster.

Why is it running alongside the disk burn test?

Quite often the problem is memory-related. The mce_check burn test constantly monitors the kernel for MCE reports, which is why it runs in parallel to the disk burn test, as well as in almost all other tests. In some cases, stressing the disks will also trigger an MCE error. The exact MCE errors are logged to a file in the node’s burn spool.

Have a look in the appendix on “Burning Nodes” for more on doing burns in general.

Updated on August 17, 2020

Related Articles

Leave a Comment