Categories

ID #1027

Why do mails flood in after updating my software image?

Short answer

Because there is probably some mail generated in the mail spool in the software image.

 

When, and on which versions does it happen?

 

This issue occurs in rare cases when installing some packages from non-Bright repositories on older Bright Cluster Manager versions.

 

For Bright Cluster Manager version 5.2 and above, a health check (chrootprocess) alerts the administrator about the problem via cmgui or cmsh.

 

An autofix based on this health check is in release 5.2-32 and higher. This means that this rare problem is even more unlikely to show up from this release onwards, although it could still occur since not every situation can be anticipated.

 

For Bright Cluster Manager 5.1 and below, there is no healthcheck detection in place, so that the problem can show up without any alert message.

 

How does it happen?

 

The first warning an administrator may get is a flood of mails from the nodes.

 

In such a case, what most likely is happening is that the software image stored on the head node is running a chrooted crond process (or *was* running a chrooted crond process in the case of the head node having been rebooted after that process runs). This can happen if the crond RPM was updated inside the image, and the post-installation scriptlet of the RPM then issues a 'service crond restart'.

If the head node has not yet been rebooted after the image update, then it is possible to see two crond processes running using 'ps auxw | grep crond'. One of them is the regular head crond and the other is the unwanted crond  running inside of the image. This can be checked by running

 ls -ld /proc/$pid/root

where $pid is the PID of either crond. One crond will have its root directory set to /, the other to the software image.

 

The crond running in the image is the one that has been generating mail inside of the software image. Of course there is no mail server running inside of the software image of the head node, so the email does not get delivered. It just grows inside the image while the chrooted crond runs. However, when a node reboots, all the email in the mail spool inside of the software image is transported to the node along with software image during typical node provisioning. Since a mail server does run on the nodes, the mail spool will be processed, and email floods in each time the nodes reboot.

 

The administrator can see the mails in the mail spool of the default software image default-image by running:


  chroot /cm/images/default-image mailq

 

How can it be fixed?

To fix it:

1. The crond that is running in the image should be killed.

 

2. the administrator should make sure that /cm/images/$image/var/spool/postfix/* does not contain any files. A command that does this for the default image default-image is:


  find /cm/images/default-image/var/spool/postfix -type f -exec rm -f {} \;

As a test, it can be verified that no new mail floods in after rebooting nodes.

Tags: -

Related entries:

You cannot comment on this entry