When does a node need to be restarted?

When does a node need to be restarted? Why does a node need to be restarted? Can I ignore it? How do I clear that status?

This article is being updated. Please be aware the content herein, not limited to version numbers and slight syntax changes, may not match the output from the most recent versions of Bright. This notation will be removed when the content has been updated.

Can I ignore it?

Not really, unless you really know what you are doing. You can see if a node needs restarting from the device status command (alias: ds):

In cmsh:

bright60% device status

apc01 .................... [ UP ] health check failed devhp .................... [ UP ] health check failed node001 .................. [UP ] restart-required node002 .................. [ UP ] health check failed

Or from cmgui -> nodes[node001] -> hostname[state]: restart-required.

When does a node need to be restarted?
A restart-required flag is set when a commit is done on a node that changes the state of:

category/image/ip/hostname/diskSetup/pxelabel/initialize script/finalize script/install boot record.

Similar rules apply for category and image commit.

These settings all have fields used by the node-installer.

It is possible to get false positives. For example adding a newline to a script will mark the node as restart-required.

There are however potentially many things that can differ when changes are made, and no guarantee that all settings from the new category have been applied until you reboot the node. The reason why a restart-required message is there, is to warn you that the node may be in a weird state (e.g., if moving a node from category B to a new category A, it may still be using the software image that has been set for category B).

Why does a node need to be restarted?

The reason for the failure is often given within parentheses:

bright60% device status

node060 .................. [ UP ] (eth0 changed) restart-required node061 .................. [ UP ] (category changed) restart-required

Sometimes the info message gives a clue on the reason for failure:

[bright60->device]% status node001 node001 .................. [ DOWN ] pingable, restart-required, health check failed

In which case you can investigate the reason further. Eg, check the health checks with.

[bright60->device]% latesthealthdata node001 Health Check Severity Value Age (sec.) Info Message ---------------------------- -------- ---------------- ---------- ---------------------------------------- nanchecker 10 FAIL 1090 DeviceIsUp 40 FAIL 10 ssh2node 0 PASS 1090 Not UP according to CMDaemon [bright60->device]%

How do I clear that status?

You can clear the install-required flag without a reboot in cmsh by closing and opening the node:

device open --reset -n node001..node100

Updated on August 24, 2020

Tagged: node restart

Related Articles

Leave a Comment Cancel