When does a node need to be restarted? Why does a node need to be restarted? Can I ignore it? How do I clear that status?
This article is being updated. Please be aware the content herein, not limited to version numbers and slight syntax changes, may not match the output from the most recent versions of Bright. This notation will be removed when the content has been updated.
Can I ignore it?
Not really, unless you really know what you are doing. You can see if a node needs restarting from the device status command (alias: ds):
In cmsh:
bright60% device status
apc01 .................... [ UP ] health check failed
devhp .................... [ UP ] health check failed
node001 .................. [UP ] restart-required
node002 .................. [ UP ] health check failed
Or from cmgui -> nodes[node001] -> hostname[state]: restart-required.
When does a node need to be restarted?
A restart-required flag is set when a commit is done on a node that changes the state of:
category/image/ip/hostname/diskSetup/pxelabel/initialize script/finalize script/install boot record.
Similar rules apply for category and image commit.
These settings all have fields used by the node-installer.
It is possible to get false positives. For example adding a newline to a script will mark the node as restart-required.
There are however potentially many things that can differ when changes are made, and no guarantee that all settings from the new category have been applied until you reboot the node. The reason why a restart-required message is there, is to warn you that the node may be in a weird state (e.g., if moving a node from category B to a new category A, it may still be using the software image that has been set for category B).
Why does a node need to be restarted?
The reason for the failure is often given within parentheses:
bright60% device status
node060 .................. [ UP ] (eth0 changed) restart-required
node061 .................. [ UP ] (category changed) restart-required
Sometimes the info message gives a clue on the reason for failure:
[bright60->device]% status node001
node001 .................. [ DOWN ] pingable, restart-required, health check failed
In which case you can investigate the reason further. Eg, check the health checks with.
[bright60->device]% latesthealthdata node001
Health Check Severity Value Age (sec.) Info Message
---------------------------- -------- ---------------- ---------- ----------------------------------------
nanchecker 10 FAIL 1090
DeviceIsUp 40 FAIL 10
ssh2node 0 PASS 1090 Not UP according to CMDaemon
[bright60->device]%
How do I clear that status?
You can clear the install-required flag without a reboot in cmsh by closing and opening the node: device open --reset -n node001..node100