Categories

ID #1050

When does a node need to be restarted?

When does a node need to be restarted? Why does a node need to be restarted? Can I ignore it? How do I clear that status?

 

Can I ignore it?

Not really, unless you really know what you are doing. You can see if a node needs restarting from the device status command (alias: ds):

 

In cmsh:

bright60% device status

apc01 .................... [   UP   ] health check failed
devhp .................... [   UP   ] health check failed
node001 .................. [UP ] restart-required
node002 .................. [   UP   ] health check failed

 

Or from cmgui -> nodes[node001] -> hostname[state]: restart-required.

 

When does a node need to be restarted?
A restart-required flag is set when a commit is done on a node that changes the state of:

 category/image/ip/hostname/diskSetup/pxelabel/initialize script/finalize script/install boot record.

Similar rules apply for category and image commit.

These settings all have fields used by the node-installer.

It is possible to get false positives. For example adding a newline to a script will mark the node as restart-required.

There are however potentially many things that can differ when changes are made, and no guarantee that all settings from the new category have been applied until you reboot the node. The reason why a restart-required message is there, is to warn you that the node may be in a weird state (e.g., if moving a node from category B to a new category A, it may still be using the software image that has been set for category B).

 

Why does a node need to be restarted?

The reason for the failure is often given within parentheses:

 

bright60% device status

node060 .................. [ UP ] (eth0 changed) restart-required
node061 .................. [ UP ] (category changed) restart-required

 

Sometimes the info message gives a clue on the reason for failure:

 [bright60->device]% status node001
 node001 .................. [  DOWN  ] pingable, restart-required, health check failed

 

In which case you can investigate the reason further. Eg, check the health checks with.

 

 [bright60->device]% latesthealthdata node001
 Health Check                 Severity Value            Age (sec.) Info Message                            
 ---------------------------- -------- ---------------- ---------- ----------------------------------------
 nanchecker                   10       FAIL             1090                                               
 DeviceIsUp                   40       FAIL             10                                                 
 ssh2node                     0        PASS             1090       Not UP according to CMDaemon            
 [bright60->device]%

 

How do I clear that status?

You can clear the install-required flag without a reboot in cmsh by closing and opening the node:

 device open --reset -n node001..node100

Tags: -

Related entries:

You cannot comment on this entry