How do I drain a node or node group with cmsh from command line?

Contents

Purpose

Setting a node to the CLOSED state typically removes an unhealthy node from the cluster management system. The node can still be UP and display UP/CLOSED.

However, the node can continue running workload jobs in this state, since workload managers run independently of CMDaemon.

If the workload manager is still running, the jobs themselves are still handled by the workload manager, even if CMDaemon is no longer aware of the node state until the node is reopened. For this reason, draining a node is often done before closing a node.

Other common purposes for draining include:

Planned maintenance
Hardware troubleshooting
Preventing new jobs during system changes
Isolating problematic nodes

Steps

Enter device mode in cmsh.
```
# cmsh
% device
```
Select the node that you want to drain via the use command:
```
% use <node>
% drain
```
Alternatively, rather than selecting an individual node, you can drain a group of nodes:
```
% drain -n <nodes>
```
You can also drain a node category if you need to drain a set of nodes:
```
% drain -c <category>
```
And you can drain a configuration overlay, which will drain all nodes in that overlay:
```
% drain -e <overlay>
```
After work is completed on the node, or nodes, the node can then be undrained by running the command:
```
% undrain
```

This command uses the same options as the drain command:

% undrain -n <nodes>
% undrain -c <category>
% undrain -e <overlay>

Additional Details

You can see a complete list of available options for draining nodes by running the following command on the active head node:

# cmsh -c "device help drain"

For example:

Name: drain - Drain jobs (not data) on a set of nodes 

Options:
    -n, --nodes <node>

    -g, --group <group>
    Include all nodes that belong to the node group, e.g. testnodes or
    test01,test03

    -c, --category <category>
    Include all nodes that belong to the category, e.g. default or default,gpu

Examples:
    drain Drain the current node
    drain node001 Drain node001
    drain -r rack01 Drain all nodes in rack01
    drain --setactions reboot Drain the current node, and append reboot when all jobs are completed
    drain --appendactions reboot Append reboot to existing drain actions for the current node

Updated on September 5, 2025

Purpose

Steps

Additional Details

Related Articles

Leave a Comment Cancel