How do I create a multi-tenant user environment in BCM?

Contents

Purpose

By default, a Bright cluster provides a multi-user environment rather than a multi-tenant environment. In the context of a cluster, the difference between multi-user and multi-tenant systems is the user’s visibility into an environment.

Multi-User System

In a multi-user system, all users log in to the same login nodes and submit their jobs to the same workload management (WLM) system. Users may have access to directories containing shared data.

In a multi-user system, all computational resources are typically shared through a single WLM system instance.

Multi-user systems tend to work well when all users belong to the same organization, and there is no need to isolate groups of people within the organization from each other.

Multi-Tenant System

In a multi-tenant system, users may come from different organizations or groups within an organization that should be isolated from each other. Users may even come from organizations that are direct competitors.

In such a scenario, users who belong to one tenant must have no visibility over what users from another tenant are doing on the system. Another use case where a multi-tenant system is useful is when certain types of workload (e.g., classified versus non-classified workloads) should be kept strictly isolated.

In a multi-tenant system, computational resources are typically partitioned, and each partition is dedicated to one particular tenant. Users typically belong to a single tenant (although it is possible that they may belong to multiple tenants). Each partition of the cluster typically runs its own WLM system instance, so users have no visibility into jobs that are being run by other tenants.

Multi-Tenant Environment in BCM

BCM can be used to build a multi-tenant user environment where a single administrator or group of administrators manages the entire cluster, but where each user only has visibility into what happens within the partition of the cluster that belongs to their tenant. Administrators can scale up or down individual partitions by assigning or taking away computational resources.

If more isolation is required, BCM also provides Cluster-on-Demand features that allow groups of users (e.g., tenants) to have their own, isolated cluster. Such a cluster can be hosted inside AWS, Azure, OpenStack, or VMware (as of Bright version 9.1). For more information, please consult the latest Cloudbursting Manual from the documentation page. The rest of this article will focus on creating a multi-tenant user environment within a single Bright cluster.

Establishing a Multi-Tenant Environment in BCM

BCM can run multiple WLM system instances within the same cluster. The cm-wlm-setup utility is used to create a new WLM system instance.

WLM Instance Management

Each WLM instance will have a unique name. Several configuration overlays will be created for each workload management system instance. A configuration overlay is a construct in BCM that binds roles to individual nodes or categories of nodes. A role causes a node to fulfill a particular task in the cluster (e.g., a provisioning node, monitoring node, WLM system client, or WLM system server). For example:

# cmsh
% device list -f hostname,category,status
hostname (key)       category             status              
-------------------- -------------------- --------------------
fire                 login                [   UP   ]
mdv-bigcluster                            [   UP   ]          
node001              default              [   UP   ]          
node002              default              [   UP   ]          
node003              default              [   UP   ]          
node004              default              [   UP   ]          
node005              default              [   UP   ]          
water                login                [   UP   ]

% configurationoverlay 
% list
Name (key)          Priority   All head nodes Nodes                  Categories       Roles           
------------------- ---------- -------------- ---------------------- ---------------- ----------------
slurm-accounting    500        yes                                                    slurmaccounting 
slurm-fire-client   500        no             node003..node005                        slurmclient     
slurm-fire-server   500        no             fire                                    slurmserver     
slurm-fire-submit   500        no             fire,node003..node005                   slurmsubmit     
slurm-water-client  500        no             node001,node002                         slurmclient     
slurm-water-server  500        no             water                                   slurmserver     
slurm-water-submit  500        no             water,node001,node002                   slurmsubmit

In a multi-tenant system, creating a dedicated login node for each partition is advisable rather than having all users use the same set of login nodes (as with a multi-user system) or having users log into the head node.

As seen above, we have defined two partitions: water and fire. Each partition has its login node (namely, nodes with the names water and fire).

WLM Instance Member Management

When creating user accounts, making users members of a tenant-specific group is a good idea.

# cmsh
% user list
Name (key)       ID (key)         Primary group    Secondary groups
---------------- ---------------- ---------------- ----------------
alice            1001             alice            water                
bob              1002             bob              water                
charlie          1003             charlie          fire                                        
donna            1004             donna            fire                
ernie            1005             ernie            fire

This membership can then be used to restrict access to tenant-specific login nodes.

For example, to limit access to the water and fire login nodes:

# cd /cm/images/default-image
# mkdir -p cm/conf/node/{water,fire}/etc/security
# mkdir -p cm/conf/node/{water,fire}/etc/pam.d
# cp etc/pam.d/{system-auth,password-auth} cm/conf/node/water/etc/pam.d
# cp etc/pam.d/{system-auth,password-auth} cm/conf/node/fire/etc/pam.d
# cp etc/security/access.conf cm/conf/node/fire/etc/security
# cp etc/security/access.conf cm/conf/node/water/etc/security
# cd cm/conf/node
# echo +:water:ALL  >> water/etc/security/access.conf 
# echo +:root:ALL  >> water/etc/security/access.conf 
# echo -:ALL:ALL  >> water/etc/security/access.conf
# echo +:fire:ALL  >> fire/etc/security/access.conf 
# echo +:root:ALL  >> fire/etc/security/access.conf 
# echo -:ALL:ALL  >> fire/etc/security/access.conf
# echo account required pam_access.so >> fire/etc/pam.d/system-auth
# echo account required pam_access.so >> water/etc/pam.d/system-auth
# echo account required pam_access.so >> fire/etc/pam.d/password-auth
# echo account required pam_access.so >> water/etc/pam.d/password-auth

Then, to prevent users from logging in directly to compute nodes, we can use the cmsh usernodelogin setting. This setting restricts direct user logins from outside the WLM:

# cmsh
% category use default
% set usernodelogin never
% commit

To prevent users who are not in the admin group from logging into the head node, we can add additional configuration:

# echo account required pam_access.so >> /etc/pam.d/system-auth
# echo account required pam_access.so >> /etc/pam.d/password-auth
# echo +:admin:ALL  >> /etc/security/access.conf 
# echo +:root:ALL  >> /etc/security/access.conf 
# echo -:ALL:ALL  >> /etc/security/access.conf

We have created a setup in which ordinary users cannot log into the head node or compute nodes. Instead, they must log in to the login node for their cluster partition.

WLM Instance Job Execution

From this login node, users may submit jobs that will be executed on nodes assigned to the cluster partition belonging to the user’s tenant.

Example session for user alice:

[alice@water ~]$ module load slurm
[alice@water ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq*        up   infinite      2   idle node[001-002]
[alice@water ~]$ srun hostname
node001

Example session for user ernie:

[ernie@fire ~]$ module load slurm
[ernie@fire ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq*        up   infinite      3   idle node[003-005]
[ernie@fire ~]$ srun hostname
node003

WLM Instance Node Management

Nodes can be allocated or de-allocated from partitions by adding or removing them from the relevant configuration overlays. To move nodes from one partition to another, the movenodes command in cmsh‘s configurationoverlay mode is useful:

% list | grep client
slurm-fire-client   500        no             node003..node005                        slurmclient     
slurm-water-client  500        no             node001,node002                         slurmclient     
% movenodes slurm-fire-client slurm-water-client -n node003..node004
% movenodes slurm-fire-submit slurm-water-client -n node003..node004
*% list | grep client
slurm-fire-client   500        no             node005                                 slurmclient     
slurm-water-client  500        no             node001..node004                        slurmclient     
*% commit

This now gives us the following setup:

# ssh root@water "module load slurm; sinfo"
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq*        up   infinite      4   idle node[001-004]

# ssh root@fire "module load slurm; sinfo"
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq*        up   infinite      1   idle node005

NOTE: It is a good idea to perform a drain operation on nodes before moving them from one cluster partition to another. This will prevent running jobs from crashing.

Depending on the level of isolation needed, it may be desirable to place nodes assigned to a particular partition into a different category. This would also allow nodes to mount different external storage depending on which partition they belong to. The downside of this approach is that it will require nodes to be rebooted after being moved to a different cluster partition.

When moving nodes between partitions, it may be a good idea to re-image the node from scratch to ensure there are no leftovers on the file system anywhere (e.g., in /scratch, /tmp, or /data). This can be done by setting the nextinstallmode property of the node to FULL, and then rebooting the node.

Updated on August 20, 2025