Integrating NVIDIA’s Base Command Manager (BCM) with Microsoft Active Directory (AD) allows organizations to centralize user authentication and simplify access management across GPU infrastructure. In this guide, we’ll walk through how to achieve this integration using System Security Services Daemon (SSSD), a powerful open-source solution for managing authentication and identity information.
This guide applies to: BCM 10.x on Ubuntu 22.04
Prerequisites
- A functional AD domain.
- The BCM head node has been configured to use the domain controller as its primary DNS resolver.
- The system time on BCM nodes and the AD domain controller is synchronized.
- Credentials for an AD user with permission to join machines in the domain.
Step 1: Install Required Packages
BCM Head node
# apt install -y sssd-ad sssd-tools realmd adcli krb5-user
Step 2: Check DNS Resolution
# dig srv _ldap._tcp.dc._msdcs.your-domain.com
Step 3: Joining the Domain
Verify Domain Discovery
Before attempting to join the AD domain, it is essential to verify that the BCM Head Node system can correctly discover the domain and its associated services.
NOTE: Replace your-domain.com with your actual AD domain name in the command below:
# realm -v discover your-domain.com
If the configuration is correct, you should see a successful discovery output indicating that the domain was found and is reachable.
Join the Domain
You will be prompted for the password of an AD user with join permissions.
NOTE: By default, realm
will use the domain’s “Administrator” account to request to join. If you need to use another account, pass it to the tool with the -U
option
# realm join your-domain.com
Step 4: Verification Post-Domain Join
After successfully joining the BCM Head Node machine to the domain, a computer account object is automatically created within the AD domain.
To verify that the computer account was created:
- Open Active Directory Users and Computers on a domain controller.
- Navigate to the Computers container (or the specific OU if one was defined during the join).
- Look for the hostname of the BCM Head Node machine in the list. The name should match the system’s hostname.
Step 5: Verify SSSD Configuration
After joining the domain, the realm tool automatically:
- Creates the
/etc/sssd/sssd.conf
configuration file - Adds the appropriate PAM and NSS modules
- Configures the
/etc/krb5.conf
configuration file - Starts the required services
Example /etc/sssd/sssd.conf:
[sssd] domains = your-domain.com config_file_version = 2 services = nss, pam default_domain_suffix = YOURDOMAIN.COM [domain/your-domain.com] ad_domain = your-domain.com krb5_realm =YOUR-DOMAINDOMAIN.COM realmd_tags = manages-system joined-with-adcli cache_credentials = True id_provider = ad krb5_store_password_if_offline = True default_shell = /bin/bash ldap_id_mapping = True use_fully_qualified_names = True fallback_homedir = /home/%u access_provider = ad simple_allow_groups = your-ad-group1, your-ad-group2
NOTE: The sssd.conf
file must have permissions set to 0600
and be owned by root:root
. If not, SSSD will fail to start.
Tips and best practices shown in the example file above:
- ad_domain and domains values should be lowercase (e.g., `our-domain.com)
- krb5_realm and default_domain_suffix values should be in UPPERCASE (e.g., YOUR-DOMAIN.COM)
- Case mismatches can lead to Kerberos or DNS resolution errors!
- cache_credentials: When set to
True
this directive allows logins when the AD server is unreachable. - fallback_homedir: This directive configures the home directory. By default,
/home/<user>@<domain>
. For example, the AD userjohn
will have a home directory of/home/john@your-domain.com
- simple_allow_groups: This directive tells SSSD only to allow specified AD group members (users) to log in. Group names must match precisely as defined in AD and are case-sensitive. If a group name contains spaces, use backslashes to escape them (e.g., Domain\ Admins).
Once you’ve updated your SSSD configuration, apply the changes by restarting the service:
# systemctl restart sssd
Step 6: Testing Active Directory Authentication
Step 1: Create a test user in Active Directory
- Create a test user account in your Active Directory server (e.g., testuser) within the appropriate Organizational Unit (OU).
- Add this user to a group that is explicitly allowed by your
sssd.conf
configuration. For example, if using the simple_allow_groups directive, ensure the user is added to the group specified (e.g., cluster-users).
Step 2: Verify User Retrieval from AD
On your BCM node, use the getent
command to query user information from AD:
# getent passwd your-ad-username@your-domain.com
If properly configured, the command will return output similar to:
testuser@your-domain.local:*:181401106:181400513:testuser:/home/testuser:/bin/bash
This confirms that SSSD is successfully retrieving user information from AD.
Step 3: Test logon on BCM Head Node Using AD Credentials
On the BCM Head Node, switch to the login
prompt using the following command:
# login
When prompted, enter the test Active Directory username and its corresponding password (e.g., testuser@your-domain.com).
Expected Output Example:
root@head-01:~# login head-01 login: testuser@your-domain.local Password: xxxxxxxxxxxxx Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-113-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/pro To see these additional updates run: apt list --upgradable 9 additional security updates can be applied with ESM Apps. Learn more about enabling ESM Apps service at https://ubuntu.com/esm Welcome to Base Command Manager 10.0 Based on Ubuntu Jammy Jellyfish 22.04 Cluster Manager ID: #00000 Use the following commands to adjust your environment: 'module avail' - show available modules 'module add <module>' - adds a module to your environment for this session 'module initadd <module>' - configure module to be loaded at every login (Note: initadd is available only for Tcl modules) ------------------------------------------------------------------------------- Last login: Thu Jun 26 19:54:30 EDT 2025 on pts/2 bcm01@your-domain.com@head-01:~$
Step 7: Enabling BCM access for Federated AD users.
To allow a federated (Active Directory) user to interact with the NVIDIA Base Command Manager (BCM) platform, you must prepare their certificate and correctly set up the environment.
The steps below are done in the root account
Step 1: Copy the certificate to the federated user account
First, we must generate and copy the certificate to the federated user’s home directory.
To do this, use the script located at:
# /cm/local/apps/cluster-tools/other/external-user-cert.py
This is accessible once the cluster-tools module is loaded, for example
# module load cluster-tools
# external-user-cert.py <profile> <user1> [<user2> ...] --home=<home-prefix> [-g <group>]
Here's my sample
# ./external-user-cert.py admin testuser
<profile>
must be a valid profile, and the external-user-cert script supports three profiles for regular users:- readonly – View-only access (recommended for regular users)
- admin – Full cluster management access
- portal – User Portal access
<user>
must be a valid user<home-prefix>
is usually/home
<group>
is a group, such aswheel
The script will:
Generate the user’s certificate (cert.pem and cert.key), place them in the directory:
/home/testuser/.cm/
Step 2: Log in with the federated (AD) account and load CMSH
root@head-01:~# login testuser@your-domain.local password xxxxxxxx
Step 3: Check that the certificate is present
ls -l ~/.cm/cert.pem ~/.cm/cert.key
NOTE: You might notice that a certificate pair (cert.pem and cert.key) is created under the profile/.cm/ directory.
Step 4: Add CMSH module
# module add cmsh
Step 5: Run CMSH command
~$ cmsh -c "device list"
Output sample
Type Hostname (key) MAC Category IP Network Status ---------------------- ---------------- ------------------ ---------------- --------------- -------------- -------------------------------- HeadNode head-01 BC:24:11:26:B6:A1 192.168.86.4 internalnet [ UP ], health check failed PhysicalNode node001 BD:24:11:28:B7:A2 default 192.168.86.1 internalnet [ UP ], health check failed
Sources
For more information, please see the following portions of the BCM admin guide: