Using NVIDIA GPUs in X-application on a headless node via VNC

Contents

The following steps can be followed to enable direct rendering from an x-client (glxgears or similar) running on a headless node, using VNC with the headless X-server display. The display can then be seen on a VNC client somewhere else, such as Jupyter VNC session.

0. Preparation

Define image name for package installation and modifications:

# export $IMAGE_PATH="/cm/images/default-image"

1. Install VirtualGL packages

For the RHEL and CentOS distributions, update the software image with the following packages:

# yum install \
    -y --installroot=${IMAGE_PATH} \
    cuda-driver VirtualGL tigervnc-server

Note: for RHEL8/Centos8 the VirtualGL repository must be configured first

# wget https://virtualgl.org/pmwiki/uploads/Downloads/VirtualGL.repo \
    -O ${IMAGE_PATH}/etc/yum.repos.d/VirtualGL.repo

For Ubuntu18 you can use:

# wget https://sourceforge.net/projects/virtualgl/files/2.6.4/virtualgl_2.6.4_amd64.deb \
    -P ${IMAGE_PATH}/tmp/
# chroot ${IMAGE_PATH} \
    bash -c '\
        apt-get update \
        && apt -y install \
            /tmp/virtualgl_2.6.4_amd64.deb \
            libglu1-mesa \
            cuda-driver \
            tightvncserver \
            xserver-xorg \
            xinit \
    '

However, for Ubuntu 18 it may be better to install TigerVNC instead of TightVNC. This is because not all OpenGL features are being supported in TightVNC:

# wget https://sourceforge.net/projects/turbovnc/files/2.2.5/turbovnc_2.2.5_amd64.deb \
    -P ${IMAGE_PATH}/tmp
# chroot ${IMAGE_PATH} \
    bash -c '
        apt-get update \
        && apt install /tmp/turbovnc_2.2.5_amd64.deb
    ’

For TigerVNC, instead of running the default vncserver you would need to run /opt/TurboVNC/bin/vncserver

2. Create xorg.conf include file

Create an additional configuration file for xorg inside software image:

# cat << EOF > ${IMAGE_PATH}/etc/X11/xorg.conf.d/10-bcm-nvidia.conf
Section "Files"
    ModulePath "/cm/local/apps/cuda-driver/libs/current/lib64/xorg/modules"
    ModulePath "/lib64/xorg/modules"
EndSection
EOF

On Ubuntu 18:

# mkdir -p ${IMAGE_PATH}/etc/X11/xorg.conf.d
# cat << EOF > ${IMAGE_PATH}/etc/X11/xorg.conf.d/10-bcm-nvidia.conf
Section "Files"
    ModulePath "/cm/local/apps/cuda-driver/libs/current/lib64/xorg/modules"
    ModulePath "/usr/lib/xorg/modules"
EndSection
EOF

3. Create a systemd unit

# cat <<EOF > ${IMAGE_PATH}/etc/systemd/system/xinit.service
[Unit]
Description=xserver
Requires=multi-user.target
After=multi-user.target

[Service]
ExecStart=/usr/bin/xinit -- /usr/bin/X -nolisten tcp

[Install]
WantedBy=multi-user.target
EOF

4. Install the cuda toolkit

# yum install cuda10.2-toolkit

For Ubuntu 18:

# apt update && apt install cuda10.2-toolkit

5. Make sure GPU is available

Reboot the target node (the node which will host the X applications) and make sure GPU is visible:

# module load shared cuda10.2/toolkit
# nvidia-smi 
Wed Sep 30 18:12:15 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla M60           On   | 00000000:00:07.0 Off |                  Off |
| N/A   32C    P8    12W / 150W |      0MiB /  8129MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

6. Create xorg.conf

On a target node we need to generate a config for the xorg.conf file of the X server. To do that, the following command can be executed on a node with an NVIDIA GPU:

# module load shared cuda10.2/toolkit

Fetch information about the PCI address of the GPU:

# nvidia-xconfig --query-gpu-info
Number of GPUs: 1

GPU #0:
  Name      : Tesla M60
  UUID      : GPU-c621812a-c0e1-dfdd-bc4d-9c65264d6956
  PCI BusID : PCI:0:7:0

  Number of Display Devices: 0

Create an /etc/X11/xorg.conf file with the following command:

# nvidia-xconfig -a --allow-empty-initial-configuration --busid PCI:0:7:0 --no-connected-monitor

Where PCI:0:7:0 is an address of the GPU acquired on the previous step.

7. Edit xorg.conf

The Section “Files” must be removed from the /etc/X11/xorg.conf file

# sed -i -e '/Section "Files"/,/EndSection/d' /etc/X11/xorg.conf

8. Test xserver

# xinit

If no errors occurred, the command above should stay running. Press Ctrl-C to stop.
Errors will be written in /var/log/Xorg.0.log file

9. Start and enable the xinit service

# systemctl enable xinit.service
# systemctl start xinit.service

10. Make changes permanent

Copy the xorg.conf file back to the image to make sure parameters persists across reboots and enable systemd unit in the image:

# scp node003:/etc/X11/xorg.conf /cm/images/default-image/etc/X11/
# chroot /cm/images/default-image systemctl enable xinit.service

Usage

To use the GPU inside VNC, the user needs to run the application using the form:

$ vglrun APP

For example:

$ vglrun glxgears

In order to use a GPU other than the default (for example, if the node has multiple GPUs), the user needs to specify its index. The first GPU is specified as:

$ vglrun -display :0.0 glxgears

The second GPU is specified as:

$ vglrun -display :0.1 glxgears

Glxgears is available as a part of the glx-utils package (mesa-utils in Ubuntu 18):

# yum install glx-utils

In Ubuntu 18:

# apt install mesa-utils

Updated on October 2, 2020