The following steps can be followed to enable direct rendering from an x-client (glxgears or similar) running on a headless node, using VNC with the headless X-server display. The display can then be seen on a VNC client somewhere else, such as Jupyter VNC session.
0. Preparation
Define image name for package installation and modifications:
# export $IMAGE_PATH="/cm/images/default-image"
1. Install VirtualGL packages
For the RHEL and CentOS distributions, update the software image with the following packages:
# yum install \
-y --installroot=${IMAGE_PATH} \
cuda-driver VirtualGL tigervnc-server
Note: for RHEL8/Centos8 the VirtualGL repository must be configured first
# wget https://virtualgl.org/pmwiki/uploads/Downloads/VirtualGL.repo \
-O ${IMAGE_PATH}/etc/yum.repos.d/VirtualGL.repo
For Ubuntu18 you can use:
# wget https://sourceforge.net/projects/virtualgl/files/2.6.4/virtualgl_2.6.4_amd64.deb \
-P ${IMAGE_PATH}/tmp/
# chroot ${IMAGE_PATH} \
bash -c '\
apt-get update \
&& apt -y install \
/tmp/virtualgl_2.6.4_amd64.deb \
libglu1-mesa \
cuda-driver \
tightvncserver \
xserver-xorg \
xinit \
'
However, for Ubuntu 18 it may be better to install TigerVNC instead of TightVNC. This is because not all OpenGL features are being supported in TightVNC:
# wget https://sourceforge.net/projects/turbovnc/files/2.2.5/turbovnc_2.2.5_amd64.deb \
-P ${IMAGE_PATH}/tmp
# chroot ${IMAGE_PATH} \
bash -c '
apt-get update \
&& apt install /tmp/turbovnc_2.2.5_amd64.deb
’
For TigerVNC, instead of running the default vncserver
you would need to run /opt/TurboVNC/bin/vncserver
2. Create xorg.conf include file
Create an additional configuration file for xorg inside software image:
# cat << EOF > ${IMAGE_PATH}/etc/X11/xorg.conf.d/10-bcm-nvidia.conf
Section "Files"
ModulePath "/cm/local/apps/cuda-driver/libs/current/lib64/xorg/modules"
ModulePath "/lib64/xorg/modules"
EndSection
EOF
On Ubuntu 18:
# mkdir -p ${IMAGE_PATH}/etc/X11/xorg.conf.d
# cat << EOF > ${IMAGE_PATH}/etc/X11/xorg.conf.d/10-bcm-nvidia.conf
Section "Files"
ModulePath "/cm/local/apps/cuda-driver/libs/current/lib64/xorg/modules"
ModulePath "/usr/lib/xorg/modules"
EndSection
EOF
3. Create a systemd unit
# cat <<EOF > ${IMAGE_PATH}/etc/systemd/system/xinit.service
[Unit]
Description=xserver
Requires=multi-user.target
After=multi-user.target
[Service]
ExecStart=/usr/bin/xinit -- /usr/bin/X -nolisten tcp
[Install]
WantedBy=multi-user.target
EOF
4. Install the cuda toolkit
# yum install cuda10.2-toolkit
For Ubuntu 18:
# apt update && apt install cuda10.2-toolkit
5. Make sure GPU is available
Reboot the target node (the node which will host the X applications) and make sure GPU is visible:
# module load shared cuda10.2/toolkit
# nvidia-smi
Wed Sep 30 18:12:15 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:00:07.0 Off | Off |
| N/A 32C P8 12W / 150W | 0MiB / 8129MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
6. Create xorg.conf
On a target node we need to generate a config for the xorg.conf file of the X server. To do that, the following command can be executed on a node with an NVIDIA GPU:
# module load shared cuda10.2/toolkit
Fetch information about the PCI address of the GPU:
# nvidia-xconfig --query-gpu-info
Number of GPUs: 1
GPU #0:
Name : Tesla M60
UUID : GPU-c621812a-c0e1-dfdd-bc4d-9c65264d6956
PCI BusID : PCI:0:7:0
Number of Display Devices: 0
Create an /etc/X11/xorg.conf file with the following command:
# nvidia-xconfig -a --allow-empty-initial-configuration --busid PCI:0:7:0 --no-connected-monitor
Where PCI:0:7:0 is an address of the GPU acquired on the previous step.
7. Edit xorg.conf
The Section “Files”
must be removed from the /etc/X11/xorg.conf
file
# sed -i -e '/Section "Files"/,/EndSection/d' /etc/X11/xorg.conf
8. Test xserver
# xinit
If no errors occurred, the command above should stay running. Press Ctrl-C to stop.
Errors will be written in /var/log/Xorg.0.log
file
9. Start and enable the xinit service
# systemctl enable xinit.service
# systemctl start xinit.service
10. Make changes permanent
Copy the xorg.conf
file back to the image to make sure parameters persists across reboots and enable systemd unit in the image:
# scp node003:/etc/X11/xorg.conf /cm/images/default-image/etc/X11/
# chroot /cm/images/default-image systemctl enable xinit.service
Usage
To use the GPU inside VNC, the user needs to run the application using the form:
$ vglrun APP
For example:
$ vglrun glxgears
In order to use a GPU other than the default (for example, if the node has multiple GPUs), the user needs to specify its index. The first GPU is specified as:
$ vglrun -display :0.0 glxgears
The second GPU is specified as:
$ vglrun -display :0.1 glxgears
Glxgears is available as a part of the glx-utils package (mesa-utils in Ubuntu 18):
# yum install glx-utils
In Ubuntu 18:
# apt install mesa-utils