In this article we are going to demonstrate a procedure to run SLURM jobs in Singularity containers by Jupyter on a Bright 9.2 Ubuntu 20.04 cluster. We assume that Jupyter, SLURM and Singularity have already been set up on the target cluster by following NVIDIA Bright Cluster Manager manuals.
The sample procedure in this article involves building and pushing a Docker image including a Jupyter kernel to a local registry. The cm-docker-setup and cm-container-registry-setup are the commands to set up Docker and its registry on a Bright managed cluster. Details on setting up Docker and its registry have been covered in the Bright containerization manual.
Creating an image with Jupyter kernel
Compose a Docker file to build an image with Jupyter kernel. Here, jupyter/datascience-notebook has been used as the base image.
# mkdir custom-datascience-notebook # cd custom-datascience-notebook # vim Dockerfile FROM jupyter/datascience-notebook RUN pip install cm-jupyter-eg-kernel-wlm==3.0.0
Build the image and push it to the Docker registry.
# module load docker # docker build -t master:5000/custom-datascience-notebook:latest . # docker push master:5000/custom-datascience-notebook:latest
Configure Singularity to allow pulling image from the registry
If the certificate used in the registry is not from a known Certificate Authority, then Singularity can show error at the time of pulling an image. As in this example setup the Docker registry is hosted in the head node, hence Singularity is being configured to allow pulling images from the registry with hostname ‘master’ by creating the directory /etc/containers and adding relevant entries to /etc/containers/registries.conf in the image for the Singularity nodes. In this example setup, the Singularity compute nodes are using the default-image.
# cm-chroot-sw-img /cm/images/default-image # mkdir /etc/containers # vim /etc/containers/registries.conf [[registry]] insecure=true blocked=false location="master"
To sync the changes in the image to the nodes, the cmsh command imageupdate can be used.
# cmsh -c "device imageupdate -m default-image -w"
To check if the pushed image can be pulled properly, the following command can be run on a node with cm-singularity installed.
$ singularity pull custom-datascience-notebook.sif docker://master:5000/custom-datascience-notebook:latest
Creating kernel template
A Jupyter kernel template can be created from the already existing kernel template for SLURM-Pyxis.
# module load jupyter # cd $JUPYTER_KERNEL_TEMPLATES_DIR # cp -prv jupyter-eg-kernel-slurm-pyxis-py39 jupyter-eg-kernel-slurm-singularity-py39 # cd jupyter-eg-kernel-slurm-singularity-py39/
Then necessary modifications are required in the meta.yaml and kernel.json.j2 files of the newly created kernel template. In the meta.yaml, the display_name, features, and modules, images and job_prefix parameters can be updated as follows (lines with dots represents the skipped lines; the sections in the file except the ones shown below can be kept unchanged):
--- display_name: "Python 3.9 via SLURM and Singularity" features: ["slurm"] parameters: modules: type: list definition: getter: uri path: /kernelcreator/envmodules display_name: "Modules loaded for spawned job" default: - shared - slurm - singularity image: type: list definition: getter: static default: - "docker://master:5000/custom-datascience-notebook:latest" values: - "docker://master:5000/custom-datascience-notebook:latest" display_name: "Image to run" limits: max_len: 1 min_len: 1 . . . job_prefix: type: str definition: getter: static default: jupyter-eg-kernel-slurm-singularity-py39 display_name: "Prefix of the job name" display_name: type: str definition: getter: shell exec: - echo "Python 3.9 via SLURM and Singularity $(date +%y%m%d%H%M%S)" display_name: "Display name of the kernel" . . .
In the kernel.json.j2 file, the line with the srun command can be replaced with a line with singularity command as follows (keep the indentation similar to the srun line being replaced):
"singularity exec --bind {{ homedir }}:{{ homedir }} --pwd {{homedir}} {{ image[0] }} {kernel_cmd}"
Launch a Jupyter kernel using the template
Now a kernel template named “Python 3.9 via SLURM and Singularity” should be shown in the Bright extension section in the JupyterLab web interface which can be used to launch kernels. Please note, the first attempt to launch a notebook with a kernel created by the template may take a while in case the image used for the Singularity container is large in size, if a gateway timeout message is shown after launching the notebook, wait for around half an hour and try again. The timeout for starting Jupyter kernels can also be increased if necessary, the KB article with title Raising timeouts to start Jupyter kernels can be helpful in this regard.
When a notebook with a kernel created using the “Python 3.9 via SLURM and Singularity” template will be launched, the Jupyter kernel should run in a Singularity container on a node with slurmclient role.