1. Home
  2. Containers
  3. How to run SLURM jobs in Singularity containers via Jupyter
  1. Home
  2. Machine Learning
  3. How to run SLURM jobs in Singularity containers via Jupyter
  1. Home
  2. Third Party Software
  3. How to run SLURM jobs in Singularity containers via Jupyter
  1. Home
  2. Workload Management
  3. How to run SLURM jobs in Singularity containers via Jupyter

How to run SLURM jobs in Singularity containers via Jupyter

In this article we are going to demonstrate a procedure to run SLURM jobs in Singularity containers by Jupyter on a Bright 9.2 Ubuntu 20.04 cluster. We assume that Jupyter, SLURM and Singularity have already been set up on the target cluster by following NVIDIA Bright Cluster Manager manuals.

The sample procedure in this article involves building and pushing a Docker image including a Jupyter kernel to a local registry. The cm-docker-setup and cm-container-registry-setup are the commands to set up Docker and its registry on a Bright managed cluster. Details on setting up Docker and its registry have been covered in the Bright containerization manual.

Creating an image with Jupyter kernel

Compose a Docker file to build an image with Jupyter kernel. Here, jupyter/datascience-notebook has been used as the base image.

# mkdir custom-datascience-notebook
# cd custom-datascience-notebook
# vim Dockerfile
FROM jupyter/datascience-notebook
RUN pip install cm-jupyter-eg-kernel-wlm==3.0.0

Build the image and push it to the Docker registry.

# module load docker
# docker build -t master:5000/custom-datascience-notebook:latest .
# docker push master:5000/custom-datascience-notebook:latest

Configure Singularity to allow pulling image from the registry

If the certificate used in the registry is not from a known Certificate Authority, then Singularity can show error at the time of pulling an image. As in this example setup the Docker registry is hosted in the head node, hence Singularity is being configured to allow pulling images from the registry with hostname ‘master’ by creating the directory /etc/containers and adding relevant entries to /etc/containers/registries.conf in the image for the Singularity nodes. In this example setup, the Singularity compute nodes are using the default-image.

# cm-chroot-sw-img /cm/images/default-image
# mkdir /etc/containers
# vim /etc/containers/registries.conf
[[registry]]
insecure=true
blocked=false
location="master"

To sync the changes in the image to the nodes, the cmsh command imageupdate can be used.

# cmsh -c "device imageupdate -m default-image -w"

To check if the pushed image can be pulled properly, the following command can be run on a node with cm-singularity installed.

$ singularity pull custom-datascience-notebook.sif docker://master:5000/custom-datascience-notebook:latest

Creating kernel template

A Jupyter kernel template can be created from the already existing kernel template for SLURM-Pyxis.

# module load jupyter
# cd $JUPYTER_KERNEL_TEMPLATES_DIR
# cp -prv jupyter-eg-kernel-slurm-pyxis-py39 jupyter-eg-kernel-slurm-singularity-py39
# cd jupyter-eg-kernel-slurm-singularity-py39/

Then necessary modifications are required in the meta.yaml and kernel.json.j2 files of the newly created kernel template. In the meta.yaml, the display_name, features, and modules, images and job_prefix parameters can be updated as follows (lines with dots represents the skipped lines; the sections in the file except the ones shown below can be kept unchanged):

---
display_name: "Python 3.9 via SLURM and Singularity"
features: ["slurm"]
parameters:
  modules:
    type: list
    definition:
      getter: uri
      path: /kernelcreator/envmodules
      display_name: "Modules loaded for spawned job"
      default:
      - shared
      - slurm
      - singularity
  image:
    type: list
    definition:
      getter: static
      default:
              - "docker://master:5000/custom-datascience-notebook:latest"
      values:
              - "docker://master:5000/custom-datascience-notebook:latest"
      display_name: "Image to run"
    limits:
      max_len: 1
      min_len: 1
.
.
.
  job_prefix:
    type: str
    definition:
      getter: static
      default: jupyter-eg-kernel-slurm-singularity-py39
      display_name: "Prefix of the job name"
  display_name:
    type: str
    definition:
      getter: shell
      exec:
      - echo "Python 3.9 via SLURM and Singularity $(date +%y%m%d%H%M%S)"
      display_name: "Display name of the kernel"
.
.
.

In the kernel.json.j2 file, the line with the srun command can be replaced with a line with singularity command as follows (keep the indentation similar to the srun line being replaced):

"singularity exec --bind {{ homedir }}:{{ homedir }} --pwd {{homedir}} {{ image[0] }} {kernel_cmd}"

Launch a Jupyter kernel using the template

Now a kernel template named “Python 3.9 via SLURM and Singularity” should be shown in the Bright extension section in the JupyterLab web interface which can be used to launch kernels. Please note, the first attempt to launch a notebook with a kernel created by the template may take a while in case the image used for the Singularity container is large in size, if a gateway timeout message is shown after launching the notebook, wait for around half an hour and try again. The timeout for starting Jupyter kernels can also be increased if necessary, the KB article with title Raising timeouts to start Jupyter kernels can be helpful in this regard.

When a notebook with a kernel created using the “Python 3.9 via SLURM and Singularity” template will be launched, the Jupyter kernel should run in a Singularity container on a node with slurmclient role. 

Updated on April 19, 2023

Related Articles

Leave a Comment