How Do I Create Docker images to use NVIDIA GPUs with Spark and XGBoost via RAPIDS?

Contents

The steps described in this page can be followed to build a Docker image that is suitable for running distributed Spark applications using XGBoost and leveraging RAPIDS to take advantage of NVIDIA GPUs.

A Python application requiring this Docker image is provided by Bright as a Jupyter notebook. It is distributed with the cm-jupyter package and can be found under /cm/shared/examples/jupyter/notebooks/.

1. Software versions

The resulting Docker image will provide software with the following main versions:

2. Prerequisites

The steps described in this article have been tested with this environment:

Docker version: 19.03.13
Docker registry (environment: Bright CM 9.1, with default values for cm-docker-registry-setup)
GPU-capable host (environment: AWS EC2, with g4dn.12xlarge instance)

3. Dockerfile

The following Dockerfile is going to be used in this knowledge base article:

FROM brightcomputing/jupyter-kernel-sample:k8s-spark-py37-1.2.1
# ref: https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/10.2/ubuntu18.04-x86_64/base/Dockerfile

# Install CUDA repositories
RUN apt-get update \
   && apt-get install -y --no-install-recommends gnupg2 curl ca-certificates unzip \
   && curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub | apt-key add - \
   && echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list \
   && echo "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list \
   && rm -rf /var/lib/apt/lists/*

ENV CUDA_VERSION 10.2.89
ENV CUDA_PKG_VERSION 10-2=$CUDA_VERSION-1

# Install CUDA packages
RUN mkdir -p /usr/share/man/man1/ \
   && apt-get update \
   && apt-get install -y --no-install-recommends \
       cuda-cudart-$CUDA_PKG_VERSION \
       cuda-compat-10-2 \
       cuda-toolkit-10-2 \
       cuda-nvtx-10-2 \
   && rm -rf /var/lib/apt/lists/*

# Required for nvidia-docker v1
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf \
   && echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf

ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV NVIDIA_REQUIRE_CUDA "cuda>=10.2 brand=tesla,driver>=396,driver<397 brand=tesla,driver>=410,driver<411 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=440,driver<441"

# ref: https://github.com/NVIDIA/spark-xgboost-examples/blob/spark-3/getting-started-guides/on-prem-cluster/standalone-python.md

# Install cuDF and RAPIDS
RUN pushd  $SPARK_HOME/jars \
       && curl https://repo1.maven.org/maven2/ai/rapids/cudf/0.14/{cudf-0.14-cuda10-2.jar} -o '#1' \
       && curl https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/0.1.0/{rapids-4-spark_2.12-0.1.0.jar} -o '#1' \
   && popd

# Install XGBoost & NumPy
RUN pushd  $SPARK_HOME/jars \
       && curl https://repo1.maven.org/maven2/com/nvidia/xgboost4j_3.0/1.0.0-0.1.0/{xgboost4j_3.0-1.0.0-0.1.0.jar} -o '#1' \
       && curl https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.0.0-0.1.0/{xgboost4j-spark_3.0-1.0.0-0.1.0.jar} -o '#1' \
   && popd \
   && pip3 install numpy==1.19.2

ENV LIBS_PATH ${SPARK_HOME}/jars
ENV SPARK_JARS ${LIBS_PATH}/cudf-0.14-cuda10-2.jar,${LIBS_PATH}/xgboost4j_3.0-1.0.0-0.1.0.jar,${LIBS_PATH}/xgboost4j-spark_3.0-1.0.0-0.1.0.jar
ENV JAR_RAPIDS ${SPARK_HOME}/rapids-4-spark_2.12-0.1.0.jar

# Make XGBoost available in Python
ENV PYTHONPATH ${PYTHONPATH}:${LIBS_PATH}/xgboost4j-spark_3.0-1.0.0-0.1.0.jar

# Download example dataset for NVIDIA notebook
# https://github.com/NVIDIA/spark-xgboost-examples/blob/880f8b8a6fde21f2f8308450883c3a980f6d434e/examples/notebooks/python/mortgage-gpu.ipynb
RUN curl https://raw.githubusercontent.com/NVIDIA/spark-xgboost-examples/880f8b8a6fde21f2f8308450883c3a980f6d434e/datasets/mortgage-small.tar.gz -o /tmp/880f8b8a6fd-mortgage-small.tar.gz

ref: https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/10.2/ubuntu18.04-x86_64/base/Dockerfile

ref: https://github.com/NVIDIA/spark-xgboost-examples/blob/e2621a1ca4dd764e48ab9b512d1896f942f4b84a/getting-started-guides/on-prem-cluster/standalone-python.md

4. Image creation

In a cluster with a Docker registry having domain head-node-name and port number 5000, connect to a host meeting all the prerequisites and run:

# mkdir spark-xgboost-image
# cd spark-xgboost-image
# curl https://support.brightcomputing.com/kb-articles/spark-xgboost/Dockerfile -o Dockerfile
# docker build -t head-node-name:5000/spark-xgboost-v1 .
# docker push head-node-name:5000/spark-xgboost-v1

It should now be possible to pull the image just created:

# docker pull head-node-name:5000/spark-xgboost-v1

5. Additional references

XGBoost4J-Spark tutorial (https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html)

Spark/XGBoost example (https://github.com/NVIDIA/spark-xgboost-examples/blob/880f8b8a6fde21f2f8308450883c3a980f6d434e/examples/notebooks/python/mortgage-gpu.ipynb)

Updated on May 31, 2021

1. Software versions

2. Prerequisites

3. Dockerfile

4. Image creation

5. Additional references

Related Articles

Leave a Comment Cancel