How can I use GPUs with Bright Deep learning stack?

This article is being updated. Please be aware the content herein, not limited to version numbers and slight syntax changes, may not match the output from the most recent versions of Bright. This notation will be removed when the content has been updated.

This is a Dockerfile that can be used to create a Bright Deep Learning Docker image. Once instantiated using the nvidia-docker command, applications running within the container will have access the to GPU. This container is lightweight. While idle, it will use only 8MB of memory.

The Dockerfile

FROM centos:latest 
ENV CUDA_VERSION 8.0.61 
LABEL com.nvidia.cuda.version="${CUDA_VERSION}" 
LABEL com.nvidia.volumes.needed="nvidia_driver" 
ENV NVIDIA_CUDA_VERSION $CUDA_VERSION 
ENV CUDA_PKG_VERSION 8-0-$CUDA_VERSION-1 

# nvidia-container-runtime 
ENV NVIDIA_VISIBLE_DEVICES all 
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility 

COPY cm.repo /etc/yum.repos.d/cm.repo 
COPY epel-testing.repo /etc/yum.repos.d/epel-testing.repo 
COPY epel.repo /etc/yum.repos.d/epel.repo 
COPY RPM-GPG-KEY-EPEL-7 /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 
COPY RPM-GPG-KEY-cm /etc/pki/rpm-gpg/RPM-GPG-KEY-cm 
COPY http-parser-2.7.1-3.sdl7.x86_64.rpm /root/http-parser-2.7.1-3.sdl7.x86_64.rpm 

RUN yum -y install /root/http-parser-2.7.1-3.sdl7.x86_64.rpm environment-modules cm-ml-distdeps 
RUN echo 'export MODULEPATH=$MODULEPATH:/cm/shared/modulefiles' >> /root/.bashrc

The build directory

# ls -l 
-rw------- 1 root root   785 Sep  2 13:30 cm.repo 
-rw-r--r-- 1 root root   838 Sep  2 13:32 Dockerfile 
-rw-r--r-- 1 root root   951 Sep  2 13:31 epel.repo 
-rw-r--r-- 1 root root  1050 Sep  2 13:31 epel-testing.repo 
-rw-r--r-- 1 root root 30784 Sep  2 13:31 http-parser-2.7.1-3.sdl7.x86_64.rpm 
-rw-r--r-- 1 root root  1714 Sep  2 13:31 RPM-GPG-KEY-cm 
-rw-r--r-- 1 root root  1662 Sep  2 13:31 RPM-GPG-KEY-EPEL-7

Build the image

# docker build -t bright/dl/min/1 .

Push the image to the local Docker repository

# docker tag bright/dl/min/1 localhost:6000/bright/dl/min/1 # docker push localhost:6000/bright/dl/min/1

Run the container

Log into a node that has a GPU then load the CUDA and nvidia-docker modules. Instantiate the container using the ‘nvidia-docker’ command. The ‘-v /cm/shared:/cm/shared’ argument is required to mount the host:/cm/shared directory inside the container. The Bright Deep Learning stack is installed in /cm/shared.

# ssh rstober@gpu01 $ module load cuda80/toolkit nvidia-docker $ nvidia-docker run -v /cm/shared:/cm/shared -ti virgo-head:6000/bright/dl/min/1

The container starts. Load the CUDA and tensorflow modules (inside the container this time) so that you can access the GPU and then run tensorflow.

[root@7ca3a4e45601 /]# module load cuda80/toolkit tensorflow  [root@7ca3a4e45601 /]# nvidia-smi 
Sat Sep  2 22:23:06 2017 
+-----------------------------------------------------------------------------+ 
| NVIDIA-SMI 375.66                 Driver Version: 375.66                  | 
|-------------------------------+----------------------+----------------------+ 
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC | 
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================| 
|   0  Quadro K6000        Off  | 0000:82:00.0     Off |                    0 | 
| 26%   36C    P8    20W / 225W |      0MiB / 11439MiB |      0%      Default | 
+-------------------------------+----------------------+----------------------+ 
+-----------------------------------------------------------------------------+ 
| Processes:                                                       GPU Memory | 
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================| 
|  No running processes found                                                 | 
+-----------------------------------------------------------------------------+

Now that we know we can see the GPU, we’re ready to run Tensorflow.

[root@7ca3a4e45601 /]# python /cm/shared/apps/dl/tensorflow-scripts/comp-graph.py 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: Quadro K6000 
major: 3 minor: 5 memoryClockRate (GHz) 0.9015 
pciBusID 0000:82:00.0 
Total memory: 11.17GiB Free memory: 11.10GiB 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro K6000, pci bus id: 0000:82:00.0) 
[3.0, 4.0] 
('node3:', <tf.Tensor 'Add:0' shape=() dtype=float32>) ('sess.run(node3):', 7.0)

Updated on October 28, 2020

Tagged: deep learning GPU

Related Articles

Leave a Comment Cancel