Categories

ID #1383

How can I use GPUs with Bright Deep learning stack?

This is a Dockerfile that can be used to create a Bright Deep Learning Docker image. Once instantiated using the nvidia-docker command, applications running within the container will have access the to GPU. This container is lightweight. While idle, it will use only 8MB of memory.

The Dockerfile

FROM centos:latest

ENV CUDA_VERSION 8.0.61
LABEL com.nvidia.cuda.version="${CUDA_VERSION}"
LABEL com.nvidia.volumes.needed="nvidia_driver"
ENV NVIDIA_CUDA_VERSION $CUDA_VERSION
ENV CUDA_PKG_VERSION 8-0-$CUDA_VERSION-1

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

COPY cm.repo /etc/yum.repos.d/cm.repo
COPY epel-testing.repo /etc/yum.repos.d/epel-testing.repo
COPY epel.repo /etc/yum.repos.d/epel.repo
COPY RPM-GPG-KEY-EPEL-7 /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
COPY RPM-GPG-KEY-cm /etc/pki/rpm-gpg/RPM-GPG-KEY-cm
COPY http-parser-2.7.1-3.sdl7.x86_64.rpm /root/http-parser-2.7.1-3.sdl7.x86_64.rpm

RUN yum -y install /root/http-parser-2.7.1-3.sdl7.x86_64.rpm environment-modules cm-ml-distdeps

RUN echo 'export MODULEPATH=$MODULEPATH:/cm/shared/modulefiles' >> /root/.bashrc

 

The build directory

# ls -l
-rw------- 1 root root   785 Sep  2 13:30 cm.repo
-rw-r--r-- 1 root root   838 Sep  2 13:32 Dockerfile
-rw-r--r-- 1 root root   951 Sep  2 13:31 epel.repo
-rw-r--r-- 1 root root  1050 Sep  2 13:31 epel-testing.repo
-rw-r--r-- 1 root root 30784 Sep  2 13:31 http-parser-2.7.1-3.sdl7.x86_64.rpm
-rw-r--r-- 1 root root  1714 Sep  2 13:31 RPM-GPG-KEY-cm
-rw-r--r-- 1 root root  1662 Sep  2 13:31 RPM-GPG-KEY-EPEL-7

 

 Build the image

 # docker build -t bright/dl/min/1 .

 

 Push the image to the local Docker repository

# docker tag bright/dl/min/1 localhost:6000/bright/dl/min/1
# docker push localhost:6000/bright/dl/min/1

 

Run the container

Log into a node that has a GPU then load the CUDA and nvidia-docker modules. Instantiate the container using the ‘nvidia-docker’ command. The ‘-v /cm/shared:/cm/shared’ argument is required to mount the host:/cm/shared directory inside the container. The Bright Deep Learning stack is installed in /cm/shared.

# ssh rstober@gpu01
$ module load cuda80/toolkit nvidia-docker
$ nvidia-docker run -v /cm/shared:/cm/shared -ti virgo-head:6000/bright/dl/min/1

 

The container starts. Load the CUDA and tensorflow modules (inside the container this time) so that you  can access the GPU and then run tensorflow.

[root@7ca3a4e45601 /]# module load cuda80/toolkit tensorflow 
[root@7ca3a4e45601 /]# nvidia-smi
Sat Sep  2 22:23:06 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K6000        Off  | 0000:82:00.0     Off |                    0 |
| 26%   36C    P8    20W / 225W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

 

 Now that we know we can see the GPU, we’re ready to run Tensorflow.

[root@7ca3a4e45601 /]# python /cm/shared/apps/dl/tensorflow-scripts/comp-graph.py
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Quadro K6000
major: 3 minor: 5 memoryClockRate (GHz) 0.9015
pciBusID 0000:82:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro K6000, pci bus id: 0000:82:00.0)
[3.0, 4.0]
('node3:', <tf.Tensor 'Add:0' shape=() dtype=float32>)
('sess.run(node3):', 7.0)


Tags: -

Related entries:

You can comment this FAQ