1. Home
  2. Cloudbursting
  3. How can I create an RDMA-ready cloud instance and software image?

How can I create an RDMA-ready cloud instance and software image?

This article is being updated. Please be aware the content herein, not limited to version numbers and slight syntax changes, may not match the output from the most recent versions of Bright. This notation will be removed when the content has been updated.

Prepare The Azure RDMA Cloud instances:

1. Create an availability set in Azure Portal

  1. Click on “All Services” (top left corner)
  2. Choose “Availability Set
    How can I create an RDMA-ready cloud instance and software image?
  3. Click on +Add to add a new availability set as follows:
    1. Set an arbitrary name
    2. Choose the correct subscription
    3. Use an existing resource to which the cluster extension belongs. You can check the cloudsettings of the cloud director to see the resource group which should be used.
    4. In Bright 8.0, only “classic” unmanaged disks are allowed.

      How can I create an RDMA-ready cloud instance and software image?
    5. Get the Resource ID of the created Availability Set. This is used later when creating the cloud instances in BrightHow can I create an RDMA-ready cloud instance and software image?
  4. Set the availability set for the cloud nodes in Bright:
[root@ma-c-02-06-b80-c7u2 ~]# cmsh
[ma-c-02-06-b80-c7u2]% device use westeurope-cnode002  
[ma-c-02-06-b80-c7u2->device[westeurope-cnode002]]% cloudsettings
[ma-c-02-06-b80-c7u2->device[westeurope-cnode002]->cloudsettings]%set availabilitysetid "/subscriptions/2b8fad2b-aaf1-425a-bf45-36cfd495107e/resourceGroups/ma-c-02-06-b80-c7u2-westeurope-bcm/providers/Microsoft.Compute/availabilitySets/azure-rdma-test"
[ma-c-02-06-b80-c7u2->device*[westeurope-cnode002*]->cloudsettings*]% commit
  1. Set the VM size to one that supports RDMA:
[root@ma-c-02-06-b80-c7u2 ~]# cmsh
[ma-c-02-06-b80-c7u2]% device use westeurope-cnode002  
[ma-c-02-06-b80-c7u2->device[westeurope-cnode002]]% cloudsettings
[ma-c-02-06-b80-c7u2->device[westeurope-cnode002]->cloudsettings]%set vmsize standard_h16m

Prepare the Azure RDMA image:

  1. Install/enable and configure WALinuxAgent** in the software image and change /etc/waagent.conf to support RDMA:

** the WALinuxAgent is responsible for bringing up the IB interfaces on the cloud nodes

[root@ma-c-02-06-b80-c7u2 ~]# yum install WALinuxAgent --installroot=/cm/images/cloud-image
[root@ma-c-02-06-b80-c7u2 ~]# chroot /cm/images/cloud-image/
[root@ma-c-02-06-b80-c7u2 /]# systemctl enable waagent
[root@ma-c-02-06-b80-c7u2 /]# grep -vE "^#|$^" /etc/waagent.conf
Provisioning.Enabled=n
Provisioning.UseCloudInit=n
Provisioning.DeleteRootPassword=n
Provisioning.RegenerateSshHostKeyPair=n
Provisioning.SshHostKeyPairType=rsa
Provisioning.MonitorHostName=n
Provisioning.DecodeCustomData=n
Provisioning.ExecuteCustomData=n
Provisioning.AllowResetSysUser=n
ResourceDisk.Format=n
ResourceDisk.Filesystem=ext4
ResourceDisk.MountPoint=/mnt/resource
ResourceDisk.EnableSwap=n
ResourceDisk.SwapSizeMB=0
ResourceDisk.MountOptions=None
Logs.Verbose=y
OS.RootDeviceScsiTimeout=300
OS.OpensslPath=None
OS.SshDir=/etc/ssh
OS.EnableRDMA=y
AutoUpdate.Enabled=n
OS.EnableFirewall=n

  1. Download and install the msft-rdma-drivers provided by Microsoft in the software image (note that the actual URL of the msft-rdma-drivers package will need to be changed in the commands below)
[root@ma-c-02-06-b80-c7u2 ~]#  wget http://download.microsoft.com/download/6/8/F/68FE11B8-FAA4-4F8D-8C7D-74DA7F2CFC8C/msft-rdma-drivers-4.2.3.1-20180209.x86_64.rpm
[root@ma-c-02-06-b80-c7u2 ~]#  wget http://download.microsoft.com/download/6/8/F/68FE11B8-FAA4-4F8D-8C7D-74DA7F2CFC8C/msft-rdma-drivers-4.2.3.1-20180209.src.rpm
[root@ma-c-02-06-b80-c7u2 ~]#  rpm -ivh msft-rdma-drivers-4.2.3.1-20180209.x86_64.rpm --root=/cm/images/cloud-image
  1. Check the version of kernel supported by the msft-rdma-drivers
[root@ma-c-02-06-b80-c7u2 ~]# chroot /cm/images/cloud-image/
[root@ma-c-02-06-b80-c7u2 ~]# cd /opt/microsoft/rdma/rhel74/
[root@ma-c-02-06-b80-c7u2 rhel74]# rpm -qlp kmod-microsoft-hyper-v-rdma-4.2.3.1.144-20180209.x86_64.rpm
/etc/depmod.d/hyperv.conf
/lib/modules/3.10.0-693.17.1.el7.x86_64
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma/hid-hyperv.ko
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma/hv_balloon.ko
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma/hv_netvsc.ko
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma/hv_network_direct.ko
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma/hv_sock.ko
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma/hv_storvsc.ko
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma/hv_utils.ko
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma/hv_vmbus.ko
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma/hyperv-keyboard.ko
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma/hyperv_fb.ko
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma/pci-hyperv.ko
/lib/modules/3.10.0-693.17.1.el7.x86_64/extra/microsoft-hyper-v-rdma/uio_hv_generic.ko
  1. Update to a kernel version which matches what is available from Microsoft
[root@ma-c-02-06-b80-c7u2 ~]# yum update --installroot=/cm/images/cloud-image
[root@ma-c-02-06-b80-c7u2 ~]# cmsh
[ma-c-02-06-b80-c7u2]% softwareimage use cloud-image  
[ma-c-02-06-b80-c7u2->softwareimage[cloud-image]]%set kernelversion 3.10.0-693.17.1.el7.x86_64  
[ma-c-02-06-b80-c7u2->softwareimage*[cloud-image*]]% commit
  1. Install Infiniband Support group in the software image:

[root@ma-c-02-06-b80-c7u2 ~]# chroot /cm/images/cloud-image/
[root@ma-c-02-06-b80-c7u2 /]# yum groupinstall "Infiniband Support"

  1. Install kmod-microsoft-hyper-v-rdma and microsoft-hyper-v-rdma in the software image:
[root@ma-c-02-06-b80-c7u2 ~]# chroot /cm/images/cloud-image/
[root@ma-c-02-06-b80-c7u2 /]# rpm -ivh /opt/microsoft/rdma/rhel74/kmod-microsoft-hyper-v-rdma-4.2.3.1.144-20180209.x86_64.rpm
[root@ma-c-02-06-b80-c7u2 /]# rpm -ivh --noscripts /opt/microsoft/rdma/rhel74/microsoft-hyper-v-rdma-4.2.3.1.144-20180209.x86_64.rpm

  1. Install hypervkvpd which is required by the waagent to bring up the RDMA interface

[root@ma-c-02-06-b80-c7u2 ~]# chroot /cm/images/cloud-image/
[root@ma-c-02-06-b80-c7u2 /]# yum install hypervkvpd

  1. Enabled openlogic repositories in the software image:
[root@ma-c-02-06-b80-c7u2 ~]# chroot /cm/images/cloud-image/
[root@ma-c-02-06-b80-c7u2 /]# cat > /etc/yum.repos.d/openlogic.repo
[openlogic]
name=CentOS-$releasever - openlogic packages for $basearch
baseurl=http://olcentgbl.trafficmanager.net/openlogic/$releasever/openlogic/$basearch/
enabled=1
gpgcheck=0
(ctrl+d)
  1. Reboot the cloud nodes and make sure that the kernel modules are loaded properly and the extra interface is up:
[root@westeurope-cnode002 ~]# lsmod | grep hv_
hv_network_direct     100138  0
hv_balloon             22073  0
ib_core               211874  14 rdma_cm,ib_cm,iw_cm,rpcrdma,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,hv_network_direct,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert
hv_storvsc             22716  2
hv_utils               25798  2
scsi_transport_fc      64007  1hv_storvsc
ptp                    19231  6 igb,tg3,bnx2x,ixgbe,hv_utils,e1000e
hv_netvsc              45611  0
hv_vmbus               72582  8hv_balloon,hyperv_keyboard,hv_netvsc,hid_hyperv,hv_utils,hyperv_fb,hv_storvsc,hv_network_direct
[root@westeurope-cnode002 ~]# lsmod | grep rdma
rpcrdma                86152  0
rdma_ucm               26841  0
ib_uverbs              64636  2 ib_ucm,rdma_ucm
rdma_cm                54426  4 rpcrdma,ib_iser,rdma_ucm,ib_isert
ib_cm                  47287  5rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib
iw_cm                  46260  1rdma_cm
ib_core               211874  14rdma_cm,ib_cm,iw_cm,rpcrdma,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,hv_network_direct,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert
sunrpc                348674  23 nfs,nfsd,auth_rpcgss,lockd,nfsv3,rpcrdma,nfs_acl
[root@westeurope-cnode001 ~]# ip a

1: lo:<LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1

  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
     valid_lft forever preferred_lft forever
  inet6 ::1/128 scope host  
     valid_lft forever preferred_lft forever

2: eth0:<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000

  link/ether 00:0d:3a:38:fa:84 brd ff:ff:ff:ff:ff:ff
  inet 10.42.0.5/16 brd 10.42.255.255 scope global eth0
     valid_lft forever preferred_lft forever
  inet6 fe80::20d:3aff:fe38:fa84/64 scope link  
     valid_lft forever preferred_lft forever

3: eth1:<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000

  link/ether 00:15:5d:33:ff:34 brd ff:ff:ff:ff:ff:ff
  inet 172.16.1.43/16 brd 172.16.255.255 scope global eth1
     valid_lft forever preferred_lft forever
  inet6 fe80::215:5dff:fe33:ff34/64 scope link  
     valid_lft forever preferred_lft forever

5: tun0:<POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1024

  link/none  
  inet 172.31.0.1/16 brd 172.31.255.255 scope global tun0
     valid_lft forever preferred_lft forever
  inet6 fe80::ed66:b3fb:a329:ef95/64 scope link flags 800  
     valid_lft forever preferred_lft forever

Test running MPI jobs:

  1. Run mpi job simple MPI job:
[cmsupport@westeurope-cnode001 ~]$ module load intel/mpi/mic/5.1.3/2016.4.258
[cmsupport@westeurope-cnode001 ~]$ which mpirun
/cm/shared/apps/intel/compilers_and_libraries/2016.4.258/linux/mpi/intel64/bin/mpirun
[cmsupport@westeurope-cnode001 2017]$ mpirun -hosts westeurope-cnode001,westeurope-cnode002 -n 2 -ppn 1-env I_MPI_FABRICS=shm:dapl -env I_MPI_DAPL_PROVIDER=ofa-v2-ib0 -env I_MPI_DYNAMIC_CONNECTION=0 hostname
westeurope-cnode001
Westeurope-cnode002
  1. Run a PingPong IMB test:
[cmsupport@westeurope-cnode001 ~]$ module load intel/mpi/64/5.1.3/2016.4.258
[cmsupport@westeurope-cnode001 ~]$ which mpirun
/cm/shared/apps/intel/compilers_and_libraries/2016.4.258/linux/mpi/intel64/bin/mpirun
[cmsupport@westeurope-cnode001 ~]$ mpirun -hosts westeurope-cnode001,westeurope-cnode002 -ppn 1-n 2  -env I_MPI_DEBUG 5  -env I_MPI_FABRICS=shm:dapl -env I_MPI_DAPL_PROVIDER=ofa-v2-ib0 -env I_MPI_DYNAMIC_CONNECTION=0/cm/shared/apps/intel/compilers_and_libraries/2016.4.258/linux/mpi/intel64/bin/IMB-MPI1 pingpong
[0] MPI startup():Multi-threaded optimized library
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-ib0
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-ib0
[1] MPI startup(): DAPL provider ofa-v2-ib0
[1] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): DAPL provider ofa-v2-ib0
[0] MPI startup(): shm and dapl data transfer modes
[0] MPID_nem_init_dapl_coll_fns():Userset DAPL collective mask =0000
[0] MPID_nem_init_dapl_coll_fns():Effective DAPL collective mask =0000
[0] MPI startup():Rank    Pid      Node name            Pin cpu
[0] MPI startup():0       5882     westeurope-cnode001  {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
[0] MPI startup():1       3563     westeurope-cnode002  {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
[1] MPID_nem_init_dapl_coll_fns():Userset DAPL collective mask =0000
[0] MPI startup(): I_MPI_DAPL_PROVIDER=ofa-v2-ib0
[0] MPI startup(): I_MPI_DEBUG=5
[0] MPI startup(): I_MPI_DYNAMIC_CONNECTION=0
[0] MPI startup(): I_MPI_FABRICS=shm:dapl
[1] MPID_nem_init_dapl_coll_fns():Effective DAPL collective mask =0000
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_DIST=10,20,20,10
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_MAP=mlx4_0:-1
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2
[0] MPI startup(): I_MPI_PIN_MAPPING=1:0 0
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 4.1 Update 1, MPI-1 part
#------------------------------------------------------------
# Date                  : Mon Mar 26 11:05:25 2018
# Machine               : x86_64
# System                : Linux
# Release               : 3.10.0-693.17.1.el7.x86_64
# Version               : #1 SMP Thu Jan 25 20:13:58 UTC 2018
# MPI Version           : 3.0
# MPI Thread Environment:
# New default behavior from Version 3.2 on:
# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# "SECS_PER_SAMPLE" (=> IMB_settings.h)
# or through the flag => -time
# Calling sequence was:
# /cm/shared/apps/intel/compilers_and_libraries/2016.4.258/linux/mpi/intel64/bin/IMB-MPI1 pingpong
# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   : MPI_BYTE
# MPI_Datatype for reductions    : MPI_FLOAT
# MPI_Op                         : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
     #bytes #repetitions      t[usec] Mbytes/sec
          0         1000         3.29         0.00
          1         1000         3.39         0.28
          2         1000         3.30         0.58
          4         1000         3.30         1.16
          8         1000         3.30         2.31
         16         1000         3.31         4.61
         32         1000         2.65        11.51
         64         1000         2.64        23.12
        128         1000         2.70        45.23
        256         1000         3.12        78.22
        512         1000         3.11       157.23
       1024         1000         3.25       300.84
       2048         1000         3.88       503.84
       4096         1000         5.13       760.86
        8192         1000         6.36      1227.52
      16384         1000         8.41      1858.89
      32768         1000        11.25      2778.86
      65536          640        17.37      3597.44
     131072          320        30.12      4149.39
     262144          160        58.17      4297.39
     524288           80       102.09      4897.46
    1048576           40       182.74      5472.38
    2097152           20       349.38      5724.45
    4194304           10       687.24      5820.37

# All processes entering MPI_Finaliz

Updated on October 7, 2020

Related Articles

Leave a Comment