InfiniBand in the lab…

Okay, the original title was going to be ‘InfiniBand in the lab… who can afford 10/40 GbE’. I’ve looked in the past at 10GbE switches, and nearly pulled the trigger a few times. Even now the prices of the switches like the Netgear Prosafe or Cisco SG500X are going down, the cost of the 10GbE Adapter is still high. Having tested VSAN in the lab, I knew I wanted more speed for the replication and access to the data, than what I experienced.  The kick in the butt in network acceleration that I have used is InfiniBand.

If you search on ebay, you will find lots of very cheap InfiniBand host channel adapters (HCA) and cables. A Dual 20Gbps adapter will cost you between $40 and $80 dollars. The cables will vary between $15 and upto $150 depending of the type of cables. One of the interesting fact is that you can use InfiniBand in a point-to-point configuration. Each InfiniBand network needs a Subnet Manager, This is a configuration for the network, akin a Fabric Channel Zoning.

InfiniBand Data Rates

An InfiniBand link is a serial link operating at one of five data rates: single data rate (SDR), double data rate (DDR), quad data rate (QDR), fourteen data rate (FDR), and enhanced data rate (EDR).

  1. 10 Gbps or Single Data Rate (SDR)
  2. 20 Gbps or Dual Data Rate (DDR)
  3. 40 Gbps or Quad Data Rate (QDR)
  4. 56 Gbps or Fourteen data rate (FDR)
  5. 100Gbps or Enhanced data rate (EDR)
  6. In 2014 will see the announcement of High data rate (HDR)
  7. And the roadmap continues with next data rate (NDR)

There is a great entry InfiniBand on wikipedia that discuss in larger terms the different signaling of InfiniBand.

InfiniBand Host Channel Adapters

Two weeks ago, I found a great lead, information and that pushed me to purchased 6 InfiniBand adapters.

Item picture  3x Mellanox InfiniBand MHGH28-XTC Dual Port DDR/CX4 (PCIe Gen2) at $50.
Item picture 3x Mellanox InfiniBand MCX354A-FCBT CX354A Dual Port FDR/QDR  (PCIe Gen3) at $300.

InfiniBand Physical Interconnection

Early InfiniBand used copper CX4 cable for SDR and DDR rates with 4x ports — also commonly used to connect SAS (Serial Attached SCSI) HBAs to external (SAS) disk arrays. With SAS, this is known as an SFF-8470 connector, and is referred to as an “InfiniBand-style” Connector.

Item picture Cisco 10GB CX4 to CX4 Infiniband Cable 1.5 m

The latest connectors used with up to QDR and FDR speeds 4x ports are QSFP (Quad SFP) and can be copper or fiber, depending on the length required.

InfiniBand Switch

While you can create a triangle configuration with 3 hosts using Dual Port cards like Vladan Seget (@Vladan) writes in his very interesting article Homelab Storage Network Speed with InfiniBand I wanted to see how a InfiniBand switch would work. I only invested in the following older Silverstorm 9024-CU24-ST2 that supports only 10Gbps SDR port. But it has 24x of them. Not bad for a $400 switch that supports 24x 10Gbps ports.

Item picture
SilverStorm 10Gbps 24 port InfiniBand switch 9024-CU24-ST2

In my configuration each Dual Port Mellanox MHGH28-XTC (DDR Capable) will connect to my SilverStorm switch at only SDR 10Gbps speed, but I have two ports from each hosts. I can also increase the amount of hosts connected to the switch, and use a single Subnet Manager and single IPoIB (IP over InfiniBand) network addressing scheme. At the present time, I think this single IPoIB network addressing might be what is important for the implementation of VSAN in the lab.

Below you see the IB Port Statistics with three vSphere 5.1 hosts connected (1x cable per ESXi as I’m waiting on a 2nd batch of CX4 cables).

Silverstorm 3x SDR Links

The surprise I had when connecting to the SilverStorm 9024 switch is that it did not have the Subnet Manager. But thanks to Raphael Schitz (@hypervisor_fr) who has successfully with the work & help of others (William Lam & Stjepan Groš) and great tools (ESX Community Packaging Tool by Andreas Peetz @vFrontDE), repackaged the OpenFabrics Enterprise Distribution OpenSM (Subnet Manager) so that it can be loaded on vSphere 5.0 and vSphere 5.1. This vSphere installable VIB can be found in his blog article  InfiniBand@home votre homelab a 20Gbps (In French).

The Link states in the screenshot above went to active, once the ib-Opensm was installed on the vSphere 5.1 hosts, the MTU was set and the partitions.conf configuration file written. Without Raphael’s ib-opensm, my InfiniBand switch would have been alone and not passed the IPoIB traffic in my lab.

 

Installing the InfiniBand Adapters in vSphere 5.1

Here is the process I used to install the InfiniBand drivers after adding the Host Channel Adapters. You will need three files. The first is the InfiniBand OFED Driver for VMware vSphere 5.x from Mellanox. The 2nd is the

  1. VMware’s Mellanox 10Gb Ethernet driver supports products based on the Mellanox ConnectX Ethernet adapters
  2. Mellanox InfiniBand OFED Driver for VMware vSphere 5.x
  3. OpenFabrics.org Enterprise Distribution’s OpenSM for VMware vSphere 5.1 packaged by Raphael Schitz

You will need to transfer these three packages to each vSphere 5.x host, and install them using the esxcli command line. Before installing the VMware Mellanox ConnectX dcrive, you need to unzip the file, as it’s the offline zip file you want to supply in the ‘esxcli software vib’ install command. I push all the files via SSH in the /tmp folder. I recommend that the host be put in maintenance mode, as you will need to reboot after the drivers are installed.

esxcli software vib install

The commands are

  • unzip mlx4_en-mlnx-1.6.1.2-471530.zip
  • esxcli software vib install -d /tmp/mlx4_en-mlnx-1.6.1.2-offline_bundle-471530.zip –no-sig-check
  • esxcli software vib install -d /tmp/MLNX-OFED-ESX-1.8.1.0.zip –no-sig-check
  • esxcli software vib install -v /tmp/ib-opensm-3.3.15.x86_64.vib –no-sig-check

Careful with the ib-opensm, the esxcli -d becomes a -v for the vib.

At this point, you will reboot the host. Once the host comes backup, there are two more commands you need to do. One is the set the MTU to 4092, and configure the OpenSM per adapter with the partitions.conf file.

The partitions.conf file is a simple one line file that contains the following config.

[button] Default=0x7fff,ipoib,mtu=5:ALL=full;[/button]

esxcli set IB mtu and copy partitions.conf

The commands are

  • esxcli system module paramters set -m=mlx4_core -p=mtu_4k=1
  • copy partitions.conf  /scratch/opensm/adapter_1_hca/
  • copy partitions.conf /scratch/opensm/adapter_2_hca/

At this point you will be able to configure the Mellanox Adapters in the vSphere Web Client (ConnectX for the MHGH28-XTC)

ESXi Network Adapter ConnectX

The vSwitch view is as follow

vSwitch1 Dual vmnic_ib

 

Configure the Mellanox Adapter in the vSphere Clientand (ConnectX3 for the MCX354A-FCBT)

ESXi Network Adapter ConnectX3

I’m still waiting on the delivery of some QSFP Cable for the ConnectX Adapters. This config will be done in a triangular design until I find a QDR Switch of reasonable cost.

This article wouldn’t be complete without a bench mark. Here is the screenshot I quickly took from the vCenter Server Appliance, that I bumped to 4 vCPU and 22GB that I vMotioned between two hosts with SDR (10Gbps) connectivity.

vCSA 22GB vMotion at SDR speed

 

This is where I’m going to stop for now.  Hope you enjoyed it.

 

 

  • Pingback: InfiniBand install & config for vSphere 5.5 | Erik Bussink()

  • Mark Malewski

    Erik, what Mellanox firmware version are you using on the MHGH28-XTC Infiniband cards, and also what software/firmware version are you using on the SilverStorm 9024 switch? I have the exact same hardware, and I’m getting ready to set it up. Do you happen to have a blog post about how you setup/configured your SilverStorm 9024? Did you need to update the firmware on it? I just want to make sure that I’m using the same firmware versions that you are using, so I don’t run into any problems. (I am trying to get a vSphere 5.5 home lab setup similar/identical to yours). Can you please post what firmware versions you are using for the Infiniband cards and switch? Thank-you!

    • You will find a link to the latest firmware for the SilverStorm 9024 in this article https://www.bussink.ch/?p=1410

      I have not checked the firmware of my HCA cards, nor upgraded them. I used them as is, and they work for me.

  • degdeg

    Hi Erik. I don’t understand why you use 6 cards? 3 MHGH28-XTC and 3 MCX354A-FCBT CX354A. Can I have the lab with 3 cards only. I will buy the same INFINIBAND switch. What is you storage ? Nexenta ? Thank you and sorry for my english

    • Deg,
      The three CX354A where an impulsive buy on a great price. They are not used yet in my lab. I will need to purchase a QDR or FDR switch.

      You can start a lab with only three MHGH28-XTC and a cheap InfiniBand switch, You need the switch as you cannot build a triangle config.

    • Deg, check the slides from the London VMUG, you will have more details.
      https://www.bussink.ch/?p=1402

    • I was using a Nexenta, but it died on me today. I didn’t get the time to play with a IB connectivity in the Nexenta, as I focused myself on the VSAN with InfiniBand.

  • degdeg

    Thanks Erik 🙂

  • Pete

    After setting 4k on the command line, did you set the vSwitch and the respective vmkernel both to 4092 as well? If so, did it go through successfully?

  • AlexMercer

    Hi,

    thanks for the great tutorial! 🙂 I’ve just discovered how cost-effective and awesome infiniband could be. But I ran into an odd problem : Can’t seem to set the MTU higher than 2044 and transmit at high speeds. At MTU=2044 I get [ 4] local 192.168.13.36 port 5001 connected with 192.168.13.37 port 53336
    [ ID] Interval Transfer Bandwidth
    [ 4] 0.0-10.0 sec 5.03 GBytes 4.32 Gbits/sec
    *measured with iperf by hypervisor.fr
    When I set the MTU higher than 2044 all I get is poor performance, clocking at ~800Mbits/s . And if I try to set the MTU = 4092, the Vsphere complains and doesn’t let me do it, because the infiniband vmnic doesn’t allow it, which is odd, since I’ve set up the mtu_4k parameter to true.
    Has somebody run into a similar problem and what was the workaround.
    Thanks in advance!
    🙂
    Alex

    • Alex, at this point, I know of no one that was able to get 4K MTU on vSphere 5.5, it worked on vSphere 5.1 (if you had a Switch capable of handling 4K MTU), but not on vSphere 5.5.

  • AlexMercer

    Thanks for the fast reply! 🙂
    So if I can’t transmit jumbo frames > 4k, how do I get the full capability of the infiniband fabric ? The speed between 2 ESXi hosts ( running on 5.1 ) is ~5Gbit/s, shouldn’t it be more like 8-9Gbit/s. I am using a HP 4x DDR IB Siwtch Module ( by specification it is capable of transmitting = 4k MTUs ) and the adapters are seen as 20Gbit vnics. But still I get an error message when trying to set up the mtu=4k. Can it be that the adapters themselves aren’t capable of handling such big jumbo frames ?
    I am sorry for bothering you and your readers with the problems I am encountering, but maybe it might be helpful for somebody with the same infiniband/vmware issues 🙂

    I also tried to use the infiniband fabric for virtual machine traffic and the results were really poor, while I am seeing 5Gbits between ESXi hosts, I get a measly 200Mbits between VMs on the same ESXi hosts using the infiniband link. Any ideas what’s causing this problem ?
    Thanks in advance!

    • Looks like some of the InfiniBand switches and drivers are stuck at a MTU 2024.

  • Mark

    Great article Erik! I’m looking at setting up high speed network connections between my VMware Esxi 5.5 host and Hyper-V 2012 R2 host on y home network. Mostly for VM migration testing from 1 platform to the other. Seeing as you wrote this article over a year ago, I’m curious as to what hardware you would recommend now to achieve this. thanks!

    • If you only have two hosts, I would recommend the Mellanox ConnectX-3 VPI adapter and a QSFP cable.

  • OLDo

    Hi Erik, great write up, thank you! It should hopefully help me on my way to setting up my own IB lab. I wonder if you can tell me how loud the 9024-CU24-ST2 switches are? Any idea if they can they be run fanless, or at least if the fans could be switched out for something like a 40mm Noctua NF-A4x10?