InfiniBand install & config for vSphere 5.5

A followup to my adventures of InfiniBand in the lab... and the vBrownbag Tech Talk about InfiniBand in the lab I did at VMworld 2013 in Barcelona .

 

In this post I will cover how to install the InfiniBand drivers and various protocols in vSphere 5.5. This post and the commands below are only applicable if you are not using Mellanox ConnectX-3 VPI Host Card Adapters or if you have a InfiniBand switch with a hardware integrated Subnet Manager. Mellanox states that the ConnectX-3 VPI should allows normal IP over InfiniBand (IPoIB) connectivity with the default 1.9.7 drivers on the ESXi 5.5.0 install cdrom.

This post will be most useful to people that have the following configuration

  • Two ESXi 5.5 hosts with direct InfiniBand host-to-host connectivity (no InfiniBand switch)
  • Two/Three ESXi 5.5 hosts with InfiniBand host -to-storage connectivity (no InfiniBand switch and a storage array like Nexenta Community Edition)
  • Multiple ESXi 5.5 hosts with a InfiniBand switch that doesn’t have a Subnet Manager

The installation in these configuration is only possible since early this morning (October 22nd at 00:08 CET time), when Raphael Schitz (@hypervisor_fr) has released an updated version of the OpenSM 3.3.16-64, which was compiled in 64bit for usage on vSphere 5.5 and vSphere 5.1.

First things first… let’s rip the Mellanox 1.9.7 drivers from a new ESXi 5.5.0 install

 

Removing Mellanox 1.9.7 drivers from ESXi 5.5

Yes, the first thing to get IP over InfiniBand (for VMkernel adaptaers like vMotion or VSAN) or SCSI RDMA Protocol (SRP) is to remove the new Mellanox 1.9.7 drivers from the newly install ESXi 5.5.0. The driver don’t work with the older Mellanox OFED 1.8.2 package, and the new OFED 2.0 package is pending… Lets cross finger for an early 2014 release.

You need to connect using SSH to your ESXi 5.5 host, and run the following command and you will need to reboot the host for the driver to be removed from the memory.

  • esxcli software vib remove -n=net-mlx4-en -n=net-mlx4-core
  • reboot the ESXi host

esxcli software vib remove

 

Installing Mellanox 1.61 drivers, OFED and OpenSM

After the reboot you will need to download the following files and copy them to the /tmp on the ESXi 5.5 host

  1. VMware ESXi 5.0 Driver 1.6.1 for Mellanox ConnectX Ethernet Adapters (Requires myVMware login)
  2. Mellanox InfiniBand OFED 1.8.2 Driver for VMware vSphere 5.x
  3. OpenFabrics.org Enterprise Distribution’s OpenSM 3.3.16-64 for VMware vSphere 5.5 (x86_64) packaged by Raphael Schitz

Once the files are in /tmp or if you want to keep a copy on the shared storage, you will need to unzip the Mellanox 1.6.1 driver file. Careful with the ib-opensm-3.3.16-64, the esxcli -d becomes a -v for the vib during the install. The other change since vSphere 5.1, is that we need to set the esxcli software acceptance level to CommunitySupported level, to install some of the drivers and binaries.

The commands are

  • unzip mlx4_en-mlnx-1.6.1.2-471530.zip
  • esxcli software acceptance set –level=CommunitySupported
  • esxcli software vib install -d /tmp/mlx4_en-mlnx-1.6.1.2-offline_bundle-471530.zip –no-sig-check
  • esxcli software vib install -d /tmp/MLNX-OFED-ESX-1.8.2.0.zip –no-sig-check
  • esxcli software vib install -v /tmp/ib-opensm-3.3.16-64.x86_64.vib –no-sig-check
  • reboot the ESXi host

esxcli software vib install infiniband

 

Setting MTU and Configuring OpenSM

After the reboot we have two more commands to pass.

  • esxcli system module paramters set -m=mlx4_core -p=mtu_4k=1
  • copy partitions.conf  /scratch/opensm/<adapter_1_hca>/
  • copy partitions.conf /scratch/opensm/<adapter_2_hca>/

The partitions.conf file only contains the following text:

  • Default=0x7fff,ipoib,mtu=5:ALL=full;

 

cp partitions.conf

I recommend that you check the state of your InfiniBand adapters (mlx4_0) using the following command

  • ./opt/opensm/bin/ibstat mlx4_0

ibstat mlx4_0

I also recommend that you write down the adapter HCA Port GUID numbers if you are going to use SCSI RDMA Protocol between the ESXi host and a storage array with SCSI RDMA Protocol. It will come in handy later (and in an upcoming post).

Now you are ready to add the new adapters to a vSwitch/dvSwitch and create the VMkernel adapters. Here is the current config for vMotion, VSAN and Fault Tolerance on a dual 20Gbps IB Adapters (which only costs $50!)

vSwitch1 with IB VMkernels

I aim to put the various vmkernel traffics in their own VLANs, but I still need to dig in the partitions.conf file.

 

If you have an older switch that does not support a MTU of 4K, make sure you set your vSwitch/dvSwitch to a MTU of 2044 (2048-4 bytes) and the same for the various VMkernel interfaces.

VMkernel MTU at 2044

 

 

Here is just a Quick Glossary about the various protocols that can use the InfiniBand fabric.

 What is IPoIB ?

IPoIB (IP-over-InfiniBand) is a protocol that defines how to send IP packets over IB; and for example Linux has an “ib_ipoib” driver that implements this protocol. This driver creates a network interface for each InfiniBand port on the system, which makes an Host Card Adapter (HCA) act like an ordinary Network Interface Card (NIC).

IPoIB does not make full use of the HCAs capabilities; network traffic goes through the normal IP stack, which means a system call is required for every message and the host CPU must handle breaking data up into packets, etc. However it does mean that applications that use normal IP sockets will work on top of the full speed of the IB link (although the CPU will probably not be able to run the IP stack fast enough to use a 32 Gb/sec QDR IB link).

Since IPoIB provides a normal IP NIC interface, one can run TCP (or UDP) sockets on top of it. TCP throughput well over 10 Gb/sec is possible using recent systems, but this will burn a fair amount of CPU.

 

 What is SRP ?

The SCSI RDMA Protocol (SRP) is a protocol that allows one computer to access SCSI devices attached to another computer via remote direct memory access (RDMA).The SRP protocol is also known as the SCSI Remote Protocol. The use of RDMA makes higher throughput and lower latency possible than what is possible through e.g. the TCP/IP communication protocol. RDMA is only possible with network adapters that support RDMA in hardware. Examples of such network adapters are InfiniBand HCAs and 10 GbE network adapters with iWARP support. While the SRP protocol has been designed to use RDMA networks efficiently, it is also possible to implement the SRP protocol over networks that do not support RDMA.

As with the ISCSI Extensions for RDMA (iSER) communication protocol, there is the notion of a target (a system that stores the data) and an initiator (a client accessing the target) with the target performing the actual data movement. In other words, when a user writes to a target, the target actually executes a read from the initiator and when a user issues a read, the target executes a write to the initiator.

While the SRP protocol is easier to implement than the iSER protocol, iSER offers more management functionality, e.g. the target discovery infrastructure enabled by the iSCSI protocol. Furthermore, the SRP protocol never made it into an official standard. The latest draft of the SRP protocol, revision 16a, dates from July 3, 2002

 

What is iSER ?

The iSCSI Extensions for RDMA (iSER) is a computer network protocol that extends the Internet Small Computer System Interface (iSCSI) protocol to use Remote Direct Memory Access (RDMA). Typically RDMA is provided by either the Transmission Control Protocol (TCP) with RDMA services (iWARP) or InfiniBand. It permits data to be transferred directly into and out of SCSI computer memory buffers (which connects computers to storage devices) without intermediate data copies.

The motivation for iSER is to use RDMA to avoid unnecessary data copying on the target and initiator. The Datamover Architecture (DA) defines an abstract model in which the movement of data between iSCSI end nodes is logically separated from the rest of the iSCSI protocol; iSER is one Datamover protocol. The interface between the iSCSI and a Datamover protocol, iSER in this case, is called Datamover Interface (DI).

 

  • raphael schitz

    Nice post Erik. One could also use PowerCLI instead of ssh to uninstall/install vibs following this post http://www.hypervisor.fr/?p=3972

    • Merci Raphael. Without you it would not be possible.

  • Very nice. Just waiting for my Infiniband switch (eBay price with shipping $240) to build my VSAN cluster. As for now I’m using Raphael’s VIB to run a Triangle (without switch) – 2 ESXi 5.5 and Nexenta box (physical).

  • Pingback: HA Configuration error following VIB install. | Erik Bussink()

  • Cody Weaver

    This excited me enough that I wanted to try it myself in my lab (was running 4.0 for the longest time). I’m using Infinihost III Ex cards (no subnet manager, only a host connected to a Storage Array (Solaris 11)), and I noticed that once I installed all the vibs, there was no /scratch/opensm folder at all available, so there was no adapters listed. Maybe that means it’s not detecting my HCAs? They show in lspci, so I’m not sure what is going on there or maybe I’m just missing something.

    • Which version of the ib-opensm did you install ?

      The latest ones from Raphael can be found at http://www.hypervisor.fr/?p=4662

      • Cody Weaver

        So I’m back after way too long! The hardware stayed the same, but I had to go on a long migration project and came back with some old ConnectX VPI Infiniband cards for free that I’ve swapped in (From SDR -> DDR, go me).

        BUT, I can say that the last couple of days I tried to implement this also failed on my ESX host.

        ESX can fully see the card when trying to configure it for passthrough, but while the VIBs install correctly, it doesn’t appear to help me.

        Stepping through my process, unzipping the drivers works fine, and changing my host level is perfectly fine (to CommunitySupported). There is also no issue installing all those VIBs. Upon reboot, (this may mess up with formatting), I’m not getting anything useful.

        I’m told to copy partitions to opensm scratch, but there is no such folder

        /opt/opensm/bin # cd /scratch/
        /vmfs/volumes/524026b4-849cd5c4-358f-0025902e0a98/.locker # ls
        core downloads log var vsantraces

        There is an opt/opensm/bin folder but I can’t execute ibstat like you ask because it states it can’t find an adapter:

        ~ # ./opt/opensm/bin/ibstat mlx4_0
        ibpanic: [62742] main: ‘mlx4_0’ IB device can’t be found: No such file or directory

        Which is odd because lspci notes that its there and even the kernel has given it a vmnic name (though the NIC is nowhere to be found in vCenter)

        0000:06:00.0 Serial bus controller: Mellanox Technologies MT25418 [ConnectX VPI – 10GigE / IB DDR, PCIe 2.0 2.5GT/s] [vmnic4]

        There is a direct connection between my Storage array and my ESX host with a singular infiniband cable for testing. There are currently no lights on either card. I’m not sure why I keep having bad luck with these things 😀

        I’m kind of stumped on what to do at the time, but I do intend to keep playing with it.

        Oddly enough, when double checking that my acceptance level was Community Supported, I found this (this was with all the VIBs installed):

        /tmp # esxcli software acceptance set –level=CommunitySupported
        [AcceptanceConfigError]
        Unable to set acceptance level of community due to installed VIBs Intel_bootbank_ib-opensm_3.3.16-64 having a lower acceptance level.
        Please refer to the log file for more details.

        Odd, given that CommunitySupported is as low as it gets! 😀

        I checked esxupdate.log, and it didn’t have any better information for me, and even notes that its going from community to community, so I have no idea what is up with that error:

        2014-02-11T14:31:08Z esxupdate: HostImage: INFO: Attempting to change the host acceptance level from community to community

        2014-02-11T14:31:08Z esxupdate: root: ERROR: Traceback (most recent call last):

        2014-02-11T14:31:08Z esxupdate: root: ERROR: File “/usr/lib/vmware/esxcli-software”, line 441, in

        2014-02-11T14:31:08Z esxupdate: root: ERROR: main()

        2014-02-11T14:31:08Z esxupdate: root: ERROR: File “/usr/lib/vmware/esxcli-software”, line 432, in main

        2014-02-11T14:31:08Z esxupdate: root: ERROR: ret = CMDTABLE[command](options)

        2014-02-11T14:31:08Z esxupdate: root: ERROR: File “/usr/lib/vmware/esxcli-software”, line 141, in AcceptanceSetCmd

        2014-02-11T14:31:08Z esxupdate: root: ERROR: h.SetHostAcceptance(ACCEPTANCE_INPUT[opts.level])

        2014-02-11T14:31:08Z esxupdate: root: ERROR: File “/build/mts/release/bora-1331820/bora/build/esx/release/vmvisor/sys-boot/lib/python2.6/site-packages/vmware/esximage/HostImage.py”, line 397, in SetHostAcceptance

        2014-02-11T14:31:08Z esxupdate: root: ERROR: AcceptanceConfigError: Unable to set acceptance level of community due to installed VIBs Intel_bootbank_ib-opensm_3.3.16-64 having a lower acceptance level.

        Infiniband has been a hard labor with no fruit so far, I hope I can get this figured out! 🙂

        • Cody Weaver

          So to conclude my own saga (That only took 5 months eh?), the lesson to be learned from this? Update your firmware. My IB cards were running FW version 2.5. I went to the Mellanox website and downloaded the latest 2.9 FW (released in 2011), and then passed the IB card into a Windows virtual machine. I burned the FW using Flint (Part of the MST Tools), then reversed everything by disassociating the card with the VM, removing it from Passthrough, and rebooting the host. The card was immediately recognizable and OpenSM created the config directories I complained about not being available, immediately.

          So again, if anyone in the future comes across this, just remember to update your firmware first before pulling your hair out.

          Sincerely,

          Cody Weaver

        • Hiya Cody,
          Can you tell me one element please, which I cannot gather from the repliyes you had here, which version of the OFED are you running on your ESXi hosts ?

          • Cody Weaver

            Hi Erik,

            Thanks for the response! For both hosts, I continued to use the OFED driver zip version laid out in this article, MLNX-OFED-ESX-1.8.2.0.zip. The only difference between this article’s files, and my hosts, is that I’m using ib-opensm-x64-3.3.15-6.x86_64.vib for opensm, as posted in http://www.hypervisor.fr/?p=4662. I have used ib-opensm-3.3.16-64.x86_64.vib as well in the past while trying to get everything to work.

      • Cody Weaver

        Another night of working on this, and while I continue to get closer, full resolution is always just out of reach. I’m still using the FW 2.9 on my ConnectX VPI cards, as the 2.5 that was on before was the reason why the cards never appeared. I am still using a direct-connect topology between 2 ESXi 5.5 hosts. I also moved to the latest .15 driver recommended instead of the .16 driver that is in this post. The cards have appeared, the modules are installing fine, OpenSM sees the dual ports, but the link state never gets past Initializing. I’ve tried having OpenSM on both hosts, one host, I’ve tried turning off 4K support (that’s when I get the correct “rate” of 20 like you have instead of 8), and all sorts of little things I’ve probably forgotten. Can you suggest anything I might be missing to try? This appears at this point to be an OpenSM issue for me. The following is what my ports are showing (I’m only connecting one right now)

        ESX Host 1:

        ~ # ./opt/opensm/bin/ibstat mlx4_0
        CA ‘mlx4_0’
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.9.1000
        Hardware version: a0
        Node GUID: 0x0008f104039a18d9
        System image GUID: 0x0008f104039a18dc
        Port 1:
        State: Initializing
        Physical state: LinkUp
        Rate: 20
        Base lid: 1
        LMC: 0
        SM lid: 1
        Capability mask: 0x0251086a
        Port GUID: 0x0008f104039a18da
        Link layer: InfiniBand
        Port 2:
        State: Down
        Physical state: Polling
        Rate: 28
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x0251086a
        Port GUID: 0x0008f104039a18db
        Link layer: InfiniBand

        ESX Host 2:

        ~ # ./opt/opensm/bin/ibstat mlx4_0
        CA ‘mlx4_0’
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.9.1000
        Hardware version: a0
        Node GUID: 0x0008f104039a1795
        System image GUID: 0x0008f104039a1798
        Port 1:
        State: Initializing
        Physical state: LinkUp
        Rate: 20
        Base lid: 1
        LMC: 0
        SM lid: 1
        Capability mask: 0x0251086a
        Port GUID: 0x0008f104039a1796
        Link layer: InfiniBand
        Port 2:
        State: Down
        Physical state: Polling
        Rate: 28
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x0251086a
        Port GUID: 0x0008f104039a1797
        Link layer: InfiniBand

        Thanks again for being that inspiration that’s gotten me this far!

        • Richie

          I had a similar issue with ConnectX cards running the latest firmware 2.9.1000.
          I downgraded the firmware to the oldest I could find, 2.7.000 and the connection state went to active which fixed the problem.

  • Matt Mabis

    Hey Eric,

    Your post was very helpful for me on getting my lab up.. I have a few questions i was hoping you might be able to answer…

    Using a direct connect i seem to get a massive amount of these errors in the logs.

    Dec 02 00:51:22 143881 [F44AF700] 0x01 -> osm_vendor_send: ERR 5430: Send p_madw = 0x32015df0 of size 256 TID 0x51000024b3 failed -5 (Invalid argument)
    Dec 02 00:51:22 143889 [F44AF700] 0x01 -> vl15_send_mad: ERR 3E03: MAD send failed (IB_UNKNOWN_ERROR)

    also

    Dec 02 00:56:57 396478 [F4632700] 0x01 -> Received SMP on a 1 hop path: Initial path = 0,1, Return path = 0,0
    Dec 02 00:56:57 396487 [F4632700] 0x01 -> sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT): SubnGet(SMInfo), attr_mod 0x0, TID 0x1cb7
    Dec 02 00:56:57 396493 [F4632700] 0x01 -> sm_mad_ctrl_send_err_cb: ERR 3120 Timeout while getting attribute 0x20 (SMInfo); Possible mis-set mkey?
    Dec 02 00:57:02 400842 [F44AF700] 0x01 -> osm_vendor_send: ERR 5430: Send p_madw = 0x32015ed0 of size 256 TID 0x510000255d failed -5 (Invalid argument)
    Dec 02 00:57:02 400864 [F44AF700] 0x01 -> vl15_send_mad: ERR 3E03: MAD send failed (IB_UNKNOWN_ERROR)
    Dec 02 00:57:07 405503 [F4632700] 0x01 -> log_send_error: ERR 5411: DR SMP Send completed with error (IB_TIMEOUT) — dropping
    Method 0x1, Attr 0x20, TID 0x1cb8

    I am using ESXi 5.5 with OpenSM 3.3.16-64 as per your guidelines here (on both hosts).. i also tried using 3.3.15-64 (Rebuilt recently) version and i was getting massive Multicast errors and the links would go up and down a lot… (the 3.3.16 at least is semi more stable on the links)

    These errors consistently happen and i dont think they should be wondering if you have an idea on what they are about?

    I am using connect-X cards (Dual Port) (cheaper models) with CX4 Cables as a cross over to 2 hosts… i would like to have a 3 triangle eventually with my 3 hosts and wait for a good price on a IB switch that does DDR

    • Richard

      You have a small typo in one of the commands above where you have a single-dash and double-equals, but they should be reversed. i.e. to set acceptance level the correct command should be:
      esxcli software acceptance set –-level=CommunitySupported

      (This was meant to be in response to erik, not Matt)

    • Matt,

      “i would like to have a 3 triangle eventually with my 3 hosts and wait for a good price on a IB switch that does DDR”

      I got a Voltaire ISR 9024D (DDR) switch from China for $300 + $100 shipping. It doesn’t have a SubnetManager in hardware. But it works great. Considering the price, it was a bargain.

      The Voltaire 9024D is different from the Voltaire 9024D-M. The 9024D-M has a management interface (but I don’t know more details about the Management and/or SubnetManager). But because of the management, it has a integrated CPU with a cooling, which cannot be turned off.
      There are rumours that you could disable the FANs in the 9024D without much impact on stability. So that is the reason I decided to go with the Voltaire 9024D instead of the Volaire 9024D-M.

      Check this out for a Voltaire ISR 9024D
      http://www.ebay.com/itm/Voltaire-ISR-9024-ISR9024-ISR9024D-24-port-Grid-Infiniband-SDR-DDR-Switch-10GB-/151212611366?pt=COMP_EN_Servers&hash=item2334f95726

      I think if the device works (on shape and without a technical malfunction) it’s a real bargain.

      • Matt Mabis

        I Ended up getting a Flextronics FX430066 for free from a guy and replaced its fans (SUPER QUIET) that didnt resolve the issue i was having with vSphere 5.5, turned out i was hitting another issue. I have been working with Raphael (Hypervisor.fr) he has built a few new revisions of the subnet manager and so far the latest build has proven VERY STABLE with vSAN. We are still testing out the other pieces like Node going offline and rebooting functionalities.

        Hopefully Raphael will post that revision soon as it has a lot more verbose logging that talks about it programming the IB Switch. Even in the Triangle network i was having issues (vSAN doesnt work in a triangle network because all host need to be on same subnet!) where the ports would flip/flop up and down in the logs. Raphael said he wasn’t having that issue with his latest build as well..

        My Switch required me to run at MTU 2044 but i got the switch for free! Hope this helps with updating you with what i know!

        • I’m also running the MTU at 2044 with my two switches.

          • Matt Mabis

            Raphael Told me he was going to post the New Build of the IB-OpenSM software soon to his site as it has yielded a better stable use of the subnet manager

          • Matt, I did test this release with Raphael. It works fine for vMotion type of traffic, but with VSAN, following the ib-opensm shutdown & recovery it broke the VSAN sync.
            I will test (time permitting) the ib-opensm builds from Raphael.

          • Matt Mabis

            I noticed the same thing, best way to fix it is to throw the VMKernels into a NIC then back into the Mellanox adapters and it works… Something with the multicast scope me thinks…

        • tbuehn

          Slightly OT, but what fans did you use for the Flextronics FX430066? I have 2 that im trying to quiet down. THey are louder than the DL160s they are attached to!

          • Matt Mabis

            If i recall you will have to move the pins appropriately but these are the fans i got.

            http://www.amazon.com/gp/product/B006ODM76C/ref=oh_details_o05_s00_i00?ie=UTF8&psc=1

          • tbuehn

            I ordered one of these and plugged it in but the fan wont turn on. The stock fans all run but LOUDLY and in looking at them further @ 3.3v whereas this fan is a 12v. Tested the fan on a motherboard and it spins fine so perhaps a voltage issue? Did you need to modify the fan at all?

          • tbuehn

            I ordered one of those fans to test but it doesn’t turn on/spin when plugged in. I tested it on a computer mobo so I know it works but I also noticed that the stock fan is rated @ 3.3v and this fan is 12v Did you have to modify anything to get the fan to work?

          • Matt Mabis

            Hey Tuehn as i mentioned above you have to rearrange the pins on the fan to match the pins on the old fan, common colors like power and negative should be easy to spot then just place the sensor pin in the right spot as well and you should get it working…

            Yours were rated at 3.3volts??? that doesnt seem right… Mine had 12v in them when i replaced them.

          • TBuehn

            Yep, that did it 🙂 Had to change the wiring as you mentioned (and DID mention in your original reply. My apologies.) and then the fan spun up. It is notably quieter now! More fans on order. Thank you! 😀 These are the fans that came in it (and the other Flextronics FX430066 I have which came from a different source) so I’m guessing they are stock but couldn’t say with certainty. SUPER quiet but at the expense of reduced airflow. I’ll take that tradeoff 🙂 Thanks again!

  • Pingback: Upgrade to ESXi 5.5 with C6100 and Mellanox()

  • Fred O Frog

    I have some more information from Mellanox on the ESXi support. Firstly the 1.8.2 drivers are separate to the 1.9.8 drivers. The former is for Infiniband and the latter is for Ethernet. They are currently incompatible. You must run every port on every Mellanox card in an ESXi box as either IB or Ethernet (you can not mix, ie. VPI is not supported under ESXi). VPI support under ESXi is currently a low priority and won’t be happening in an upcoming driver at this stage unless a strong business case warrants it. iSER will not be included in the upcoming 1.9.9 driver scheduled for release this year however it is scheduled to be included in the upcoming Ethernet driver 1.9.10 which will is scheduled to be released early 2014.

    • Thanks a million for sharing this. Didn’t know the details about the various drivers status from Mellanox.

  • Steve Furniss

    What MTU settings should I use with an MT25418 adaptor? should the MTU be set at the command line and also on the vSwitch and VM-Kernel port?

    I am currently trying to use this in a point to point setup with two ESXi 5.5 hosts.

    Thanks for writing this blog post…. it has great information that is really usefull.

    • If you are using vSphere 5.5, I think for now you are going to be limited to a 2048 MTU. Don’t know why yet, but we can’t seem to get the 4K MTU to work under ESXi 5.5

  • Saddust

    This article is great ! I have 3 ESXi hosts doted with 10Gbe Mellanox NICs, one is now running vSphere 5.5 but the 10Gbe NICs are not seen anymore, impossible to carry on with this upgrade ! Anyway, I learned some usefull things by reading this.
    I will try this asap and let you know.

  • Phorkus Maximus

    Wonderful article, Eric! I wanted to chime in and mention that if you want to uninstall the 1.6.1.2 version of the mlx4_en driver, you will need to do a little magic to fix things up when you reinstall the “depot” edition of the drivers. The magic commands are:

    esxcli system module parameters set -m mlx4_core -p debug_level=0
    esxcli system module parameters set -m mlx4_core -p msi_x=1

    The msi_x and debug_level parameters are invalid going from 1.6.1.2 to the 1.9.7 version of the driver. The above corrects the parameters that cause issues when trying to automatically load the mlx4_core and mlx4_en drivers after reinstall. Cheers!

  • Monty

    Thanks for the great article.. very useful. Although when following it line for line in my own setup I seem to be running into problems (i’m guessing its because I am not using a switch). Essentially I am finding that pings between my 3 hosts randomly stops working.

    Additionally I have an OmniOS box, where I created an SRP target. This initially worked as expected it was just picked up no dramas.. However its not randomly stopped working altogether too.

    Any ideas/suggestions as to what to try/not to try would be welcome.

    • Monty

      I managed to sort out the SRP storage, was simply down to not having the same IB port used after I removed the cables to ensure they were working. So I am 50% through this, just cannot get IP connectivity working between my 2 x vsphere 5.5.

  • Rain

    Can anyone confirm that the older ConnectX (not -2 or -3, just ConnectX) still work with ESXi 5.5? Probably have to use a much lower MTU (2k) as others have reported with 5.1, but they should work, right?

    Better yet: Is anyone actively using the older ConnectX(-1) cards with ESXi 5.5? Any thing I should be aware of before I purchase a few used cards and follow the above steps to get OpenSM working?

    • I’m using those as reference (bought used on eBay), but it seems that those are the “4x”…..HP 452372-001 Infiniband PCI-E 4X DDR. vSphere 5.5 U1 with ESXi 1881737 as latest build and MTU 2044 with Mellanox IB switch. And recently, I flashed my Dell Perc H310 which originally do provide queue depth only 25 with IT firmware to get queue depth 600 -:).http://www.vladan.fr/flash-dell-perc-h310-with-it-firmware/

      • Rain

        Vladan, thanks for the quick response! I really appreciate it!

        Some quick research leads me to believe that the HP 452372-001 cards are rebranded Mellanox MHGH28-XTC cards. Can you confirm? The MHGH28-XTC cards are what I’m specifically interested in because of a recent eBay listing. I’d assume that the firmware is the same on the HP cards you’re using, so the MHGH28-XTC’s should work just fine as well.

        I’m not sure exactly what you mean about them being “4x” cards though. What does the “4x” part of the model name mean exactly?

        • The exact reference of those cards is:
          HP 452372-001 Infiniband PCI-E 4X DDR Dual Port Storage Host Channel Adapter HCA ( 3 pieces – £26 each on Ebay)
          The “4x” – forget it. I got confused.

          I haven’t flashed them with another firmware. To be honest I’m glad that they just works. There is a discussion on Mellanox forum about two similar models like the on I use. I think it might be your interest: http://community.mellanox.com/thread/1178

          • Rain

            Thanks for all the information! Judging by that discussion on the Mellanox forum I think I’ll be fine with the MHGH28-XTC cards.

  • Антон Бородин

    Well, I have four ESXi 5.5 hosts with onboard MT26428 adapters. I also bought IS5022 switch and MC2206130-00A.cables.However, indicators of switch ports are turned off and the vSphere Client app also shows a lack of connection (see image in attachment). What could be the problem?

    • What software stack (OFED) did you install ?
      Did you create the partitions.conf file for each adapter ?
      Is there anything you need to do on the IS5022 ?
      I don’t have a InfiniBand environment running anymore, so I’m just helping based on my knowledge from last year.

      • Антон Бородин

        As I understand I have not set a stack OFED. Also, tell me, whether I understand that VPI adapters are initialized by the driver? Ie, currently installed driver initializes the adapter in 10GbE mode and therefore switch does not see the link with this adapter?

      • Антон Бородин

        So, I installed MLNX-OFED-ESX-1.9.10.0-10EM-550.0.0.1331820. Here is the output of ibstat:
        CA ‘mlx4_0’
        CA type: MT26428
        Number of ports: 1
        Firmware version: 2.9.1000
        Hardware version: b0
        Node GUID: 0x5404a64a13730000
        System image GUID: 0x5404a64a13730003
        Port 1:
        State: Down
        Physical state: Disabled
        Rate: 10
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x00010000
        Port GUID: 0x5604a6fffe4a1373
        Link layer: Ethernet
        How can I change link layer from Ethernet to Infiniband?

        I also created a file partitions.conf, but after reboot it disappears from /scratch/opensm/0x5604a6fffe4a1373

        Perhaps it is because I boot from USB?

  • Pingback: Scrap Lab - Home lab for the cheap - Michael Ryom()

  • Alex

    Hi,

    I have a question – I have 4 nodes connected together, all of them running esxi 5.1. Everything worked fine with the software SM ( well not everything – according to iperf I was getting around 5Gbps between the nodes, and I was expecting ~8-9Gbps ). I thought that the software SM was slowing me down, so I introduced a topspin 120 switch to the topology. But I haven’t been able to make it work. The only configuration the switch has is ib sm subnet-prefix fe:80:00:00:00:00:00:00 priority 0 . So my question is – has anybody been able to connect together a topology with an infiniband switch in it ? I would be glad if somebody shares their experience. I thought that it would be plug’n’play pretty much, but this hasn’t been the case so far.

    Thanks in advance.
    And thanks to Erik for this great blog 🙂

  • Audi Chris

    is this apply to vmware 6.0

    [root@ESXi-1:~] esxcli system module paramters set -m=mlx4_core -p=mtu_4k=1

    Error: Unknown command or namespace system module paramters set

    ———————–
    [root@ESXi-1:/vmfs/volumes/557663ce-e4c6a5b6-df89-2047477d7c30] esxcli software vib list|grep mlx4

    net-mlx4-core 1.8.2.4-1OEM.500.0.0.472560 Mellanox PartnerSupported 2015-06-24

    net-mlx4-en 1.6.1.2-1OEM.500.0.0.406165 Mellanox VMwareCertified 2015-06-24

    net-mlx4-ib 1.8.2.4-1OEM.500.0.0.472560 Mellanox PartnerSupported 2015-06-24

    nmlx4-core 3.0.0.0-1vmw.600.0.0.2494585 VMware VMwareCertified 2015-06-09

    nmlx4-en 3.0.0.0-1vmw.600.0.0.2494585 VMware VMwareCertified 2015-06-09

    nmlx4-rdma 3.0.0.0-1vmw.600.0.0.2494585 VMware VMwareCertified 2015-06-09

    ————————
    untouch vmware 6.0

    [root@ESXi-2:/var/log] esxcli software vib list|grep mlx4

    net-mlx4-core 1.9.7.0-1vmw.600.0.0.2494585 VMware VMwareCertified 2015-05-11

    net-mlx4-en 1.9.7.0-1vmw.600.0.0.2494585 VMware VMwareCertified 2015-05-11

    nmlx4-core 3.0.0.0-1vmw.600.0.0.2494585 VMware VMwareCertified 2015-05-11

    nmlx4-en 3.0.0.0-1vmw.600.0.0.2494585 VMware VMwareCertified 2015-05-11

    nmlx4-rdma 3.0.0.0-1vmw.600.0.0.2494585 VMware VMwareCertified 2015-05-11

  • Johnathan Wee

    I have a couple of ConnectX-2 VPI cards dual port which I want to use between an xpenology NAS and a esxi 5.5 host. I managed to get the drivers for the Xpenology NAS. Are there any updated drivers for ESXI 5.5 ? Is the above process (eg deleting the 1.9.7 drivers ) still necessary?

  • Fred O Frog

    Just a quick note to everybody that new ESXi 6.0 drivers have been released by Mellanox however they are hidden. On this page – http://www.mellanox.com/page/products_dyn?product_family=29&mtag=vmware_driver there are OFED drivers as well as Ethnernet drivers. In particular you probably want 1.9.10.5 driver for ESXi 6.0 if your running iSER and IPoIB. If you only want IPoIB and/or Ethernet in the same driver you want v2.4.0 and if you want ethernet support only you probably want v3.2.0.15.

    Tech tip, you need an earlier version of MFT and MST to upgrade the firmware. Try 3.7.0. I actually installed the most recent MFT and used the builtin vmware drivers to do the firmware upgrade before I removed them to install the 1.9.10.5 package.