Upgrading Mellanox ConnectX firmware within ESXi

Last summer, while reading the ServeTheHome.com website, I saw a great link to Ebay for Mellanox ConnectX-3 VPI cards (MCX354A-FCBT). These cards where selling at $299 on ebay. I took three of the awesome cards. These Mellanox ConnectX-3 VPI adapters where simply too good to be true… Dual FDR 56Gb/s or 40/56GbE using PCIe Generation 3 slots. Having three of these Host Card Adapters without a InfiniBand switch is limiting.

With my new Homelab 2014 design, I now have two vSphere hosts that have PCIe Generation 3 slots, and using a simple QSFP+ Fiber Cable, I can create a direct point-to-point connection between the two vSphere hosts.

The Mellanox Firmware Tools (MFT) that can run within the vSphere 5.5 and allow to check the state of the InfiniBand adapter and even update the firmware.

MFT for vSphere

Installing the tools is very straight forward.

# esxcli software vib install -d /tmp/mlx-fw/MLNX-MFT-ESXi5.5-3.5.1.7.zip

Install Mellanox MST

Unfortunately it requires a reboot.

The next steps going to be to start the MST service, check the status of the of the Mellanox devices and query them to check the current level of firmware.

I don’t need to have the Mellanox MST driver running all the time, so I will simply start it using /opt/mellanox/bin/mst start.

Next we will query the state of all Mellanox devices in the host using /opt/mellanox/bin/mst status -v from which we will get the path to the devices.

We then use the flint tool to query the devices to get their stats.

/opt/mellanox/bin/flint -d /dev/mt40099_pci_cr0 hw query

and

/opt/mellanox/bin/flint -d /dev/mt40099_pci_cr0 query

which returns us the current Firmware version and the GUIDs and MACs for the host card adapters.

Mellanox firmware upgrade 01

Well as I’m running only FW Version 2.10.700 its time to upgrade this firmware to release 2.30.8000

 /opt/mellanox/bin/flint -d /dev/mt4099_pci_cr0 -i /tmp/mlx-fw/fw-ConnectX3-rel-2_30_8000-MCX354A-FCB_A1-FlexBoot-3.4.151_VPI.bin burn does the trick.

Mellanox firmware upgrade 02

And we can quickly check the new running firmware on the InfiniBand adapter.

 

 

Upgrading the X9SRH-7TF LSI HBA 2308 and LSI HBA 9207-8i

Here is a resume on how to upgrade the LSI HBA 2308 Chipset on the Supermicro X9SRH-7TF and a LSI SAS2 HBA 9207-8i card to the latest BIOS & Firmware using the UEFI mode. This is applicable to my homelab Supermicro X9SRH-7TF or any other motherboard with UEFI Build-In EFI Shell.

I’ve found that using the UEFI mode to be more practical than the old method of a MSDOS bootable USB key. And this is the way more and more Firmware and BIOS will be released.

Tom and Duncan showed you last week how to upgrade an LSI 9207-4i4e from within VMware vSphere 5.5 CLI. In this article I’m going to show you how to use the UEFI Shell for the upgrade.

Preamble.

Since last week, I have been running the PernixData FVP (Flash Virtualization Platform) 1.5 solution on my two ESXi hosts, and I have found that the LSI HBA 2308 on the motherboard had a tendency to drop all the Drives and SSDs under heavy I/O load. I did upgrade last week the LSI HBA 2308 from the original Phase 14 Firmware to Phase 16, but that didn’t solve the issue.  Unfortunately I have not yet found on the Supermicro Support site, a newer release of the Firmware Phase 18 or BIOS for the embedded adapter.

So I dropped in the box another LSI HBA 9207-8i adapter, which is also based on the LSI 2308 chip. And low and behold, my two LSI adapter seemed to have nearly the exact same Firmware & BIOS.

two_adapters_lsi

Well if they LSI Embedded HBA and the LSI 9207-8i are nearly identical and with the same chipset… who knows if I burn the Firmware & BIOS on the motherboard…

 

Preparation.

First you need to head over to the LSI website for the LSI 9207-8I and download a few files to a local computer. For the LSI HBA 9207-8i you can jump to the Software Downloads section. You want to download three files, extract them and put the files on a USB key.

  • The Installer_P18_for_UEFI which contains the firmware updater (sas2flash.efi)
  • The UEFI_BSD_P18 which contains the BIOS for the updater (X64SAS2.ROM)
  • The 9207_8i_Package_P18_IR_IT_Firmware_BIOS_for_MSDOS_Windows which contains the 9207-8.bin firmware.

lsi_site

At this point you put all those extracted files mentioned above on a USB key.

You reboot your server, and modify the Boot parameters in the BIOS of the server to boot in UEFI Built-In EFI Shell.

UEFI_Build-In_EFI_Shell

When you reboot also jump into the LSI HBA Adapter to collect the controllers SAS address. Its a 9 digit number you can find on the following interface. Notice that it starts with a 0 on the left of the quote.

lsi_sas_address_1

and

lsi_sas_address_2

For my adapters it would be 005A68BB0 for the SAS9207-8I and 0133DBE00 for the embedded SMC2308-IT.

 

Upgrading BIOS & Firmware.

Lets plug in the USB key in the server, and lets boot into the UEFI Build-In EFI Shell.

UEFI_booting

And lets move over to the USB key. For me the USB key is mapped as fs1: but you could also have a fs0:.  A quick dir command will list the files on the USB key.

usb_dir

Using the sas2flash.efi -listall command (extracted from the Installer_P18_for_UEFI file) we can list all the local LSI HBA adapters and see the various versions of the Firmware & BIOS.

sas2flash_listall_old

We can also get more details about a specific card using the sas2flash.efi -c 0 -list

sas2flash_list_old_9207

and sas2flash.efi -c 1 -list

sas2flash_list_old_2308

Now lets just upgrade the BIOS with the X64SAS2.ROM file found in the UEFI_BSD_P18 download and the Firmware with the 9207-8.bin that we found in the 9207-8i_Package_P18_IR_IT_Firmware_BIOS_for_MSDOS_Windows file.

As you see, the -c Controller command allows you to specify to which adapter the BIOS and Firmware is upgraded.

sas2flash_upgrade_0

and

sas2flash_upgrade_1

Lets have a peak again at just one of the LSI Adapters, the controller 1, which is the embedded one, now seems to have the Board name SAS9207-8i. A bit confusing, but it seemed to have worked.

sas2flash_1_list

Using the sas2flash.efi -listall command now shows us the new Firmware and BIOS applied to both cards.

sas2flash_listall_new

Now power-off the server, so the new BIOS & Firmware are properly loaded, and make sure to change back your Boot option in the server BIOS to your USB key or harddrive that contains the vSphere hypervisor.

Both LSI 9207-8i and the Embedded LSI HBA 2308 now show up as LSI2308_1 and LSI2308_2 in the vSphere Client.

esxi_storage_adapters

 

Homelab 2014 upgrade

I’ve been looking for a while for a new more powerful homelab (for home), that scales and passes the limits I currently have. I had a great success last year with the Supermicro X9SRL-F motherboard for the Home NAS (Running NexentaStor 3.1.5), so I know I loved the Supermicro X9 Single LGA2011 series. Because of the Intel C600 series of chipset, you can break the barrier of the 32GB you find on most motherboards (Otherwise the X79 chipset allows you upto 64GB).

As time passes, and you see product solutions coming out (vCOPS, Horizon View, vCAC, DeepSecurity, ProtectV, Veeam VBR, Zerto) with memory requirements just exploding. You need more and more memory. I’m done with the homelab, where you really need to upgrade just because you can’t upgrade the top limit of the memory. So bye bye the current cluster of four Shuttle XH61v with 16GB.

With the Supermicro X9SRH-7TF (link) you can go to 128GB easy (8x16GB) for now. It’s really just a $$$ choice. 256GB (8x32GB) is still out of reach for now, but that might change in 2 years.

I have attempted to install PernixData FVP 1.5 on my Homelab 2013 Shuttle XH61v, but the combo of the motherboard/AHCI/Realtek R8168 makes for an unstable ESXi 5.5. Sometimes the PernixData FVP Management Server sees the SSD on my host, then it looses it. I did work with PernixData engineers (and Satyam Vaghani), but my homelab is just not stable. Having been invited to the PernixPro program, doesn’t give me the right to use hours and hours of PernixData engineers time to solve my homelab issues. This has made the choice for my two X9SRH-7TF boxes much easier.

The Motherboard choice of the Supermicro X9SRH-7TF (link) is great because of the integrated management, the F in the X9SRH-7TF. Its a must these day. Having the Dual X540 Intel 10GbE Network Card on the motherboard will allow me to start using the network with a dual gigabit link,  and when I have the budget for a Netgear XS708E or XS712T it will scale to dual 10Gbase-T. In the meantime I can also have a single point-to-point 10GbE link between the two X9SRH-7TF boxes for vMotion and the PernixData data synchronization. The third component that comes on the X9SRH-7TF is the integrated LSI Storage SAS HBA, the LSI 2308 SAS2 HBA. This will allow me to build a great VSAN cluster, once I go from two to three serverss at a later date. Its very important to ensure you have a good storage adapter for VSAN. I have been using the LSI adapters for a few years and I trust them. Purchasing a motherboard, then adding the Dual X540 10GbE NIC and a LSI HBA would have cost a lot more than the X9SRH-7TF.

For the CPU, Frank Denneman (@FrankDenneman) and me came to the same conclusion, the Intel Xeon E5-1650 v2 is the perfect choice between number of cores, cache and speed. Here is an another description of the Intel Xeon E5-1650 v2 launch (CPUworld).

For the Case, I have gone just like Frank Denneman’s vSphere 5.5 home lab choice with the Fractal Design Define R4 (Black). I used a Fractal Design Arc Midi R2 for my Home NAS last summer, and I really liked the case’s flexibility, the interior design, the two SSD slots below the motherboard. I removed the default two Fractal Design Silent R2 12cm cooling fans in the case and replaced with two Noctua NH-A14 FLX fans that are even quieter, and are connected using rubber holders so they vibrate even less. It’s all about having a quiet system. The Home NAS is in the guest room, and people sleep next to it without noticing it. Also the Define R4 case is just short of 47cm in height, meaning you can lie it down in a 19″ rack if there is such a need/opportunity.

For the CPU Cooler, I ordered two Noctua NH-U12DX i4 coolers which support the Narrow ILM socket. Its a bit bigger than the NH-U9DX i4 that Frank ordered, so we will be able to compare. I burned myself last year with the Narrow ILM socket. I puchased a water cooling solution for the Home NAS and it just couldn’t fit it on the Narrow ILM socket. That was before I found out the difference between a normal square LGA2011 socket and the Narrow ILM sockets used on some of the Supermicro boards. Here is a great article that explains the differences Narrow ILM vs Square ILM LGA 2011 Heatsink Differences (ServeTheHome.com)

For the Power supply, I invested last year in an Enermax Platimax 750W for the Home NAS. This time the selection is the Enermax Revolution X’t 530W power supply. This is a very efficient 80 Gold Plus PSU. which supports ATX 12V v2.4 (can drop to 0.5W on standby) and uses the same modular connectors of my other power supplies. These smaller 500W power supplies are very efficient when they run at 20% to 50% charge. This should also be a very quiet PSU.

I made some quick calculations yesterday for the Power Consumption, I expect the max power that can be consumed by this new X9SRH-7TF build should be around 180-200W, but it should be running around the 100-120W on a normal basis. At normal usage, I should hit the 20% of the power supply load, so my Efficiency of the PSU should be at around 87%, a bit lower than Frank’s choice of the Corsair RM550. This is the reason why I attempt to take a smaller PSU rather than some of the large 800W or even 1000W PSU. 

xt_530w_efficiency

For the Memory, I’m going to reuse what I purchased last year for my Home NAS. So each box will receive 4x16GB Kingston 1600Mhz ECC for now.

My current SSDs that I will use in this rig are the Intel SSD S3700 100GB enterprise SSD and some Samsung 840 Pro 512GB. What is crucial for me in the the Intel S3700 is that its Endurance design is 10 drive writes per day for 5 years. For the 100GB, it means that its designed to write 1TB each day. This is very important for solutions like PernixData or VSAN.  Just to compare, the latest Intel Enthusiast SSD, the SSD 730 240GB that I purchased for my wife’s computer, its endurance design is set to 50GB per day for 5 years (70GB for the 480GB model). The Intel SSD 730 just like it’s Enterprise cousins (S3500 and S3700) come with a Enhanced power-loss data protection using power capacitors. The second crucial design in an Enterprise SSD, is its Sustained IOPs rating.

I’m also adding a Intel Ethernet Server Adapter I350-T2 Network Card for the vSphere Console management. I’m used to have a dedicated Console Management vNIC on my ESXi hosts. These will be configured in the old but trusty vSwitch Standard.

Another piece of equipment that I already own and that I will plug on the new X9SRH-7TF are the Mellanox ConnectX-3 Dual FDR 56Gb/s  InfiniBand Adapters I purchased last year. This will allow me to test and play with a point-to-point 56Gb/s link between the two ESXi hosts. Some interesting possibilities here…  I currently don’t have a QDR or FDR InfiniBand switch, and these switches are also very noisy, so that is something I will look at in Q3 this year.

I live in Switzerland, so my pricing will be a bit more expensive than what you find in other European countries. I’m purchasing my equipment with a large distribor in switzerland, Brack.ch . Even if the Supermicro X9SRH-7TF is not on their pricing list, they are able to order them for me. The price I got for the X9SRH-7TF is at 670 Swiss Francs, and the Intel E5-1650v2 at 630 Swiss Francs. As you see the Cost of one of these server is closing in the 1800-1900 Euro price range. I realize it’s Not Cheap. And it’s the reason of my previous article on the increase costs for a dedicated homelab, the Homelab shift…

Last but not least, in my Homelab 2013 I focus a lot on the Wife Acceptance Factor (WAF). I aimed for Small, Quiet, Efficence. This time, the only part that I will not be able to keep, is the Small. This design is still a Quiet and Efficient configuration. Lets hope I won’t get into too much problems with the wife.

I also need to thank Frank Denneman (@FrankDenneman) as we discussed extensively this home lab topic over the past 10 days, fine tuning the design on some of the choice going into this design. My prior design for the homelab 2014 might have gone with the Supermicro A1SAM-2750F without his input. A nifty little motherboard with Quad Gigabit, 64GB memory support, but lacking on the CPU performance. Thanks Frank.

The homelab shift…

I believe that we are at a point of time where we will see a shift in the vSphere homelab designs.

One homelab design, which I see as becoming more and more popular is the Nested Homelab using either a VMware Workstation or VMware Fusion base.
There are already a lot of great blogs on Nested homelabs (William Lam), and I must at least mention the excellent AutoLab project. AutoLab is a quick and easy
way to build a vSphere environment for testing and learning, and the latest release of AutoLab supports the vSphere 5.5 release.

The other homelab design is a dedicated homelab. Some of the solutions that people want to test on the homelabs are becoming larger and with more components (Horizon, vCAC), requiring more resources. So it is painful to admit, but I believe the dedicated homelab is heading towards a more expensive direction.

Let me explain my view with these two points.

The first one and the more recent one, is that if you want to lab Virtual SAN, you need to spend some non-negligible money in your lab. You need to invest in at least 3 SSDs on three hosts, and you need to invest in a storage controller that is on the VMware VSAN Hardware Compatibility List.

Recently Duncan Epping mentioned once again that unfortunately the Advanced Host Controller Interface (AHCI) standard for SATA is not supported with VSAN, and you can loose the integrity of your VSAN storage. Something that you don’t want to happen in production and loose hours of your precious time configuring VMs. Therefore if you want to lab Virtual SAN, you will need to get an storage controller that is supported. This will cost money and will limit the whitebox motherboards that support VSAN without add-on cards. I really hope that the AHCI standard will be supported in the near future, but there is no guarantee.

The second one, and the one I see as a serious trend, is network drivers support. Network drivers used in most homelab computer are not updated for the current release of vSphere (5.5) and don’t have a bright future with upcoming vSphere releases. 

VMware has started with vSphere 5.5 their migration to a new Native Driver Architecture and slowly moving away from the Linux Kernel Driver that are plugged into the VMkernel using Shims (great blog entry by Andreas Peetz on Native Driver Architecture).  

For all those users that need the Realtek R8168 driver in the current vSphere 5.5 release, they need to extract the driver from the latest vSphere 5.1 offline bundle, and need to injected the .vib driver in the vSphere 5.5 iso file. You can read more about this popular article at “Adding Realtek R8168 Driver to ESXi 5.5.0 ISO“. 

My homelab 2013 implementation uses these Realtek network cards, and the driver works good with my Shuttle XH61v.  But if you have a closer peak at the many replies to my article, a big trend seems to emerge. People use a lot of various Realtek NICs on their computers, and they have to use these R8168/R8169 drivers. Yet these drivers don’t work well for everyone. I get a lot of queries about why the drivers stop working, or are slow, but hey, I’m just a administrator that cooked a driver in the vSphere ISO, I’m not driver developer.

vSphere is a product aimed at large enterprise, so priority in the development of drivers, is to be expected for this market.  VMware seems to have dropped/lagged the development of these non-Enterprise oriented drivers. I don’t believe we will see further development of these Realtek drivers from the VMware development team, only Realtek could really pickup this job.

This brings me up to the fact that for the future, people will need to move to more professional computers/workstations and controllers if they want to keep using and learning vSphere at home on a dedicated homelab.
I really hope to be proven wrong here… So you are most welcome to reply to me that I’m completely wrong.

 

47787550

 

 

28/03/2014 Some spelling corrects and some

VSAN Lab issues due to Infiniband OpenSM failover

This isn’t really a blog where you will get a recipe on how to implement VMware Virtual SAN (VSAN) or InfiniBand technologies, but more a small account of my troubles I experienced yesterday with my infrastructure. I did publish a picture yesterday on twitter, that didn’t look to go.

VSAN Infrastructure in bad shape

Cause: Network infrastructure transporting the VSAN traffic because unavailable for 5-6 minutes

Issue: All VMs became frozen, as all Read/Write where blocked. I Powered Off all the VMs. Each VMs became an Unidentified object as seen above.

Remediation: Restarted all VSAN hosts at the same time, and let the infrastructure stabilize about 10 minutes before restarting the first VM.

I got myself into this state, because I was messing with the core networking infrastructure in my lab, this was not a VSAN product error, but a side effect of the network loss. After publishing this tweet and picture, I had a dinner that lasted a few hours, and when I got home, I simply decided to restart the four VSAN nodes at the same time, let the infrastructure simmer for 10 minutes while looking at the host logs, then I restarted my VMs.

 

Preamble.

Since beginning of December 2013, I’m running all my VMs direct from my VSAN datastore, no other iSCSI/NFS repository is used. If VSAN goes down, everything goes down (including Domain Controllers, SQL Server and vCenter).

 

Network Issue.

As some of you know, the VSAN traffic in my lab, is being transported by InfiniBand. Each host has two 20Gbps connections to the InfiniBand switches. My InfiniBand switches are described in my LonVMUG presentation about using Infiniband in the Lab. An InfiniBand fabric needs a Subnet Manager to control the various entries, I got lucky in my first InfiniBand switch purchase, I got myself a Silverstorm 9024-CU24-ST2 model from 2005.

silverstorm9024chassis

Yet the latest firmware that can be found on Intel’s 9000 Edge Managed Series website. And the latest firmware 4.2.5.5.1 from Jul 2012 now adds a hardware Subnet Manager. This is simply awesome for a switch created in 2005.

Silverstorm 9024

Silverstorm 9024

Okay, I disgress here…. bear with me. Now, not all the InfiniBand switches come with a Subnet Manager, actually only a select few and more expensive switches have this feature. What can you do, when you have an InfiniBand switch without a management stack, well you run the Software version of the Open Subnet Manager (OpenSM) directly on the ESXi host, or a dedicated Linux node.

Yesterday, I was validating a new build of the OpenSM daemon compiled by Raphael Schitz  (@Hypervisor_fr) that has some improvements. I had placed the new code on each of my VSAN nodes, and shutdown the Hardware Subnet Manager to use only the Software Hardware Manager. It worked well enough, only seeing a simple 2 second RDP interuption to the vCenter.

It was only when I attempted to fake the death of the Master OpenSM on my esx13.ebk.lab host, that I created enough fluctuation in the InfiniBand fabric, causing an outage, that I estimate to have lasted between 3 and 5 minutes. But as the InfiniBand fabric is used to transport all my VSAN traffic at high-speed, all my VMs because frozen, all IOPs suspended, leaving me only the option to connect with the vSphere C# Client to the hosts directly, wait to see if things would stabilize. Unfortunately, that did not seem to be the case after 10 minutes, so I powered off the running VMs.

Yet each of my hosts, was now disconnected from the other VSAN nodes, and the vsanDatastore was not showing at it’s usual 24TB, but at 8TB. It bit of a panic set in, and I tweeted about a Shattered VSAN Cluster.

When I came home a few hours later, I simply restarted all my four VSAN nodes (3 Storage+Compute and 1 Compute-Only), lets some synchronization take place, and I was able to restart my VMs.

 

Recommendations

These recommendations are only if you use VSAN with an InfiniBand backbone used to replicate the storage objects across nodes. If you have a InfiniBand switch which support a hardware Subnet Manager, use it. If you have an unmanaged InfiniBand switch, you need to ensure that the Subnet Manager is kept stable and always available.

If you use InfiniBand as the network backbone for vMotion or other IP over IB, the impact of having a software Subnet Manager election is not the same (HA reactivity)

I don’t have yet a better answer yet, but I know Raphael Schitz (@Hypervisor_fr) has some ideas, and we will test new OpenSM builds for this kind of issues.

 

Your comments are welcome…

 

 

 

VSAN and the LSI SAS 9300-4i Host Bus Adapter

As part of my VSAN Cluster that I’m building, I wanted to dig deeper and test hte LSI Host Bus Adapters. These cards have been used extensively in past few years with storage appliances that migrate the mangement, compute and error handling to the operating system, rather than to use RAID adapters. I have build various storage appliances using Nexenta Community Edition. Even as I speak, my office lab, is using such a Nexenta Community Edition 3.1.5 server, to provide shared storage to my vSphere 5.5 Cluster. I’ve used various LSI Host Bus Adapters in my Nexenta boxes, like the LSI SAS 9207-8i in my recent home storage, or the LSI SAS 9201-16i in my office storage. These are very reliable cards that I highly recommend.

For the implementation of the VSAN in the office lab, I have decided to turn to the latest LSI SAS 9300-4i card, so that each of my Cisco UCS C200 M2 LFF host (4x 3.5″ Disk slot), can have a powerful & stable card. The LSI SAS 9300-4i is a PCIe Generation 3 card, but it works great in my PCIe Gen2 slot. The LSI SAS3004 Chipset, supports 12Gb/s SAS connection using an (x4) internal mini-SAS (SFF8643) HD connector. The card is affordable, and should be around $245 (as advertised on the LSI store). For servers with only four disk slots (1 SSD and 3 HDD), the LSI SAS 9300-4i is a nice fit, and provide futur usage.

I added an Adapter HD-SAS Cable 2279900-R (Right Angled) to ensure, the cabling fits nicely in the 1U server.

Here is a view of a Cisco UCS 1U server with 1 SSD (Intel S3700) and 3 Seagate Constellation CS 3TB hard drives. I think this kind of server is the right configuration for the VSAN building blocks.

Cisco UCS C200 M2 LFF

Here is the view of the Storage Adapter in vSphere 5.5

Storage Adapters

The interesting thing is that the LSI SAS 9300-4i presents the four devices  in the ESXi (esx14.ebk.lab) host with a Transport Protocol “Parallel SCSI“, instead of the expect Block Adapter.

Claim Disks for VSAN Use

This has not stopped the claiming of the Disks to create a 24.55 TiB VSAN Cluster.

Virtual SAN is Turned ON

I expect another two LSI SAS 9300-4i by the end of the week, and then I will be able to start some serious VSAN scalability and performance testing (which I can’t publish due to the VSAN Beta agreement)

I’m aware that the Intel S3700 are only 100GB, and are way undersizes by the amount of total storage provided in each hosts, but I just don’t have the budget for 400GB or 800GB Intel S3700. I might test this config at some point with Samsung 840 Pro (512GB) if I see that the VSAN Observer is reporting excessive Congestion or WriteBuffer Fills. It’s going to be interesting.

At the time of the writing of this article, the AHCI bug identified in the VSAN Beta has not yet been fixed. This has contributed to the reason of my selection for the LSI SAS 9300-4i Host Bus Adapter. I have added the LSI SAS 9300-4i to the VSAN Community HCL.

 

VSAN Community HCL

A few days before VMworld 2013 Barcelona, I started the VSAN.info website to documents VSAN configurations, list whitepapers, redirect people to VSAN resources, and VSAN implementations. And it’s been a bit quiet since then on the surface. Well I’ve been working in the background to push a new feature, the VSAN Community HCL.

One of the features I wanted to add to the VSAN.info site from the get go, was a VSAN Community Hardware Compatibility List. Equipment and configurations on this list would not appear on the official VMware VSAN HCL. Now starting such a list is a very large endevour, that needs dedicated resources, probably lots of management, user management, password management, moderators. In short a lot of things to make sure that I don’t find the time to keep it up and going. Good will in new projets only take it so far, before it would have died slowly… So why try to re-invent the wheel ?

What other better place to host such a VSAN Community HCL than the offical VMware Community website in the Community Hardware Software forum. Yes, now you can head to the Community Hardware Software (CSHWSH) forum and check out which hardware & software can be used to run a VSAN environment.

Here is the direct link to View All the entries for VSAN Beta on the CSHWSW Forum. You can then select the Infrastructure to list all the vSphere 5.5 or VSAN Beta entries.

CSHWSW View All Entries VSAN Beta

When you click on the entry you will be able to see the Configuration Tested field that explains how I have designed and configured this small VSAN node.

CSHWSH Configuration Tested

 

It is now time to populate the Community Hardware Forum with your VSAN Configs.

This modification of the Community Hardware Software forum, would not have been possible without the help of Corey Romero (@vCommunityGuy) and the team managing the Communities Forums, and I also want to Thank John Troyer (@jtroyer) and Mike Laverick (@Mike_Laverick) that help facilitate my contact with Corey Romero. To all of you… THANK YOU...

 

 

 

InfiniBand install & config for vSphere 5.5

A followup to my adventures of InfiniBand in the lab... and the vBrownbag Tech Talk about InfiniBand in the lab I did at VMworld 2013 in Barcelona .

 

In this post I will cover how to install the InfiniBand drivers and various protocols in vSphere 5.5. This post and the commands below are only applicable if you are not using Mellanox ConnectX-3 VPI Host Card Adapters or if you have a InfiniBand switch with a hardware integrated Subnet Manager. Mellanox states that the ConnectX-3 VPI should allows normal IP over InfiniBand (IPoIB) connectivity with the default 1.9.7 drivers on the ESXi 5.5.0 install cdrom.

This post will be most useful to people that have the following configuration

  • Two ESXi 5.5 hosts with direct InfiniBand host-to-host connectivity (no InfiniBand switch)
  • Two/Three ESXi 5.5 hosts with InfiniBand host -to-storage connectivity (no InfiniBand switch and a storage array like Nexenta Community Edition)
  • Multiple ESXi 5.5 hosts with a InfiniBand switch that doesn’t have a Subnet Manager

The installation in these configuration is only possible since early this morning (October 22nd at 00:08 CET time), when Raphael Schitz (@hypervisor_fr) has released an updated version of the OpenSM 3.3.16-64, which was compiled in 64bit for usage on vSphere 5.5 and vSphere 5.1.

First things first… let’s rip the Mellanox 1.9.7 drivers from a new ESXi 5.5.0 install

 

Removing Mellanox 1.9.7 drivers from ESXi 5.5

Yes, the first thing to get IP over InfiniBand (for VMkernel adaptaers like vMotion or VSAN) or SCSI RDMA Protocol (SRP) is to remove the new Mellanox 1.9.7 drivers from the newly install ESXi 5.5.0. The driver don’t work with the older Mellanox OFED 1.8.2 package, and the new OFED 2.0 package is pending… Lets cross finger for an early 2014 release.

You need to connect using SSH to your ESXi 5.5 host, and run the following command and you will need to reboot the host for the driver to be removed from the memory.

  • esxcli software vib remove -n=net-mlx4-en -n=net-mlx4-core
  • reboot the ESXi host

esxcli software vib remove

 

Installing Mellanox 1.61 drivers, OFED and OpenSM

After the reboot you will need to download the following files and copy them to the /tmp on the ESXi 5.5 host

  1. VMware ESXi 5.0 Driver 1.6.1 for Mellanox ConnectX Ethernet Adapters (Requires myVMware login)
  2. Mellanox InfiniBand OFED 1.8.2 Driver for VMware vSphere 5.x
  3. OpenFabrics.org Enterprise Distribution’s OpenSM 3.3.16-64 for VMware vSphere 5.5 (x86_64) packaged by Raphael Schitz

Once the files are in /tmp or if you want to keep a copy on the shared storage, you will need to unzip the Mellanox 1.6.1 driver file. Careful with the ib-opensm-3.3.16-64, the esxcli -d becomes a -v for the vib during the install. The other change since vSphere 5.1, is that we need to set the esxcli software acceptance level to CommunitySupported level, to install some of the drivers and binaries.

The commands are

  • unzip mlx4_en-mlnx-1.6.1.2-471530.zip
  • esxcli software acceptance set –level=CommunitySupported
  • esxcli software vib install -d /tmp/mlx4_en-mlnx-1.6.1.2-offline_bundle-471530.zip –no-sig-check
  • esxcli software vib install -d /tmp/MLNX-OFED-ESX-1.8.2.0.zip –no-sig-check
  • esxcli software vib install -v /tmp/ib-opensm-3.3.16-64.x86_64.vib –no-sig-check
  • reboot the ESXi host

esxcli software vib install infiniband

 

Setting MTU and Configuring OpenSM

After the reboot we have two more commands to pass.

  • esxcli system module paramters set -m=mlx4_core -p=mtu_4k=1
  • copy partitions.conf  /scratch/opensm/<adapter_1_hca>/
  • copy partitions.conf /scratch/opensm/<adapter_2_hca>/

The partitions.conf file only contains the following text:

  • Default=0x7fff,ipoib,mtu=5:ALL=full;

 

cp partitions.conf

I recommend that you check the state of your InfiniBand adapters (mlx4_0) using the following command

  • ./opt/opensm/bin/ibstat mlx4_0

ibstat mlx4_0

I also recommend that you write down the adapter HCA Port GUID numbers if you are going to use SCSI RDMA Protocol between the ESXi host and a storage array with SCSI RDMA Protocol. It will come in handy later (and in an upcoming post).

Now you are ready to add the new adapters to a vSwitch/dvSwitch and create the VMkernel adapters. Here is the current config for vMotion, VSAN and Fault Tolerance on a dual 20Gbps IB Adapters (which only costs $50!)

vSwitch1 with IB VMkernels

I aim to put the various vmkernel traffics in their own VLANs, but I still need to dig in the partitions.conf file.

 

If you have an older switch that does not support a MTU of 4K, make sure you set your vSwitch/dvSwitch to a MTU of 2044 (2048-4 bytes) and the same for the various VMkernel interfaces.

VMkernel MTU at 2044

 

 

Here is just a Quick Glossary about the various protocols that can use the InfiniBand fabric.

 What is IPoIB ?

IPoIB (IP-over-InfiniBand) is a protocol that defines how to send IP packets over IB; and for example Linux has an “ib_ipoib” driver that implements this protocol. This driver creates a network interface for each InfiniBand port on the system, which makes an Host Card Adapter (HCA) act like an ordinary Network Interface Card (NIC).

IPoIB does not make full use of the HCAs capabilities; network traffic goes through the normal IP stack, which means a system call is required for every message and the host CPU must handle breaking data up into packets, etc. However it does mean that applications that use normal IP sockets will work on top of the full speed of the IB link (although the CPU will probably not be able to run the IP stack fast enough to use a 32 Gb/sec QDR IB link).

Since IPoIB provides a normal IP NIC interface, one can run TCP (or UDP) sockets on top of it. TCP throughput well over 10 Gb/sec is possible using recent systems, but this will burn a fair amount of CPU.

 

 What is SRP ?

The SCSI RDMA Protocol (SRP) is a protocol that allows one computer to access SCSI devices attached to another computer via remote direct memory access (RDMA).The SRP protocol is also known as the SCSI Remote Protocol. The use of RDMA makes higher throughput and lower latency possible than what is possible through e.g. the TCP/IP communication protocol. RDMA is only possible with network adapters that support RDMA in hardware. Examples of such network adapters are InfiniBand HCAs and 10 GbE network adapters with iWARP support. While the SRP protocol has been designed to use RDMA networks efficiently, it is also possible to implement the SRP protocol over networks that do not support RDMA.

As with the ISCSI Extensions for RDMA (iSER) communication protocol, there is the notion of a target (a system that stores the data) and an initiator (a client accessing the target) with the target performing the actual data movement. In other words, when a user writes to a target, the target actually executes a read from the initiator and when a user issues a read, the target executes a write to the initiator.

While the SRP protocol is easier to implement than the iSER protocol, iSER offers more management functionality, e.g. the target discovery infrastructure enabled by the iSCSI protocol. Furthermore, the SRP protocol never made it into an official standard. The latest draft of the SRP protocol, revision 16a, dates from July 3, 2002

 

What is iSER ?

The iSCSI Extensions for RDMA (iSER) is a computer network protocol that extends the Internet Small Computer System Interface (iSCSI) protocol to use Remote Direct Memory Access (RDMA). Typically RDMA is provided by either the Transmission Control Protocol (TCP) with RDMA services (iWARP) or InfiniBand. It permits data to be transferred directly into and out of SCSI computer memory buffers (which connects computers to storage devices) without intermediate data copies.

The motivation for iSER is to use RDMA to avoid unnecessary data copying on the target and initiator. The Datamover Architecture (DA) defines an abstract model in which the movement of data between iSCSI end nodes is logically separated from the rest of the iSCSI protocol; iSER is one Datamover protocol. The interface between the iSCSI and a Datamover protocol, iSER in this case, is called Datamover Interface (DI).

 

vBrownbag TechTalk “InfiniBand in the Lab” presentation.

For the past few weeks I have slowly begun to build a working InfiniBand infrastructure on my vSphere cluster hosted in the office. I’m still missing some cables. With VMworld 2013 EMEA in Barcelona behind us, I’ve now got the time to publish the presentation I did in the Community zone for the vBrownbag Tech Talks. On Tuesday noon, I was the first one to start the series of Tech Talk and the infrastructure to record and process the video/audio feed had not been tuned properly. Unfortunately you will see this in the video link of the presentation. So in my video, the first 2 minutes 08 seconds, the audio is just horible… So I URGE you to jump into the video at the 3 minute mark if you value your ears.

Here is the direct link to the Tech Talk about “InfiniBand in the Lab” and the link to the other Tech Talks done at VMworld 2013 EMEA.

I’m not used to doing a presentation sitting in front of multiple cameras. Some of the later slides are too fuzzy on the video, so I’m now publishing the presentation in this article.

InfiniBand_in_the_Lab

 

The InfiniBands Host Card Adapters (HCA) for Dual 20Gbps ports (DDR Speed) can be found on ebay for $50 or $35 pounds.

I hope this video link and the presentation will be useful to some of you that want to increase an intra vSphere cluster backbone for the vMotion, Fault Tolerance or VSAN traffic.

I enjoyed doing the presentation, as I have to thank the following people making this presentation possible : Raphael Schitz,William Lam, Vladan Seget, Gregory Roche