Homelab 2014 upgrade

I’ve been looking for a while for a new more powerful homelab (for home), that scales and passes the limits I currently have. I had a great success last year with the Supermicro X9SRL-F motherboard for the Home NAS (Running NexentaStor 3.1.5), so I know I loved the Supermicro X9 Single LGA2011 series. Because of the Intel C600 series of chipset, you can break the barrier of the 32GB you find on most motherboards (Otherwise the X79 chipset allows you upto 64GB).

As time passes, and you see product solutions coming out (vCOPS, Horizon View, vCAC, DeepSecurity, ProtectV, Veeam VBR, Zerto) with memory requirements just exploding. You need more and more memory. I’m done with the homelab, where you really need to upgrade just because you can’t upgrade the top limit of the memory. So bye bye the current cluster of four Shuttle XH61v with 16GB.

With the Supermicro X9SRH-7TF (link) you can go to 128GB easy (8x16GB) for now. It’s really just a $$$ choice. 256GB (8x32GB) is still out of reach for now, but that might change in 2 years.

I have attempted to install PernixData FVP 1.5 on my Homelab 2013 Shuttle XH61v, but the combo of the motherboard/AHCI/Realtek R8168 makes for an unstable ESXi 5.5. Sometimes the PernixData FVP Management Server sees the SSD on my host, then it looses it. I did work with PernixData engineers (and Satyam Vaghani), but my homelab is just not stable. Having been invited to the PernixPro program, doesn’t give me the right to use hours and hours of PernixData engineers time to solve my homelab issues. This has made the choice for my two X9SRH-7TF boxes much easier.

The Motherboard choice of the Supermicro X9SRH-7TF (link) is great because of the integrated management, the F in the X9SRH-7TF. Its a must these day. Having the Dual X540 Intel 10GbE Network Card on the motherboard will allow me to start using the network with a dual gigabit link,  and when I have the budget for a Netgear XS708E or XS712T it will scale to dual 10Gbase-T. In the meantime I can also have a single point-to-point 10GbE link between the two X9SRH-7TF boxes for vMotion and the PernixData data synchronization. The third component that comes on the X9SRH-7TF is the integrated LSI Storage SAS HBA, the LSI 2308 SAS2 HBA. This will allow me to build a great VSAN cluster, once I go from two to three serverss at a later date. Its very important to ensure you have a good storage adapter for VSAN. I have been using the LSI adapters for a few years and I trust them. Purchasing a motherboard, then adding the Dual X540 10GbE NIC and a LSI HBA would have cost a lot more than the X9SRH-7TF.

For the CPU, Frank Denneman (@FrankDenneman) and me came to the same conclusion, the Intel Xeon E5-1650 v2 is the perfect choice between number of cores, cache and speed. Here is an another description of the Intel Xeon E5-1650 v2 launch (CPUworld).

For the Case, I have gone just like Frank Denneman’s vSphere 5.5 home lab choice with the Fractal Design Define R4 (Black). I used a Fractal Design Arc Midi R2 for my Home NAS last summer, and I really liked the case’s flexibility, the interior design, the two SSD slots below the motherboard. I removed the default two Fractal Design Silent R2 12cm cooling fans in the case and replaced with two Noctua NH-A14 FLX fans that are even quieter, and are connected using rubber holders so they vibrate even less. It’s all about having a quiet system. The Home NAS is in the guest room, and people sleep next to it without noticing it. Also the Define R4 case is just short of 47cm in height, meaning you can lie it down in a 19″ rack if there is such a need/opportunity.

For the CPU Cooler, I ordered two Noctua NH-U12DX i4 coolers which support the Narrow ILM socket. Its a bit bigger than the NH-U9DX i4 that Frank ordered, so we will be able to compare. I burned myself last year with the Narrow ILM socket. I puchased a water cooling solution for the Home NAS and it just couldn’t fit it on the Narrow ILM socket. That was before I found out the difference between a normal square LGA2011 socket and the Narrow ILM sockets used on some of the Supermicro boards. Here is a great article that explains the differences Narrow ILM vs Square ILM LGA 2011 Heatsink Differences (ServeTheHome.com)

For the Power supply, I invested last year in an Enermax Platimax 750W for the Home NAS. This time the selection is the Enermax Revolution X’t 530W power supply. This is a very efficient 80 Gold Plus PSU. which supports ATX 12V v2.4 (can drop to 0.5W on standby) and uses the same modular connectors of my other power supplies. These smaller 500W power supplies are very efficient when they run at 20% to 50% charge. This should also be a very quiet PSU.

I made some quick calculations yesterday for the Power Consumption, I expect the max power that can be consumed by this new X9SRH-7TF build should be around 180-200W, but it should be running around the 100-120W on a normal basis. At normal usage, I should hit the 20% of the power supply load, so my Efficiency of the PSU should be at around 87%, a bit lower than Frank’s choice of the Corsair RM550. This is the reason why I attempt to take a smaller PSU rather than some of the large 800W or even 1000W PSU. 

xt_530w_efficiency

For the Memory, I’m going to reuse what I purchased last year for my Home NAS. So each box will receive 4x16GB Kingston 1600Mhz ECC for now.

My current SSDs that I will use in this rig are the Intel SSD S3700 100GB enterprise SSD and some Samsung 840 Pro 512GB. What is crucial for me in the the Intel S3700 is that its Endurance design is 10 drive writes per day for 5 years. For the 100GB, it means that its designed to write 1TB each day. This is very important for solutions like PernixData or VSAN.  Just to compare, the latest Intel Enthusiast SSD, the SSD 730 240GB that I purchased for my wife’s computer, its endurance design is set to 50GB per day for 5 years (70GB for the 480GB model). The Intel SSD 730 just like it’s Enterprise cousins (S3500 and S3700) come with a Enhanced power-loss data protection using power capacitors. The second crucial design in an Enterprise SSD, is its Sustained IOPs rating.

I’m also adding a Intel Ethernet Server Adapter I350-T2 Network Card for the vSphere Console management. I’m used to have a dedicated Console Management vNIC on my ESXi hosts. These will be configured in the old but trusty vSwitch Standard.

Another piece of equipment that I already own and that I will plug on the new X9SRH-7TF are the Mellanox ConnectX-3 Dual FDR 56Gb/s  InfiniBand Adapters I purchased last year. This will allow me to test and play with a point-to-point 56Gb/s link between the two ESXi hosts. Some interesting possibilities here…  I currently don’t have a QDR or FDR InfiniBand switch, and these switches are also very noisy, so that is something I will look at in Q3 this year.

I live in Switzerland, so my pricing will be a bit more expensive than what you find in other European countries. I’m purchasing my equipment with a large distribor in switzerland, Brack.ch . Even if the Supermicro X9SRH-7TF is not on their pricing list, they are able to order them for me. The price I got for the X9SRH-7TF is at 670 Swiss Francs, and the Intel E5-1650v2 at 630 Swiss Francs. As you see the Cost of one of these server is closing in the 1800-1900 Euro price range. I realize it’s Not Cheap. And it’s the reason of my previous article on the increase costs for a dedicated homelab, the Homelab shift…

Last but not least, in my Homelab 2013 I focus a lot on the Wife Acceptance Factor (WAF). I aimed for Small, Quiet, Efficence. This time, the only part that I will not be able to keep, is the Small. This design is still a Quiet and Efficient configuration. Lets hope I won’t get into too much problems with the wife.

I also need to thank Frank Denneman (@FrankDenneman) as we discussed extensively this home lab topic over the past 10 days, fine tuning the design on some of the choice going into this design. My prior design for the homelab 2014 might have gone with the Supermicro A1SAM-2750F without his input. A nifty little motherboard with Quad Gigabit, 64GB memory support, but lacking on the CPU performance. Thanks Frank.

The homelab shift…

I believe that we are at a point of time where we will see a shift in the vSphere homelab designs.

One homelab design, which I see as becoming more and more popular is the Nested Homelab using either a VMware Workstation or VMware Fusion base.
There are already a lot of great blogs on Nested homelabs (William Lam), and I must at least mention the excellent AutoLab project. AutoLab is a quick and easy
way to build a vSphere environment for testing and learning, and the latest release of AutoLab supports the vSphere 5.5 release.

The other homelab design is a dedicated homelab. Some of the solutions that people want to test on the homelabs are becoming larger and with more components (Horizon, vCAC), requiring more resources. So it is painful to admit, but I believe the dedicated homelab is heading towards a more expensive direction.

Let me explain my view with these two points.

The first one and the more recent one, is that if you want to lab Virtual SAN, you need to spend some non-negligible money in your lab. You need to invest in at least 3 SSDs on three hosts, and you need to invest in a storage controller that is on the VMware VSAN Hardware Compatibility List.

Recently Duncan Epping mentioned once again that unfortunately the Advanced Host Controller Interface (AHCI) standard for SATA is not supported with VSAN, and you can loose the integrity of your VSAN storage. Something that you don’t want to happen in production and loose hours of your precious time configuring VMs. Therefore if you want to lab Virtual SAN, you will need to get an storage controller that is supported. This will cost money and will limit the whitebox motherboards that support VSAN without add-on cards. I really hope that the AHCI standard will be supported in the near future, but there is no guarantee.

The second one, and the one I see as a serious trend, is network drivers support. Network drivers used in most homelab computer are not updated for the current release of vSphere (5.5) and don’t have a bright future with upcoming vSphere releases. 

VMware has started with vSphere 5.5 their migration to a new Native Driver Architecture and slowly moving away from the Linux Kernel Driver that are plugged into the VMkernel using Shims (great blog entry by Andreas Peetz on Native Driver Architecture).  

For all those users that need the Realtek R8168 driver in the current vSphere 5.5 release, they need to extract the driver from the latest vSphere 5.1 offline bundle, and need to injected the .vib driver in the vSphere 5.5 iso file. You can read more about this popular article at “Adding Realtek R8168 Driver to ESXi 5.5.0 ISO“. 

My homelab 2013 implementation uses these Realtek network cards, and the driver works good with my Shuttle XH61v.  But if you have a closer peak at the many replies to my article, a big trend seems to emerge. People use a lot of various Realtek NICs on their computers, and they have to use these R8168/R8169 drivers. Yet these drivers don’t work well for everyone. I get a lot of queries about why the drivers stop working, or are slow, but hey, I’m just a administrator that cooked a driver in the vSphere ISO, I’m not driver developer.

vSphere is a product aimed at large enterprise, so priority in the development of drivers, is to be expected for this market.  VMware seems to have dropped/lagged the development of these non-Enterprise oriented drivers. I don’t believe we will see further development of these Realtek drivers from the VMware development team, only Realtek could really pickup this job.

This brings me up to the fact that for the future, people will need to move to more professional computers/workstations and controllers if they want to keep using and learning vSphere at home on a dedicated homelab.
I really hope to be proven wrong here… So you are most welcome to reply to me that I’m completely wrong.

 

47787550

 

 

28/03/2014 Some spelling corrects and some

VSAN Lab issues due to Infiniband OpenSM failover

This isn’t really a blog where you will get a recipe on how to implement VMware Virtual SAN (VSAN) or InfiniBand technologies, but more a small account of my troubles I experienced yesterday with my infrastructure. I did publish a picture yesterday on twitter, that didn’t look to go.

VSAN Infrastructure in bad shape

Cause: Network infrastructure transporting the VSAN traffic because unavailable for 5-6 minutes

Issue: All VMs became frozen, as all Read/Write where blocked. I Powered Off all the VMs. Each VMs became an Unidentified object as seen above.

Remediation: Restarted all VSAN hosts at the same time, and let the infrastructure stabilize about 10 minutes before restarting the first VM.

I got myself into this state, because I was messing with the core networking infrastructure in my lab, this was not a VSAN product error, but a side effect of the network loss. After publishing this tweet and picture, I had a dinner that lasted a few hours, and when I got home, I simply decided to restart the four VSAN nodes at the same time, let the infrastructure simmer for 10 minutes while looking at the host logs, then I restarted my VMs.

 

Preamble.

Since beginning of December 2013, I’m running all my VMs direct from my VSAN datastore, no other iSCSI/NFS repository is used. If VSAN goes down, everything goes down (including Domain Controllers, SQL Server and vCenter).

 

Network Issue.

As some of you know, the VSAN traffic in my lab, is being transported by InfiniBand. Each host has two 20Gbps connections to the InfiniBand switches. My InfiniBand switches are described in my LonVMUG presentation about using Infiniband in the Lab. An InfiniBand fabric needs a Subnet Manager to control the various entries, I got lucky in my first InfiniBand switch purchase, I got myself a Silverstorm 9024-CU24-ST2 model from 2005.

silverstorm9024chassis

Yet the latest firmware that can be found on Intel’s 9000 Edge Managed Series website. And the latest firmware 4.2.5.5.1 from Jul 2012 now adds a hardware Subnet Manager. This is simply awesome for a switch created in 2005.

Silverstorm 9024

Silverstorm 9024

Okay, I disgress here…. bear with me. Now, not all the InfiniBand switches come with a Subnet Manager, actually only a select few and more expensive switches have this feature. What can you do, when you have an InfiniBand switch without a management stack, well you run the Software version of the Open Subnet Manager (OpenSM) directly on the ESXi host, or a dedicated Linux node.

Yesterday, I was validating a new build of the OpenSM daemon compiled by Raphael Schitz  (@Hypervisor_fr) that has some improvements. I had placed the new code on each of my VSAN nodes, and shutdown the Hardware Subnet Manager to use only the Software Hardware Manager. It worked well enough, only seeing a simple 2 second RDP interuption to the vCenter.

It was only when I attempted to fake the death of the Master OpenSM on my esx13.ebk.lab host, that I created enough fluctuation in the InfiniBand fabric, causing an outage, that I estimate to have lasted between 3 and 5 minutes. But as the InfiniBand fabric is used to transport all my VSAN traffic at high-speed, all my VMs because frozen, all IOPs suspended, leaving me only the option to connect with the vSphere C# Client to the hosts directly, wait to see if things would stabilize. Unfortunately, that did not seem to be the case after 10 minutes, so I powered off the running VMs.

Yet each of my hosts, was now disconnected from the other VSAN nodes, and the vsanDatastore was not showing at it’s usual 24TB, but at 8TB. It bit of a panic set in, and I tweeted about a Shattered VSAN Cluster.

When I came home a few hours later, I simply restarted all my four VSAN nodes (3 Storage+Compute and 1 Compute-Only), lets some synchronization take place, and I was able to restart my VMs.

 

Recommendations

These recommendations are only if you use VSAN with an InfiniBand backbone used to replicate the storage objects across nodes. If you have a InfiniBand switch which support a hardware Subnet Manager, use it. If you have an unmanaged InfiniBand switch, you need to ensure that the Subnet Manager is kept stable and always available.

If you use InfiniBand as the network backbone for vMotion or other IP over IB, the impact of having a software Subnet Manager election is not the same (HA reactivity)

I don’t have yet a better answer yet, but I know Raphael Schitz (@Hypervisor_fr) has some ideas, and we will test new OpenSM builds for this kind of issues.

 

Your comments are welcome…

 

 

 

VSAN and the LSI SAS 9300-4i Host Bus Adapter

As part of my VSAN Cluster that I’m building, I wanted to dig deeper and test hte LSI Host Bus Adapters. These cards have been used extensively in past few years with storage appliances that migrate the mangement, compute and error handling to the operating system, rather than to use RAID adapters. I have build various storage appliances using Nexenta Community Edition. Even as I speak, my office lab, is using such a Nexenta Community Edition 3.1.5 server, to provide shared storage to my vSphere 5.5 Cluster. I’ve used various LSI Host Bus Adapters in my Nexenta boxes, like the LSI SAS 9207-8i in my recent home storage, or the LSI SAS 9201-16i in my office storage. These are very reliable cards that I highly recommend.

For the implementation of the VSAN in the office lab, I have decided to turn to the latest LSI SAS 9300-4i card, so that each of my Cisco UCS C200 M2 LFF host (4x 3.5″ Disk slot), can have a powerful & stable card. The LSI SAS 9300-4i is a PCIe Generation 3 card, but it works great in my PCIe Gen2 slot. The LSI SAS3004 Chipset, supports 12Gb/s SAS connection using an (x4) internal mini-SAS (SFF8643) HD connector. The card is affordable, and should be around $245 (as advertised on the LSI store). For servers with only four disk slots (1 SSD and 3 HDD), the LSI SAS 9300-4i is a nice fit, and provide futur usage.

I added an Adapter HD-SAS Cable 2279900-R (Right Angled) to ensure, the cabling fits nicely in the 1U server.

Here is a view of a Cisco UCS 1U server with 1 SSD (Intel S3700) and 3 Seagate Constellation CS 3TB hard drives. I think this kind of server is the right configuration for the VSAN building blocks.

Cisco UCS C200 M2 LFF

Here is the view of the Storage Adapter in vSphere 5.5

Storage Adapters

The interesting thing is that the LSI SAS 9300-4i presents the four devices  in the ESXi (esx14.ebk.lab) host with a Transport Protocol “Parallel SCSI“, instead of the expect Block Adapter.

Claim Disks for VSAN Use

This has not stopped the claiming of the Disks to create a 24.55 TiB VSAN Cluster.

Virtual SAN is Turned ON

I expect another two LSI SAS 9300-4i by the end of the week, and then I will be able to start some serious VSAN scalability and performance testing (which I can’t publish due to the VSAN Beta agreement)

I’m aware that the Intel S3700 are only 100GB, and are way undersizes by the amount of total storage provided in each hosts, but I just don’t have the budget for 400GB or 800GB Intel S3700. I might test this config at some point with Samsung 840 Pro (512GB) if I see that the VSAN Observer is reporting excessive Congestion or WriteBuffer Fills. It’s going to be interesting.

At the time of the writing of this article, the AHCI bug identified in the VSAN Beta has not yet been fixed. This has contributed to the reason of my selection for the LSI SAS 9300-4i Host Bus Adapter. I have added the LSI SAS 9300-4i to the VSAN Community HCL.

 

HA Configuration error following VIB install.

I ran into a strange problem with my InfiniBand infrastructure. For some reason, one of my ESXi host would not enter a configured HA mode. I kept getting the “Cannot upgrade vCenter agent on host. Unknow installer error”.

Unable configuring vSphere HA

In my post on how to installing InfiniBand and configuring it on vSphere 5.5, I make a mention of applying the esxcli software acceptance set –level==CommunitySupported command before installing the various drivers, InfiniBand protocol stack (OFED) and the OpenSM package. I must have forgotten to launch that command on a single of my ESXi host. Yet, I was able to install all the InfiniBand drivers.

While trying to find a solution, I came across this VMware KnowledgeBase article kb.vmware.com/kb/2032101 Configuring HA on an ESXi 5.x host fails with the error: Cannot install the vCenter agent service. Unknown installer error  (2032101)

At the bottom of the page, I find the summary, thatthis issue occurs due to acceptance level issues with the vSphere Installation Bundle (VIB) and the host, and to change the host acceptance level to the VIB’s acceptance level.

so I try, to change the acceptance level using esxcli software acceptance set –level==CommunitySupported

But it cannot change the acceptance level because I still have ib-opensm-3.3.16-64 installed. So the trick was to remove the OpenSM VIB, change the host acceptance level for the VIB, reboot the host, make sure VMware HA can be enabled and I reinstalled the ib-opensm-3.3.16-64 VIB afterwards.

Remove ib-opensm dryrun

esxcli software vib remove -n ib-opensm

esxcli software acceptance set –level==CommunitySupported

reboot

Ensure vSphere HA can be installed

esxcli software vib install -v /tmp/ib-opensm-3.3.16-64.vib –no-sig-check

 

Hope this can save someone some trouble at some point.

 

 

 

 

 

 

VSAN Community HCL

A few days before VMworld 2013 Barcelona, I started the VSAN.info website to documents VSAN configurations, list whitepapers, redirect people to VSAN resources, and VSAN implementations. And it’s been a bit quiet since then on the surface. Well I’ve been working in the background to push a new feature, the VSAN Community HCL.

One of the features I wanted to add to the VSAN.info site from the get go, was a VSAN Community Hardware Compatibility List. Equipment and configurations on this list would not appear on the official VMware VSAN HCL. Now starting such a list is a very large endevour, that needs dedicated resources, probably lots of management, user management, password management, moderators. In short a lot of things to make sure that I don’t find the time to keep it up and going. Good will in new projets only take it so far, before it would have died slowly… So why try to re-invent the wheel ?

What other better place to host such a VSAN Community HCL than the offical VMware Community website in the Community Hardware Software forum. Yes, now you can head to the Community Hardware Software (CSHWSH) forum and check out which hardware & software can be used to run a VSAN environment.

Here is the direct link to View All the entries for VSAN Beta on the CSHWSW Forum. You can then select the Infrastructure to list all the vSphere 5.5 or VSAN Beta entries.

CSHWSW View All Entries VSAN Beta

When you click on the entry you will be able to see the Configuration Tested field that explains how I have designed and configured this small VSAN node.

CSHWSH Configuration Tested

 

It is now time to populate the Community Hardware Forum with your VSAN Configs.

This modification of the Community Hardware Software forum, would not have been possible without the help of Corey Romero (@vCommunityGuy) and the team managing the Communities Forums, and I also want to Thank John Troyer (@jtroyer) and Mike Laverick (@Mike_Laverick) that help facilitate my contact with Corey Romero. To all of you… THANK YOU...

 

 

 

NVIDIA Cards List for VMware Horizon View 3D Acceleration

During VMworld 2013 in Barcelona, I spend some time on the NVIDIA booth trying to figure out more about which graphic cards are supported for accelerating the End User Computing solutions.Wanting to lab and test the acceleration provided by using a Dedicated graphic cards for my View Desktop, I took a closer look at the NVIDIA cards that support the feature. This article will give you a list of the official cards, the ones shown to work and a list of cards that might work.

The VMware official Hardware Compatibility List for vSGA is here, and you will quickly see that a lot of cards are missing.

Before I go further I want to point out to some excellent resources, like these white papers:

With VMware Horizon View we can user a segmentation in the performance needed by the users. From Task Workers to Workstation Users.

Virtual Desktop User Segmentation

Here is another quick table that explains the differences.

Graphics Driver Comparison

Now let’s get to the meat of this article, the list of the current generation of NVIDIA Cards that support the Virtual Shared Graphics Acceleration (vSGA) and Virtual Dedicated Graphics Acceleration (vDGA).

NVIDIA Cards and EUC Solutions

 

I have added two cards to this list following a discussion and followup email exchange with a NVIDIA representative, and from my observations of the Hewlett-Packard demo running a Quadro 3000M graphics card in the blade, running vDGA on ESXi. So I added the Quadro 3000M and the Quadro 1000M.

Why you ask ?

Well there are some Blades that can be configured with these cards, as well as some customized Laptops.

So while enterprise customers will be looking to NVIDIA Kepler 5000/6000 or NVIDIA GRID K1 and K2 cards, the Small Business market and professionals could get VMware Horizon View running with Quadro 4000 or Quadro 2000.

As for the laptop I just mentionned before, the Clevo P570WM, a true beast with 32GB, Xeon E5-1650 6-Cores and upto 4 SSDs and two GPU , can be purchased from resellers like Sager Notebook (US), Eurocom Panther 5SE (Canada), MySchenker (UK & Germany). The Clevo P570WN is able to run vSphere 5.1 natively, as it comes with a Intel 82579V gigabit network connection.

I’m currently out of budget for the rest of 2013, so I won’t try to lab vDGA in my ESXi with Quadro 4000 cards yet…

Hope this article can be useful in listing the NVIDIA Cards that are supported or probably work with Virtual Dedicated Graphics Acceleration (vDGA)

 

 

InfiniBand install & config for vSphere 5.5

A followup to my adventures of InfiniBand in the lab... and the vBrownbag Tech Talk about InfiniBand in the lab I did at VMworld 2013 in Barcelona .

 

In this post I will cover how to install the InfiniBand drivers and various protocols in vSphere 5.5. This post and the commands below are only applicable if you are not using Mellanox ConnectX-3 VPI Host Card Adapters or if you have a InfiniBand switch with a hardware integrated Subnet Manager. Mellanox states that the ConnectX-3 VPI should allows normal IP over InfiniBand (IPoIB) connectivity with the default 1.9.7 drivers on the ESXi 5.5.0 install cdrom.

This post will be most useful to people that have the following configuration

  • Two ESXi 5.5 hosts with direct InfiniBand host-to-host connectivity (no InfiniBand switch)
  • Two/Three ESXi 5.5 hosts with InfiniBand host -to-storage connectivity (no InfiniBand switch and a storage array like Nexenta Community Edition)
  • Multiple ESXi 5.5 hosts with a InfiniBand switch that doesn’t have a Subnet Manager

The installation in these configuration is only possible since early this morning (October 22nd at 00:08 CET time), when Raphael Schitz (@hypervisor_fr) has released an updated version of the OpenSM 3.3.16-64, which was compiled in 64bit for usage on vSphere 5.5 and vSphere 5.1.

First things first… let’s rip the Mellanox 1.9.7 drivers from a new ESXi 5.5.0 install

 

Removing Mellanox 1.9.7 drivers from ESXi 5.5

Yes, the first thing to get IP over InfiniBand (for VMkernel adaptaers like vMotion or VSAN) or SCSI RDMA Protocol (SRP) is to remove the new Mellanox 1.9.7 drivers from the newly install ESXi 5.5.0. The driver don’t work with the older Mellanox OFED 1.8.2 package, and the new OFED 2.0 package is pending… Lets cross finger for an early 2014 release.

You need to connect using SSH to your ESXi 5.5 host, and run the following command and you will need to reboot the host for the driver to be removed from the memory.

  • esxcli software vib remove -n=net-mlx4-en -n=net-mlx4-core
  • reboot the ESXi host

esxcli software vib remove

 

Installing Mellanox 1.61 drivers, OFED and OpenSM

After the reboot you will need to download the following files and copy them to the /tmp on the ESXi 5.5 host

  1. VMware ESXi 5.0 Driver 1.6.1 for Mellanox ConnectX Ethernet Adapters (Requires myVMware login)
  2. Mellanox InfiniBand OFED 1.8.2 Driver for VMware vSphere 5.x
  3. OpenFabrics.org Enterprise Distribution’s OpenSM 3.3.16-64 for VMware vSphere 5.5 (x86_64) packaged by Raphael Schitz

Once the files are in /tmp or if you want to keep a copy on the shared storage, you will need to unzip the Mellanox 1.6.1 driver file. Careful with the ib-opensm-3.3.16-64, the esxcli -d becomes a -v for the vib during the install. The other change since vSphere 5.1, is that we need to set the esxcli software acceptance level to CommunitySupported level, to install some of the drivers and binaries.

The commands are

  • unzip mlx4_en-mlnx-1.6.1.2-471530.zip
  • esxcli software acceptance set –level=CommunitySupported
  • esxcli software vib install -d /tmp/mlx4_en-mlnx-1.6.1.2-offline_bundle-471530.zip –no-sig-check
  • esxcli software vib install -d /tmp/MLNX-OFED-ESX-1.8.2.0.zip –no-sig-check
  • esxcli software vib install -v /tmp/ib-opensm-3.3.16-64.x86_64.vib –no-sig-check
  • reboot the ESXi host

esxcli software vib install infiniband

 

Setting MTU and Configuring OpenSM

After the reboot we have two more commands to pass.

  • esxcli system module paramters set -m=mlx4_core -p=mtu_4k=1
  • copy partitions.conf  /scratch/opensm/<adapter_1_hca>/
  • copy partitions.conf /scratch/opensm/<adapter_2_hca>/

The partitions.conf file only contains the following text:

  • Default=0x7fff,ipoib,mtu=5:ALL=full;

 

cp partitions.conf

I recommend that you check the state of your InfiniBand adapters (mlx4_0) using the following command

  • ./opt/opensm/bin/ibstat mlx4_0

ibstat mlx4_0

I also recommend that you write down the adapter HCA Port GUID numbers if you are going to use SCSI RDMA Protocol between the ESXi host and a storage array with SCSI RDMA Protocol. It will come in handy later (and in an upcoming post).

Now you are ready to add the new adapters to a vSwitch/dvSwitch and create the VMkernel adapters. Here is the current config for vMotion, VSAN and Fault Tolerance on a dual 20Gbps IB Adapters (which only costs $50!)

vSwitch1 with IB VMkernels

I aim to put the various vmkernel traffics in their own VLANs, but I still need to dig in the partitions.conf file.

 

If you have an older switch that does not support a MTU of 4K, make sure you set your vSwitch/dvSwitch to a MTU of 2044 (2048-4 bytes) and the same for the various VMkernel interfaces.

VMkernel MTU at 2044

 

 

Here is just a Quick Glossary about the various protocols that can use the InfiniBand fabric.

 What is IPoIB ?

IPoIB (IP-over-InfiniBand) is a protocol that defines how to send IP packets over IB; and for example Linux has an “ib_ipoib” driver that implements this protocol. This driver creates a network interface for each InfiniBand port on the system, which makes an Host Card Adapter (HCA) act like an ordinary Network Interface Card (NIC).

IPoIB does not make full use of the HCAs capabilities; network traffic goes through the normal IP stack, which means a system call is required for every message and the host CPU must handle breaking data up into packets, etc. However it does mean that applications that use normal IP sockets will work on top of the full speed of the IB link (although the CPU will probably not be able to run the IP stack fast enough to use a 32 Gb/sec QDR IB link).

Since IPoIB provides a normal IP NIC interface, one can run TCP (or UDP) sockets on top of it. TCP throughput well over 10 Gb/sec is possible using recent systems, but this will burn a fair amount of CPU.

 

 What is SRP ?

The SCSI RDMA Protocol (SRP) is a protocol that allows one computer to access SCSI devices attached to another computer via remote direct memory access (RDMA).The SRP protocol is also known as the SCSI Remote Protocol. The use of RDMA makes higher throughput and lower latency possible than what is possible through e.g. the TCP/IP communication protocol. RDMA is only possible with network adapters that support RDMA in hardware. Examples of such network adapters are InfiniBand HCAs and 10 GbE network adapters with iWARP support. While the SRP protocol has been designed to use RDMA networks efficiently, it is also possible to implement the SRP protocol over networks that do not support RDMA.

As with the ISCSI Extensions for RDMA (iSER) communication protocol, there is the notion of a target (a system that stores the data) and an initiator (a client accessing the target) with the target performing the actual data movement. In other words, when a user writes to a target, the target actually executes a read from the initiator and when a user issues a read, the target executes a write to the initiator.

While the SRP protocol is easier to implement than the iSER protocol, iSER offers more management functionality, e.g. the target discovery infrastructure enabled by the iSCSI protocol. Furthermore, the SRP protocol never made it into an official standard. The latest draft of the SRP protocol, revision 16a, dates from July 3, 2002

 

What is iSER ?

The iSCSI Extensions for RDMA (iSER) is a computer network protocol that extends the Internet Small Computer System Interface (iSCSI) protocol to use Remote Direct Memory Access (RDMA). Typically RDMA is provided by either the Transmission Control Protocol (TCP) with RDMA services (iWARP) or InfiniBand. It permits data to be transferred directly into and out of SCSI computer memory buffers (which connects computers to storage devices) without intermediate data copies.

The motivation for iSER is to use RDMA to avoid unnecessary data copying on the target and initiator. The Datamover Architecture (DA) defines an abstract model in which the movement of data between iSCSI end nodes is logically separated from the rest of the iSCSI protocol; iSER is one Datamover protocol. The interface between the iSCSI and a Datamover protocol, iSER in this case, is called Datamover Interface (DI).

 

vBrownbag TechTalk “InfiniBand in the Lab” presentation.

For the past few weeks I have slowly begun to build a working InfiniBand infrastructure on my vSphere cluster hosted in the office. I’m still missing some cables. With VMworld 2013 EMEA in Barcelona behind us, I’ve now got the time to publish the presentation I did in the Community zone for the vBrownbag Tech Talks. On Tuesday noon, I was the first one to start the series of Tech Talk and the infrastructure to record and process the video/audio feed had not been tuned properly. Unfortunately you will see this in the video link of the presentation. So in my video, the first 2 minutes 08 seconds, the audio is just horible… So I URGE you to jump into the video at the 3 minute mark if you value your ears.

Here is the direct link to the Tech Talk about “InfiniBand in the Lab” and the link to the other Tech Talks done at VMworld 2013 EMEA.

I’m not used to doing a presentation sitting in front of multiple cameras. Some of the later slides are too fuzzy on the video, so I’m now publishing the presentation in this article.

InfiniBand_in_the_Lab

 

The InfiniBands Host Card Adapters (HCA) for Dual 20Gbps ports (DDR Speed) can be found on ebay for $50 or $35 pounds.

I hope this video link and the presentation will be useful to some of you that want to increase an intra vSphere cluster backbone for the vMotion, Fault Tolerance or VSAN traffic.

I enjoyed doing the presentation, as I have to thank the following people making this presentation possible : Raphael Schitz,William Lam, Vladan Seget, Gregory Roche