VSAN Community HCL

A few days before VMworld 2013 Barcelona, I started the VSAN.info website to documents VSAN configurations, list whitepapers, redirect people to VSAN resources, and VSAN implementations. And it’s been a bit quiet since then on the surface. Well I’ve been working in the background to push a new feature, the VSAN Community HCL.

One of the features I wanted to add to the VSAN.info site from the get go, was a VSAN Community Hardware Compatibility List. Equipment and configurations on this list would not appear on the official VMware VSAN HCL. Now starting such a list is a very large endevour, that needs dedicated resources, probably lots of management, user management, password management, moderators. In short a lot of things to make sure that I don’t find the time to keep it up and going. Good will in new projets only take it so far, before it would have died slowly… So why try to re-invent the wheel ?

What other better place to host such a VSAN Community HCL than the offical VMware Community website in the Community Hardware Software forum. Yes, now you can head to the Community Hardware Software (CSHWSH) forum and check out which hardware & software can be used to run a VSAN environment.

Here is the direct link to View All the entries for VSAN Beta on the CSHWSW Forum. You can then select the Infrastructure to list all the vSphere 5.5 or VSAN Beta entries.

CSHWSW View All Entries VSAN Beta

When you click on the entry you will be able to see the Configuration Tested field that explains how I have designed and configured this small VSAN node.

CSHWSH Configuration Tested

 

It is now time to populate the Community Hardware Forum with your VSAN Configs.

This modification of the Community Hardware Software forum, would not have been possible without the help of Corey Romero (@vCommunityGuy) and the team managing the Communities Forums, and I also want to Thank John Troyer (@jtroyer) and Mike Laverick (@Mike_Laverick) that help facilitate my contact with Corey Romero. To all of you… THANK YOU...

 

 

 

NVIDIA Cards List for VMware Horizon View 3D Acceleration

During VMworld 2013 in Barcelona, I spend some time on the NVIDIA booth trying to figure out more about which graphic cards are supported for accelerating the End User Computing solutions.Wanting to lab and test the acceleration provided by using a Dedicated graphic cards for my View Desktop, I took a closer look at the NVIDIA cards that support the feature. This article will give you a list of the official cards, the ones shown to work and a list of cards that might work.

The VMware official Hardware Compatibility List for vSGA is here, and you will quickly see that a lot of cards are missing.

Before I go further I want to point out to some excellent resources, like these white papers:

With VMware Horizon View we can user a segmentation in the performance needed by the users. From Task Workers to Workstation Users.

Virtual Desktop User Segmentation

Here is another quick table that explains the differences.

Graphics Driver Comparison

Now let’s get to the meat of this article, the list of the current generation of NVIDIA Cards that support the Virtual Shared Graphics Acceleration (vSGA) and Virtual Dedicated Graphics Acceleration (vDGA).

NVIDIA Cards and EUC Solutions

 

I have added two cards to this list following a discussion and followup email exchange with a NVIDIA representative, and from my observations of the Hewlett-Packard demo running a Quadro 3000M graphics card in the blade, running vDGA on ESXi. So I added the Quadro 3000M and the Quadro 1000M.

Why you ask ?

Well there are some Blades that can be configured with these cards, as well as some customized Laptops.

So while enterprise customers will be looking to NVIDIA Kepler 5000/6000 or NVIDIA GRID K1 and K2 cards, the Small Business market and professionals could get VMware Horizon View running with Quadro 4000 or Quadro 2000.

As for the laptop I just mentionned before, the Clevo P570WM, a true beast with 32GB, Xeon E5-1650 6-Cores and upto 4 SSDs and two GPU , can be purchased from resellers like Sager Notebook (US), Eurocom Panther 5SE (Canada), MySchenker (UK & Germany). The Clevo P570WN is able to run vSphere 5.1 natively, as it comes with a Intel 82579V gigabit network connection.

I’m currently out of budget for the rest of 2013, so I won’t try to lab vDGA in my ESXi with Quadro 4000 cards yet…

Hope this article can be useful in listing the NVIDIA Cards that are supported or probably work with Virtual Dedicated Graphics Acceleration (vDGA)

 

 

InfiniBand install & config for vSphere 5.5

A followup to my adventures of InfiniBand in the lab... and the vBrownbag Tech Talk about InfiniBand in the lab I did at VMworld 2013 in Barcelona .

 

In this post I will cover how to install the InfiniBand drivers and various protocols in vSphere 5.5. This post and the commands below are only applicable if you are not using Mellanox ConnectX-3 VPI Host Card Adapters or if you have a InfiniBand switch with a hardware integrated Subnet Manager. Mellanox states that the ConnectX-3 VPI should allows normal IP over InfiniBand (IPoIB) connectivity with the default 1.9.7 drivers on the ESXi 5.5.0 install cdrom.

This post will be most useful to people that have the following configuration

  • Two ESXi 5.5 hosts with direct InfiniBand host-to-host connectivity (no InfiniBand switch)
  • Two/Three ESXi 5.5 hosts with InfiniBand host -to-storage connectivity (no InfiniBand switch and a storage array like Nexenta Community Edition)
  • Multiple ESXi 5.5 hosts with a InfiniBand switch that doesn’t have a Subnet Manager

The installation in these configuration is only possible since early this morning (October 22nd at 00:08 CET time), when Raphael Schitz (@hypervisor_fr) has released an updated version of the OpenSM 3.3.16-64, which was compiled in 64bit for usage on vSphere 5.5 and vSphere 5.1.

First things first… let’s rip the Mellanox 1.9.7 drivers from a new ESXi 5.5.0 install

 

Removing Mellanox 1.9.7 drivers from ESXi 5.5

Yes, the first thing to get IP over InfiniBand (for VMkernel adaptaers like vMotion or VSAN) or SCSI RDMA Protocol (SRP) is to remove the new Mellanox 1.9.7 drivers from the newly install ESXi 5.5.0. The driver don’t work with the older Mellanox OFED 1.8.2 package, and the new OFED 2.0 package is pending… Lets cross finger for an early 2014 release.

You need to connect using SSH to your ESXi 5.5 host, and run the following command and you will need to reboot the host for the driver to be removed from the memory.

  • esxcli software vib remove -n=net-mlx4-en -n=net-mlx4-core
  • reboot the ESXi host

esxcli software vib remove

 

Installing Mellanox 1.61 drivers, OFED and OpenSM

After the reboot you will need to download the following files and copy them to the /tmp on the ESXi 5.5 host

  1. VMware ESXi 5.0 Driver 1.6.1 for Mellanox ConnectX Ethernet Adapters (Requires myVMware login)
  2. Mellanox InfiniBand OFED 1.8.2 Driver for VMware vSphere 5.x
  3. OpenFabrics.org Enterprise Distribution’s OpenSM 3.3.16-64 for VMware vSphere 5.5 (x86_64) packaged by Raphael Schitz

Once the files are in /tmp or if you want to keep a copy on the shared storage, you will need to unzip the Mellanox 1.6.1 driver file. Careful with the ib-opensm-3.3.16-64, the esxcli -d becomes a -v for the vib during the install. The other change since vSphere 5.1, is that we need to set the esxcli software acceptance level to CommunitySupported level, to install some of the drivers and binaries.

The commands are

  • unzip mlx4_en-mlnx-1.6.1.2-471530.zip
  • esxcli software acceptance set –level=CommunitySupported
  • esxcli software vib install -d /tmp/mlx4_en-mlnx-1.6.1.2-offline_bundle-471530.zip –no-sig-check
  • esxcli software vib install -d /tmp/MLNX-OFED-ESX-1.8.2.0.zip –no-sig-check
  • esxcli software vib install -v /tmp/ib-opensm-3.3.16-64.x86_64.vib –no-sig-check
  • reboot the ESXi host

esxcli software vib install infiniband

 

Setting MTU and Configuring OpenSM

After the reboot we have two more commands to pass.

  • esxcli system module paramters set -m=mlx4_core -p=mtu_4k=1
  • copy partitions.conf  /scratch/opensm/<adapter_1_hca>/
  • copy partitions.conf /scratch/opensm/<adapter_2_hca>/

The partitions.conf file only contains the following text:

  • Default=0x7fff,ipoib,mtu=5:ALL=full;

 

cp partitions.conf

I recommend that you check the state of your InfiniBand adapters (mlx4_0) using the following command

  • ./opt/opensm/bin/ibstat mlx4_0

ibstat mlx4_0

I also recommend that you write down the adapter HCA Port GUID numbers if you are going to use SCSI RDMA Protocol between the ESXi host and a storage array with SCSI RDMA Protocol. It will come in handy later (and in an upcoming post).

Now you are ready to add the new adapters to a vSwitch/dvSwitch and create the VMkernel adapters. Here is the current config for vMotion, VSAN and Fault Tolerance on a dual 20Gbps IB Adapters (which only costs $50!)

vSwitch1 with IB VMkernels

I aim to put the various vmkernel traffics in their own VLANs, but I still need to dig in the partitions.conf file.

 

If you have an older switch that does not support a MTU of 4K, make sure you set your vSwitch/dvSwitch to a MTU of 2044 (2048-4 bytes) and the same for the various VMkernel interfaces.

VMkernel MTU at 2044

 

 

Here is just a Quick Glossary about the various protocols that can use the InfiniBand fabric.

 What is IPoIB ?

IPoIB (IP-over-InfiniBand) is a protocol that defines how to send IP packets over IB; and for example Linux has an “ib_ipoib” driver that implements this protocol. This driver creates a network interface for each InfiniBand port on the system, which makes an Host Card Adapter (HCA) act like an ordinary Network Interface Card (NIC).

IPoIB does not make full use of the HCAs capabilities; network traffic goes through the normal IP stack, which means a system call is required for every message and the host CPU must handle breaking data up into packets, etc. However it does mean that applications that use normal IP sockets will work on top of the full speed of the IB link (although the CPU will probably not be able to run the IP stack fast enough to use a 32 Gb/sec QDR IB link).

Since IPoIB provides a normal IP NIC interface, one can run TCP (or UDP) sockets on top of it. TCP throughput well over 10 Gb/sec is possible using recent systems, but this will burn a fair amount of CPU.

 

 What is SRP ?

The SCSI RDMA Protocol (SRP) is a protocol that allows one computer to access SCSI devices attached to another computer via remote direct memory access (RDMA).The SRP protocol is also known as the SCSI Remote Protocol. The use of RDMA makes higher throughput and lower latency possible than what is possible through e.g. the TCP/IP communication protocol. RDMA is only possible with network adapters that support RDMA in hardware. Examples of such network adapters are InfiniBand HCAs and 10 GbE network adapters with iWARP support. While the SRP protocol has been designed to use RDMA networks efficiently, it is also possible to implement the SRP protocol over networks that do not support RDMA.

As with the ISCSI Extensions for RDMA (iSER) communication protocol, there is the notion of a target (a system that stores the data) and an initiator (a client accessing the target) with the target performing the actual data movement. In other words, when a user writes to a target, the target actually executes a read from the initiator and when a user issues a read, the target executes a write to the initiator.

While the SRP protocol is easier to implement than the iSER protocol, iSER offers more management functionality, e.g. the target discovery infrastructure enabled by the iSCSI protocol. Furthermore, the SRP protocol never made it into an official standard. The latest draft of the SRP protocol, revision 16a, dates from July 3, 2002

 

What is iSER ?

The iSCSI Extensions for RDMA (iSER) is a computer network protocol that extends the Internet Small Computer System Interface (iSCSI) protocol to use Remote Direct Memory Access (RDMA). Typically RDMA is provided by either the Transmission Control Protocol (TCP) with RDMA services (iWARP) or InfiniBand. It permits data to be transferred directly into and out of SCSI computer memory buffers (which connects computers to storage devices) without intermediate data copies.

The motivation for iSER is to use RDMA to avoid unnecessary data copying on the target and initiator. The Datamover Architecture (DA) defines an abstract model in which the movement of data between iSCSI end nodes is logically separated from the rest of the iSCSI protocol; iSER is one Datamover protocol. The interface between the iSCSI and a Datamover protocol, iSER in this case, is called Datamover Interface (DI).

 

vBrownbag TechTalk “InfiniBand in the Lab” presentation.

For the past few weeks I have slowly begun to build a working InfiniBand infrastructure on my vSphere cluster hosted in the office. I’m still missing some cables. With VMworld 2013 EMEA in Barcelona behind us, I’ve now got the time to publish the presentation I did in the Community zone for the vBrownbag Tech Talks. On Tuesday noon, I was the first one to start the series of Tech Talk and the infrastructure to record and process the video/audio feed had not been tuned properly. Unfortunately you will see this in the video link of the presentation. So in my video, the first 2 minutes 08 seconds, the audio is just horible… So I URGE you to jump into the video at the 3 minute mark if you value your ears.

Here is the direct link to the Tech Talk about “InfiniBand in the Lab” and the link to the other Tech Talks done at VMworld 2013 EMEA.

I’m not used to doing a presentation sitting in front of multiple cameras. Some of the later slides are too fuzzy on the video, so I’m now publishing the presentation in this article.

InfiniBand_in_the_Lab

 

The InfiniBands Host Card Adapters (HCA) for Dual 20Gbps ports (DDR Speed) can be found on ebay for $50 or $35 pounds.

I hope this video link and the presentation will be useful to some of you that want to increase an intra vSphere cluster backbone for the vMotion, Fault Tolerance or VSAN traffic.

I enjoyed doing the presentation, as I have to thank the following people making this presentation possible : Raphael Schitz,William Lam, Vladan Seget, Gregory Roche

 

 

 

 

VSAN.info website

Introducing the new VSAN.info website, This website is not aimed at Corporate vSphere VSAN infrastructure, butthe people implementing VSAN in the homelabs. We have noticed an new interest in build labs to test out VSAN, but many questions on configurations and components. By building a list of the various articles and blogs that talk about the VSAN, people will be able to quickly check the various configs.

Head over to http://www.vsan.info

 

VSAN Observer showing Degraded status…

This is just a quick follow-up on my previous “Using VSAN Observer in vCenter 5.5” post. As mentioned recently by Duncan Epping (@DuncanYB) in his blog entry Virtual SAN news flash pt 1. The VSAN engineers have done a full root cause of the AHCI controller issues that have been reported recently. The fix is not out yet. As a precaution, and because I use the AHCI chipset in my homelab servers, I have not scaled up the usage of the VSAN. I have been monitoring closely the VMs I have deployed on the VSAN datastore.

VSAN Observer DEGRADED status on a host

VSAN Observer degraded

This is curious as neither the vSphere Web Client or the vSphere Client on Windows have reported anything at a high level. No Alarms. As can be seen from the following two screenshots.

VSAN Virtual Disks

VSAN Virtual Disks

To see any glimpse to an error, you need to drill deeper into the Hard disk to see the following.

VSAN Virtual Disks Expanded

VSAN Disk Groups

VSAN Disk Groups

 

So what to do in this case. Well I tried to activate the Maintenance Mode and migrate the data from the degraded ESXi host to another.

Virtual SAN data migration

There are three modes how you can enter a host in the Virtual SAN Cluster into Maintenance Mode.  They are the following:

  1. Full data migration: Virtual SAN migrates all data that resides on this host. This option results in the largest amount of data transfer and consumes the most time and resources.
  2. Ensure accessibility: Virtual SAN ensures that all virtual machines on this host will remain accessible if the host is shut down or removed from the cluster. Only partial data migration is needed. This is the default option.
  3. No data migration: Virtual SAN will not migrate any data from this host. Some virtual machines might become inaccessible if the host is shut down or removed from the cluster.

 

Maintenance Mode - Full Data Migration

So I selected the Full data migration option. But this didn’t work out well for me.

General VSAN fault

I had to fail back to the Ensure accessibility to get the host into maintenance mode.

Unfortunately, even after a reboot of the ESXi host and it’s return from maintenance mode. The VSAN Observer keeps telling me that my component residing on the ESXi host is still in a DEGRADED state. Guess, I will have to patiently wait for the release of the AHCI controller VSAN fix. And see how it performs then.

 

Open Questions:

  • Is VSAN Observer picking up some extra info that is not raised by the vCenter Server 5.5 ?
  • Is the info from the vCenter Server 5.5 not presented properly in the vSphere Web Client ?

 

Supporting Information.

My hosts have two gigabit network interface. I have created two VMkernel-VSAN interface in two differents IP ranges, as per the recommendations. Each VMkernel-VSAN interface goes out using one interface, and will not switch to the 2nd one.