Notes & Photos of the Homelab 2014 build

I’ve had a few questions about my Homelab 2014 upgrade hardware and settings. So here is a follow-up. This is just a photo collection of the various stages of the build.  Compared to my previous homelabs that where designed for a small footprint, this one isn’t, this homelab version has been build to be a quiet environment.

I started my build with only two hosts. For the cases I used the very nice Fractal Design Define R4. These are ATX chassis in a sleek black color, can house 8x 3.5″ disks, and support a lot of extra fans. Some of those you can see on the right site, those are Noctua NF-A14 FLX. For the power supply I picked up some Enermax Revolution Xt PSU.

IMG_4584

For the CPU I went with the Intel Xeon E5-1650v2 (6 Cores @3.5GHZ) and a large Noctua NH-U12DX i4. The special thing about the NH-U12DX i4 model is that it comes with socket brackets for the Narrow-Brack ILM that you find on the Supermicro X9SRH-7TF motherboard.

IMG_4591

The two Supermicro X9SRH-7TF motherboards and two add-on Intel I350-T2 dual 1Gbps network cards.

IMG_4594

Getting everything read for the build stage.

On the next photo you will see quiet a large assortment of pieces. There are 5 small yet long lasting Intel SSD S3700 100GB, 8x Seagate Constellation 3TB disks, some LSI HBA Adapters like the LSI 9207-8i and LSI 9300-8i, two Mellanox ConnectX-3 VPI Dual 40/56Gbps InfiniBand and Ethernet adapters that I got for a steal (~$320USD) on ebay last summer.

IMG_4595

You need to remember, that if you only have two hosts, with 10Gbps Ethernet or 40Gbps Ethernet, you can build a point-to-point config, without having to purchase a network switch. These ConnectX-3 VPI adapters are recognized as 40Gbps Ethernet NIC by vSphere 5.5.

Lets have a closer look at the Fractal Design Define R4 chassis.

Fractal Design Define R4 Front

Fractal Design Define R4 Front

The Fractal Design Define R4 has two 14cm Fans, one in the front, and one in the back. I’m replacing the back one with the Noctua NF-A14 FLX, and I put one in the top of the chassis to extra the little warm air out the top.

The inside of the chassis has a nice feel, easy access to the various elements, space for 8x 3.5″ disk in the front, and you can push power-cables on the other side of the chassis.

Fractal Design Define R4 Inside

Fractal Design Define R4 Inside

A few years ago, I bought a very nice yet expensive Tyan dual processor motherboard and I installed it with all the elements before looking to put the CPU on the motherboard. It had bent pins under the CPU cover. This is something that motherboard manufacturers and distributors have no warranty. That was an expensive lesson, and that was the end of my Tyan allegiance. Since then I moved to Supermicro.

LGA2011 socket close-up. Always check the PINs. for damage

LGA2011 socket close-up. Always check the PINs. for damage

Here is the close up of the Supermicro X9SRH-7TF

Supermicro X9SRH-7TF

Supermicro X9SRH-7TF

I now always put the CPU on the motherboard, before the motherboard goes in the chassis. Note on the next picture the Narrow ILM socket for the cooling.

Intel Xeon E5-1650v2 and Narrow ILM

Intel Xeon E5-1650v2 and Narrow ILM

Here is the difference between the Fractal Design Silent Series R2 fan and the Noctua NF-A14 FLX.

Fractal Design Silent Series R2 & Noctua NF-A14 FLX

Fractal Design Silent Series R2 & Noctua NF-A14 FLX

What I like in the Noctua NF-A14 FLX are the rubber hold-fasts that replace the screws holding the fan. That is one more option where items in a chassis don’t vibrate and make noise. Also the Noctua NF-A14 FLX runs by default at 1200RPM, but you have two electric Low-Noise Adapters (LNA) that can bring the default speed down to 1000RPM and 800RPM. Less rotations equals less noise.

Noctua NF-A14 FLX Details

Noctua NF-A14 FLX Details

Putting the motherboard in the Chassis.

IMG_4623

Now we need to modify the holding brackets for the CPU Cooler. The Noctua NH-U12DX i4 comes with Narrow ILM that can replace the ones on it. In the picture below, the top one is the Narrow ILM holder, while the bottom one still needs to be replaced.

IMG_4621

And a close up of everything installed in the Chassis.

IMG_4629

To hold the SSD in the chassis, I’m using an Icy Dock MB996SP-6SB to hold multiple SSD in a single 5.25″ frontal slot. As SSD don’t heat up like 2.5″ HDD, you can select to cut the electricity to the FAN.

IMG_4611

This Icy Dock MB996SP-6SB gives a nice front look to the chassis.

IMG_4631

How does it look inside… okay, honest I have tied up the sata cables since my building process.

IMG_4632

 

Here is the picture of my 2nd vSphere host during building. See the cabling is done better here.

IMG_4647

 

The two Mellanox ConnectX-3 VPI 40/56Gbps cards I have where half-height adapters. So I just to adapt a little bit the holders so that the 40Gbps NIC where firmly secured in the chassis.

IMG_4658

Here is the Homelab 2014 after the first build.

IMG_4648

 

At the end of August 2014, I got a new Core network switch to expand the Homelab. The Cisco SG500XG-8F8T, which is a 16x Port 10Gb Ethernet. Eight ports are in RJ45 format, and eight are in SFP+ format, and one for Management.

Cisco SG500XG-8G8T

Cisco SG500XG-8G8T

I build a third vSphere host using the same config as the first ones. And here is the current 2014 Homelab.

Homelab 2014

Homelab 2014

And if you want to see what the noise is at home, check this Youtube movie out. I used the dBUltraPro app on the iPad to measure the noise level.

And this page would not be complete if it didn’t have a vCenter cluster screenshot.

Homelab 2014 Cluster

Speed testing 40G Ethernet in the Homelab

In my previous post, I described the building of two Linux virtual machines to benchmark the network. Here are the results.

homelab_network_1g_10g_40g_iperf_testing

 

The first blip, is running iperf to the maximum speed between the two Linux VMs at 1Gbps, on separate hosts using Intel I350-T2 adapters.

The second spike (or vmnic0), is running iperf to the maximum speed between two Linux VMs at 10Gbps. The two ESXi hosts are using Intel X540-T2 adapters.

The third mountain (or vmnic4) and most impressive result is running iperf between the Linux VMs using 40Gb Ethernet. The two ESXi hosts are using Mellanox ConnectX-3 VPI adapters.

The Homelab 2014 ESXi hosts, uses a Supermicro X9SRH-7TF come with an embedded Intel X540-T2. We can more closely see the  results of the iperf test at 10Gbps in the following picture.

homelab_network_10g_iperf_testing

I also got last summer from Ebay, a set of Mellanox ConnectX-3 VPI Dual Adapters for $300. These cards support InfiniBand 40Gb/s and 56Gb/s, and Ethernet at 10Gb/s and 40Gb/s. By default, vSphere 5.5 recognizes these adapters as 40Gb Ethernet adapters. And I really wanted to test these adapters at 40Gb Ethernet… and the results are great. I can push upto 37.3 Gbits/sec thru a single 40Gb Ethernet link, or 4299 MBytes/sec. Just have a peak at the following screenshot.

homelab_network_40g_iperf_testing

I guess having 40Gb Ethernet for vMotion is too fast…  The vMotion of a 12GB VM takes 15-16 seconds, of which only 3 seconds are used for the memory transfer, the rest is the memory snapshot, processes freeze, cpu register cloning and the rest.

All the test run at 10Gb Ethernet and 40Gb Ethernet where done with Jumbo Frames. For 40Gb Ethernet it makes real (x 2.5) difference in bandwidth.

This was a fun piece to lab in the homelab.

Upgrading Mellanox ConnectX firmware within ESXi

Last summer, while reading the ServeTheHome.com website, I saw a great link to Ebay for Mellanox ConnectX-3 VPI cards (MCX354A-FCBT). These cards where selling at $299 on ebay. I took three of the awesome cards. These Mellanox ConnectX-3 VPI adapters where simply too good to be true… Dual FDR 56Gb/s or 40/56GbE using PCIe Generation 3 slots. Having three of these Host Card Adapters without a InfiniBand switch is limiting.

With my new Homelab 2014 design, I now have two vSphere hosts that have PCIe Generation 3 slots, and using a simple QSFP+ Fiber Cable, I can create a direct point-to-point connection between the two vSphere hosts.

The Mellanox Firmware Tools (MFT) that can run within the vSphere 5.5 and allow to check the state of the InfiniBand adapter and even update the firmware.

MFT for vSphere

Installing the tools is very straight forward.

# esxcli software vib install -d /tmp/mlx-fw/MLNX-MFT-ESXi5.5-3.5.1.7.zip

Install Mellanox MST

Unfortunately it requires a reboot.

The next steps going to be to start the MST service, check the status of the of the Mellanox devices and query them to check the current level of firmware.

I don’t need to have the Mellanox MST driver running all the time, so I will simply start it using /opt/mellanox/bin/mst start.

Next we will query the state of all Mellanox devices in the host using /opt/mellanox/bin/mst status -v from which we will get the path to the devices.

We then use the flint tool to query the devices to get their stats.

/opt/mellanox/bin/flint -d /dev/mt40099_pci_cr0 hw query

and

/opt/mellanox/bin/flint -d /dev/mt40099_pci_cr0 query

which returns us the current Firmware version and the GUIDs and MACs for the host card adapters.

Mellanox firmware upgrade 01

Well as I’m running only FW Version 2.10.700 its time to upgrade this firmware to release 2.30.8000

 /opt/mellanox/bin/flint -d /dev/mt4099_pci_cr0 -i /tmp/mlx-fw/fw-ConnectX3-rel-2_30_8000-MCX354A-FCB_A1-FlexBoot-3.4.151_VPI.bin burn does the trick.

Mellanox firmware upgrade 02

And we can quickly check the new running firmware on the InfiniBand adapter.

 

 

InfiniBand install & config for vSphere 5.5

A followup to my adventures of InfiniBand in the lab... and the vBrownbag Tech Talk about InfiniBand in the lab I did at VMworld 2013 in Barcelona .

 

In this post I will cover how to install the InfiniBand drivers and various protocols in vSphere 5.5. This post and the commands below are only applicable if you are not using Mellanox ConnectX-3 VPI Host Card Adapters or if you have a InfiniBand switch with a hardware integrated Subnet Manager. Mellanox states that the ConnectX-3 VPI should allows normal IP over InfiniBand (IPoIB) connectivity with the default 1.9.7 drivers on the ESXi 5.5.0 install cdrom.

This post will be most useful to people that have the following configuration

  • Two ESXi 5.5 hosts with direct InfiniBand host-to-host connectivity (no InfiniBand switch)
  • Two/Three ESXi 5.5 hosts with InfiniBand host -to-storage connectivity (no InfiniBand switch and a storage array like Nexenta Community Edition)
  • Multiple ESXi 5.5 hosts with a InfiniBand switch that doesn’t have a Subnet Manager

The installation in these configuration is only possible since early this morning (October 22nd at 00:08 CET time), when Raphael Schitz (@hypervisor_fr) has released an updated version of the OpenSM 3.3.16-64, which was compiled in 64bit for usage on vSphere 5.5 and vSphere 5.1.

First things first… let’s rip the Mellanox 1.9.7 drivers from a new ESXi 5.5.0 install

 

Removing Mellanox 1.9.7 drivers from ESXi 5.5

Yes, the first thing to get IP over InfiniBand (for VMkernel adaptaers like vMotion or VSAN) or SCSI RDMA Protocol (SRP) is to remove the new Mellanox 1.9.7 drivers from the newly install ESXi 5.5.0. The driver don’t work with the older Mellanox OFED 1.8.2 package, and the new OFED 2.0 package is pending… Lets cross finger for an early 2014 release.

You need to connect using SSH to your ESXi 5.5 host, and run the following command and you will need to reboot the host for the driver to be removed from the memory.

  • esxcli software vib remove -n=net-mlx4-en -n=net-mlx4-core
  • reboot the ESXi host

esxcli software vib remove

 

Installing Mellanox 1.61 drivers, OFED and OpenSM

After the reboot you will need to download the following files and copy them to the /tmp on the ESXi 5.5 host

  1. VMware ESXi 5.0 Driver 1.6.1 for Mellanox ConnectX Ethernet Adapters (Requires myVMware login)
  2. Mellanox InfiniBand OFED 1.8.2 Driver for VMware vSphere 5.x
  3. OpenFabrics.org Enterprise Distribution’s OpenSM 3.3.16-64 for VMware vSphere 5.5 (x86_64) packaged by Raphael Schitz

Once the files are in /tmp or if you want to keep a copy on the shared storage, you will need to unzip the Mellanox 1.6.1 driver file. Careful with the ib-opensm-3.3.16-64, the esxcli -d becomes a -v for the vib during the install. The other change since vSphere 5.1, is that we need to set the esxcli software acceptance level to CommunitySupported level, to install some of the drivers and binaries.

The commands are

  • unzip mlx4_en-mlnx-1.6.1.2-471530.zip
  • esxcli software acceptance set –level=CommunitySupported
  • esxcli software vib install -d /tmp/mlx4_en-mlnx-1.6.1.2-offline_bundle-471530.zip –no-sig-check
  • esxcli software vib install -d /tmp/MLNX-OFED-ESX-1.8.2.0.zip –no-sig-check
  • esxcli software vib install -v /tmp/ib-opensm-3.3.16-64.x86_64.vib –no-sig-check
  • reboot the ESXi host

esxcli software vib install infiniband

 

Setting MTU and Configuring OpenSM

After the reboot we have two more commands to pass.

  • esxcli system module paramters set -m=mlx4_core -p=mtu_4k=1
  • copy partitions.conf  /scratch/opensm/<adapter_1_hca>/
  • copy partitions.conf /scratch/opensm/<adapter_2_hca>/

The partitions.conf file only contains the following text:

  • Default=0x7fff,ipoib,mtu=5:ALL=full;

 

cp partitions.conf

I recommend that you check the state of your InfiniBand adapters (mlx4_0) using the following command

  • ./opt/opensm/bin/ibstat mlx4_0

ibstat mlx4_0

I also recommend that you write down the adapter HCA Port GUID numbers if you are going to use SCSI RDMA Protocol between the ESXi host and a storage array with SCSI RDMA Protocol. It will come in handy later (and in an upcoming post).

Now you are ready to add the new adapters to a vSwitch/dvSwitch and create the VMkernel adapters. Here is the current config for vMotion, VSAN and Fault Tolerance on a dual 20Gbps IB Adapters (which only costs $50!)

vSwitch1 with IB VMkernels

I aim to put the various vmkernel traffics in their own VLANs, but I still need to dig in the partitions.conf file.

 

If you have an older switch that does not support a MTU of 4K, make sure you set your vSwitch/dvSwitch to a MTU of 2044 (2048-4 bytes) and the same for the various VMkernel interfaces.

VMkernel MTU at 2044

 

 

Here is just a Quick Glossary about the various protocols that can use the InfiniBand fabric.

 What is IPoIB ?

IPoIB (IP-over-InfiniBand) is a protocol that defines how to send IP packets over IB; and for example Linux has an “ib_ipoib” driver that implements this protocol. This driver creates a network interface for each InfiniBand port on the system, which makes an Host Card Adapter (HCA) act like an ordinary Network Interface Card (NIC).

IPoIB does not make full use of the HCAs capabilities; network traffic goes through the normal IP stack, which means a system call is required for every message and the host CPU must handle breaking data up into packets, etc. However it does mean that applications that use normal IP sockets will work on top of the full speed of the IB link (although the CPU will probably not be able to run the IP stack fast enough to use a 32 Gb/sec QDR IB link).

Since IPoIB provides a normal IP NIC interface, one can run TCP (or UDP) sockets on top of it. TCP throughput well over 10 Gb/sec is possible using recent systems, but this will burn a fair amount of CPU.

 

 What is SRP ?

The SCSI RDMA Protocol (SRP) is a protocol that allows one computer to access SCSI devices attached to another computer via remote direct memory access (RDMA).The SRP protocol is also known as the SCSI Remote Protocol. The use of RDMA makes higher throughput and lower latency possible than what is possible through e.g. the TCP/IP communication protocol. RDMA is only possible with network adapters that support RDMA in hardware. Examples of such network adapters are InfiniBand HCAs and 10 GbE network adapters with iWARP support. While the SRP protocol has been designed to use RDMA networks efficiently, it is also possible to implement the SRP protocol over networks that do not support RDMA.

As with the ISCSI Extensions for RDMA (iSER) communication protocol, there is the notion of a target (a system that stores the data) and an initiator (a client accessing the target) with the target performing the actual data movement. In other words, when a user writes to a target, the target actually executes a read from the initiator and when a user issues a read, the target executes a write to the initiator.

While the SRP protocol is easier to implement than the iSER protocol, iSER offers more management functionality, e.g. the target discovery infrastructure enabled by the iSCSI protocol. Furthermore, the SRP protocol never made it into an official standard. The latest draft of the SRP protocol, revision 16a, dates from July 3, 2002

 

What is iSER ?

The iSCSI Extensions for RDMA (iSER) is a computer network protocol that extends the Internet Small Computer System Interface (iSCSI) protocol to use Remote Direct Memory Access (RDMA). Typically RDMA is provided by either the Transmission Control Protocol (TCP) with RDMA services (iWARP) or InfiniBand. It permits data to be transferred directly into and out of SCSI computer memory buffers (which connects computers to storage devices) without intermediate data copies.

The motivation for iSER is to use RDMA to avoid unnecessary data copying on the target and initiator. The Datamover Architecture (DA) defines an abstract model in which the movement of data between iSCSI end nodes is logically separated from the rest of the iSCSI protocol; iSER is one Datamover protocol. The interface between the iSCSI and a Datamover protocol, iSER in this case, is called Datamover Interface (DI).