NSX Advanced Load Balancer (Avi Networks) Components

Posted on 24/01/2020 by Erik

In this first post, I will attempt to describe the different components of the NSX Advanced Load Balancer (ALB). Following the purchase of Avi Networks by VMware in June 2019, the product Avi Vantage has been renamed as NSX Advanced Load Balancer. If you attempt to deploy the latest release 18.2.7, you will notice it still has the Avi Network branding. This will change in the near future.

NSX Advanced Load Balancer is aSoftware Defined solution that provides Application Delivery Services in an automated elastic deployment. It has an in-build best of breed operational dashboard for advanced analytics and reporting, it’s completely API driven, and supports a wide range of infrastructure and cloud providers. In this series of blogs, I will focus on the vSphere integration in my home datacenter.

Version 18.2 of the product works over distributed virtual port groups that belong to both Virtual Distributed Switch (regular VLAN networks) and Logical Switches (NSX-V VXLAN based or NSX-T GENEVE networks) as it is agnostic of underlying network infrastructure. Integration in an NSX-T network is possible today, but still requires some manual configuration steps. The automation of this process will come in a future release of NSX ALB.

The management plane is composed of one or three Controllers, and the data plane is composed of the Service Engines (SE).

The Controller is the central repository for the configurations and policies, it manages the full lifecycle of the Service Engines (creation, control and deletion). The Controllers run on a dedicated virtual machines. I’ve used a Controller that is integrated in my vSphere infrastructure to automatically deploy and configure the Service Engines across the selected cluster.

The Controllers have the recommended minimum requirements of 24GB RAM, 8 vCPU per node and 128GB disk. The reason to increase the size of the appliance settings above the previous requirements, depends on the data analytics and the amount of the flows. https://avinetworks.com/docs/latest/avi-controller-sizing/

Once the Controllers are deployed, we define a Cloud Infrastructure. Here I have configured my vCenter as the target. With administrator credentials, the Controller will be able to provision the required Service Engines on the cluster.

The Service Engines are lightweight data plane engines that distribute the connections based on Load-Balancing algorithms or (HTTP/S) headers. These Service Engines do the Load-Balancing in front of the back-end servers and also execute all the data plane Application Delivery Controls operations, such as health Monitoring and test the performance of the back-end servers, Persists request to back-end servers, Caches response content for potential re-use, Protects against security threats (DoS, suspicious client IPs), Delivers high performance web security with iWAF and Offloads SSL decryption from back-end servers; re-encrypts if required.

Service Engines are then assembled in a Service Engine Group, just like vSphere hosts are part of a cluster.

When you create a Virtual Service for a Load-Balance, this service is deployed across the Service Engine Group.

In the next blog post, I will cover more details of the configuration and the deployment of a simple Load Balanced workload.

PKI Infrastructure (01) – Introduction & Certificate Lifetimes

Posted on 21/06/2018 by Erik

With this first post, I will start a small series of blogs on how to install and configure a Microsoft Public Key Infrastructure (PKI). There are plently of other set of articles on the internet on how to setup PKI solutions on Windows Server and Linux. But I’m writing my own set of articles, also to document my own implementation and to give other people an idea how I’ve down some configs like CRL publications that I have not seen elsewhere. I’m using Powershell and command scripts to deploy the Root CA and the Issuing CA. I also use OpenSSL to create frameworks to create my certificates and document them. I orgininaly started writing this blog post over 2 years ago when I setup my PKI infrastructure on Windows Server 2012 R2, but I have now moved on to Windows Server 2016 for my infrastrucutre and its what I’m going to use for the Root CA, Issuing CA and the Web Server publishing the Certificate Revocation List.

You will also note that my certificate will have much more aggressive expiration that most blogs covering this topic. My reason behind this is to make sure that I need to regularly (each 2 years) work on the PKI Infrastructure to ensure that certs, Issuing CA and Root CA are kept up to date.

Hashing Algorithm are bundled in Cryptographic Service Providers (CSP). A list of the different Microsoft Cryptographic Service Providers (CSP) can be found at Cryptographic Service Providers.

My first implementation of a PKI Infrastructure happened in 2003, when I setup a two-tier corporate infrastructure on Windows Server 2003. We started with a Root CA certificate that had a Validity period of 20 years (default used by so many other people). Yet at the half-life of the Root CA in 2013, I needed to refresh the Issuing CA and Root CA. At that point in time the hashing algorithm SHA-1 used in 2003 was being deprecated (SHA-1 Deprecation policy). While changing from a RSA 1024bit to RSA 2048bit key was easy, SHA-2 was not present in the initial Microsoft Cryptographic Service Provider that I selected in 2003. So in 2013 I had to swap for a strong Cryptographic Service Provider causing me to re-install all my PKI from scratch. Selecting the correct Cryptographic Service Provider is primordial.

I used to run two sets of PKI for the past year, one based on the “RSA#Microsoft Software Key Storage Provider” which I labed as a G1 (Generation 1 tier) and the second G2 (Generation 2 tier) based on the Ecliptic Curve Diffie-Hellman (ECDH) algorithm that has been published in the NSA Suite B cryptography. Unfortunately Ecliptic Curve algorithm while being much more secure are not available in most products using certificates. In the past 4 years, I have not been able to use ECDH/ECDSA to encrypt or sign various products. During this implementation of the Two-Tier PKI Infrastructure I will therefore stick to the “RSA#Microsoft Software Key Storage Provider” CSP.

The document Cryptographic Services by Microsoft is a great PKI 101 guide that covers the bases from primitives to secret-key encryption, public-key encryption and digital signatures.

To finish off the first article, I want to talk about Certificates Lifetimes and create the base CApolicy config file for the Root CA and Issuing CA.

The Root Certificate Authority will have a lifetime associated, which can be renewed, but which most likely will be replaced by a new one based on 15 years experience with Public Key Infrastructure.

One of biggest Public Key Infrastructure design question, is the advantages and the disadvantages for the different key size values. The higher the key size, the more secure the certificate is from attackers, but will require more processing to use. The longer the validity period, the less certificate maintenance required (and potentially some service disruption), but the certificate is more vulnerable to being compromised.

At the Vegas 2009 Microsoft Management Summit (MMS) Chris Adams and Ben Shy from Microsoft presented a breakout session that shared their experience about how they implemented native mode and Internet-based client management in Microsoft. They shared with customers was their strategy for deciding the key size and validity period. Their numbers are based on RSA research and how long it would take an attacker to compromise a certificate. So the higher the key size, the more secure the certificate is (but remember that this comes at the cost of extra processing). Their simple matrix that they presented at MMS looked like this:

Key length of 1024: Validity period = not greater than 6-12 months
Key length of 2048: Validity period = not greater than 2 years
Key length of 4096: Validity period = not greater than 16 years

If we start off the Root Certificate Authority with a Validity of 12 years, the Root CA Key Length should have a key length of 4096, use the Cryptography Next Generation (CNG) Hash Algorithm which is based on SHA-2 and use the implementation hash as SHA256 (which is SHA-2 256bits), SHA384 (SHA-2 384bits) or SHA512 (SHA-2 512bits).

On a Multi-Threaded x64 based platform, the SHA512 is actually faster than the smaller SHA256 (Reference: http://en.wikipedia.org/wiki/SHA-2 )

Here is the beginning of my Root CA & Issuing CA configurations. The CApolicy.inf file will be used during the creation of the Root CA certificate or the Issuing CA certificate.

RootCA – CApolicy.inf

[Certsrv_Server]
RenewalKeyLength=4096
CNGHashAlgorithm=SHA512
AlternateSignatureAlgorithm=0

For the Issuing CA, which has a validity period of 6 years, we should also set the Key Length to 4096.

IssuingCA – CApolicy.inf

[Certsrv_Server]
RenewalKeyLength=4096
CNGHashAlgorithm=SHA512
AlternateSignatureAlgorithm=0

The Issuing Certificate Authority which will issue server/user certificates to be used. The Issuing CA is generally configured to have the half-life of the Root Certificate Authority. Various vendors will now refuse to work with server certificate that have a longer than 2 year certificate lifetime. So you need plan your Root CA, Issuing CA lifetimes properly. Just taking the default is not good enough.

However, you also need to take into account what your CA hierarchy can support. A Certificate Authority cannot issue a certificate with a longer validity period than its own certificate. This one is easy to remember, however, there’s also a ticking time limit because a Certificate Authority cannot issue certificates with a validity period that is longer than its own remaining validity period. This is the reason why we have increased the Root CA validity period from 20 years to 21 years, so that the Issuing CA can stay at 10 years validity.

So lets expand the Root CA and Issuing CA policy.

RootCA – CApolicy.inf

[Certsrv_Server]
RenewalKeyLength=4096
RenewalValidityPeriod = Years
RenewalValidityPeriodUnits = 12
CNGHashAlgorithm=SHA512
AlternateSignatureAlgorithm=0

IssuingCA – CApolicy.inf

[Certsrv_Server]
RenewalKeyLength=4096
RenewalValidityPeriod = Years
RenewalValidityPeriodUnits = 6
CNGHashAlgorithm=SHA512
AlternateSignatureAlgorithm=0

AlternateSignatureAlgorithm = 0

While the Microsoft Press 2008 PKI documentation refers to the DiscreteSignatureAlgorithm, it should really be the AlternateSignatureAlgorithm. With the support of Cryptography Next Generation (CNG) and the new Suite B signature and encryption algorithms, it is necessary to include information about algorithms in both certificate and certificate requests. If this information is not included, an entity processing the certificate or certificate request may not be able to verify the signature of the object.

The AlternateSignatureAlgorithm option, when assigned a value of 1, enables support for the PKCS#1 V2.1 signature format for both the CA certificate and CA certificate requests. If implemented on a Root CA the Root CA will generate a root certificate that includes the PKCS#1 V2.1 signature format. If implanted on a subordinate CA, the subordinate CA will generate a certificate request that includes the PKCS#1 V2.1 signature format.

Warning: A side effect of using the AlternateSignatureAlgorithm=1 option, means that the PKCS#1 V2.1 signature is changed from SHA384RSA to a RSASSA-PSS. This newer Signature algorithm is not recognized by all certificate implementation. It has been found out that some Citrix implementations break in this config.

There are still plenty of other options that we can insert in the CApolicy.inf configuration, but I will not cover them right now. You will see the end result CApolicy.inf in the corresponding blog entries.

Running ScaleIO in the HomeDC

Posted on 10/05/2017 by Erik

In this Post, I will describe how I have come about in deploying a ScaleIO Software-Defined Storage in the Home Datacenter. Over the course of 2016, I have upgraded my clusters from VMware Virtual SAN Hybrid (Flash for Caching Tier and SAS Enterprise disks for Capacity Tier) to an All Flash Tiering. This has released Multiple 4TB SAS Enteprise disk from the vSAN config. Rather than remove these from the hosts, I decided to learn and test the Free and Frictionless edition of DellEMC ScaleIO.

My ScaleIO design crosses the boundaries of three VMware vSphere Clusters, and is hosted across eight Tower case servers in the Home Datacenter. In a normal production ScaleIO cluster, the recommendation is to have a minimum of 6 disk drivers per ScaleIO Data Server (the servers shading the storage). As you will see, in my design I spread the SAS Enterprise disks across the eight servers.

I’m not going to cover the definition of Protection Domains or Storage Pools in this article, but for this design, I have a single Protection Domain (pd1) with a single Storage Pool which I named SAS_pool. I did device the Protection Domain into three separate Fault Sets (fs1, fs2 and fs3), so as to spread failures across the hosts based on the power phase use in my datacenter.

I’ve run ScaleIO across my cluster for 10 months for some specific workloads that I just could not fit or did not want to fit on my VMware vSAN All-Flash environment.

Here is a large screenshot of my ScaleIO configuration as it’s re-balancing the workload across the hosts.

Each ScaleIO Data Server (SDS) was a CentOS 7 VM running on the ESXi and had two or three physical devices attached to it using RDM. Each SDS had a SSD device for the RFcache (Read Cache) and a single or dual SAS disk drive.

At the peak this deployment, the ScaleIO config had 41.8TB Usable Storage. I set the Spare Capacity at 8TB, leaving 34.5TB usable storage. With the double parity on the storage objects, I could only store 17.2TB of data to my VMs and my vSphere hosts.

Over the past 10 month of using ScaleIO, I’ve found two main limitations.

The ScaleIO release cycle, and even more so for people using the Free & Frictionless version of ScaleIO. The release cycle is out of sync with the vSphere release. Some version are only released to Dell EMC customer with support contracts, and some version take between 6 and 8 weeks to move from the restricted access to a public access. At the end of March 2017, there was no version of ScaleIO that supports vSphere 6.5.
Maintenance & Operations. As I wanted or needed to upgrade an ESXi host with a patch, a driver change or install a new version of NSX-v, I had to plan the power off the SDS VM running on the ESXi host. You can only put a single SDS in a planned maintenance mode per Protection Domain. So only one ESXi could be patched at a time. A simple cluster upgrade process with a DRS backed network, would now take much longer require more manual steps, put the SDS VM in maintenance mode, shutdown the SDS VM (and take the time to patch the Linux in the SDS VM), putting the host in maintenance mode, patching ESXi, restarting ESXi, exit maintenance mode, restart the SDS VM, exit the ScaleIO Maintenance mode, wait for the ScaleIO to rebuild the redundancy and move to the next host.

I’ve now decommissioned the ScaleIO storage tier as I needed to migrate to vSphere 6.5 for some new product testing.

Issuing CA Renewal operation

Posted on 10/05/2017 by Erik

There is a german proverb “Ubung macht den Meister” that I have always tried to apply to my day to day computer science skills. While dealing with my Public Key Infrastructure in the home datacenter (#HomeDC), this means having a proper multi-tier PKI infrastructure with a Standalone Root CA, an Issuing CA, a PKI Web publishing server for Certificates and Certificate Revocation List. Nearly everyone can setup a PKI infrastructure with Microsoft Windows Server using Next Next Next and a 40 years Root Certificate Authority, but I had to make this a bit more challenging and make it so that it needs a yearly maintenance process to keep my PKI skills fresh.

My PKI Certificate Lifecycle is based on the following schema:

You can find the original diagram on this Microsoft PKI Certificate Lifecycle article. So instead of having a Root CA that is valid for 20 years and an Issuing CA that is valid for 10 years, I went with smaller validity periods, like 8 years for the Root CA and 4 years with the Issuing CA.

I use two different set of Generation PKI Infrastructure. The G1 on which this article is written is using a Root CA with a RSA (4096 Bits) Public Key and a sha512RSA Signature Algorithm for my G1 tier and the same for my Issuing CA. The G2 that you will see on some of the screenshots is based on a Root CA with a Elliptic curve cryptography (ECC) P521 and a sha512ECDSA Signature Algorithm.

Since my infrastructure is now running since 2015, I’m now closing in to the half-time of the Issuing CA validity period. What I decided to do is the following renewal:

At T+4 years the Issuing CA certificate will be renewed with a new key pair. This action enforces the 4 year lifetime of the RSA key pair as agreed to when designing the PKI and PKI security. This will create a new CA certificate with a new key pair. This will also force the CA to generate a new CRL file, since there is a new key pair. A CRL signed by the “old” key pair will continue to be generated as long as the CA certificate associated with the “old” key pair is still time valid.

When you do a certificate renewal, the new version has a (1) behind it. The certificate request would now be called Issuing CA G1(1).req

Let’s have a look at the original Issuing CA certificate on the Root CA.

And the Issuing CA detail is

This is now impacting me when I attempt to sign new certificates with a validity of over 24 months. Because those are now limited in their validity until the 4th December 2019.

The first step on the Issuing CA is to Stop Service of the PKI and launch the Renew CA Certificate process. I decided to generate a new public and private key, so my new Issuing CA request file is now named Issuing CA G1(1). Take the certificate request to the Root CA. On the Root CA, Revoke the current Issuing CA certificate as it’s Superseded and Submit new request of the Issuing CA(1) request file. Issue the new SubCA certificate. We now have a Issuing CA certificate with two fields.

I need to export the signed certificate (I used the PKCS #7 .p7b with certificate path format), move it to the Issuing CA and Import CA Certificate.

In the following steps I’m doing a few more operations on the Root CA. Now that I have Revoked (Yeah with insight I might better have not revoked the original Issuing CA… might need to update this article if I run into issues…) it’s time to do the annual publishing of the Certificate Revocation List (CRL).

I can see in my Root CA CRL now the old revoked Issuing CA certificate serial number.

Moving along on the Issuing CA in the Active Directory, I’m publishing the update Root CA CRL using certutil -dsPublish RootCA.crl RootCA

For the computers and operating systems that are not in the Active Directory and that cannot check the state of the Certificates from the AD, I have a Windows server with the IIS Web server running that publishes the CRLs. This server while having the FQDN of pki-web.bussink.org is also referred by the alias pki.bussink.org on my network. I copied the updated Issuing CA(1) certificate and the Root CA CRL on the directory mapped by the IIS server.

On the Issuing CA in the Enterprise PKI tab, you can ensure that all paths to the Certificates, Certificate Revocation List and Delta CRL work. As you see in the top part of the following screenshot I had not yet copied the Issuing CA(1) certificate. That is corrected in the bottom part of the screenshot.

Having the Issuing CA running again, I forced a Publishing of the Issuing CA CRLs. You can now see them below on the Web server in Purple. There are two sets of the CRL, the ones for the original Issuing CA certificate and the set for the updated Issuing CA(1) certificate.

The files in the red boxes are the ones I manually added to my PKI-WEB repository. They are the annual Root CA CRL and the new Issuing CA G1(1) certificate (I already mentionned it above, I might have been a bit premature in removing the original Issuing CA G1 certificate. I will update this article if I run into serious issues).

I wrote this blog article more for myself as a recap of the operations, as I will have to redo it before 2021. While this is only 4 years down the road, I have already I had the opportunity once in my career to setup a Root CA infrastructure in 2004 with Windows Server 2003 and have to renew it completly 10 years later in 2014. This was a lot more complicated as I had to change the PKI CryptoProvider from the old one only support SHA1 to one that supported SHA2. This is a reminder to all professionals, if you setup a PKI, you might have to work on it again a decade later.

Network core switch Cisco Nexus 3064PQ

Posted on 21/08/2016 by Erik

Here is my new network core switch for the Home Datacenter, a Cisco Nexus 3064PQ-10GE.

Cisco Nexus 3064PQ-10GE (48x SFP+ & 4x QSFP+)

But before I speak more about the Cisco Nexus 3064PQ-10GE, let me just bring you back in time… Two years ago, I purchased a Cisco SG500XG-8F8T 16-port 10-Gigabit Stackable Managed Switch. This was first described in my Homelab 2014 build. This was my most expensive networking investment I ever did. During the past two years, as the lab grew, I used the SG500XG and two SG500X-24 for my networking stack. This stack is still running on the 1.4.0.88 firmware.

During these past two years, I have learned the hard way that network chipsets for 10GbE using RJ-45 cabling was outputting so much more heat than the SFP+ chipset. My initial Virtual SAN Hybrid implementation using a cluster of three ESXi host with Supermicro X9SRH-7TF (Network chipset is Intel X540-AT2) crashed more than once, when the network chipset became so hot that I lost my 10G connectivity, but the ESXi host kept on running. Only a powerdown & cool off of the motherboard, would allow my host to restart with the 10G connectivity. This also lead me to expand the VSAN Hybrid cluster from three to four hosts and to have a closer look at the heating issues when running 10G over RJ45.

Small business network switches with 10GBase-T connectivity are more expensive than the more enterprise oriented SFP+ switch, but they also output so much more heat (Measured in BTU/hr). Sure once the 10GBase-T switch is purchased, the cost of Category 6A cables is cheaper than getting the Passive Copper cables, who are limited to 7 meters.

The Cisco SG500XG-8F8T is a great switch as it allows me to connect using both RJ-45 and SFP+ cables.

As the lab expanded, I started to ensure that my new hosts have either no 10GBase-T adapters on the motherboard, or use the SFP+ adapter (Like my recent X10SDV-4C-7TP4F ESXi host). I have started using the Intel X710 Dual SFP+ adapters on some of my host. I like this Intel network adapter, as the network chipset gives out less heat than previous generations chipset, and has a firmware update function that can done from the command prompt inside of vSphere 6.0.

This brings me to the fact that I was starting to run out of SFP+ ports as the labs expands. I have found on ebay some older Cisco Nexus switch, and the one that caught my eye for it’s amount of ports, it price and it’s capabilities is the Cisco Nexus 3064PQ-10GE. These babies are going for about $1200-$1500 on ebay now.

The switch comes with 48-ports SFP+ and 4-ports in QSFP+ format. These four ports can be configured in either 16x10G using fan-out cables or 4x40G. This is a software command that can be put on the switch to change from one mode to the other.

Here is my switch with the interface output. I’m using a Get-Console Airconsole to extend the console port to my iPad over Bluetooth.

My vSphere 6.0 host is now connected to the switch using an Intel XL710-QDA2 40GbE network adapter and a QSFP+ copper cable.

I’m going to use the four QSFP+ connectors on the Cisco Nexus 3064PQ-10GE to connect my Compute cluster with NSX and VSAN All-Flash.

The switch came with NX-OS 5.0(3)U5(1f).

Concerning the heat output of the Cisco Nexus 3064PQ-10GE (datasheet) I was pleasantly surprised to note that it’s output is rather small at 488 BTU/hr when all 48 SFP+ are used. I also noted that the noise level of the fans was linked to the fan speed and the charge of the switch. Going from 59 dBA at 40% duty cycle to 66 dBA at 60% duty cycle to 71 dBA when at 100% duty cycle.

Here is the back of the Cisco Nexus 3064PQ-10GE. I did purchase the switch with a DC power (top of switch to the right), because the switch I wanted had both the LAN_BASE_SERVICES and the LAN_ENTERPRISE_SERVICES license. I sourced two N2200-PAC-400W-B power supply from another place.

Link to the Cisco Nexus 3064PQ Architecture.

Intel Xeon D-1518 (X10SDV-4C-7TP4F) ESXi & Storage server build notes

Posted on 19/08/2016 by Erik

These are my build notes of my last server. This server is based around the Supermicro X10SDV-4C-7TP4F motherboard that I already described in my previous article (Bill-of-Materials). For the Case I select a Fractal Design Node 804 square small chassis. It is described as being able to handle upto 10x 3.5″ disks.

Fractal Design Node 804

Here is the side view where the motherboard can be fitted. It supports MiniITX, MicroITX and the FlexATX of the Supermicro motherboard. Two 3.5″ harddrives or 2.5″ SSD can be fitted on the bottom plate.

The right section of the chassis, contains the space for eight 3.5″ harddrives, fixed in two sliding frame at the top.

Let’s compare the size of the Chassis, the Power Supply Unit and the Motherboard in the next photo.

Fractal Design Node 804, Supermicro X10SDV-4C-7TP4F and Corsair RM750i

When you zoom in the the picture above, you can see three red squares on the bottom right of the motherboard. Before you inser the motherboard in the chassis, you might want to make sure you have moved the mSATA pin from the position on the photo to the 2nd position, otherwise you will not be able to attach the mSATA to the chassis. You need to unscrew the holding grommet from below the motherboard. People having purchased the Supermicro E300-8D will have a nasty surprise. The red square in the center of the motherboard is set for M.2 sticks at the 2280 position. If you have a M.2 storage stick 22110, you better move the holding grommet also.

Here is another closer view of the Supermicro X10SDV-4C-7TP4F motherboard with the two Intel X552 SFP+ connectors, and the 16 SAS2 ports managed by the onboard LSI 2116 SAS Chipset.

In the next picture you see the mSATA holding grommet moved to accommodate the Samsung 850 EVO Basic 1TB mSATA SSD, and the Samsung SM951 512GB NVMe SSD in the M.2 socket.

In the next picture we see the size of the motherboard in the Chassis.At the top left, you will see a feature of the Fractal Design Node 804. A switch that allows you to change the voltage of three fans. This switch is getting it’s electricity thru a SATA power connector. It’s on this power switch, that I was able to put a Y-power cable and then drive the Noctua A6x25 PWM CPU fan that fits perfectly on top of the CPU heatsink. This allowed me to bring down the CPU heat buildup during the Memtest86+ test from 104c to 54c.

I used two spare Noctua Fan on CPU Heatsink fixer to hold the Noctua A6x25 PWM on the Heatsink, and a ziplock to hold those two fixers together (sorry I’m not sure if we have a proper name for those metal fixing brackets). Because the Noctua is getting it’s electricity from the Chassis and not the Motherboard, the Supermicro BIOS is not attemping to increase/decrease the Fan’s rpm. This allows me to keep a steady air flow on the heatsink.

Noctua A6x25 PWM fixed on heatsink

I have fitted my server with a single 4TB SAS drive. To do this I use a LSI SAS Cable L5-00222-00 shown here.

This picture shows the 4TB SAS drive in the left most storage frame. Due to the length of the adapter, the SAS cable would be blocked by the Power Supply Unit. I will only be able to expand to 4x 3.5″ SAS disk in this chassis. Using SATA drives, the chassis would take upto 10 disks.

View from the back once all is assembled and powered up.

This server with an Intel Xeon D-1518 and 128GB is part of my Secondary Site chassis.

The last picture shows my HomeDC Secondary Site. The Fractal Design Node 804 is sitting next to a Fractal Design Define R5. The power consumption is rated at 68 Watts for a X10SDV-4C-7TP4F with two 10GbE SFP+ Passive Copper connection, two SSDs and a single 4TB SAS drive.

HomeDC Secondary Site

Supermicro X10SDV-4C-7TP4F server Bill-of-Materials

Posted on 12/08/2016 by Erik

Another new host has joined the Home Datacenter (#HomeDC). This one is my first low powered Intel Xeon D-1500 server I get my hands on. There have been some great install guides about other Supermicro X10SDV motherboards on many sites, and I would recommend that you head over to Paul Braren’s (@tinkertry) TinkerTry site for a lot of great content. There are now also two small server from Supermicro that came out E200-8D and E300-8D. The motherboard I selected for my new host closely matches the one on the Supermicro E300-8D, described on TinkerTry.

I was looking for a motherboard that had great storage capabilities, 10G connectivity and low powered. As my Home Datacenter (#HomeDC) is growing, I find myself using more and more 10G SFP+ connectivity. This 10G SFP+ connectivity consumes less watts in the chipset, creating less heat inside the servers. SFP+ connecitivty allows me to use cheaper network switches. 10G Ethernet with RJ45 has a price premium, even if the Category 6A cables are cheaper than Passive Copper SFP+ cables.

I selected the Supermicro X10SDV-4C-7TP4F motherboard, it has a 7 year product life, support two SFP+ 10G connection, comes with a LSI/AVAGO 2116 SAS/SATA chipset with a total of 16 SAS ports. More than enough for a storage server. It comes with a M.2 socket and a mSATA socket. The Intel Xeon D-1518 is a quad cores processor running at 2.2Ghz. All in all a very good selection of specifications on such a small FlexATX motherboard.

The X10SDV series of motherboards come with the Intel X552 dual 10G network card. In case you are experiencing network connectivity issues, it is important to make sure your motherboard has the proper firmware. When I received my motherboard with the default bios 1.0, it gave me a serious scare. I was unable to get the two 10G links up with my Cisco SG500X and SG500XG switches. I had to upgrade to version 1.0a and clear the CMOS to get it to work.

I’ve been a long time user of the Fractal Design cases, and I wanted to have something small for the FlexATX, yet with lots of space for adding disks. So I selected the Fractal Design Node 804 cube chassis that supports MicroATX, MiniATX and the FlexATX like the Supermicro X10SDV series. The Node 804 is capable of having upto 10x 3.5″ disks. The case comes with three fans and a fan selector that is powered by a SATA power connector, so fans can run independant of the motherboard connectors. This is very usefull when you add a small Noctua NF-A6x25 PWM fan on top of the CPU heat sink. It is not spinning-up and down at the whim of the Supermicro motherboard choosing. I also liked the square look of the chassis.

For my power supply, I have decided to change from my usual Enermax for a Corsair RM750i power supply. I wanted a power supply that was capable of driving a lot of disks if I decided to increase the amount of disks, and a power supply that would be quiet under low power consumption. As you see below plenty of expansions and a power supply that stays fan-less until it it’s 45% of it’s charge. I added a Seagate Enterprise Capacity 4TB SAS drive in the chassis and when it’s running vSphere with some quiet VMs, the system is only consuming 69 Watts.

The Supermicro X10SDV-4C-7TP4F comes with the following expansions for storage.

PCI-Express	2 PCI-E 3.0 x8 slots
M.2	Interface: PCI-E 3.0 x4 Form Factor: M Key 2242/2280/22110 Support SATA devices
Mini PCI-E	Interface: PCI-E 2.0 x1 Support mSATA

In the M.2 socket, I added a Samsung SM951 512GB NVMe Solid State Disk and in the Mini PCI-E, I added the Samsung 850 EVO 1TB Basic Solid State Disk. The mSATA drive is used as the Boot device and to have a large datastore to keep VMs local to the host. The Samsung SM951 512GB NVMe SSD can be used for the caching part of a VSAN design or a rfcache when running scaleIO.

Another up front warning, before you place this motherboard in a chassis, you need to make sure to un-screw the mSATA holder stick to the right position, so you can use a standard mSATA. There is a tiny screw on top and bottom of the mSATA holding bolt.

The Supermicro X10SDV-4C-7TP4F CPU cooling is done with a passive CPU heat sink. But during the initial memory testing, I have found that the IPMI CPU Sensor was showing Critical heat warning during a memtest86+ run. I decided to add the Noctua A6X25 PWM fan on top of Xeon D-1518 processor. The fit is perfect, and when this fan is connected on the chassis fan subsystem (see the top right section in the photo at the bottom) the critical heat issues disappeared.

So let’s recap the Bill-of-Materials (BoM) for this server the way I have configured. The pricing has been assembled from amazon/newegg in the US, amazon/azerty.nl for the Euro and with Brack.ch for Switzerland. I have left out the cost of the HDD, as Your Mileage May Vary.

I will create a 2nd post on the build notes and pictures, but here is a teaser.

Enabling vRealize Log Insight agent on vRealize Automation 7 appliance

Posted on 27/05/2016 by Erik

While deploying a vRealize Automation (vRA) appliance at a customer yesterday, I wanted to setup the monitoring from vRealize Automation to the vRealize Log Insight (vRLI) solution with the new vRA 7 Content Pack. This Content Pack is available for download from the Marketplace, directly on each vRLI server.

When you install the vRA 7 Content Pack you can find the setup instruction in the tool tab. Here you will realize that you need to add two more Content Packs to be able to properly monitor a vRealize Automation installation.

Lets add the Apache – CLF and the vRealize Orchestrator Content Packs in the vRLI marketplace.

The vRealize Automation 7.0.1 Build 3622989 release now comes with a vRealize Log Insight Agent pre-installed. This agent is already running on the appliance when it is deployed, but it is not configured. You just need to connect using SSH to the vRA appliance and edit the /etc/liagent.ini configuration file.

edit the /etc/liagent.ini file

The three fields to edit are vRLI hostname, protocol (cfapi) and the port. Once this is done, you just need to restart the Log Insight Agent using service liagentd restart.

service liagentd restart

At this point the Log Insight Agent starts talking to your vRealize Log Insight host or cluster Virtual IP address.

When checking the vRealize Log Insight Administration pane, in the Agents tab, I now see in my All Agents the vRA Appliance communicating.

The Log Insight Agent is now communicating, but it does not yet have a configuration on what to send back. Once an Agent pools the Log Insight server it will query what it needs to monitor. Here we will start to have fun and see the real power of vRealize Log Insight agent configuration.

We are going to use the appropriate vRA agent configuration template and make it an agent configuration that can be applied to our implementation.

Select the appropriate vRA agent group and Copy Template

From the Agent Drop-Down menu, scroll to the vRealize Automation 7 – Linux template and select on the right the Copy Template button. We will keep the configuration name as vRealize Automation 7 – Linux.

The next step is to select to which Log Insight Agents this configuration is applicable to. In the next screenshot I have selected to setup this Agent configuration using the IP Address of my vRA Appliance. I re-type the IP Address of the vRA Appliance in the filter, and use the Save New Group at the bottom.

Once save you can refresh the view and you will now see the configuration that is sent out to the qualified agent on their next regular pool of the vRLI host/cluster.

Here is another view of the vRealize Automation 7 – Windows group, and you see that I have applied this configuration only to the windows servers of a vRA Enterprise deployment.

You can apply multiple Log Insight Agent configurations to the same server. Another vRA example to monitor the IIS components of vRA IaaS elements

Example of monitoring Microsoft – Active Directory 2012 servers

You can see now how easy and fast it is to monitor servers with specific Log Insight Agent configurations.

Intel NUC Skull Canyon (NUC6I7KYK) and ESXi 6.0

Posted on 23/05/2016 by Erik

As part of my ongoing expansion of the HomeDC, I was excited to learn about the availability of the latest Quad-Core Intel NUC a few months ago. Last friday I received my first Intel NUC Skylake NUC6I7KYK. I only started setting it up this afternoon. I usually do disabled a few settings in the BIOS, but following the warning from fellow bloggers that people had issues getting the Intel NUC running with ESXi [virtuallyghetto.com] I did take a deeper look prior to the install. I was able to install ESXi 6.0 Update 2 (Build 3620759) on my 4th try after disabling more settings in the BIOS.

Here is the screenshot of the ESXi Host Client of the Intel NUC6I7KYK with BIOS 0034.

Here is a quick screenshot of the physical machine. I was planning to use the SDXC slot with an SDXC 32GB card to store the boot configuration of ESXi, but unfortunately I did not see the SDXC as a valid target during the ESXi install process. So I keep the USB key I was boot from and select it as the target. On the screenshot below you will also notice an extra Network card, the StarTech USB3 Gigabit Ethernet Network Adapter which driver you can get from VirtuallyGhetto’s web page Functional USB 3.0 Ethernet Adapter (NIC) driver for ESXi 5.5 & 6.0. Thanks William for this driver.

The Bill-of-Materials (BOM) of my assembly…

1x Intel NUC NUC6I7KYK
2x Crucial 16GB DDR4 – 2133 SODIMM 1.2v CT16G4SFD8213
1x USB 16 Stick
1x Samsung SM951 256GB NVMe SSD (VSAN 6.2 Caching Tier)
1x Samsung SM951 512GB NVMe SSD (VSAN 6.2 Capacity Tier)
1x StarTech USB 3.0 to Gigabit Ethernet Network Adapter

Here below you can see the Intel NUC with the two Samsung SM951 NVMe disks and the Crucial memory.

To get ESXi 6.0 Update 2 to install I disabled the following BIOS Settings.But as people have commented back after more test, you really only need to disable the Thunderbolt Controller to get ESXi to install.

BIOS\Devices\USB

disabled – USB Legacy (Default: On)
disabled – Portable Device Charging Mode (Default: Charging Only)
not change – USB Ports (Port 01-08 enabled)

BIOS\Devices\SATA

disabled – Chipset SATA (Default AHCI & SMART Enabled)
M.2 Slot 1 NVMe SSD: Samsung MZVPV256HDGL-00000
M.2 Slot 2 NVMe SSD: Samsung MZVPV512HDGL-00000
disabled – HDD Activity LED (Default: On)
disabled – M.2 PCIe SSD LEG (Default: On)

BIOS\Devices\Video

IGD Minimum Memory – 64MB (Default)
IGD Aperture Size – 256MB (Default)
IGD Primary Video Port – Auto (Default)

BIOS\Devices\Onboard Devices

disabled – Audio (Default: On)
LAN (Default)
disabled – Thunderbolt Controller (Default: On)
disabled – WLAN (Default: On)
disabled – Bluetooth (Default: On)
Near Field Communication – Disabled (Default is Disabled)

BIOS\Devices\Onboard Devices\Legacy Device Configuration

disabled – Enhanced Consumer IR (Default: On)
disabled – High Precision Event Timers (Default: On)
disabled – Num Lock (Default: On)

BIOS\PCI

M.2 Slot 1 – Enabled
M.2 Slot 2 – Enabled
M.2 Slot 1 NVMe SSD: Samsung MZVPV256HDGL-00000
M.2 Slot 2 NVMe SSD: Samsung MZVPV512HDGL-00000

Cooling

CPU Fan Header
Fan Control Mode : Cool

Performance\Processor

disabled Real-Time Performance Tuning (Default: On)

Power

Select Max Performance Enabled (Default: Balanced Enabled)

Secondary Power Settings

disabled – Intel Ready Mode Technology (Default: On)
disabled – Power Sense (Default: On)
After Power Failure: Power On (Default was stay off)

Sample view of the BIOS Onboard Devices as I deactivate some Legacy Device Configuration.

26/05 Update: Only the Thunderbolt Controller is stopping the ESXi 6.0 Update 2 installer to run properly. Re-activiting it after the install does not cause an issue in my limited testing.

Using virtual synology in a scale out distributed storage architecture

Posted on 25/04/2015 by Erik

I’ve recently finished upgrading the Home Datacenter (#HomeDC) to vSphere 6.0 with four hosts running VSAN 6.0 with dual 10GbE networking for each host.

Even running a few large virtual machines on the VSAN Datastore like VDP 6.0 with a 4TB backed disk, I found myself with a lot of spare storage. I’ve invested in the SAS disks (Seagate Enterprise Capacity 4TB SAS 7200rpm) backing the VSAN datastore, so the budget is gone for replacing the aging Synology DS1010+.

I’ve recently studied various reviews on the Synology DS2015xs, but found the CPU a bit lacking to drive the dual 10GbE SFP+ links, and the Synology DS3615xs is a bit expensive. So why not leverage the 10GbE NICs in my management cluster for ultra fast connections, the fast CPUs on my hosts are a nice addition too. The biggest advantage is “cheap” 10GbE file server connections.

The rest of the blog is going in a grey zone… it’s #unsupported

Let me show you the goods first.

virtual Synology DS3615xs running on VSAN datastore

The concept is to create a storage appliance, that leverages the VSAN datastore and its accelerations of read/writes, and provides a flexible structure, where you could increase the storage on an as needed basis, or create a temporary storage while migrating from one Synology to a newer one. All this running on a vSphere host. A concept that a lot of other companies are doing with their Virtual Storage Appliances.

I’m going to use the XPEnology operating system, which is based on the Synology DiskStation Manager (DSM).

In my design and implementation that I will describe here, the virtual synology has a 8TB disk. The appliance is not doing any RAID functions on this disk, as its already protected on the VSAN datastore using a number of failures to tolerate of 1 policy (FTT=1).
Another way would be to create two or four virtual disks with a number of failures to tolerate of 0, and do a Software RAID in the appliance.
A third way could be to use four physical disks and two SSDs on a host, create RDM links, and present all these disks to the virtual Synology appliance and do Software RAID on the disks, and use the SSD for caching (SSDcache). This virtual storage appliance would not be able to move to another host using vMotion, but you could mitigate this restriction using Synology High-Availability.

To build the virtual synology you will need to retrieve the latest copy of the XPEnology DS3615xs files. You are looking for XPEnoboot_DS3615xs_5.1-5022.3.vmdk or a more recent version. Each version can have its own deployment process. The process I have described below is using the XPEnoboot_DS3615xs_5.1-5022.3.vmdk version.

There is also a huge forum with lots of contributions and interesting links at the XPEnology forums.

1) Creating the vSynology

Now I’m going to say upfront, that you will need to upload the XPEnoboot_DS3615xs_5.1-5022.3.vmdk twice in the virtual storage appliance. Once for the initial install, which will format all disks of the appliance (including the boot vmdk), then again to boot the appliance.

We start by creating a new Virtual Machine.

We give it a name and place it in a Cluster.

And we store the virtual machine and its configuration files on an existing datastore. I have select my vsanDatastore.

We define the hardware compatibility of the virtual machine and select the Guess OS. We are going to use the Linux Other 3.x Linux (64-bit).

I have selected two CPU and 8GB of memory. Because my appliance won’t do any software RAID, 2 vCPU is more than enough.

I have added a second VMXNET3 network interface, which I put on a dedicated 10GbE Distributed Port Group. So eth0 goes out using uplink1 and eth1 goes out using uplink2. You see these changes in the summary of the appliance below.

2) Changing the Boot disk

We can now go back into the appliance and edit it. We remove the boot disk, and erase it from the disk. (Yeah missing screenshot of this step).

We then use the datastore browser to upload for the first time the XPEnoboot_DS3615xs_5-1-5022.3.vmdk in the appliance folder.

And we add this existing virtual disk to the appliance

The new boot disk is attached as an IDE disk on port IDE(0:0)

In the following screenshot, I’m adding the main disk to the storage appliance. I’m creating a 8TB (or 8192GB) virtual disk, and select my VSAN Storage Base Polci “VSAN High Perf”. The “VSAN High Perf” is defined as a Number of failures to tolerate of 1, and Number of disk stripes per object at 2.

Now you can start the appliance. Look closely at the IP addresses of the appliance and the MAC addresses. You want to start configuring the IP Addresses later on the proper NIC.

Using the Synology Assistant you can now see your appliance appear on the network. Use your browser and aim it to the IP address shown in the Synology Assistant to do the initial install.

We are installing the DSM using the Manual install.

Here you upload the DSM 5.1-5022 pattern file that you retrieved from the Synology download center in the DS3615xs selection.

It will now prompt you that it will erase all partitions on the attached disks of the appliance. This includes the XPEnoboot disk of the appliance.

Accordingly the expected behavior now, is that the boot disk is wiped and won’t boot.

Stop the appliance, and using the Datastore browser, you go erase the XPenoboot disk. Upload again for the 2nd time the XPEnoboot_DS3615xs_5.1-5022-3.vmdk in the folder.

3) Configuration using Synology Assistant

You can now restart the appliance. You will notice that the 2nd time the appliance boots, some of the messages like the IP address are not there anymore. And using the Synology Assistant, you see that the DHCP function isn’t started. The IP addresses are now 169.254.x.y

Select the proper network interface in the Synology Assistant using the MAC address, and select Setup. If you don’t select the proper MAC address you might need to change swap IP addresses later. So save yourself some time, and select the eth0 one.

The Synology assistant wizard will now start.

The Admin password at this time is blank, don’t enter any value. You can change the password later.

Enter the appliance Network settings.

Refreshing the Synology Assistant shows that you have the proper IP address now.

Time to connect to your newly deploy appliance.

You are now only a few steps away from using your storage appliance.

It is now time to change your admin account password.

We can now update the DSM 5.1-5022 version to the latest 5.1-5022-5 version. Depending on the CPU of your host, you will never have seen a Synology reboot so fast.

If you intend to use this virtual synology appliance to store data, I recommend you do some conditioning tests first, to see how it reacts in your environment.

I like the flexibility of the virtual synology appliance:

Adding a temporary repository for a data migration becomes easy if you have a lot of underlying VSAN datastore space.
Want to try out Synology High-Availability, add a 2nd appliance and create the High-Availability cluster.
Want to test a Synology with 10GbE interface, easy if your ESXi host has a 10G interface. (*)

In the coming weeks, I’m looking forward to deploy on my VSAN datastore another storage appliances that can scale out in this distributed storage architecture.

(*) I have found out that while having the virtual synology appliance with 10GbE on the backbone is awesome, yet I ran into upload bandwidth limits trying to upload data. My sources where connected to the core switch over 1GbE links, or the virtual machines being used as a source for testing, has its disk store on 1GbE NFS/iSCSI LUNs. To test the virtual synolgoy I copied large files from various sources.I had three sources pushing out 100-120MB/s, 60-70MB/s and 80-90MB/s of large sequential files to get the 2nd screenshot at the top and see the virtual synology write stats at 220MB/s.

Erik Bussink

Technology & Rants since 1991