Thursday, December 2, 2021

HAProxy Problem Solved

Okay, an update on my latest post on HAProxy. It turns out that the issue had nothing to do with HAProxy at all, but that the clustering on the back-ends was not working properly.

So, I have reverted the HAProxy back to its original configuration, prior to getting into debug mode. 

Note: The Stats page in HAProxy, by the way, is an invaluable way to help see if your proxy is working correctly. You can see the load distribution, you can see the back-end health checks, etc. Plus the GUI is simple, informative and intuitive.

Below is a quick look at the front and back end configurations. In this configuration, we are using HAProxy as a Reverse Proxy, where we terminate incoming requests (SSL Termination), and rather than forward to our back-ends in the clear, we utilize back-end certificates to proxy to the back-ends using SSL. This is a bit unconventional perhaps in a typical reverse proxy scenario.

#---------------------------------------------------------------------
# ssl frontend
#---------------------------------------------------------------------
frontend sslfront
   mode tcp
   option tcplog
   bind *:443 ssl crt /etc/haproxy/cert/yourcert.pem
   default_backend sslback

#---------------------------------------------------------------------
# sticky load balancing to back-end nodes based on source ip.
#---------------------------------------------------------------------
backend sslback
   # redirects http requests to https which makes site https-only.
   #redirect scheme https if !{ ssl_fc }
   mode http
   balance roundrobin
   option ssl-hello-chk

   option httpchk GET /ping
   http-check expect string TESTPING

   stick-table type ip size 30k expire 30m
   stick on src
   #stick-table type binary len 32 size 30k expire 30m
   #stick on ssl_fc_session_id
   default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions

   server servername01 192.168.20.10:443 ssl check id 1 maxconn 1024
   server servername02 192.168.20.11:443 ssl check id 2 maxconn 1024
   server servername03 192.168.20.12:443 ssl check id 3 maxconn 1024

Monday, October 18, 2021

HAProxy - Aggravating Problem I Have Not Solved

I have not ever really blogged on proxies. I don't have a lot of proxy experience and don't consider myself a guru with proxies, load balancers, etc.

But more and more often, solutions have come in that require load distribution to an N+1 (Active Active) cluster. And, HAProxy is supposed to be a rather lightweight and simple approach, especially in situations where the mission is not totally critical, or the load is not seriously high.

I originally set HAProxy up to distribute load to a Cloudify cluster. And Cloudify provided the configuration for HAProxy that they had tested in their lab, and that they knew worked well. Later, I set HAProxy up to load balance our Morpheus cluster. Initially it was working fine. 

Or, so it seemed. Later, I noticed errors. The first thing you generally do when you see errors, is to tell HAProxy to use one node (and not 2 or 3), so that you can reduce troubleshooting complexity and examine the logs on just one back-end node.  So in doing this, I managed to rather quickly figure out that if I told HAProxy to use one back-end node, things worked fine. When I told HAProxy to use two or more back-end nodes, things didn't work.  

So that's where it all started.

The Problem
Below is a picture of what we are doing with HAProxy, and based on the picture below, web access comes in on the northbound side of the picture, and web access is not the problem we are having.  The problem, is that VMs that are deployed onto various internal networks by Morpheus "phone home" and they phone home on a different network interface. 

This works fine with a single back-end enabled. But if you enable more than one back-end in HAProxy, Morpheus fails to fully transition the state of the VM to "running".


HAProxy Flow

In testing this out a bit and dumping traffic, we initially noticed something interesting. The Source IP coming into each Morpheus node, was not the HAProxy VIP - it was the interface IP address. We wound up solving this, by telling KeepAliveD to delete and re-create the routes with the VIP to be used as the Source IP - but only when it had control of the VIP. But in the end, while this made traffic analysis (tcpdump on the Morpheus nodes) a bit more clear about the traffic flow, it did not solve the actual issue.

I STILL don't know why it works with one back-end, and not two or more. I have had Proxy experts in our organization come in and look, and they seem to think HAProxy is doing its job properly, and that the issue is on the back-end clustering. The vendor, however, is telling us the issue is with HAProxy.

Our next step may be to configure a different load balancer. That should definitely rule things out. I know Squid Proxy is quite popular, but these tools do have a Learning Curve, and I have zero zilch experience with Squid Proxy. I think we may use a Netscaler Load Balancer if we wind up going with another one.

I should mention that the HAProxy configuration is not the simplest. And as a result of configuring this, I have increased my general knowledge on Load Balancing.


Monday, October 4, 2021

The first Accelerated VNF on our NFV platform

 I haven't posted anything since April but that isn't because I haven't been busy.

We have our new NFV Platform up and running, and it is NOT on OpenStack. It is NOT on VMWare VIO. It also, is NOT on VMWare Telco Cloud!

We are using ESXi, vCenter, NSX-T for the SD-WAN, and Morpheus as a Cloud Management solution. Morpheus has a lot of different integrations, and a great user interface that gives tenants a place to log in and call home and self-manage their resources.

The diagram below depicts what this looks like from a Reference Architecture perspective.

The OSS, which is not covered in the diagram, is a combination of Zabbix and VROPS, both working in tandem to ensure that the clustered hosts and management functions are behaving properly.

The platform is optimized with E-NVDS, which is also referred to commonly as Enhanced Datapath which requires special DPDK drivers to be loaded on the ESXi hosts, for starters, as well as some configuration in the hypervisors. There are also settings to be made in the hypervisors to ensure that the E-NVDS is configured properly (separate upcoming post).

Now that the platform is up and running, it is time to start discussing workload types. There are a number of Workload Categories that I tend to use:

  1. Enterprise Workloads - Enterprise Applications, 3-Tier Architectures, etc.
  2. Telecommunications Workloads
    • Control Plane Workloads
    • Data Plane Workloads

Control Plane workloads are have more tolerances for latency and system resources than Data Plane Workloads do. 

Why? Because Control Plane workloads are typically TCP-based,  frequently use APIs (RESTful),  and tend to be more periodic in their behavior (periodic updates).  Most of the time, when you see issues related to Control Plane, it is related to back-hauling a lot of measurements and statistics (Telemetry Data). But generally speaking, this data in of itself does not have stringent requirements.

From a VM perspective, there are a few key things you need to do to ensure your VNF behaves as a true VNF and not as a standard workload VM. These include:

  • setting Latency Sensitivity to High, which turns off interrupts and ensures that poll mode drivers are used.
  • Enable Huge Pages on the VM by going into VM Advanced Settings and adding the parameter: sched.mem.lpage.enable1GHugePage = TRUE

Note: Another setting worth checking, although we did not actually set this parameter ourselves, is: sched.mem.pin = TRUE

Note: Another setting, sched.mem.maxmemctl ensures that ballooing is turned off. We do NOT have this setting, but it was mentioned to us, and we are researching this setting.

One issue we seemed to continually run into, was a vCenter alert called Virtual Machine Memory Usage, displaying in vCenter as a red banner with "Acknowledge and Reset to Green" links. The VM was in fact running, but vCenter seemed to have issues with it. The latest change we made that seems to have fixed this error, was to check the "Reserve all guest memory (All locked)" option checkbox.

This checkbox to Reserve all guest memory seemed intimidating at first, because the concern was that the VM could reserve all memory on the host. That is NOT what this setting does!!! What it does, is allow the VM to reserve all of its memory up-front - but just the VM memory that is specified (i.e. 24G). If the VM has has HugePages enabled, it makes sense that one would want the entire allotment of VM memory to  memory to be reserved up front and be contiguous. When we enabled this, our vCenter alerts disappeared.

Lastly, we decided to change DRS to Manual in VM Overrides. To find this setting amongst the huge number of settings hidden in vCenter, you go to the Cluster (not the Host, not the VM, not the Datacenter) and the option for VM Overrides is there, and you have four options:

  • None
  • Manual
  • Partial
  • Full

The thinking here, is that VMs with complex settings may not play well with vMotion. I will be doing more research on DRS for VNFs before considering setting this (back) to Partial or Full.

Monday, April 26, 2021

Tenancy is Critical on a Cloud Platform

With this new VMWare platform, it was ultimately decided to go with ESXi hypervisors, managed by vCenter, and NSX-T.  

During the POC, it was pointed out that this combination of solutions had some improvements and enhancements over OpenStack (DRS, vMotion, et al). But one thing seemed to be overlooked, and we pointed it out: Tenancy

VMWare attempts to address Tenancy with Vertical Stack point solutions, like vCloud Director (positioned at Service Providers), or vRealize Automation. The latter, is going through a complete transformation in its latest version.  These solutions are also expensive. And, if you don't have the budget, what are your options??

One option is to set up Resource Pools and Folders in vCenter. Not the cleanest solution because you cannot set policies, workflows, etc.

What else can you do? Well, you can use a Cloud Management solution.

We had Cloudify as an Orchestrator. And we evaluated that as a Cloud Management solution. But what we found in the end, was that Cloudify excelled at complex orchestration, but it was not designed and built, ground-up, to be a Cloud Management Platform.

It seemed that this (lack of) Tenancy seemed to become apparent to everyone all at once - once the platform came up on VMWare.  And, with Cloudify we lacked the Blueprint development to do the scores to hundreds of tasks that we needed to have. It needed integrations with NSX-T, vCenter, and a host of other solutions.

We looked at a couple of other solutions, and settled on a solution called Morpheus.

I will blog a bit more about Morpheus in upcoming posts. I have been very hands-on with it lately. 

Migrating from OpenStack to VMWare

Been a while since I have posted anything. The latest news, is that my employer has decided to replace OpenStack with VMWare. We had to do a side-by-side comparison POC (Proof of Concept), between VMWare with Contrail as the network SDN, and VMWare with NSX-T as the network SDN.

In the end, they chose NSX-T, and the architecture looked like:

  • Cloudify - Cloud Management  and Orchestration
  • VMWare vCenter Cloud (Private Cloud)
  • NSX-T - (I will probably blog more on NSX-T later)

So, I have had to go through a series of Bootcamp trainings on the following.

  • NSX-T
  • VeloCloud
 I will make some brief posts about these, as well as some other things, going forward.

 

Wednesday, February 3, 2021

DPDK Hands-On Part X - Launching a DPDK virtual machine with Virsh Libvirt

In DPDK Hands-On Part IX, we launched a virtual machine with a bash script, calling qemu with sufficient command line options that it would launch a virtual machine with DPDK ports.

To launch a virtual machine that has DPDK ports using virsh (LibVirt), the first thing you need to know, is that you can not use the virt-manager GUI. Why? The GUI does not understand vhostuser (DPDK) ports - at least not in the version I happen to be running. But aside of this, the process remains the same. Your virtual machine needs to be defined by xml, which is parsed by virsh when the VM is launched. In order to launch a DPDK virtual machine, the xml will need to be crafted properly, and this post will discuss the important sections of that xml.

I made several attempts at finding an xml file that would work properly. It took quite a while. There are a few examples on OpenVSwitch websites, as well as Intel-sponsored DPDK websites.  In the end, I found a link from an R&D engineer named Tomek Osinski that helped me more than any other, and I will share that here:

Configuring OVS-DPDK with VM for performance testing 

In this link, is a full-fledged xml (file).

Let me pick through the relevant sections of this file, and comment on the important sections.

Memory

<currentMemory unit='KiB'>8399608</currentMemory>
<memoryBacking>
  <hugepages>
    <page size='1' unit='G' nodeset='0'/>
  </hugepages>
</memoryBacking>

This section specifies that the VM will use in excess of 8G of RAM, but tells virsh (libvirt/KVM) to place the VM's memory on HugePages.

If you try to launch this VM, and the VM Host (server running KVM) does not have Hugepages enabled, or have enough Hugepages available, the launch of the VM will fail.

CPU

<vcpu placement='static'>8</vcpu> 
<cpu mode='host-model'>
  <model fallback='allow'/>
  <topology sockets='2' cores='4' threads='1'/>
  <numa>
    <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/>
  </numa>
</cpu> 
<cputune>
  <shares>4096</shares>
  <vcpupin vcpu='0' cpuset='14'/>
  <vcpupin vcpu='1' cpuset='15'/>
  <emulatorpin cpuset='11,13'/>
</cputune>

In this section, 8 virtual CPU (2 sockets x 4 cores each) are specified by the virtual machine. A NUMA cell is specified with virtual CPUs 0 and 1.

Of these 8, two virtual CPUs are pinned to the last two cores of the host (assuming a 16 core host that has cores 0-15).

The CPU pinning is optional. I elected to avoid CPU pinning on my own VM. One reason I avoided it is because I am already pinning my OpenVSwitch to specific cores using the pmd mask. But I am still covering this here because it is worth pointing out.

Network Interface (vhostuser)

    <interface type='vhostuser'>
      <mac address='00:00:00:00:00:01'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/dpdkvhostuser0' mode='client'/>
       <model type='virtio'/>
      <driver queues='2'>
        <host mrg_rxbuf='off'/>
      </driver>
    </interface>
 
    <interface type='vhostuser'>
      <mac address='00:00:00:00:00:02'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/dpdkvhostuser1' mode='client'/>
      <model type='virtio'/>
      <driver queues='2'>
        <host mrg_rxbuf='off'/>
      </driver>
    </interface>

In this section, the user has defined a network interface of type vhostuser.

As you may recall from my earlier blogs on the topic, a vhostuser port means that the VM acts as a client and OpenVSwitch acts as a server (which is not recommended by OpenVSwitch these days - they prefer vhostuserclient ports be used instead so that a reboot of the switch does not strands multiple VMs connected to it).

Tip: The Virt-Manager KVM GUI does not allow you to pick this kind of network interface from the menu when you set up an adaptor. So you need to put this in your xml, and if you do so, it indeed will show up in the GUI as a "vhost" interface.

Each mac address is set distinctively (loading two interfaces with same mac address is obviously going to cause trouble). Each NIC is specified to use multi-queuing (2 Rx/Tx queues per NIC). And, the specific socket to connect to on the OpenVSwitch is specified.

The super important thing to know about these interface directives, is that, prior to launching the VM, the following needs to be in place as a pre-requisite.

  • OpenVSwitch needs to be running
  • The DPDK vhostuser ports need to be created on the appropriate bridges on OpenVSwitch.
  • The datapath on the bridges of the OpenVSwitch that host the vhostuser ports need to be set to netdev (DPDK) rather than the default of system.

Once your xml is crafted properly, boot the VM with "virsh start [ vm name ]", and see if the VM boots with network connectivity. 

Tip: When your VM boots, KVM should number the interfaces as eth0, eth1, eth2 according to the order they are in the xml file. But I have found that it doesn't always do this, so it is a good idea to map the xml file mac addresses to those which show up in the VM so that you can ensure you are putting the right IP addresses on the right interfaces!!! Doing this will save you a TON OF DEBUGGING!

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...