Monday, March 30, 2020

How to run OpenStack on a single server - using veth pair



I decided I wanted to implement OpenStack using OpenVSwitch. On one server.

The way I decided to do this, was to spin up a KVM virtual machine (VM) as an OpenStack controller, and have it communicate to the bare metal CentOS7 Linux host (that runs the KVM hypervisor libvirt/qemu).

I did not realize how difficult this would be, until I realized that OpenVSwitch cannot leverage Linux bridges (bridges on the host).

OpenVSwitch allows you to create, delete and otherwise manipulate bridges - but ONLY bridges that are under the control of OpenVSswitch. So, if you happen to have a bridge on the Linux host (we will call it br0), you cannot snap that bridge into OpenVSwitch.

What you would normally do, is to create a new bridge on OpenVSwitch (i.e. br-ex), and migrate your connections from br0, to br-ex.

That's all well and good - and straightforward, most of the time. But, if you want to run a virtual machine (i.e. an OpenStack Controller VM), and have that virtual machine communicate to OpenStack Compute processes running on the bare metal host, abandoning the host bridges becomes a problem.

Virt-Manager, does NOT know anything about OpenVSwitches, nor OpenVSwitch bridges that OpenVSwitch controls. So when you create your VM, if everything is under an OpenVSwitch bridge (i.e. br-ex), Virt-Manager will only provide you a series of macvtap interfaces (macvtap, and  for that matter macvlan, are topics in and of themselves that we won't get into here).

So. I did not want to try and use macvtap interfaces - and jump through hoops to get that to communicate to the underlying host (yes, there are some tricks with macvlan that can do this presumably, but the rabbit hole was getting deeper).

As it turns out, "you can have your cake, and eat it too". You can create a Linux bridge (br0), and plumb that into OpenVSwitch with a veth pair. A veth pair is used just for this very purpose. It is essentially a virtual patch cable between two bridges, since you cannot join bridges (joining bridges is called cascading bridges, and this is not allowed in Linux Networking).

So here is what we wound up doing.


Monday, March 9, 2020

CPU Isolation - and how dangerous it can be

I noticed that in an implementation of OpenStack, they had CPU Pinning configured. I wasn't sure why, so I asked, and I was told that it allowed an application (comprised of several VMs on several Compute hosts in an Availability Zone), to achieve bare-metal performance.

I didn't think too much about it.

THEN - when I finally DID start to look into it - I realized that the feature was not turned on.

CPU Pinning, as they were configuring it, was comprised of 3 parameters:
  1. cpu_isol - a Linux Kernel setting, passed into grub boot loader on grub command line. 
  2. vcpu_pin_set - defined in nova.conf - an OpenStack configuration file 
  3. reserved_host_cpus - defined in nova.conf - an OpenStack configuration file 
These settings have tremendous impact. For instance, they can impact how many CPUs OpenStack sees on the host. 

isol_cpu is a comma-delimited array of CPUs. vcpu_pin_set is also an array of CPUs, and what this does, is allow OpenStack Nova to place VMs (qemu processes), via libvirt APIs, on all or a subset of the full bank of isolated CPUs. 

So for example, you might isolate 44 CPUs on a 48 core system (24 cores x 2 threads per core). Then you might specify 24 of those 44 to be pinned by Nova/libvirt - and perhaps the remaining 20 are used for non-OpenStack userland processes (i.e. OpenContrail vrouter processes that broker packets in and out of virtual machines and the compute hosts).

So. In a lab environment, with isol_cpu isolating 44 cpus, and these same 44 cpus listed in the vcpu_pin_set array, a customer emailed and complained about sluggish performance. I logged in, started up htop, added the PROCESSOR column, and noticed that everything was running on a single cpu core.

Ironically enough, I had just read this interesting article that helped me realize very quickly what was happening.


Obviously, running every single userland process on a single processor core is a killer.

So why was everything running on one core?

It turned out, that when launching the images, there is a policy that needed to be attached to the flavors, called hw:policy=dedicated. 

When specified on the flavor, this property causes Nova to pass this information to libvirt, which knows to assign the virtual machine to one of the specific isolated CPUs.

When NOT specified, it appears that libvirt just shoves the task onto the first available CPU on the system - cpu 0. cpu 0 was indeed an isolated CPU, because the ones left out of the isol_cpu and vcpu_pin_set arrays were 2,4,26 and 28. 

So the qemu virtual machine process wound up on the isolated CPU (as it should have). But since there is no load balancing on CPUs when you isolate CPUs, the CPUs just fell onto CPU 0. 

Apparently, the flavor property hw:policy=dedicated is CRITICAL in telling libvirt to map an instance to a vcpu in the array.

Changing the flavor properties was not an option in this case, so what wound up happening, was to remove the vcpu_pin_set array in /etc/nova.conf, and to remove the isol_cpu array from the grub boot loader. This fixed the issue of images with no property landing on a single CPU. We also noticed that if a flavor did STILL use the flavor property hw:policy=dedicated, a cpu assignment would still get generated into the libvirt xml file - and the OS would place (and manage) the task on that CPU.

Thursday, March 5, 2020

Mounting a Linux Volume over a File System - the bind mount trick

Logged into a VM today trying to help troubleshoot issues. There was nothing in /var/log! No Syslog!

Turns out that this phenomenon had occurred, where Linux will indeed let you mount on top of pretty much any directory, because after all, a directory is just a mount point as far as Linux is concerned.

But what happens to the files in original directory? I used to think they were lost. They're not. They're there, but shielded. They can be recovered, with a neat trick called a bind mount!

All described here! Learn something new every day.

A snippet of dialog from the link below:
https://unix.stackexchange.com/questions/198542/what-happens-when-you-mount-over-an-existing-folder-with-contents

Q. Right now /tmp has some temporary files in it. When I mount my hard drive (/dev/sdc1) on top of /tmp, I can see the files on the hard drive. What happens to the actual content of /tmp when my hard drive is mounted? 

 A. Pretty much nothing. They're just hidden from view, not reachable via normal filesystem traversal.

Q. Is it possible to perform r/w operations on the actual content of /tmp while the hard drive is mounted?

A. Yes. Processes that had open file handles inside your "original" /tmp will continue to be able to use them. You can also make the "reappear" somewhere else by bind-mounting / elsewhere.

# mount -o bind / /somewhere/else
# ls /somewhere/else/tmp  

 

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...