Monday, March 9, 2020

CPU Isolation - and how dangerous it can be

I noticed that in an implementation of OpenStack, they had CPU Pinning configured. I wasn't sure why, so I asked, and I was told that it allowed an application (comprised of several VMs on several Compute hosts in an Availability Zone), to achieve bare-metal performance.

I didn't think too much about it.

THEN - when I finally DID start to look into it - I realized that the feature was not turned on.

CPU Pinning, as they were configuring it, was comprised of 3 parameters:
  1. cpu_isol - a Linux Kernel setting, passed into grub boot loader on grub command line. 
  2. vcpu_pin_set - defined in nova.conf - an OpenStack configuration file 
  3. reserved_host_cpus - defined in nova.conf - an OpenStack configuration file 
These settings have tremendous impact. For instance, they can impact how many CPUs OpenStack sees on the host. 

isol_cpu is a comma-delimited array of CPUs. vcpu_pin_set is also an array of CPUs, and what this does, is allow OpenStack Nova to place VMs (qemu processes), via libvirt APIs, on all or a subset of the full bank of isolated CPUs. 

So for example, you might isolate 44 CPUs on a 48 core system (24 cores x 2 threads per core). Then you might specify 24 of those 44 to be pinned by Nova/libvirt - and perhaps the remaining 20 are used for non-OpenStack userland processes (i.e. OpenContrail vrouter processes that broker packets in and out of virtual machines and the compute hosts).

So. In a lab environment, with isol_cpu isolating 44 cpus, and these same 44 cpus listed in the vcpu_pin_set array, a customer emailed and complained about sluggish performance. I logged in, started up htop, added the PROCESSOR column, and noticed that everything was running on a single cpu core.

Ironically enough, I had just read this interesting article that helped me realize very quickly what was happening.


Obviously, running every single userland process on a single processor core is a killer.

So why was everything running on one core?

It turned out, that when launching the images, there is a policy that needed to be attached to the flavors, called hw:policy=dedicated. 

When specified on the flavor, this property causes Nova to pass this information to libvirt, which knows to assign the virtual machine to one of the specific isolated CPUs.

When NOT specified, it appears that libvirt just shoves the task onto the first available CPU on the system - cpu 0. cpu 0 was indeed an isolated CPU, because the ones left out of the isol_cpu and vcpu_pin_set arrays were 2,4,26 and 28. 

So the qemu virtual machine process wound up on the isolated CPU (as it should have). But since there is no load balancing on CPUs when you isolate CPUs, the CPUs just fell onto CPU 0. 

Apparently, the flavor property hw:policy=dedicated is CRITICAL in telling libvirt to map an instance to a vcpu in the array.

Changing the flavor properties was not an option in this case, so what wound up happening, was to remove the vcpu_pin_set array in /etc/nova.conf, and to remove the isol_cpu array from the grub boot loader. This fixed the issue of images with no property landing on a single CPU. We also noticed that if a flavor did STILL use the flavor property hw:policy=dedicated, a cpu assignment would still get generated into the libvirt xml file - and the OS would place (and manage) the task on that CPU.

No comments:

Fixing Clustering and Disk Issues on an N+1 Morpheus CMP Cluster

I had performed an upgrade on Morpheus which I thought was fairly successful. I had some issues doing this upgrade on CentOS 7 because it wa...