Grasping Technology: OpenStack

Showing posts with label OpenStack. Show all posts

Monday, October 4, 2021

The first Accelerated VNF on our NFV platform

I haven't posted anything since April but that isn't because I haven't been busy.

We have our new NFV Platform up and running, and it is NOT on OpenStack. It is NOT on VMWare VIO. It also, is NOT on VMWare Telco Cloud!

We are using ESXi, vCenter, NSX-T for the SD-WAN, and Morpheus as a Cloud Management solution. Morpheus has a lot of different integrations, and a great user interface that gives tenants a place to log in and call home and self-manage their resources.

The diagram below depicts what this looks like from a Reference Architecture perspective.

The OSS, which is not covered in the diagram, is a combination of Zabbix and VROPS, both working in tandem to ensure that the clustered hosts and management functions are behaving properly.

The platform is optimized with E-NVDS, which is also referred to commonly as Enhanced Datapath which requires special DPDK drivers to be loaded on the ESXi hosts, for starters, as well as some configuration in the hypervisors. There are also settings to be made in the hypervisors to ensure that the E-NVDS is configured properly (separate upcoming post).

Now that the platform is up and running, it is time to start discussing workload types. There are a number of Workload Categories that I tend to use:

Enterprise Workloads - Enterprise Applications, 3-Tier Architectures, etc.
Telecommunications Workloads

Control Plane Workloads
Data Plane Workloads

Control Plane workloads are have more tolerances for latency and system resources than Data Plane Workloads do.

Why? Because Control Plane workloads are typically TCP-based, frequently use APIs (RESTful), and tend to be more periodic in their behavior (periodic updates). Most of the time, when you see issues related to Control Plane, it is related to back-hauling a lot of measurements and statistics (Telemetry Data). But generally speaking, this data in of itself does not have stringent requirements.

From a VM perspective, there are a few key things you need to do to ensure your VNF behaves as a true VNF and not as a standard workload VM. These include:

setting Latency Sensitivity to High, which turns off interrupts and ensures that poll mode drivers are used.
Enable Huge Pages on the VM by going into VM Advanced Settings and adding the parameter: sched.mem.lpage.enable1GHugePage = TRUE

Note: Another setting worth checking, although we did not actually set this parameter ourselves, is: sched.mem.pin = TRUE

Note: Another setting, sched.mem.maxmemctl ensures that ballooing is turned off. We do NOT have this setting, but it was mentioned to us, and we are researching this setting.

One issue we seemed to continually run into, was a vCenter alert called Virtual Machine Memory Usage, displaying in vCenter as a red banner with "Acknowledge and Reset to Green" links. The VM was in fact running, but vCenter seemed to have issues with it. The latest change we made that seems to have fixed this error, was to check the "Reserve all guest memory (All locked)" option checkbox.

This checkbox to Reserve all guest memory seemed intimidating at first, because the concern was that the VM could reserve all memory on the host. That is NOT what this setting does!!! What it does, is allow the VM to reserve all of its memory up-front - but just the VM memory that is specified (i.e. 24G). If the VM has has HugePages enabled, it makes sense that one would want the entire allotment of VM memory to memory to be reserved up front and be contiguous. When we enabled this, our vCenter alerts disappeared.

Lastly, we decided to change DRS to Manual in VM Overrides. To find this setting amongst the huge number of settings hidden in vCenter, you go to the Cluster (not the Host, not the VM, not the Datacenter) and the option for VM Overrides is there, and you have four options:

None
Manual
Partial
Full

The thinking here, is that VMs with complex settings may not play well with vMotion. I will be doing more research on DRS for VNFs before considering setting this (back) to Partial or Full.

Wednesday, April 1, 2020

Enabling Jumbo Frames on Tenant Virtual Machines - Should We?

I noticed that all of our OpenStack virtual machines had 1500 MTU on the interfaces. These seemed wasteful to me, since I knew that everything upstream (private MPLS network) was using jumbo frames.

I went looking for answers as to why the tenants were enabled with only 1500 MTU. Which led to me looking into who was responsible for setting the MTU.

OpenStack?
Neutron?
LibVirt?
Contrail?
something else?

As it turns out, Contrail, which kicks Neutron out of the way and manages the networking with is L3 VPN solution (MPLS over GRE/UDP), works in tandem with Neutron via a bi-directional Plugin (so you can administer your networks and ports from Horizon, or through a Contrail GUI.

But, as I have learned from a web discussion thread, Contrail takes no responsibility for setting the MTU of the virtual machine interfaces. It pleads the 5th.

The thread mentions that the MTU can be set in the Contrail DHCP server. I am not sure, if that would work if you used pre-defined ports, though (do those still use a DHCP mac reservation approach to getting an assigned IP Address?). Do other DHCP servers assign MTUs? DHCP can do a lot of stuff (they cannot cook you a good breakfast unfortunately). I didn't realize DHCP servers could set MTUs, too, until I read that.

Now - the big question. If we can set the MTU on virtual machines, should we? Just because you can, doesn't necessarily mean you should, right?

I set about looking into that. And I ran into some really interesting discussions (and slide decks) on this very topic, and some outright debates on it.

This link below, was pretty informative, I thought.

Discussion: What advantage does enabling Jumbo Frames provide?

Make sure you expand the discussion out with "Read More Comments! That is where the good stuff lies!"

He brings up considerations:

Everything in front of you, including WAN Accelerators and Optimizers, would need to support the larger MTUs.
Your target VM on the other side of the world, would need to support the larger MTU.
Unless you use MTU Path Discovery, and I read a lot of bad things about MTU-PD.
Your MTU setting in a VM, would need to consider any encapsulation that would be done to the frames - and Contrail, being a L3 VPN, does indeed encapsulate the packets.
On any OpenStack compute host running Contrail, the Contrail vRouter already places the payload into 9000 MTU frames, to send over the transport network. Maybe making it not necessary to use jumbo frames at the VM level?

Interesting stuff.

Monday, March 30, 2020

How to run OpenStack on a single server - using veth pair

I decided I wanted to implement OpenStack using OpenVSwitch. On one server.

The way I decided to do this, was to spin up a KVM virtual machine (VM) as an OpenStack controller, and have it communicate to the bare metal CentOS7 Linux host (that runs the KVM hypervisor libvirt/qemu).

I did not realize how difficult this would be, until I realized that OpenVSwitch cannot leverage Linux bridges (bridges on the host).

OpenVSwitch allows you to create, delete and otherwise manipulate bridges - but ONLY bridges that are under the control of OpenVSswitch. So, if you happen to have a bridge on the Linux host (we will call it br0), you cannot snap that bridge into OpenVSwitch.

What you would normally do, is to create a new bridge on OpenVSwitch (i.e. br-ex), and migrate your connections from br0, to br-ex.

That's all well and good - and straightforward, most of the time. But, if you want to run a virtual machine (i.e. an OpenStack Controller VM), and have that virtual machine communicate to OpenStack Compute processes running on the bare metal host, abandoning the host bridges becomes a problem.

Virt-Manager, does NOT know anything about OpenVSwitches, nor OpenVSwitch bridges that OpenVSwitch controls. So when you create your VM, if everything is under an OpenVSwitch bridge (i.e. br-ex), Virt-Manager will only provide you a series of macvtap interfaces (macvtap, and for that matter macvlan, are topics in and of themselves that we won't get into here).

So. I did not want to try and use macvtap interfaces - and jump through hoops to get that to communicate to the underlying host (yes, there are some tricks with macvlan that can do this presumably, but the rabbit hole was getting deeper).

As it turns out, "you can have your cake, and eat it too". You can create a Linux bridge (br0), and plumb that into OpenVSwitch with a veth pair. A veth pair is used just for this very purpose. It is essentially a virtual patch cable between two bridges, since you cannot join bridges (joining bridges is called cascading bridges, and this is not allowed in Linux Networking).

So here is what we wound up doing.

Monday, March 9, 2020

CPU Isolation - and how dangerous it can be

I noticed that in an implementation of OpenStack, they had CPU Pinning configured. I wasn't sure why, so I asked, and I was told that it allowed an application (comprised of several VMs on several Compute hosts in an Availability Zone), to achieve bare-metal performance.

I didn't think too much about it.

THEN - when I finally DID start to look into it - I realized that the feature was not turned on.

CPU Pinning, as they were configuring it, was comprised of 3 parameters:

cpu_isol - a Linux Kernel setting, passed into grub boot loader on grub command line.
vcpu_pin_set - defined in nova.conf - an OpenStack configuration file
reserved_host_cpus - defined in nova.conf - an OpenStack configuration file

These settings have tremendous impact. For instance, they can impact how many CPUs OpenStack sees on the host.

isol_cpu is a comma-delimited array of CPUs. vcpu_pin_set is also an array of CPUs, and what this does, is allow OpenStack Nova to place VMs (qemu processes), via libvirt APIs, on all or a subset of the full bank of isolated CPUs.

So for example, you might isolate 44 CPUs on a 48 core system (24 cores x 2 threads per core). Then you might specify 24 of those 44 to be pinned by Nova/libvirt - and perhaps the remaining 20 are used for non-OpenStack userland processes (i.e. OpenContrail vrouter processes that broker packets in and out of virtual machines and the compute hosts).

So. In a lab environment, with isol_cpu isolating 44 cpus, and these same 44 cpus listed in the vcpu_pin_set array, a customer emailed and complained about sluggish performance. I logged in, started up htop, added the PROCESSOR column, and noticed that everything was running on a single cpu core.

Ironically enough, I had just read this interesting article that helped me realize very quickly what was happening.

https://www.codeblueprint.co.uk/2019/10/08/isolcpus-is-deprecated-kinda.html

Obviously, running every single userland process on a single processor core is a killer.

So why was everything running on one core?

It turned out, that when launching the images, there is a policy that needed to be attached to the flavors, called hw:policy=dedicated.

When specified on the flavor, this property causes Nova to pass this information to libvirt, which knows to assign the virtual machine to one of the specific isolated CPUs.

When NOT specified, it appears that libvirt just shoves the task onto the first available CPU on the system - cpu 0. cpu 0 was indeed an isolated CPU, because the ones left out of the isol_cpu and vcpu_pin_set arrays were 2,4,26 and 28.

So the qemu virtual machine process wound up on the isolated CPU (as it should have). But since there is no load balancing on CPUs when you isolate CPUs, the CPUs just fell onto CPU 0.

Apparently, the flavor property hw:policy=dedicated is CRITICAL in telling libvirt to map an instance to a vcpu in the array.

Changing the flavor properties was not an option in this case, so what wound up happening, was to remove the vcpu_pin_set array in /etc/nova.conf, and to remove the isol_cpu array from the grub boot loader. This fixed the issue of images with no property landing on a single CPU. We also noticed that if a flavor did STILL use the flavor property hw:policy=dedicated, a cpu assignment would still get generated into the libvirt xml file - and the OS would place (and manage) the task on that CPU.

Wednesday, January 1, 2020

Cloudify

I have been doing some work with Cloudify.

First, someone gave me access to an instance. Without spending up-front time reading up on Cloudify, I always try to see if I can intuitively figure it out without reading anything.

Not the case with Cloudify.

I had to take some steps to "get into" Cloudify, and I will recap some of those.

1. I went to YouTube, and watched a couple of Videos.

This was somewhat valuable, but I felt this was "hands-on" technology. I knew I would need to install this in my home lab to get proficient with it; that was clear from watching the videos.

2. I logged onto a Cloudify Instance, and looked through the UI

I saw the Blueprints, but couldn't read any of the meta information. Finally I figured out that if I switched browsers, I could scroll down and see the descriptors.

3. Reading up on TOSCA - and Cloudify TOSCA specifically

In examining the descriptors, I realized they were Greek to me, and had to take a step back and read and learn. So I first started reading up on some of the TOSCA standards, and standards like these are tedious and frankly, quite boring after a while. But - as a result of doing this, I started to realize that Cloudify has extended the TOSCA descriptors. So, there is a degree of proprietary with regards to Cloudify, and in reading a few blogs, Cloudify "sorta kinda" follows the ETSI MANO standards, but in extending (and probably changing) some of the TOSCA YAML descriptors, they are going to create some vendor lock-in. They tout this as "value add", and "innovation" of course. Hey - that is how people try to make money with standards.

4. Finally, I decided to stick my toe in the water

I logged onto Cloudify Manager, and decided I would try the openstack-example-network.

It wouldn't upload, so I had to look into why. We had the v3.x version of the OpenStack Plugin, which requires a compat.xml file that was not loaded. In figuring this out, I realized we probably shouldn't even be using that version of the plugin since the plugin is supported on version 5.x of Cloudify Manager, and we were running version 4.6.

So, I uninstalled version 3.x of the OpenStack plugin. And tried to upload the sample example blueprint, and voila', success. I stopped there, because I wanted to see if I could create my own blueprint.

5. Created my own Blueprint

Our initial interest in a use case was not to deploy services per se, but to onboard new customers onto an OpenStack platform. So, I saw the palette in OpenStack Composer for the OpenStack Plugin v2.14.7, and it allowed you to create all kinds of OpenStack objects. I decided to put a User and a Project on the palette. I used some web documentation to learn about secrets (which were already created by a Cloudify Consultant who set up the system), and used those to configure the openstack_config items on the project and user. I then configured up the user and project sections.

I saved the blueprint,
validated the blueprint (no complaints from Composer on that),
and then uploaded the blueprint to Cloudify Manager.

6. I then attempted to Deploy the Blueprint

This seemed to work, but I did not see a new project or user on the system. I saw a bunch of console messages on the Cloudify Manager GUI, but didn't really see any errors.

It is worth noting that I don't see any examples on the web of people trying to "onboard" an OpenStack tenant. Just about all examples are people instantiating some kind of VM on an already-configured OpenStack platform (tenants, users, projects, et al already created).

7. Joined the Cloudify Slack Community

At this point, I signed up for the Cloudify Slack Community, and am trying to seek some assistance from this point on figuring out why my little blueprint did not seem to execute on the target system.

...Meanwhile, I spun up a new thread, and did some additional things:

8. Installed the Cloudify qcow2 image

If you try to do this, it directs you to the Cloudify Sales page. But there is a link to get the back versions, and I downloaded version 4.4 of the qcow2 image.

NOTE: I did not launch this in OpenStack. It surprised me that this was what they seemed to want you to do, because most Orchestrators I have seen operate from outside the OpenStack domain (as a VM outside of OpenStack).

This qcow2 is a CentOS7 image, and I could not find a password to get into the operating system image itself (i.e. as root). What they instead ask you to do, is just hit the ip address from a browser using http (not https!), and see if you get a GUI for Cloudify Manager (I did). Then use your default login. I did log in successfully, and that is as far as I have gotten for now.

9. Installed the CLI

The CLI is an rpm, and I installed this rpm and it installed successfully. So I plan to configure that and use that CLI to learn the CLI and interact with Cloudify Manager.

So, let's see what I learn to get to the next steps. More on this later.

Thursday, August 1, 2019

OpenStack - Discussion on Cells

I have a partner who is still using OpenStack Newton.

I was asked to look into this, because OpenStack Newton is no longer supported by the OpenStack community; it has been End of Life'd (EOL).

OpenStack Ocata is still supported. I at one time set this up, and I didn't see any notable differences between Ocata and Newton, and my Service Orchestrator (Open Baton) seemed to still work with Ocata.

Ocata introduces the concept of Cells. Cells is an architecture concept that apparently (if I understand correctly), replaces (enhances?) the previous concept of Regions. It changes the way OpenStack is run, in terms of control and delegation of nodes and resources (Compute Node resources, specifically). It is a more hierarchical approach.

Here is a link on cells that I read to get this understanding: Discussion about OpenStack Cells

I didn't stop there, either. I read some more.

It turns out CERN (Particle Physics!? They run those Particle Accelerators and do stuff more complex than anything I am writing about!?) - THEY are (I assume they still are) big on OpenStack. Tons of and tons of material on what CERN is doing. Architectures, Topologies, yada yada. I don't have time to read all of that.

But, I did read THIS article, on moving from Cells v1 to Cells v2. It told me all I need to know. If you are using Cells, you need to jump over the Ocata release, and use Queens or later. Because more than half the OpenStack modules were deaf, dumb and blind as to the notion of what a Cell is. Obviously this causes problems.

So I guess the concept of a Cell is somewhat Beta, and partially supported in Ocata.

I guess you could move to Ocata in a small lab if you are not using Cells, and if the API remains a constant in conjunction with what happens to be leveraging it.

If anyone reads this, by all means feel free to correct and comment as necessary.

Thursday, July 11, 2019

Palo Alto Firewall VM Series - Integration and Evaluation - Part IV

Okay this is probably a final post on the Palo Alto Firewall VM Series Integration project.

This project was aimed at doing a proof of concept reference implementation of a fully integrated and provisioned SD-WAN, using a Palo-Alto Virtual Firewall to protect the edges of two campus networks (e.g. East and West).

It uses a Service Orchestration software platform (Service Orchestrator) to model and specify services (networks) that the Orchestrator will deploy, out to the field, to the CPE, at the push of a button.

What are we deploying?

We are deploying overlay networks (SD-WAN) to Customer Premise Edge (CPE) devices. These devices are typically deployed smack on the edge of a premise network, and for that reason have optical SFP termination ports, and Gigabit (or 10 Gig in some cases) ports.

Now to make these overlay networks work in these modern times, we rely on Virtualization. Virtual Machines, Containers (i.e. Docker), OpenStack, et al. (depending on the CPE and software chosen).

Network Function Virtualization (NFV) - meaning non-iron-based network functions like Routing, Switching, Network Stacks in Namespaces, and Overlay networks (i.e. secure overlay) - ties all of this together. NFV is a fairly recent technology and has made networking (as if it weren't complicated enough already) a bit MORE complicated. But virtualizing applications and processes without also virtualizing the networking the rely on to intercommunicate doesn't achieve the objectives that everyone is after by using virtualization in the first place. It is like going halfway and stopping.

So - we have been deploying this new VM-Series Palo Alto Firewall in a network "topology" in an automated fashion, using the Palo Alto API to provision all of the "things that need to be provisioned", a few of which are mentioned below.

Interfaces
Zones
Security Policies
Static Routes
Policy-Based Forwarding Rules

This is done, by using the Palo-Alto XML API.

So to do this, at least in this proof of concept, we provision:

A Palo-Alto Firewall
A virtual machine (temporary) that provisions the Palo-Alto Firewall
A SD-WAN Gateway behind the Palo-Alto Firewall
A client or server behind the Gateway.

Topology Provisioned on a CPE device - Site West
screenshot: ADVA Ensemble Orchestrator

Sounds easy, right? It's not. Here are just a few of the challenges you face:

Timing and Order

The Firewall needs to be deployed first, naturally. Otherwise you cannot provision it.
The Firewall Provisioner configures all of the interfaces, routes, zones, policies and rules. This is done on a management network - separate from the service plane, or traffic plane as it is often referred to.
With a properly configured firewall in place, the Gateway can make outbound calls to the Cloud.
Once the Gateway is up and cloud-connected, the server can begin listening for and responding to requests.

Routing

Dynamic routing stacks are difficult to understand, set up and provision; especially BGP. Trying to do all of this "auto-magically" at the push of a button is even more challenging.
In our case, we used static routes, and it is important to know how these routes need to be established.

Rules

Indeed, it is sometimes difficult to understand why traffic is not flowing properly, or at all. Often is due to routing. It can also be due to not having proper rules configured.

Programmatic APIs

Making API calls is software development. So the code has to be written. It has to be tested. And re-tested. And re-tested.

When you deploy a network push-button, the visual ease at which it all happens starts to make people take it for granted, and NOT fully appreciate all of the complexity and under-the-hood inner workings that make it all come to light.

Most of the network troubleshooting (or all of it perhaps) had to do with missing or incorrect routes, or missing rules. It can be a chicken or egg problem trying to figure out which one is a culprit.

In this particular CPE, it runs:

an OS (Linux)
two docker containers - that contain OpenStack nodes (Controller node and Compute node)
a FastPath Data Switch - that map physical ports to virtual ports

The "magic" is in the automation and integration. The data switch is pre-provisioned and custom developed for performance. The integration with OpenStack allows virtual ports and virtual switch services to be created / removed as new Virtualized Network Functions (VNFs) are initialized or terminated on demand.

Then, on top of this, you have the VNFs themselves; in this case here, the Palo Alto Firewall, the Edge Gateway behind this firewall, and a Server on a trusted network behind the gateway.

Traffic from the server will run through the firewall for inspection before it gets to the Gateway so that the firewall can inspect that traffic before the gateway encrypts it for outbound transport.

Then, the Gateway will encrypt and send the traffic back through the firewall, and out the door, on a virtual circuit (or multiple virtual circuits if multi-pathing is used), to the complementary gateway on the other side (think East to West across two corporate campus networks here).

Grasping Technology