Grasping Technology: VNF

Showing posts with label VNF. Show all posts

Tuesday, September 24, 2019

Vector Packet Processing - Part I

Yesterday, I was reading up on something called Vector Packet Processing (VPP). I had not heard of this, nor the organization called Fd.io (pronounced Fido), which can be found at the following link: http://fd.io

Chasing links to get more up to speed, I found this article, which does a very good indoctrination on these newer networking technologies, which have emerged to support virtualization, due to the overhead (and redundancy) associated with forwarding packets from NICs, to virtualization hosts, and into the virtual machines.

https://software.intel.com/en-us/articles/an-overview-of-advanced-server-based-networking-technologies

I like how the article progresses from the old-style interrupt processing, to OpenVSwitch (OVS), to SR-IOV, to DPDK, and then, finally, to VPP.

I am familiar with OpenVSwitch, which I came into contact with OpenStack, which had OpenVswitch drivers (and required you to install OpenVSwitch on the controller and compute nodes).

I was only familiar with SR-IOV because I stumbled upon it and took the time to read up on what it was. I think it was a virtual Palo Alto Firewall that had SR-IOV NIC Types, if I'm not mistaken. I spent some time trying to figure out if these servers I am running support SR-IOV and they don't seem to have it enabled, that's for sure. Whether they support it would take more research.

And DPDK I had read up on, because a lot of hardware vendors were including FastPath Data switches that were utilizing DPDK for their own in-house virtual switches, or using the DPDK-OpenVSwitch implementation.

But Vector Packet Processing (VPP), this somehow missed me. So I have been doing some catch-up on VPP, which I won't go into detail on in this post or share additional resources on such a large topic. But the link above to Fido is essentially touting VPP.

UPDATE:
I found this link, which is also spectacularly written:
https://www.metaswitch.com/blog/accelerating-the-nfv-data-plane

And, same blog with another link for those wanting the deep dive into VPP:
https://www.metaswitch.com/blog/fd.io-takes-over-vpp

Thursday, July 25, 2019

ONAP - Just Too Big and Heavy?

I have been working with Service Orchestrators for a while now. Here are three of them I have had experience with:

Heat - which is an OpenStack Project, so while OpenStack can be considered the VIM (Virtual Infrastructure Manager), Heat is an Orchestrator that runs on top of OpenStack and allows you to deploy and manage services
Open Baton - this was the ORIGINAL Reference Implementation for the ETSI MANO standards, out of a Think Tank in Germany (Frauenhofer Fokus).
ADVA Ensemble - which is an ETSI-based Orchestrator that is not in the public domain. It is the IPR of ADVA Optical Networks, based out of Germany.

There are a few new Open Source initiatives that have surpassed Open Baton for sure, and probably Heat also. Here are a few of the more popular open source ones:

ONAP - a Tier 1 carrier solution, backed by the likes of at&t.
OSM - I have not examined this one fully. TODO: Update this entry when I do.
Cloudify - a private commercial implementation that bills itself as being more lightweight than the ONAP solution.

I looked at ONAP today. Some initial YouTube presentations were completely inadequate for allowing me to "get started". One was a presentation by an at&t Vice President. Another was done by some architect who didn't show a single slide on the video (the camera was trained on the speaker the whole time).

This led me to do some digging around. I found THIS site: Setting up ONAP

Well, if you scroll down to the bottom of this, here is your "footprint" - meaning, your System Requirements, to install this.

ONAP System Requirements

Okay. This is for a Full Installation, I guess. The 3Tb of Disk is not that bad. You can put a NAS out there and achieve that, no problem. But 148 VCPU???? THREE HUNDRED THIRTY SIX Gig of RAM? OMG - That is a deal killer in terms of being able to install this in a lab here.

I can go through and see if I can pare this down, but I have a feeling that I cannot install ONAP. This is a toy for big boys, who have huge servers and lots of dinero.

I might have to go over and look at OSM to see if that is more my size.

I will say that the underlying technologies include Ubuntu, OpenStack, Docker and Mysql - which are pretty mainline mainstream.

Thursday, July 18, 2019

Q-In-Q Tunneling and VLAN Translation

I have been working on this Customer Premise Edge (CPE) project, in which a Service Orchestrator deploys Virtual Network Functions (VNFs) to a "piece of hardware". In the specific implementation I have been working with, the CPE runs a FastPath DataSwitch, and 3-4 Docker containers that in turn run:

OpenStack (Compute Node in one container, Controller Node in another)
VRFs
a FastPath Data Switch

The architecture looks, essentially, as shown below:

Two Customer Premise Edge Devices Connecting over a Secure Transport Overlay Network (SD-WAN)

This architecture relies on Layer 2 (Ethernet Frame) Forwarding. So what happens, essentially, is that when a virtual network is created, a "service" is generated at runtime, which connects two "interface ports" (can be L3, L2, et al). But because an interface is a physical (i.e. Operating System managed) device, traffic is run through a virtual concept called a "service port" as it comes off the wire. Depending on what kind of service it is, there are different topologies, and rulesets that can (and must) be configured and applied to make traffic handling and flows work properly.

I was not altogether very interested in this concept of "match sets" that were required to configure these services - initially. I just keyed in what I was told (which was an asterisk to allow all traffic).

But, finally, I became more interested in a deep-dive on these rules. I noticed that there was a rule to configure "inner VLAN" and "outer VLAN" settings. Huh? What does that mean? A VLAN is a VLAN, right? Well, sort of. Kind of. Not exactly.

As it turns out, in order to handle multi-tenant traffic (a service provider managing multiple customers), VLANs can overlap. And you cannot have Customer A's traffic going to Customer B, or you will find yourself out of business pretty quickly (possibly with a liability lawsuit).

So - they came up with concepts like Q-in-Q Tunneling and VLAN Translation to deal with these problems. Now, you can have a Customer VLAN (C-VLAN), and a Service Provider VLAN (S-VLAN), and you can map and manage the packets based on S-VLANS without manipulating and or disturbing the original customer VLAN that is set on the frame.

So NOW - I understand why these match set rules have fields for an "inner" and an "outer" VLAN.

Just to be thorough, the outer VLAN, by the way, is the S-VLAN (therefore the inner VLAN is the C-VLAN).

Credit for this explanation and understanding goes to this link (although there are probably numerous sources for this concept available on the world wide web):

Q-in-Q Tunneling and VLAN Translation

Thursday, July 11, 2019

Palo Alto Firewall VM Series - Integration and Evaluation - Part IV

Okay this is probably a final post on the Palo Alto Firewall VM Series Integration project.

This project was aimed at doing a proof of concept reference implementation of a fully integrated and provisioned SD-WAN, using a Palo-Alto Virtual Firewall to protect the edges of two campus networks (e.g. East and West).

It uses a Service Orchestration software platform (Service Orchestrator) to model and specify services (networks) that the Orchestrator will deploy, out to the field, to the CPE, at the push of a button.

What are we deploying?

We are deploying overlay networks (SD-WAN) to Customer Premise Edge (CPE) devices. These devices are typically deployed smack on the edge of a premise network, and for that reason have optical SFP termination ports, and Gigabit (or 10 Gig in some cases) ports.

Now to make these overlay networks work in these modern times, we rely on Virtualization. Virtual Machines, Containers (i.e. Docker), OpenStack, et al. (depending on the CPE and software chosen).

Network Function Virtualization (NFV) - meaning non-iron-based network functions like Routing, Switching, Network Stacks in Namespaces, and Overlay networks (i.e. secure overlay) - ties all of this together. NFV is a fairly recent technology and has made networking (as if it weren't complicated enough already) a bit MORE complicated. But virtualizing applications and processes without also virtualizing the networking the rely on to intercommunicate doesn't achieve the objectives that everyone is after by using virtualization in the first place. It is like going halfway and stopping.

So - we have been deploying this new VM-Series Palo Alto Firewall in a network "topology" in an automated fashion, using the Palo Alto API to provision all of the "things that need to be provisioned", a few of which are mentioned below.

Interfaces
Zones
Security Policies
Static Routes
Policy-Based Forwarding Rules

This is done, by using the Palo-Alto XML API.

So to do this, at least in this proof of concept, we provision:

A Palo-Alto Firewall
A virtual machine (temporary) that provisions the Palo-Alto Firewall
A SD-WAN Gateway behind the Palo-Alto Firewall
A client or server behind the Gateway.

Topology Provisioned on a CPE device - Site West
screenshot: ADVA Ensemble Orchestrator

Sounds easy, right? It's not. Here are just a few of the challenges you face:

Timing and Order

The Firewall needs to be deployed first, naturally. Otherwise you cannot provision it.
The Firewall Provisioner configures all of the interfaces, routes, zones, policies and rules. This is done on a management network - separate from the service plane, or traffic plane as it is often referred to.
With a properly configured firewall in place, the Gateway can make outbound calls to the Cloud.
Once the Gateway is up and cloud-connected, the server can begin listening for and responding to requests.

Routing

Dynamic routing stacks are difficult to understand, set up and provision; especially BGP. Trying to do all of this "auto-magically" at the push of a button is even more challenging.
In our case, we used static routes, and it is important to know how these routes need to be established.

Rules

Indeed, it is sometimes difficult to understand why traffic is not flowing properly, or at all. Often is due to routing. It can also be due to not having proper rules configured.

Programmatic APIs

Making API calls is software development. So the code has to be written. It has to be tested. And re-tested. And re-tested.

When you deploy a network push-button, the visual ease at which it all happens starts to make people take it for granted, and NOT fully appreciate all of the complexity and under-the-hood inner workings that make it all come to light.

Most of the network troubleshooting (or all of it perhaps) had to do with missing or incorrect routes, or missing rules. It can be a chicken or egg problem trying to figure out which one is a culprit.

In this particular CPE, it runs:

an OS (Linux)
two docker containers - that contain OpenStack nodes (Controller node and Compute node)
a FastPath Data Switch - that map physical ports to virtual ports

The "magic" is in the automation and integration. The data switch is pre-provisioned and custom developed for performance. The integration with OpenStack allows virtual ports and virtual switch services to be created / removed as new Virtualized Network Functions (VNFs) are initialized or terminated on demand.

Then, on top of this, you have the VNFs themselves; in this case here, the Palo Alto Firewall, the Edge Gateway behind this firewall, and a Server on a trusted network behind the gateway.

Traffic from the server will run through the firewall for inspection before it gets to the Gateway so that the firewall can inspect that traffic before the gateway encrypts it for outbound transport.

Then, the Gateway will encrypt and send the traffic back through the firewall, and out the door, on a virtual circuit (or multiple virtual circuits if multi-pathing is used), to the complementary gateway on the other side (think East to West across two corporate campus networks here).

Wednesday, October 31, 2018

Data Plane Development Kit (DPDK)

I kept noticing that a lot of the carrier OEMs are implementing their "own" Virtual Switches.

I wasn't really sure why, and decided to look into the matter. After all, there is a fast-performing OpenVSwitch, which while fairly complex, is powerful, flexible, and, well, open source.

Come to learn, there is actually a faster way to do networking than with native OpenVSwitch.

OpenVSwitch minimizes all of the context switching between user space and kernel space when it comes to taking packets from a physical port, and forwarding those packets to virtualized network functions (VNF) and back.

But - DPDK provides a means to circumvent the kernel, and have practically everything in user space interacting directly to the hardware (bypassing the kernel).

This is fast, indeed, if you can do this. But it bypasses all of the purposes of a kernel network stack, so there has to be some sacrifice (which I need to look into and understand better). One of the ways it bypasses the kernel is through Direct Memory Access (DMA), based on some limited reading (frankly, reading it and digesting it and understanding it usually requires several reads and a bit of concentration as this stuff gets very complex very fast).

The other question I have, is that if DPDK is bypassing the kernel en route to a physical NIC, what about other kernel-based networking services that are using that same NIC? How does that work?

I've got questions. More questions.

But up to now, I was unaware of this DPDK and its role in the new generation of virtual switches coming out. Even OpenVSwitch itself has a DPDK version.

Sunday, October 28, 2018

Service Chaining and Service Function Forwarding

I had read about the concept of service chaining and service forward functioning early on, in a SD-WAN / NFV book that I had read, which at the time was ahead of its time. I hadn't actually SEEN this, or implemented it, until just recently on my latest project.

Now, we have two "Cloud" initiatives going on at the moment, plus one that's been in play for a while.

Ansible - chosen over Puppet, and Chef in a research initiative, this technology is essentially used to automate the deployment and configurations of VMs (LibVirt KVMs to be accurate).

But there is no service chaining or service function forwarding in this.

OpenStack / OpenBaton - this is a project to implement Service Orchestration - using ETSI MANO descriptors to "describe" Network Functions, Services, etc.

But we only implemented a single VNF, and did not chain them together with chaining rules, or forwarding rules.

Kubernetes - this is a current project to deploy technology into containers. And while there is reliance and dependencies between the containers, including scaling and autoscaling, I would not say that we have implemented Service Chaining or Service Function Forwarding the way it was conceptualized academically and in standards.

The latest project I was involved with DID make use of Service Chaining and Service Function Forwarding. We had to deploy a VNF onto a Ciena 3906mvi device, which had a built-in Network Virtualization module that ran on a Linux operating system. This ran "on top" of an underlying Linux operating system that dealt with the more physical aspects of the box (fiber ports, ethernet ports both 1G and 100G, et al).

It's my understanding that the terms Service Chaining and Service Function Forwarding have their roots in the YANG reference model. https://en.wikipedia.org/wiki/YANG

This link has a short primer on YANG.

http://www.tail-f.com/wordpress/wp-content/uploads/2014/02/Tail-f-Instant-YANG.pdf

YANG is supposed to extend a base set of network operations that are spelled out in a standard called NETCONF (feel free to research this - it and YANG are both topics in and of themselves).

In summary, it was rather straightforward to deploy the VNF. You had to know how to do it on this particular box, but it was rather straightforward. What was NOT straightforward, was figuring out how you wanted your traffic to flow, and configuring the Service Chaining and Service Function Forwarding rules.

What really hit home to me is that the Virtual Switch (fabric) is the epicenter of the technology. Without knowing how these switches are configured and inter-operate, you can't do squat - automated, manual, with descriptors, or not. And this includes troubleshooting them.

Now with Ciena, theirs on this box was proprietary. So you were configuring Flooding Domains, Ports, Logical Ports, Traffic Classifiers, VLANs, etc. This is the only way you can make sure your traffic is hop-scotching around the box the way you want it to, based on rules you specify.

Here is another link on Service Chaining and Service Function Forwarding that's worth a read.