Grasping Technology: Virtualized Network

Showing posts with label Virtualized Network. Show all posts

Tuesday, December 3, 2019

Virtualized Networking Acceleration Technologies - Part II

In Part I of this series of posts, I recapped my research on these virtualized networking technologies, with the aim to build an understanding of:

what they are
the history and evolution between them

What I did not cover, was a couple of further questions:

When to Use Them
Can you Combine Them?

This link is a fantastic link that discusses item number one. Now, I can't tell how "right" or "accurate" he is, and I typically look down in comments for rebuttals and refutes (I didn't see any and most commenters seemed relatively uninformed on this topic).

He concludes that in East-West (inter-data center) traffic, DPDK wins, and in North-South traffic, SR-IOV wins.
https://www.telcocloudbridge.com/blog/dpdk-vs-sr-iov-for-nfv-why-a-wrong-decision-can-impact-performance/

Friday, November 15, 2019

OpenContrail - Part 1

When I came to this shop and found out that they were running OpenStack but were not running Neutron, I about panicked. Especially when I found out they were running OpenContrail.

OpenContrail uses BGP and XMPP as its control plane protocols and route advertisements/exchanges. And it uses MPLS over GRE/UDP to direct packets. The documentation says it CAN use VXLAN - which Neutron also seems to favor (over GRE tunneling). But here at least, it is being run in the way the designed of OpenContrail wanted it to run - which is as an MPLS L3VPN.

I am going to drop some links in here real quick and come back and flush this blog entry out.

Here is an Architectural Guide on OpenContrail. Make sure you have time to digest this.

https://www.juniper.net/us/en/local/pdf/whitepapers/2000535-en.pdf

Once you read the architecture, here is a Gitbook on OpenContrail that can be used to get more familiarity.

https://sureshkvl.gitbooks.io/opencontrail-beginners-tutorial/content/

Perhaps the stash of gold was the location of a 2013 video from one of the developers of vRouter itself. It turns out most of the stuff in this video is still relevant for OpenContrail several years later. I could not find these slides anywhere, so I did make my own slide deck that highlights important discussions that took place on this video, as well as some of the key concepts shown.

https://www.youtube.com/watch?v=xhn7AYvv2Yg

If you read these, you are halfway there. Maybe more than halfway actually.

Thursday, September 12, 2019

Graphical Network Simulator-3 (GNS3) - Part II Installation on a Linux Server

Okay for Part II of GNS3, I came in today looking to install GNS3 on a Linux Server.

I noticed that GNS3 is designed to run on Ubuntu Linux, and as I tend to run in a CentOS7 shop, I am now faced with the hump of putting an Ubuntu server in here, or trying to get this to run on CentOS7. It should run on CentOS7, right? After all, this is a Linux world, right? 😏

I decided to take one of my 32Gb RAM servers, an HP box, that runs CentOS7, and follow a cookbook for installing GNS3 on it.

I followed this link:
https://gns3.com/discussions/how-to-install-gns3-on-centos-7-

I chose this box because it runs X Windows. It didn't have Python 3.6 on it, or the pip36 used for installing and managing python 3.6 packages.

A lot of steps in this thing.

Some questions I have about this cookbook that I need to look into:

1. Why does the cookbook use VirtualBox on Linux? I have KVM installed. Surely I can use that instead of VirtualBox. I only use VirtualBox on my Win10 laptop. So I have, for now, skipped that section.

2. What is IOU support? I will need to google that.

UPDATE: IOU (also called IOL, which stands for IOS on Linux, is basically an IOS simulator) that can run on an i386 chipset. You would need and want that if you run any Cisco elements on the GNS3 simulator.

Thursday, July 18, 2019

Q-In-Q Tunneling and VLAN Translation

I have been working on this Customer Premise Edge (CPE) project, in which a Service Orchestrator deploys Virtual Network Functions (VNFs) to a "piece of hardware". In the specific implementation I have been working with, the CPE runs a FastPath DataSwitch, and 3-4 Docker containers that in turn run:

OpenStack (Compute Node in one container, Controller Node in another)
VRFs
a FastPath Data Switch

The architecture looks, essentially, as shown below:

Two Customer Premise Edge Devices Connecting over a Secure Transport Overlay Network (SD-WAN)

This architecture relies on Layer 2 (Ethernet Frame) Forwarding. So what happens, essentially, is that when a virtual network is created, a "service" is generated at runtime, which connects two "interface ports" (can be L3, L2, et al). But because an interface is a physical (i.e. Operating System managed) device, traffic is run through a virtual concept called a "service port" as it comes off the wire. Depending on what kind of service it is, there are different topologies, and rulesets that can (and must) be configured and applied to make traffic handling and flows work properly.

I was not altogether very interested in this concept of "match sets" that were required to configure these services - initially. I just keyed in what I was told (which was an asterisk to allow all traffic).

But, finally, I became more interested in a deep-dive on these rules. I noticed that there was a rule to configure "inner VLAN" and "outer VLAN" settings. Huh? What does that mean? A VLAN is a VLAN, right? Well, sort of. Kind of. Not exactly.

As it turns out, in order to handle multi-tenant traffic (a service provider managing multiple customers), VLANs can overlap. And you cannot have Customer A's traffic going to Customer B, or you will find yourself out of business pretty quickly (possibly with a liability lawsuit).

So - they came up with concepts like Q-in-Q Tunneling and VLAN Translation to deal with these problems. Now, you can have a Customer VLAN (C-VLAN), and a Service Provider VLAN (S-VLAN), and you can map and manage the packets based on S-VLANS without manipulating and or disturbing the original customer VLAN that is set on the frame.

So NOW - I understand why these match set rules have fields for an "inner" and an "outer" VLAN.

Just to be thorough, the outer VLAN, by the way, is the S-VLAN (therefore the inner VLAN is the C-VLAN).

Credit for this explanation and understanding goes to this link (although there are probably numerous sources for this concept available on the world wide web):

Q-in-Q Tunneling and VLAN Translation

Thursday, June 6, 2019

The Network Problem From Hell - Fixed - Circuitous Routing

Life is easy when you use a single network interface adaptor. But when you start using multiple adaptors, you start running into complexities because packets can start taking multiple paths.

One particular thing most network engineers want to avoid, is situations where a packet leaves through door #1 (e.g. NIC 1), and arrives through door #2. To fix this, though, requires some more advanced network techniques and tricks (separate routing tables per NIC, and corresponding rules to direct packets to use those separate routing tables).

So, I had this problem where an OpenStack-managed virtual machine stopped working because it could not reach OpenStack itself, which was running on the SAME machine that the virtual machine was running on. It was driving me insane.

I thought the problem might be iptables on the host machine. I disabled those. Nope.

I thought the problem might be OpenVSwitch. I moved the cable to a new NIC, and changed the bridge the virtual machine was using. Nope.

Compounding the problem, was that the OpenStack Host could ping the virtual machine. But the virtual machine could not ping the host. Why would it work one way, and not the other?

The Virtual Machine could ping the internet. It could ping the IP of the OpenStack router. It could ping the router that the host was connected to.

OpenStack uses Linux IP Namespaces, and in our case was using the Neutron OpenVSwitch Agent. An examination of these showed that the networking seemed to be configured just as it showed up in the Horizon Dashboard "Network Topology" visual interface.

One thing that is worth mentioning, is that the bridge mappings for provider networks is in the ml2_conf.ini file, and the openvswitch_agent.ini file. But the EXTERNAL OpenStack networks use a bridge based on a parameter setting in the l3_agent.ini file! So if the l3_agent.ini file has a bridge setting of, say, "br-ex" for external networks, and you don't have that bridge correspondingly configured in the other files, OpenStack will give you a message when you create the external network that it cannot reach the external network. We did run into this when trying to create different external networks on different bridges to solve the problem.

At wits end, I finally called over one of the more advanced networking guys in the company, and we began troubleshooting it using tcpdump. We finally realized that when the VM pinged the OpenStack host, the ICMP request packets were arriving on the expected NIC (em1 below), but no responses were going out on em1. When we changed tcpdump to use "any" interface, we saw no responses at all. Clearly the host was dropping the packets. But iptables was flushed! WHO/HOW were the packets getting dropped? (to be honest, we still aren't sure about this - more research required on that). But - we did figure out that the problem was a "circuitous routing" problem.

We figured maybe reverse path filtering was causing the issue. So we disabled that in the kernel. Didn't fix it.

Finally, we realized that what was happening, is that the VM sends all of its packets through the external network bridge, which was attached to a 172.22.0.0/24 network, and the packet went to the router, which routed it to its 172.20.0.0/24 port, and then to the host machine. But because the host machine had TWO NICs on BOTH those networks, the host machine did not send replies back the same way they came in. It sent the replies to its em2 NIC which was bridged to br-compute. And it was HERE that the packets were getting dropped. Since that NIC is managed by OpenVSwitch, we believe a loop-prevention flow rule in OpenVSwitch, or perhaps Spanning Tree Protocol, caused the packets to get dropped.

Illustration of the Circuitous Routing Issue & How it was solved

The final solution was to put in a host route, so that any packet to that particular VM would be sent outside of the host, upstream to the router, and back in through the appropriate 172.22.0.0/24 port on the host/bridge, to the OpenStack Router, where it would be NATd back to the 192.168.100.19 IP of the virtual machine.

Firewalls and Routing. These two seem to be where most problems occur in networking.

Wednesday, May 8, 2019

Berkeley Packet Filtering - replacement for iptables - AND nftables?

I came across this blog, from Jonathan Corbet, dated Feb 19th, 2018.

BPF Comes to Firewalls, by Jonathan Corbet

I found this rather fascinating, since I was aware that nftables seemed pre-ordained to be the successor to iptables. I had even purchased and read Steven Suehring's Linux Firewalls book, which covers both iptables and nftables.

At the end of the day, I only see iptables and firewalls based on iptables (e.g. FirewallD) being used. I have not encountered any nftables firewalls yet.

And the other noted point is that nftables IS in the current version of the Linux Kernel. BPF is not.

But, can BPF come into Linux distributions alongside nftables soon, and wind up replacing nftables?

That is the question.

Another interesting blog post addressing the impetus of BPF, is this one:

why-is-the-kernel-community-replacing-iptables

Sunday, October 28, 2018

Service Chaining and Service Function Forwarding

I had read about the concept of service chaining and service forward functioning early on, in a SD-WAN / NFV book that I had read, which at the time was ahead of its time. I hadn't actually SEEN this, or implemented it, until just recently on my latest project.

Now, we have two "Cloud" initiatives going on at the moment, plus one that's been in play for a while.

Ansible - chosen over Puppet, and Chef in a research initiative, this technology is essentially used to automate the deployment and configurations of VMs (LibVirt KVMs to be accurate).

But there is no service chaining or service function forwarding in this.

OpenStack / OpenBaton - this is a project to implement Service Orchestration - using ETSI MANO descriptors to "describe" Network Functions, Services, etc.

But we only implemented a single VNF, and did not chain them together with chaining rules, or forwarding rules.

Kubernetes - this is a current project to deploy technology into containers. And while there is reliance and dependencies between the containers, including scaling and autoscaling, I would not say that we have implemented Service Chaining or Service Function Forwarding the way it was conceptualized academically and in standards.

The latest project I was involved with DID make use of Service Chaining and Service Function Forwarding. We had to deploy a VNF onto a Ciena 3906mvi device, which had a built-in Network Virtualization module that ran on a Linux operating system. This ran "on top" of an underlying Linux operating system that dealt with the more physical aspects of the box (fiber ports, ethernet ports both 1G and 100G, et al).

It's my understanding that the terms Service Chaining and Service Function Forwarding have their roots in the YANG reference model. https://en.wikipedia.org/wiki/YANG

This link has a short primer on YANG.

http://www.tail-f.com/wordpress/wp-content/uploads/2014/02/Tail-f-Instant-YANG.pdf

YANG is supposed to extend a base set of network operations that are spelled out in a standard called NETCONF (feel free to research this - it and YANG are both topics in and of themselves).

In summary, it was rather straightforward to deploy the VNF. You had to know how to do it on this particular box, but it was rather straightforward. What was NOT straightforward, was figuring out how you wanted your traffic to flow, and configuring the Service Chaining and Service Function Forwarding rules.

What really hit home to me is that the Virtual Switch (fabric) is the epicenter of the technology. Without knowing how these switches are configured and inter-operate, you can't do squat - automated, manual, with descriptors, or not. And this includes troubleshooting them.

Now with Ciena, theirs on this box was proprietary. So you were configuring Flooding Domains, Ports, Logical Ports, Traffic Classifiers, VLANs, etc. This is the only way you can make sure your traffic is hop-scotching around the box the way you want it to, based on rules you specify.

Here is another link on Service Chaining and Service Function Forwarding that's worth a read.

http://www.tail-f.com/wordpress/wp-content/uploads/2014/02/Tail-f-Instant-YANG.pdf

Thursday, October 11, 2018

What is a Flooding Domain?

I have been working on configuring this Ciena 3906MVI premise router, with a Virtualized Network Function (VNF), and connecting that VNF back to some physical network ports.

This is a rather complex piece of hardware (under the hood).

I noticed in some commands, they were creating these Flooding Domains. And I didn't know what those were (there were sub-types called VPWS and VPLS and I need to look into that as well).

These Flooding Domains are then associated with "classifiers", like "Ingress Classifiers".

I didn't truly know what a Flooding Domain was. Not a lot on the web if you search those two words together. There's plenty of stuff on the concept of Flooding, however.

I found a link where someone asked what the difference between Flooding and Broadcasting is, and it is in this link where I found the best clues to get the proper understanding. So I will recap that there:

https://networkengineering.stackexchange.com/questions/36662/what-is-the-difference-between-broadcasting-and-flooding

What is the Difference between Broadcasting and Flooding?

Broadcasting is a term that is used on a broadcast domain, which is bounded by layer-3 (routers). Broadcasts are sent to a special broadcast address, both for layer-2 and layer-3. A broadcast cannot cross a layer-3 device, and every host in a broadcast domain must be interrupted and inspect a broadcast.

Flooding is used by a switch at layer-2 to send unknown unicast frames to all other interfaces. If a frame is not destined for a host which receives it, the host will ignore it and not be interrupted. This, too, is limited to a broadcast domain.

Flooding in OSPF (layer-3) means that the routes get delivered to every OSPF router in an area. It really has nothing to do with a broadcast. OSPF doesn't use broadcasts to send routes, it uses unicast or multicast to connect with its neighbors. Each OSPF router needs to have a full understanding of all the routers and routes in its area, and it tells all its neighbors about all its local routes, and any routes it hears about from other neighbors. (OSPF routers are unrepentant gossips.)

So, a Flooding Domain is essentially a "domain of packet delivery" - where the point of the inbound (ingress) packet is not where the packet exits (egress). That's my best definition.

Grasping Technology