Tuesday, December 3, 2019

Virtualized Networking Acceleration Technologies - Part II


In Part I of this series of posts, I recapped my research on these virtualized networking technologies, with the aim to build an understanding of:

  • what they are
  • the history and evolution between them
What I did not cover, was a couple of further questions:
  1. When to Use Them
  2. Can you Combine Them?
This link is a fantastic link that discusses item number one. Now, I can't tell how "right" or "accurate" he is, and I typically look down in comments for rebuttals and refutes (I didn't see any and most commenters seemed relatively uninformed on this topic).

He concludes that in East-West (inter-data center) traffic, DPDK wins, and in North-South traffic, SR-IOV wins.
https://www.telcocloudbridge.com/blog/dpdk-vs-sr-iov-for-nfv-why-a-wrong-decision-can-impact-performance/

Friday, November 15, 2019

How LibVirt Networking Works - Under the Hood

This is the best link on this topic that I have found.

Lots of great pictures. Pictures are worth a thousand words.

https://www.redhat.com/en/blog/introduction-virtio-networking-and-vhost-net

OpenContrail - Part 1

When I came to this shop and found out that they were running OpenStack but were not running Neutron, I about panicked. Especially when I found out they were running OpenContrail.

OpenContrail uses BGP and XMPP as its control plane protocols and route advertisements/exchanges. And it uses MPLS over GRE/UDP to direct packets. The documentation says it CAN use VXLAN - which Neutron also seems to favor (over GRE tunneling). But here at least, it is being run in the way the designed of OpenContrail wanted it to run - which is as an MPLS L3VPN.

I am going to drop some links in here real quick and come back and flush this blog entry out.

Here is an Architectural Guide on OpenContrail. Make sure you have time to digest this.

https://www.juniper.net/us/en/local/pdf/whitepapers/2000535-en.pdf

Once you read the architecture, here is a Gitbook on OpenContrail that can be used to get more familiarity.

https://sureshkvl.gitbooks.io/opencontrail-beginners-tutorial/content/

Perhaps the stash of gold was the location of a 2013 video from one of the developers of vRouter itself. It turns out most of the stuff in this video is still relevant for OpenContrail several years later. I could not find these slides anywhere, so I did make my own slide deck that highlights important discussions that took place on this video, as well as some of the key concepts shown.

https://www.youtube.com/watch?v=xhn7AYvv2Yg

If you read these, you are halfway there. Maybe more than halfway actually.

High Packet Loss in the Tx of TAP Interfaces



I was seeing some bond interfaces that had high dropped counts, but these were all Rx drops.

I noticed that the tap interfaces on OpenStack compute hosts - which were hooked to OpenContrail's vRouter - had drops on the Tx.

So, in trying to understand why we would be dropping packets on Tap interfaces, I did some poking around and found this link.

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/ovs-dpdk_end_to_end_troubleshooting_guide/high_packet_loss_in_the_tx_queue_of_the_instance_s_tap_interface

From this, article, an excerpt:
"TX drops occur because of interference between the instance’s vCPU and other processes on the hypervisor. The TX queue of the tap interface is a buffer that can store packets for a short while in case that the instance cannot pick up the packets. This would happen if the instance’s CPU is prevented from running (or freezes) for a long enough time."

The article goes on and elaborates on diagnosis, and how to fix by adjusting the Tx Queue Length.

SaltStack


I had heard of Puppet. I had heard of Chef. And I knew Ansible quite well because someone I know looked at all three (Puppet, Chef and Ansible) and chose Ansible for our organization.

I had never heard of Salt.

Until now.

Mirantis uses Salt to manage OpenStack infrastructure.

So in having some familiarity with Ansible, it made sense to type into the search engine:
"ansible vs salt'.

Well, sure enough. Someone has done a comparison.

Ansible vs Salt

What I see a number of people doing with Salt, is running remote commands on nodes that they otherwise might not have access to. But - recently, I have started looking more into Salt and it appears to be architected quite similar to Ansible, and is also quite powerful.

One of the features I have recently played around with, is the ability to use "Salt Grains". You can get all kinds of "grains of information" from a host with Salt Grains. In my case, I am calling Salt and telling it to give me all of the grains for all of the hosts in JSON format - and then I parse the json and make a csv spreadsheet. Pretty cool.

There's a lot more. Like Salt States (equivalent to Ansible Modules I think?). There are Salt Pillars.

They use the "salt" theme pretty well in naming all of their extensions.

This link, is called Salt in Ten Minutes. Gives a pretty good overview.

Salt in Ten Minutes

This link, below, is quite handy in figuring out how to target your minions using regular expressions.
https://docs.saltstack.com/en/latest/topics/targeting/globbing.html#regular-expressions

Analyzing Dropped Packets


I recently saw an alert, which created a ticket, for "over 100 dropped packets in 10s".

I thought this was interesting:

  • What KIND of packets?
  • What if the drops are intentional drops?
  • Is 100 dropped packets a LOT? 
    • How often is this happening?
Lots of questions.

This got me into taking a quick look on a number of different Linux hosts, to see what the drop situation looked like on certain interfaces.

I noticed that most drops on one sample of hosts were Rx packets.
I noticed that drops on most hosts, were Tx packets.

In looking at netstat -s, you can get an amazing picture of exactly why packets are being dropped on a Linux host. It could be related to congestion control, like a socket buffer overrun (applications cannot read fast enough due to high CPU perhaps). Or, it could be dropped because it was supposed to be dropped - maybe there is a checksum error, windowing error, or a packet that should never have arrived in the first place.

One Network Engineer mentioned to me that some packets are dropped due to Packet Cloning, or Packet Redundancy features. These features were enabled so that far-end routers and switches that lost a packet (for one reason or another) close to the destination, didn't have to truck it all the way back to the source for a re-send.  But when this feature is used, you can get a lot of dropped packets due to "de-dupping". This could create a false positive. Juniper has a document that describes their Packet Redundancy, or Packet Protection, feature:


Interesting. Worth mentioning, or blogging about. Juniper has a

This link below is also interesting when it comes to finding out how to debug dropped packets.
https://community.pivotal.io/s/article/Network-Troubleshooting-Guide

Here is another interesting link on same topic.
https://jvns.ca/blog/2017/09/05/finding-out-where-packets-are-being-dropped/

Layer 2 Networking Configuration in Linux

I have not had a tremendous amount of exposure to Layer 2 Networking in Linux, or in general.

The SD-WAN product at my previous startup company has a Layer 2 Gateway that essentially would allow corporations to join LAN segments over a wide area network. So people sitting in an office in, say, Seattle, could be "theoretically" sitting next to some colleagues sitting in an office in, say, Atlanta. All on the same LAN segment. How the product did this is a separate discussions since it involved taking Ethernet Frames, and transporting / tunneling them across the internet (albeit in a very creative and very fast way due to link aggregation, UDP acceleration, multiple channels for delivering the tunneled packets, et al).

I only scratched the surface in terms of understanding the nuances of L2 with this. For example, I learned quite a bit about Bridging (from a Linux perspective). I learned a bit about Spanning Tree Protocol as well, and BPDUS.

I had heard about protocols like LLDP (Link Layer Discovery Protocol), and LACP (Link Aggregation Control Protocol), but since I was not dealing with commercial switches and things, I had no need for enabling, configuring, tuning or analyzing these protocols.

But - I am in an environment now, where these things start to matter a bit more. We run OpenStack hosts that connect to large Juniper switches. These Linux servers are using Link Aggregation and Bonding, and as such, are configured to use LACP to send PDUs to the switches.

LLDP is also enabled. With LLDP, devices advertise their information to directly-connected peers/neighbors. I found a good link that describes how to enable LLDP on Linux.
https://community.mellanox.com/s/article/howto-enable-lldp-on-linux-servers-for-link-discovery

This Juniper document does a pretty good job of discussing LACP.
Understanding LAG and LACP







Friday, November 8, 2019

Some handy sed commands for formatting a large concatenated json file

More on Salt later in a separate post, but I am using Salt to pull salt grains from a number of servers so that I can extrapolate out the gain items (in my case, serial number, operating system and so forth).

I ran into an issue where Salt concatenates the json file as a bunch of continguous blocks of json.

When you try to load this into a json parser, it fails.

So in researching how to split this file, I ran into one particularly clever person on the web who said, "don't SPLIT the file! just make it an array of json elements".

I wonder if this guy knew how much wisdom he expelled.

So - I needed some sed to "prepare" this file.

And here it is:

#!/bin/bash

# This script will take the huge json file from Salt and make it something we can parse by
# making each json block an array element.

# Step 1 - add commas between blocks
sed -i 's/^}/},/g' saltgraininfo.json

# Step 2 - remove that last comma out which I could not figure out how to do in Step 1 in same sed command.
sed -i '$s/,//g' saltgraininfo.json

# Step 3 - put a bracket in beginning of the file
sed -i '1s/^/[\n/' saltgraininfo.json

# Step 4 - put a bracket at the end of the file
sed -i '$s/}/}\n]/g' saltgraininfo.json

After I did this, I used Python and did a json.load on the file, and voila'! It loads!

On to the parsing now....

Friday, October 11, 2019

Beware: Swapping a NIC Card Out Changes the MAC Address

I observed an issue this week where a flapping NIC was replaced by a Dell technician.

When the Dell technician swapped out the NIC card (and left), the interfaces on the card would not come up and go into operation.

They were ABOUT to call Dell back in and swap out the motherboard, when I decided to wander over and take a look and get involved.

It is always important to remember, that when you change out a NIC card, the mac address CHANGES!

And you never know, where that previous mac address might have been used! Here are just a few things a mac address might be used:

  • an upstream DHCP server might be assigning an IP address based on mac address
  • firewalls might be using the mac address in certain rules and policies
  • interfaces in the OS (Linux in particular - especially CentOS) might not come up with a new mac address
    • CentOS7 has HWADDR directive in the interface configuration files
    • scripts in rc.local or udev may be using the mac address to do certain things
      • configure certain interfaces to bridges or bonds
In this particular case, a udev script was renaming a specific interface - based on mac address - and assigning it to a nic teaming configuration (bond).

Don't just swap NICs out! Figure out who might be paying attention to mac addresses before swapping! It can pay dividends.

Friday, September 27, 2019

Vector Packet Processing - Part IV - Testing and Verification

As I work through the documentation on fd.io, it discusses Installation, and then there is a section called "Running VPP".

The first error I encountered in this documentation had to do with running the VPP Shell. The documentation said to run the following command: "sudo vppctl -s run/vpp/cli-vpp.sock"

On a CentOS7 installation, the cli-vpp.sock file is actually called "cli.sock", not "cli-vpp.sock".  So in correcting this, indeed, I see a CLI shell, which I will show further down.

So there is a CLI to master with this. And to be any kind of guru, one will need to learn this. It does look like a more or less "standardized" CLI, with syntax commands that include the familiar, "show", "set", etc. I ran the "help" command to get a dump of commands, which showed a hefty number of sub-commands to potentially learn.

I decided to run a fairly simple "show interface" command, to see what that would produce. And, here is the result of that.

"show interface" results from VPP shell CLI - all interfaces down
So the CLI sees 4 Gigabit Ethernet interfaces, all in a state of "down". 

This server has two dual-port NIC cards, so it makes sense to me that there would be two found on GigabitEthernet1. Why there is only a single interface found on GigabitEthernet3, I need to look into (seems there should also be two of these). The local0 interface, I presume, is a NIC that is on the motherboard (I could see people confusing local0 with a loopback). 

If you proceed with the dp.io documentation, it actually instructs you to set up a veth pair - not the actual physical NICs on the box - and create interfaces that way and enable them, and then do some tracing. It probably makes some sense to do that, before trying to bring these Gigabit Ethernet NICs up and test those. Why? Well, for one reason, you could knock your connectivity to the server out, which would be bad. So let's leave our physical NICs alone for the time being.

So next step, we will run the veth steps and the tracing steps on the dp.io website.

Then, after that, I noticed there is a VPP Testing site on GitHub.

https://github.com/FDio/vpp/tree/master/test

It is written in Python, so you could run your Makefile commands and, hopefully, run these easily.

Vector Packet Processing - Part III - Ensuring a Supported PCI Device

Okay - today I took a quick look into why the "Unsupported PCI Device" errors were popping up when I started the VPP service.

It turns out, that the Realtek network adaptors on that server, are, in fact, not supported! Duh. This has nothing to do with VPP. It has to do with the underlying Data Plane Development Kit, on which VPP resides as a layer on top of (in other words, VPP uses DPDK libraries).

The DPDK site lists the adaptors that are supported, on this page of their website, entitled, "Supported Hardware".
http://core.dpdk.org/supported/

Sure enough, no long-in-the-tooth RealTek NICs are listed here.

So what would you do (on that server) to test and experiment with VPP?

  1. Well, you could swap out the adaptors. If you do that, you better make sure you think about static IP assignments based on MAC address because all of your MACs will change. 
  2. You could use a virtual adaptor that is supported.
Or, you could simply find another server. Which I did. And this server is using Intel adaptors that ARE supported.

VPP Startup with Supported Adaptors
Next, I ran the "vppctl list plugins" command, which dumped out a ton of .so (shared object) files. 

These files are shared libraries, essentially. Rather than linking stuff into all of the binaries (making them larger), a shared object or shared library accommodates multiple binaries using the code (they get their own local data segments but share a pointer to the code segment - as I understand it). 

So - it looks like we have a working VPP service on this service. Yay. What next? Well, here are a couple of possibilities:

1. OpenStack has a Neutron VPP driver. That could be interesting to look into, and see what it's all about, and how well it works.

2. Maybe there are some ways of using or testing VPP in a standalone way. For example, some test clients. 

I think I will look into number 2 first. At this point, I am only interested in functional testing here. I am not doing any kind of performance bake-offs. Not even sure I have the environment and tools for that right now. We're just learning here.
  

Tuesday, September 24, 2019

Vector Packet Processing - Part II - Installing VPP

As a wrap-up to my day, I decided to take one of my CentOS7 servers, and install vpp on it.

I followed the cookbook found at this link:
https://wiki.fd.io/view/VPP/Installing_VPP_binaries_from_packages#RPMs

This link doesn't tell you how to set up the vpp repository, which is necessary to install any of the vpp packages (a yum groupinstall would have been nice for this, actually).

But the link for the repository is here:
https://my-vpp-docs.readthedocs.io/en/latest/gettingstarted/users/installing/centos.html

For convenience I included the snippet below.
$ cat /etc/yum.repos.d/fdio-release.repo
[fdio-release]
name=fd.io release branch latest merge
baseurl=https://nexus.fd.io/content/repositories/fd.io.centos7/
enabled=1
gpgcheck=0
This didn't take long to do at all. No problem installing packages, no problem starting up the vpp service.
But, it looks to me like old hardware and old network cards don't support vpp. So more work to do.
Unsupported PCI Device Errors on vpp service startup

Vector Packet Processing - Part I

Yesterday, I was reading up on something called Vector Packet Processing (VPP). I had not heard of this, nor the organization called Fd.io (pronounced Fido), which can be found at the following link: http://fd.io

Chasing links to get more up to speed, I found this article, which does a very good indoctrination on these newer networking technologies, which  have emerged to support virtualization, due to the overhead (and redundancy) associated with forwarding packets from NICs, to virtualization hosts, and into the virtual machines.

https://software.intel.com/en-us/articles/an-overview-of-advanced-server-based-networking-technologies

I like how the article progresses from the old-style interrupt processing, to OpenVSwitch (OVS), to SR-IOV, to DPDK, and then, finally, to VPP.

I am familiar with OpenVSwitch, which I came into contact with OpenStack, which had OpenVswitch drivers (and required you to install OpenVSwitch on the controller and compute nodes).

I was only familiar with SR-IOV because I stumbled upon it and took the time to read up on what it was. I think it was a virtual Palo Alto Firewall that had SR-IOV NIC Types, if I'm not mistaken. I spent some time trying to figure out if these servers I am running support SR-IOV and they don't seem to have it enabled, that's for sure. Whether they support it would take more research.

And DPDK I had read up on, because a lot of hardware vendors were including FastPath Data switches that were utilizing DPDK for their own in-house virtual switches, or using the DPDK-OpenVSwitch implementation.

But Vector Packet Processing (VPP), this somehow missed me. So I have been doing some catch-up on VPP, which I won't go into detail on in this post or share additional resources on such a large topic. But the link above to Fido is essentially touting VPP.

UPDATE:
I found this link, which is also spectacularly written:
https://www.metaswitch.com/blog/accelerating-the-nfv-data-plane

And, same blog with another link for those wanting the deep dive into VPP:
https://www.metaswitch.com/blog/fd.io-takes-over-vpp

Thursday, September 12, 2019

Graphical Network Simulator-3 (GNS3) - Part II Installation on a Linux Server

Okay for Part II of GNS3, I came in today looking to install GNS3 on a Linux Server.

I noticed that GNS3 is designed to run on Ubuntu Linux, and as I tend to run in a CentOS7 shop, I am now faced with the hump of putting an Ubuntu server in here, or trying to get this to run on CentOS7. It should run on CentOS7, right? After all, this is a Linux world, right? 😏

I decided to take one of my 32Gb RAM servers, an HP box, that runs CentOS7, and follow a cookbook for installing GNS3 on it.

I followed this link:
https://gns3.com/discussions/how-to-install-gns3-on-centos-7-

I chose this box because it runs X Windows. It didn't have Python 3.6 on it, or the pip36 used for installing and managing python 3.6 packages.

A lot of steps in this thing.

Some questions I have about this cookbook that I need to look into:

1. Why does the cookbook use VirtualBox on Linux? I have KVM installed. Surely I can use that instead of VirtualBox. I only use VirtualBox on my Win10 laptop. So I have, for now, skipped that section.

2. What is IOU support? I will need to google that.

UPDATE: IOU (also called IOL, which stands for IOS on Linux, is basically an IOS simulator) that can run on an i386 chipset.  You would need and want that if you run any Cisco elements on the GNS3 simulator.

Friday, September 6, 2019

Graphical Network Simulator-3 (GNS3) - Part I Initial Introduction on a Laptop

Someone told me about this network modeling and simulation tool called Graphical Network Simulator-3. There is a Wikipedia page on this tool, which can be found here:

https://en.wikipedia.org/wiki/Graphical_Network_Simulator-3

Fascinating tool. Allows you to drag and drop network elements onto a canvas - but unlike the old tools, this tool can actually RUN the elements! To do this, you need to import image files as you drag and drop the elements out on the canvas. Below is an example of how, when dragging a simulated internet cloud onto the canvas will prompt for an image to run on a virtual machine.

Image Files Required for Network Elements in GNS3

Then, once you have the elements properly situated on the canvas, you can use a connector to interconnect them (it will prompt you for the NIC interface), and then, once your interconnection points are established, you can click a "run" button.

If all goes well everything turns green and packets start to flow. There is a built-in packet trace on each link line, which will dump packets to a file if you choose to do a packet capture.

Wednesday, August 21, 2019

Linux rpm package management tools with rpmbuilder and rpmreaper

In all of my years of using Linux, I have never created a package with a package manager build tool. I have of course used rpm, all the time. Querying packages, installing packages, removing packages. I just haven't generated, or built, a package myself. Until now.

We use CentOS here, which is a Red Hat Linux distribution. And Red Hat uses the Red Hat Package Manager (rpm) tools to administratively manage software packages on operating systems that are based on Red Hat Linux. Every package on an rpm-based system has a ".rpm" file suffix, and there is a binary called "rpm" that is used to install, uninstall, query, etc any and all packages on a system (that were created with Red Hat Package Management).

I had always heard that working with rpms (generating them) was tedious, painful, and a general pain in the a$$. One reason has to do with package dependencies. You can run into mutual or circular dependencies, nested dependencies, and many other issues. So I probably avoided making packages for these reasons.

One little-known, but very cool tool, is called rpmreaper. It is part of the epel-release repository. If you run this tool, you can visually inspect details about packages, as shown below.

Sample Screenshot of rpmreaper rpm Package Panagement Tool

So while I had no idea what I was doing, I spent a full day making a package and it didn't go too badly.  The rpm I put together parks a couple of kernel drivers and a configuration file on the system. That's it. Sounds simple, huh? Guess again.

First, kernel drivers it turns out, are compressed on Linux systems now. So I needed to use xz to compress the kernel drivers. Which means an uninstall needs to remove the compressed kernel drivers because the .ko files won't be there. And, when plopping new kernel modules onto a system, you do need to run commands like depmod to re-generate dependencies between the modules.

Now this rpm probably goes beyond what a typical rpm would do. I think as a best practice, an rpm will move files to where they need to be, or remove files from where they should be. That's it. And, they may do system things in an effort to support that charter.

Dependencies
I built the kernel drivers outside of the rpm. I could have gotten heavy and sophisticated and had the rpm compile the kernel drivers. This opens up a can of works about chipsets, target architectures, etc. I decided to keep it simple and that was easy to do, fortunately, because my box was an x86_64 architecture and so was the target boxes they wanted to install the rpms on.

So originally, I had dependencies for packages in the group "Development Tools". I commented those out. I instead put JUST the dependencies that the scripting in the rpm needed.
  • bash
  • xz (for compressing the kernel modules)
  • NetworkManager (nmcli)
  • ModemManager (mmcli)
Package Success or Failure
There was so much scripting to check for or start/stop services, and or load/unload kernel drivers that I learned that system return codes aside of the normal 0 exit code would cause the package install or package remove to fail outright. 

My solution to this was to provide feedback in the form of echo commands, and use an "|| true" (or true) to ensure that the command didn't cause the rpm itself to bail out. Because, the commands were really for for administrator convenience - not so much related to the deployment/removal of necessary files.

Definitions
Originally I was defining shell variables in the specific shell functions of the rpm specfile. That became redundant and error prone very quickly when I needed access to these same variables in pre/post script of both the install/uninstall sections of the rpm specfile.

Hence, I had to quickly learn and make use of definitions.Which are sort of like global variables. But, definitions are only used on the creation of the rpm itself. They are not referenced when you install or uninstall the package.

Versioning
Rpm specfiles, as you would expect, have nice versioning support, and it is wise to make use of that and document in your specfile what you are doing in each version iteration! 

Ok, in summary, this was interesting to have FINALLY created my own rpm package. I am sure there is a LOT more to learn, and the sophistication can go way beyond what my rpm is doing right now. I have about a 300 line specfile, mainly due to all of the scripting logic. I am only deploying 5 files in this rpm.

Thursday, August 15, 2019

Sierra Wireless EM7455 LTE Card on CentOS7

I had someone approach me trying to get some help. He had a Sierra Wireless LTE card that he wanted to use on CentOS7.  He had Network Manager running, and ModemManager, and he had two kernel modules loaded up called qcserial and qmi_wwan, but ModemManager would not recognize the card. So that's where we start.

I am not a low level expert on drivers these days (I don't do that day in day out), but have had some experience with drivers for wireless devices, such as USB 802.11x sticks. I had a TrendNet one years ago that wouldn't work with Linux until I found some sketch drivers on the web that I compiled and got to work. But, that entailed Network Manager and wpa_supplicant...not ModemManager. This was my first dive into LTE cards and drivers. Furthermore, I did not have the card in my hand, or on my own system.

So, apparently Ubuntu supports these cards natively, but CentOS7 doesn't.

I noticed that CentOS 7 does include a sierra.ko (sierra.ko.xz) module, which I thought should work with a Sierra Wireless EM7455 LTE-A card, which uses a Snapdragon X7 chip. We tested that out, by loading the sierra kernel module manually and starting ModemManager. No luck. Maybe it doesn't work with this EM7455 card? Not sure. I did see some support threads on the sierra.ko kernel module where the module only works for Sierra cards because Sierra does some interesting power management stuff with their driver (they made mention of another option.ko kernel module that should work with most other LTE cards). But this card, the EM7455 is indeed a Sierra LTE card. And the sierra.ko module didn't seem to work.

There are also a couple of other kernel modules that ARE on a CentOS7 box. These are called:

  • qcserial
  • qmi_wwan

The qcserial module creates a /dev/ttyUSB interface. The qmi_wwan creates a /dev/cdc-wdm interface. My understanding is that the serial interface is a control protocol for commands and statistics while the other is used for data transmission/reception (Tx/Rx). This is all part of a protocol called QMI; a Qualcomm protocol.

If you want to learn more about these protocols, this link below is absolutely fascinating as it discusses distinctions between GSM and CDMA, and the history of CDMA which has ties to Hollywood and a Beautiful Actress. Eventually it gets into QMI.

https://blogs.gnome.org/dcbw/2010/04/15/mobile-broadband-and-qualcomm-proprietary-protocols/

I think what is/was happening, is that when you crank the EM7455 card, these two drivers, qcserial and qmi_wwan are loaded but ModemManager still doesn't recognize the card. Either does NetworkManager.

So - the engineer heard that if he got access to two new drivers, GobiNet and GobiSerial, which are generated from a Sierra Wireless SDK, the card would work. You would need to blacklist the qcserial and qmi_wwan drivers though. The problem: how to get the SDK. I guess there might be some reason why Sierra Wireless doesn't release this SDK, which is probably, maybe, tied to royalties or licensing to Qualcom.

So we eventually obtained the SDK. We compiled it, and it produces, for our x86_64 architecture, two kernel modules:

  • GobiNet
  • GobiSerial

We (I) created an rpm (separate blog post about rpm package creation) to do all of the voodoo to get these drivers installed, along with the blacklist file, and configure an apn connection to a Verizon LTE access point.

Voila'. The drivers work! I think he said something about it using a ppp interface, though. And we specifically compiled GobiNet to use rawip with a rawip=1 setting on the Makefile.  So we may need to look into that but at least the LTE modem is now working.

By the way. You cannot rely just on Syslog for information about LTE. Because these are kernel drivers, you need to use dmesg to see what these modules are barking out!

So some more testing the engineer will do. But we have something that seems to work. I will wait to hear more feedback.

Thursday, August 1, 2019

OpenStack - Discussion on Cells

I have a partner who is still using OpenStack Newton.

I was asked to look into this, because OpenStack Newton is no longer supported by the OpenStack community; it has been End of Life'd (EOL).

OpenStack Ocata is still supported. I at one time set this up, and I didn't see any notable differences between Ocata and Newton, and my Service Orchestrator (Open Baton) seemed to still work with Ocata.

Ocata introduces the concept of Cells. Cells is an architecture concept that apparently (if I understand correctly), replaces (enhances?) the previous concept of Regions. It changes the way OpenStack is run, in terms of control and delegation of nodes and resources (Compute Node resources, specifically). It is a more hierarchical approach.

Here is a link on cells that I read to get this understanding: Discussion about OpenStack Cells

I didn't stop there, either. I read some more.

It turns out CERN (Particle Physics!? They run those Particle Accelerators and do stuff  more complex than anything I am writing about!?) - THEY are (I assume they still are) big on OpenStack. Tons of and tons of material on what CERN is doing. Architectures, Topologies, yada yada. I don't have time to read all of that.

But, I did read THIS article, on moving from Cells v1 to Cells v2. It told me all I  need to know. If you are using Cells, you need to jump over the Ocata release, and use Queens or later. Because more than half the OpenStack modules were deaf, dumb and blind as to the notion of what a Cell is. Obviously this causes problems.

So I guess the concept of a Cell is somewhat Beta, and partially supported in Ocata.

I guess you could move to Ocata in a small lab if you are not using Cells, and if the API remains a constant in conjunction with what happens to be leveraging it.

If anyone reads this, by all means feel free to correct and comment as necessary.

Wednesday, July 31, 2019

What on Earth is Canonical Doing with Ubuntu? Sheez

I have been using CentOS almost exclusively since I have been working here, first with CentOS 6, and then (now) CentOS7. I have watched the kernels move up from 3.x to 4.x, I have fought with (and still fight with) NetworkManager, etc.

You get used to what you use.

I have also used Ubuntu in the past, 14.04, and 16.04, but it has been a while.

So, I needed to install Ubuntu in order to run OSM, because OSM (on their website at least) wants to use Ubuntu. I think Ubuntu is probably bigger in Europe, is my guess.

So - for two straight days now, I have been trying to install a Ubuntu Cloud image and get it working on a KVM system. Something that SHOULD be simple, right? Wrong.

Here is a rundown of all of the headaches I have run into thus far, which has pissed me off about Ubuntu.

1. On 16.04, the root file system is only 2G for the cloud image you download off the web.

I ran out of disk space in no time flat. Trying to install X Windows and a Display Manager, which by default are not enabled on the cloud image. 

Trying to increase the root file system? Damn near impossible. I tried using qemu-img --resize, and that only created a /vdb file system. The ./dev./sda1 was STILL, and REMAINED, 2G. I could not install X Windows, I couldn't do squat. I am sure if I rolled up my sleeves, and got to work using advanced partitioning tools and whatnot, I could have made this happen. Or, maybe not. Point is, this was a hassle. An unnecessary hassle in my opinion.

2. I realized that the 18.04 Ubuntu uses a qcow2 format - which you CAN resize. Again, why Ubuntu is using 2G as a root file system size is beyond me, and this is ANNOYING. This is the year 2019.

So, I resized the image, and put a password on the image (cloud images are not set up to log in with prompt, only certificates, which of course is a good practice, albeit a hassle for what I needed).

3. I launched 18.04 and guess what? NO NETWORKING!!!! @^%$

I realize no networking was set up. At all! WHAT???

4. Let's go set up networking. Yikes! You CAN'T!!!!!! WHY? Because the iproute2 packages and legacy packages that everyone in the WORLD uses, are not on the machine!

They want you to use this newfangled tool called NetPlan to set up your networking!?!?

Fine. You can google this and set it up, which I did.

BUT WHY ARE ALL OF THESE LINUX DISTRIBUTIONS BECOMING SO DIFFERENT?

THAT IS NOT, I SAY...NOT...WHAT LINUX IS ALL ABOUT?????

I remember when Gentoo came out, and how different a beast it was. Now, the distinction between CentOS and Ubuntu is becoming a very wide chasm.

Thursday, July 25, 2019

ONAP - Just Too Big and Heavy?

I have been working with Service Orchestrators for a while now. Here are three of them I have had experience with:

  • Heat - which is an OpenStack Project, so while OpenStack can be considered the VIM (Virtual Infrastructure Manager), Heat is an Orchestrator that runs on top of OpenStack and allows you to deploy and manage services 
  • Open Baton - this was the ORIGINAL Reference Implementation for the ETSI MANO standards, out of a Think Tank in Germany (Frauenhofer Fokus).  
  • ADVA Ensemble - which is an ETSI-based Orchestrator that is not in the public domain. It is the IPR of ADVA Optical Networks, based out of Germany.
There are a few new Open Source initiatives that have surpassed Open Baton for sure, and probably Heat also. Here are a few of the more popular open source ones:
  • ONAP - a Tier 1 carrier solution, backed by the likes of at&t. 
  • OSM - I have not examined this one fully. TODO: Update this entry when I do.
  • Cloudify - a private commercial implementation that bills itself as being more lightweight than the ONAP solution.
I looked at ONAP today. Some initial YouTube presentations were completely inadequate for allowing me to "get started". One was a presentation by an at&t Vice President. Another was done by some architect who didn't show a single slide on the video (the camera was trained on the speaker the whole time).

This led me to do some digging around. I found THIS site: Setting up ONAP

Well, if you scroll down to the bottom of this, here is your "footprint" - meaning, your System Requirements, to install this.

ONAP System Requirements
Okay. This is for a Full Installation, I guess. The 3Tb of Disk is not that bad. You can put a NAS out there and achieve that, no problem.  But 148 VCPU????  THREE HUNDRED THIRTY SIX  Gig of RAM? OMG - That is a deal killer in terms of being able to install this in a lab here. 

I can go through and see if I can pare this down, but I have a feeling that I cannot install ONAP. This is a toy for big boys, who have huge servers and lots of dinero.

I might have to go over and look at OSM to see if that is more my size.

I will say that the underlying technologies include Ubuntu, OpenStack, Docker and Mysql - which are pretty mainline mainstream.

RUST Programming Language - Part II

I spent a few hours with RUST again yesterday.

There's definitely some things to learn with this language.

One thing I noticed, was that the Compiler is really good. Very very intelligent. It can make a lot of intelligent inferences about what you are doing (or trying to do). It will also bark out all kinds of warnings about things.

Cargo
In my previous post, I may have made light mention of something called Cargo. Cargo is a "Configuration Management" facility for RUST programmers. It can build (cargo build), check (cargo check), or build AND execute (cargo run) your program.

It also manages packages, and dependencies. So I think it probably is somewhat synonymous with pip in Python. If you are familiar with yum or some equivalent package manager on a Linux distribution, you can get the gist of what Cargo does from the perspective of pulling in packages you need for your project.

This link is a book on Cargo:  The Cargo Book

So yesterday, I wrote some code from the book I have been using, but deviated the code a little bit, and pulled in a package called strum, which allows you to iterate over an "Enum" object. My Enum object has Coins and Coin values (penny, nickel, dime, quarter) and I would use strum to iterator over this and print out the monetary value of each coin. Nothing super sophisticated, but in this RUST language, you definitely need to learn how to do the basics first.

Match Expression
Another interesting thing is that you can use simple "if / then" logic, but you can also use this more sophisticated "match" expression, or construct. So this is the "higher IQ" way to do matching, for the more advanced or off the beaten path cases (i.e. regular expression searches, etc).

Here is a link on that, which is actually a relative link to a more comprehensive book on RUST that has a lot more good stuff in it than just the Match expression.

https://doc.rust-lang.org/reference/expressions/match-expr.html

Tuesday, July 23, 2019

RUST Programming Language - Part I


In hearing about this "new" language, I have spent some time this week going through a book called "The Rust Programming Language", which can be found at this link:

The Rust Programming Language

I will have to come back and edit this post as I go through the book, but so far, I have been trying to learn "enough" about the language to understand WHY the language has even emerged - in other words, what it's selling point is, what deficiencies in other languages it addresses, etc.

What do I have to say right now?

Well, it's a Compiled language. It's been a long time coming that I have seen a new compiled language emerge. We have had nothing but runtime interpreted languages for many years now.

It has some interesting resemblances to C.

It has no advanced traditional object oriented capabilities, like Inheritance and Polymorphism. That said, it does support Encapsulation, and this concept of Traits (which up to now, seem to me to resemble Templates in C++ a bit - but I may revise this statement after I get more familiar with it).

The language is designed to be more "safe" than traditional C or C++, the latter of which, due to direct memory access and manipulation, can cause Segmentation Violations, etc. One example of course is References where one thread might de-reference a pointer that another thread might be using, accessing, manipulating, etc.

Thursday, July 18, 2019

Q-In-Q Tunneling and VLAN Translation


I have been working on this Customer Premise Edge (CPE) project, in which a Service Orchestrator deploys Virtual Network Functions (VNFs) to a "piece of hardware". In the specific implementation I have been working with, the CPE runs a FastPath DataSwitch, and 3-4 Docker containers that in turn run:

  • OpenStack (Compute Node in one container, Controller Node in another)
  • VRFs
  • a FastPath Data Switch

The architecture looks, essentially, as shown below:

Two Customer Premise Edge Devices Connecting over a Secure Transport Overlay Network (SD-WAN)

This architecture relies on Layer 2 (Ethernet Frame) Forwarding. So what happens, essentially, is that when a virtual network is created, a "service" is generated at runtime, which connects two "interface ports" (can be L3, L2, et al). But because an interface is a physical (i.e. Operating System managed) device, traffic is run through a virtual concept called a "service port" as it comes off the wire. Depending on what kind of service it is, there are different topologies, and rulesets that can (and must) be configured and applied to make traffic handling and flows work properly.

I was not altogether very interested in this concept of "match sets" that were required to configure these services - initially. I just keyed in what I was told (which was an asterisk to allow all traffic).

But, finally, I became more interested in a deep-dive on these rules. I noticed that there was a rule to configure "inner VLAN" and "outer VLAN" settings. Huh? What does that mean? A VLAN is a VLAN, right? Well, sort of. Kind of. Not exactly.

As it turns out, in order to handle multi-tenant traffic (a service provider managing multiple customers), VLANs can overlap. And you cannot have Customer A's traffic going to Customer B, or you will find yourself out of business pretty quickly (possibly with a liability lawsuit).

So - they came up with concepts like Q-in-Q Tunneling and VLAN Translation to deal with these problems. Now, you can have a Customer VLAN (C-VLAN), and a Service Provider VLAN (S-VLAN), and you can map and manage the packets based on S-VLANS without manipulating and or disturbing the original customer VLAN that is set on the frame.

So NOW - I understand why these match set rules have fields for an "inner" and an "outer" VLAN. 

Just to be thorough, the outer VLAN, by the way, is the S-VLAN (therefore the inner VLAN is the C-VLAN).

Credit for this explanation and understanding goes to this link (although there are probably numerous sources for this concept available on the world wide web):

Thursday, July 11, 2019

Palo Alto Firewall VM Series - Integration and Evaluation - Part IV

Okay this is probably a final post on the Palo Alto Firewall VM Series Integration project.

This project was aimed at doing a proof of concept reference implementation of a fully integrated and provisioned SD-WAN, using a Palo-Alto Virtual Firewall to protect the edges of two campus networks (e.g. East and West).

It uses a Service Orchestration software platform (Service Orchestrator) to model and specify services (networks) that the Orchestrator will deploy, out to the field, to the CPE, at the push of a button.

What are we deploying?

We are deploying overlay networks (SD-WAN) to Customer Premise Edge (CPE) devices. These devices are typically deployed smack on the edge of a premise network, and for that reason have optical SFP termination ports, and Gigabit (or 10 Gig in some cases) ports.

Now to make these overlay networks work in these modern times, we rely on Virtualization. Virtual Machines, Containers (i.e. Docker), OpenStack, et al. (depending on the CPE and software chosen).

Network Function Virtualization (NFV) - meaning non-iron-based network functions like Routing, Switching, Network Stacks in Namespaces, and Overlay networks (i.e. secure overlay) - ties all of this together. NFV is a fairly recent technology and has made networking (as if it weren't complicated enough already) a bit MORE complicated. But virtualizing applications and processes without also virtualizing the networking the rely on to intercommunicate doesn't achieve the objectives that everyone is after by using virtualization in the first place. It is like going halfway and stopping.

So - we have been deploying this new VM-Series Palo Alto Firewall in a network "topology" in an automated fashion, using the Palo Alto API to provision all of the "things that need to be provisioned", a few of which are mentioned below.

  • Interfaces
  • Zones
  • Security Policies
  • Static Routes
  • Policy-Based Forwarding Rules

This is done, by using the Palo-Alto XML API.

So to do this, at least in this proof of concept, we provision:
  1. A Palo-Alto Firewall
  2. A virtual machine (temporary) that provisions the Palo-Alto Firewall
  3. A SD-WAN Gateway behind the Palo-Alto Firewall
  4. A client or server behind the Gateway.
Topology Provisioned on a CPE device - Site West
screenshot: ADVA Ensemble Orchestrator

Sounds easy, right? It's not. Here are just a few of the challenges you face:
  • Timing and Order
    • The Firewall needs to be deployed first, naturally. Otherwise you cannot provision it.
    • The Firewall Provisioner configures all of the interfaces, routes, zones, policies and rules. This is done on a management network - separate from the service plane, or traffic plane as it is often referred to.
    • With a properly configured firewall in place, the Gateway can make outbound calls to the Cloud.
    • Once the Gateway is up and cloud-connected, the server can begin listening for and responding to requests.
  • Routing
    • Dynamic routing stacks are difficult to understand, set up and provision; especially BGP. Trying to do all of this "auto-magically" at the push of a button is even more challenging.
    • In our case, we used static routes, and it is important to know how these routes need to be established. 
  • Rules
    • Indeed, it is sometimes difficult to understand why traffic is not flowing properly, or at all. Often is due to routing. It can also be due to not having proper rules configured.
  • Programmatic APIs
    • Making API calls is software development. So the code has to be written. It has to be tested. And re-tested. And re-tested.
When you deploy a network push-button, the visual ease at which it all happens starts to make people take it for granted, and NOT fully appreciate all of the complexity and under-the-hood inner workings that make it all come to light.

Most of the network troubleshooting (or all of it perhaps) had to do with missing or incorrect routes, or missing rules. It can be a chicken or egg problem trying to figure out which one is a culprit.

In this particular CPE, it runs:
  • an OS (Linux)
  • two docker containers - that contain OpenStack nodes (Controller node and Compute node)
  • a FastPath Data Switch - that map physical ports to virtual ports 

The "magic" is in the automation and integration. The data switch is pre-provisioned and custom developed for performance. The integration with OpenStack allows virtual ports and virtual switch services to be created / removed as new Virtualized Network Functions (VNFs) are initialized or terminated on demand.

Then, on top of this, you have the VNFs themselves; in this case here, the Palo Alto Firewall, the Edge Gateway behind this firewall, and a Server on a trusted network behind the gateway.

Traffic from the server will run through the firewall for inspection before it gets to the Gateway so that the firewall can inspect that traffic before the gateway encrypts it for outbound transport.

Then, the Gateway will encrypt and send the traffic back through the firewall, and out the door, on a virtual circuit (or multiple virtual circuits if multi-pathing is used), to the complementary gateway on the other side (think East to West across two corporate campus networks here).


Thursday, June 6, 2019

The Network Problem From Hell - Fixed - Circuitous Routing



Life is easy when you use a single network interface adaptor.  But when you start using multiple adaptors, you start running into complexities because packets can start taking multiple paths. 

One particular thing most network engineers want to avoid, is situations where a packet leaves through door #1 (e.g. NIC 1), and arrives through door #2.  To fix this, though, requires some more advanced network techniques and tricks (separate routing tables per NIC, and corresponding rules to direct packets to use those separate routing tables).

So, I had this problem where an OpenStack-managed virtual machine stopped working because it could not reach OpenStack itself, which was running on the SAME machine that the virtual machine was running on. It was driving me insane.

I thought the problem might be iptables on the host machine. I disabled those. Nope. 

I thought the problem might be OpenVSwitch. I moved the cable to a new NIC, and changed the bridge the virtual machine was using. Nope.

Compounding the problem, was that the OpenStack Host could ping the virtual machine. But the virtual machine could not ping the host. Why would it work one way, and not the other?

The Virtual Machine could ping the internet. It could ping the IP of the OpenStack router. It could ping the router that the host was connected to.

OpenStack uses Linux IP Namespaces, and in our case was using the Neutron OpenVSwitch Agent. An examination of these showed that the networking seemed to be configured just as it showed up in the Horizon Dashboard "Network Topology" visual interface.

One thing that is worth mentioning, is that the bridge mappings for provider networks is in the ml2_conf.ini file, and the openvswitch_agent.ini file. But the EXTERNAL OpenStack networks use a bridge based on a parameter setting in the l3_agent.ini file! So if the l3_agent.ini file has a bridge setting of, say, "br-ex" for external networks, and you don't have that bridge correspondingly configured in the other files, OpenStack will give you a message when you create the external network that it cannot reach the external network. We did run into this when trying to create different external networks on different bridges to solve the problem.

At wits end, I finally called over one of the more advanced networking guys in the company, and we began troubleshooting it using tcpdump. We finally realized that when the VM pinged the OpenStack host, the ICMP request packets were arriving on the expected NIC (em1 below), but no responses were going out on em1. When we changed tcpdump to use "any" interface, we saw no responses at all. Clearly the host was dropping the packets. But iptables was flushed! WHO/HOW were the packets getting dropped? (to be honest, we still aren't sure about this - more research required on that). But - we did figure out that the problem was a "circuitous routing" problem.

We figured maybe reverse path filtering was causing the issue. So we disabled that in the kernel. Didn't fix it.  

Finally, we realized that what was happening, is that the VM sends all of its packets through the external network bridge, which was attached to a 172.22.0.0/24 network, and the packet went to the router, which routed it to its 172.20.0.0/24 port, and then to the host machine. But because the host machine had TWO NICs on BOTH those networks, the host machine did not send replies back the same way they came in. It sent the replies to its em2 NIC which was bridged to br-compute. And it was HERE that the packets were getting dropped. Since that NIC is managed by OpenVSwitch, we believe a loop-prevention flow rule in OpenVSwitch, or perhaps Spanning Tree Protocol, caused the packets to get dropped.

Illustration of the Circuitous Routing Issue & How it was solved

The final solution was to put in a host route, so that any packet to that particular VM would be sent outside of the host, upstream to the router, and back in through the appropriate 172.22.0.0/24 port on the host/bridge, to the OpenStack Router, where it would be NATd back to the 192.168.100.19 IP of the virtual machine. 

Firewalls and Routing. These two seem to be where most problems occur in networking.

Tuesday, May 28, 2019

Smart Home - Zigbee versus Z-Wave



This is a GREAT blog to get yourself initialized on the two protocols and some distinctions between them.

z-wave-vs-zigbee-home-automation

Palo Alto Firewall VM Series - Integration and Evaluation - Part III

Okay, this is just a short post to discuss where we are in the integration process.
  1. I have a Python script that generates XML. In this script, you pass in parameters, and then I use the ETree library in Python to generate the XML.
  2. I have some bash scripts that take the XML files, and invoke the Python XML API Wrapper, which in turn does the legwork to send the data to the API Server on the Firewall.
Normally one might create the Management Profile, Zones and Security Policies first. And THEN add or assign interfaces, routers, routes on those routers, etc.

This is the basic process I am following thus far:
  1. Create Management Profile
  2. Load Interface(s) - the management profile in #1 is included.
  3. Create Zone(s)
  4. Create Security Policies - the interfaces included
  5. Assign interface to default router
  6. Load Static Route on default router - include interface
Seems to be working okay although the process needs to be tightened up a bit so that you are not using one Python program to generate the xml, and another to call the API. 

But it's good enough to load and test and see if I can get a firewall operational.

Friday, May 17, 2019

Palo Alto Firewall VM Series - Integration and Evaluation - Part II


After a couple of days of playing around with the Palo Alto VM-Series Firewall (running the VM on a KVM / LibvirtD virtualization platform on a CentOS7 host), I felt I was comfortable enough with it to explore the API.

I asked a Palo Alto engineer how they bootstrap these things. He told me they use CloudInit and use a boot.xml file to change the default password. From there, they use their management platform, Panorama, to push configurations to the devices.

I don't happen to have Panorama anywhere. And I presume like everything else, it needs licenses. So, I started looking at the facilities to interface/integrate with the device; meaning APIs.

There are actually several APIs:

  • Command Line Interface (CLI)
  • WildFire API
  • AutoFocus API
  • PAN-OS Licensing API
  • Panorama XML API (requires Panorama of course)
  • Pan-OS XML API

I located, downloaded and glanced through the XML API Guide. Which actually does do a nice job of getting you acquainted with the API. There is nothing really unusual. You need to authenticate, get a token (they call it a key), and with that key you can go to work (I won't cover details of the API here).

Next it was time to examine the API firsthand. Is it running? Do I need a license? I used Postman for this. I don't know if there are other better tools for picking at APIs, but I think Postman is definitely one of those most popular tools. Making add/modify changes is always risky when you are learning a new API, so it always makes sense to start with some "get" calls so you can understand the structure of the data. So, I was able to hit the VM on standard SSL port 443, and get back a key, and with the key, run a few get commands based on examples in the API Guide. The API works, it appears!

One noteworthy comment is that the API would not work without turning off certificate validation in the settings!

Next, I considered starting to write some Python code as a client, but as Palo Alto is a pretty popular firewall from a large company, there had to be some folks who have broken ground on that already, right? A quick google search for a Python API client turned up a project from a guy named Kevin Steves, who has clients for ALL of the APIs in Python. It is on GitHub with a free use license.

https://github.com/PaloAltoNetworks/pandevice/

After cloning this, I noticed you can run setup. I elected not to run setup, and just try to invoke the API directly. I had to use the panxapi.py python file. Examining the source code, you can supply an exhaustive list of options to the main() module of the Python file, which will parse those and invoke accordingly.

Immediately, however, I ran into the same certificate validation error I experienced with PostMan. But in PostMan I could just go into settings and disable certificate validation. Figuring out how to do this with the API was more difficult. Eventually, I found an issue recorded on the project that discusses this same problem, which can be found at this link:  Certificate Validation Issue

The issue discusses versions of Python on CentOS that do certificate checking. Rather than fool with upgrading Python, one poster pointed out that you can, in fact, disable certificate checking in Python by setting an environment variable: "export PYTHONHTTPSVERIFY=0". Bingo. That's all I need right now to experiment with the API.

Tuesday, May 14, 2019

Palo Alto Firewall VM Series - Integration and Evaluation - Part I

This week, I am evaluating the Palo Alto VM-Series Firewall.

I will ramble a bit about what I am doing, learning, etc.

First off, this VM-Series Firewall can run as a qcow2 image on KVM, it can load as an OVF onto the VMWare vSphere ESXi platform, and I have seen some evidence of people loading it on VirtualBox also. I am using it on a KVM (libvirtd) host.

The main thing about a virtual firewall appliance is how it plumbs into the virtual networking, using virtual adaptors, or host NICs.

One the VM I just installed, I set up 4 Adaptors.

4 NICs on the Palo Alto VM
If we assume 4 NICs on the virtual machine, the very first NIC is designated as the management NIC.  What is confusing about this, is that you might expect this NIC to show up in the list of interfaces. It doesn't. You have to "know" that the first NIC is a management NIC that does NOT show up in the list of Interfaces.

If we look at the screenshot below, you will see a Management IP Address on a 172.22.0.0/24 network. This is shown on the "Dashboard" tab of the Palo Alto user interface.

The Management Interface connects to a VM NIC that does not show up in the list of Interfaces

Yet, if we look at the list of Interfaces (Network tab), we will see that this Management Interface (first NIC in the KVM list of adaptors) does not exist.


I would have to go back and see how well this is documented, for I admittedly dive in without reading documentation sometimes. But it was NOT very intuitive. I would prefer for the interfaces to "line up" with the VM adaptors, and if one is a Management Interface, perhaps it is greyed out, or managed in a unique way.

I understand why Palo Alto did this, for the Management Interface is generally considered a unique pipe into the product - used for administration of the device itself and generally is not considered part of a traffic plane interface. But it did make it difficult at first for me because I did not know which bridge the first Interface was indeed connected to - the br0 (which has a 172.22.0.0/24 network on it), or br1 (which has a 172.21.0.0/24 network on it).

Palo Alto, like FirewallD, is a zone-based firewall. So while you may have a tendency to get fixated on your Interfaces (trying to get them to light up green) initially, the first thing you really SHOULD do is a bit of forethought and planning, and create your Zones.

I created two zones (currently):

  • Trusted L3
  • UnTrusted L3
Two Zones Initially Created - L3 Trusted, and L3 Untrusted
The Untrusted Zone contains the interface Ethernet1/1, which is connected to a host adaptor via a bridge. For this reason I considered this interface to be untrusted, as per my thinking, it would connect up to my Router much like a Firewall at the edge of a premise might connect to an ISP.

The Trusted Zone contains two interfaces Ethernet1/2 and Ethernet1/3. 

Ethernet1/2 is mapped to an adaptor on the virtual machine that is connected to the "default" KVM network, which has CIDR of 192.168.122.0/24. But - this is a NAT network! Packets that leave the KVM host are Source NAT'ed. How this works with the Firewall? I don't know yet - I have not tested extensively with this "type" of network.

Ethernet1/3 is mapped to an adaptor on the virtual machine that is connected to an Internal KVM network which has a CIDR of 192.168.124.0/24. This network, though is NOT a NAT network. It is a Routed Network. A Routed Network is routed between KVM Internal networks, but generally is not reachable outside the KVM host, because KVM creates iptables rules that drop inbound packets coming from any host aside of the KVM host itself (I tested this - pings from another host cannot reach the 192.168.122.0/24 network because they get dropped on a FORWARD chain rule). I suppose theoretically, if you hacked the iptables rules properly on the KVM host, these KVM internal networks could be reachable. Maybe there is a facility for this designed to accommodate strange circumventions like these, but messing with the iptables rules, and especially the order of such rules, are prone to issues.

So in summary, it does appear that the Palo Alto Firewall VM-Series, on KVM, will work with KVM Internal networks in a respectful way. You would just want to classify these as "Trusted" networks when it comes to zones and security policies.

Wednesday, May 8, 2019

Berkeley Packet Filtering - replacement for iptables - AND nftables?

I came across this blog, from Jonathan Corbet, dated Feb 19th, 2018.

BPF Comes to Firewalls, by Jonathan Corbet

I found this rather fascinating, since I was aware that nftables seemed pre-ordained to be the successor to iptables. I had even purchased and read Steven Suehring's Linux Firewalls book, which covers both iptables and nftables.

At the end of the day, I only see iptables and firewalls based on iptables (e.g. FirewallD) being used. I have not encountered any nftables firewalls yet.

And the other noted point is that nftables IS in the current version of the Linux Kernel. BPF is not.

But, can BPF come into Linux distributions alongside nftables soon, and wind up replacing nftables?

That is the question.

Another interesting blog post addressing the impetus of BPF, is this one:

why-is-the-kernel-community-replacing-iptables


Thursday, April 18, 2019

Kubernetes Networking - Multus

It's been a while since I have posted on here. What have I been up to?

Interestingly enough, I have had to reverse-engineer a Kubernetes project. I was initially involved with this, but got pulled off of it, and the project had grown immensely in its layers, complexity and sophistication in my absence.  The chief developer on it left, so I had to work with a colleague to try and get the solution working, deployable and tested.

Right off the bat, the issue was related to Kubernetes Networking. That was easy to see.

The project uses Multus to create multi-homed pods (pods with multiple network interface adaptors).

By default, a Kubernetes pod only allows a single NIC (i.e. eth0). If you need two interfaces or more, there is a project call Multus (Intel sponsors this) that accomodates this requirement.

Multus is not a simple thing to understand. Think about it. You have Linux running on baremetal hosts. You have KVM virtual machines running on the VMs (virtualized networking). You have Kubernetes, and its Container Networking Interface plugins that supply a networking fabric amongst pods (Flannel, Weave, Calico, et al). And, now, on top of that, you have - Multus.

Multus is not a CNI itself. It does not "replace" Flannel, or Weave, but instead inserts itself between Kubernetes and Flannel or Weave much like a proxy or a broker would.

This article here has some good diagrams and exhibits that show this:

https://neuvector.com/network-security/advanced-kubernetes-networking/

[ I am skipping some important information about the Multus Daemonset here - and how that all works. But may come back to it. ]

One issue we ran into, is that we had two macvlan (Layer 2) configurations.

One used static host networking configuration:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf
spec:
  config: '{
      "cniVersion": "0.3.1",
      "type": "macvlan",
      "master": "eth1",
      "mode": "bridge",
      "ipam": {
        "type": "host-local",
        "subnet": "10.10.20.0/24",
        "rangeStart": "10.10.20.1",
        "rangeEnd": "10.10.20.254",
        "routes": [
          { "dst": "0.0.0.0/0" }
        ],
        "gateway": ""
      }
    }

while the other used DHCP.

{
   "cniVersion": "0.3.1",
   "name": "macvlan-conf-2",
   "type": "macvlan",
   "master": "eth1",
   "mode": "bridge",
   "ipam": {
             "type": "dhcp",
             "routes": [ { "dst": "192.168.0.0/16", "gw": "192.168.0.1" } ]
           },
   "dns": {
             "nameservers": [ "4.4.4.4", "8.8.8.8" ]
          }
}

The DHCP directive is interesting, because it will NOT work unless you have ANOTHER cni plugin called cni-dhcp deployed into Kubernetes so that it is installed on each Kubernetes node that is receiving containers that use this. This took me a WHILE to understand. I didn't even know about the plugin, its existence, or anything like that.

We were running into an issue where the DHCP Multus pods (those that used this macvlan-conf-2) where stuck in an Initializing state. After considerable debugging, I figured out the issue was with DHCP.

Once I realized the plugin existed, I knew the issue had to either be with the plugin (which requests leases), or the upstream DHCP server (which responds). In the end, it turned out to be that the upstream DHCP server was returning routes that the dhcp plugin could not handle. By removing these routes, and letting the upstream DHCP server just worry about ip assignment, the pods came up successfully.

Monday, March 4, 2019

Artificial Intelligence and Deep Learning - Tensorflow

This is a tool that someone told me about which could be a good way to get hands-on started with AI, should the spirit move you to do so.

Tensorflow

FPGA

FPGA stands for Field Programmable Gate Array.

Per Wikipedia definition, "an integrated circuit designed to be configured by a customer or a designer after manufacturing – hence the term field-programmable"

https://en.wikipedia.org/wiki/Field-programmable_gate_array

Wednesday, February 13, 2019

Hairpin NAT

A lot of folks don't understand Hairpin NAT, meaning what it is, why it exists, or the specific use cases in which it applies.

This is an awesome site that explains it nicely - although you have to read the very very last paragraph to get to the bottom of it:

Hairpin NAT Explained

Friday, February 1, 2019

NOSQL databases - are we taking a step backwards?

One of the solutions I am looking at happens to be utilizing Cassandra, a NOSQL database project from the Apache Foundation.

I am pretty deep with SQL databases, but not so much with NOSQL databases. I may have done a couple remark-based blogs on the topic of NOSQL databases in the past, but really have not looked into them in any kind of depth.

However, in noticing a java process running and realizing it was Cassandra, I went to the Cassandra website and started to take a closer look. When I went to the site and clicked:

  • Documentation
    • Architecture
      • Overview
I wound up getting a TODO page. Sheez. That's absolutely unacceptable and ridiculous.

So, if I want more introductory information, I will probably have to blog surf.

But, I did find this very interesting Quora page, entitled: What are the pros and cons of the Cassandra database? It can be found at this link: What-are-the-pros-and-cons-of-using-the-Cassandra-database?

This reminds me of the old Object Oriented database days, when products like Versant hit the scene. Speedy databases that made it easy to get your data IN, but when it came to getting it OUT, it was an absolute nightmare.

There are no aggregate functions (SUM, AVG, etc). No table joins or filters. It uses a CSQL query syntax that looks somewhat like SQL, but will result in confusion because it does not naturally support ANSI-SQL concepts.

Makes me wonder. Are we taking a big step backwards with these kinds of databases becoming so pervasive?

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...