Grasping Technology

Thursday, November 23, 2017

Ubiquiti Edge Router ER-X - Impressive

I just love this router.

The icon on the top that shows colorized ethernet plugs (colorization related to status). Cool.
It was sooooo easy to configure it.
It has a shell that takes you into Ubuntu Linux (uses Ubuntu = impressive). I'm not even sure it is using BusyBox or some slimmed down quasi-linux. It looks like a custom compile of Ubuntu.
One cool feature is that it can support link aggregation. I am not using that feature, but it's cool.
Has excellent support for IPv6.

It can also automatically switch the ports on the router so that you don't need a L3 switch to go with it.

So for example, you can set up:

eth0 as the management port
eth1 as the WAN port, and
eth2, eth3 and eth4 are switched so that anything plugged into these are on the same network (you defined the network).

I don't currently have this router configured a router than actually learns routes. In other words, I am not running BGP, RIP or OSPF on it. I don't really need to learn networks, nor advertise networks, dynamically. All I need is to get out to the internet.

I have it set up with a hairpin NAT, and the firewall rules configured on it are rather trivial at the moment but designed to protect ingress through iptables rules.

This is truly a "power user" routing device, and it can fit into the palm of your hand; it is no bigger than a Raspberry Pi device.

This router also comes with some interesting Wizards that allow you to configure the router for certain use cases, like the WAN+LAN wizard.

So I have not done anything in-depth, but I spent an hour messing around with this device and I'm pretty impressed with it.

Security - Antivirus specifically

My McAfee just expired on this computer and I am now getting a bunch of intrusive "buy me" pop-ups. I have never thought McAfee to be top of the line when it comes to Anti-Virus, but the question is, is anybody really stopping viri these days?

I have started to get smarter about Security. I went to RSA in 2016, and I bought a book on Exploits. This is very very hardcode book, and I have not managed to get through it all yet. It requires Assembler and C programming, and teaches you how hackers actually exploit code. I think once I finish this it will be awesome knowledge, and I am about halfway through it. I got pulled off of this due to the longer hours at work playing with virtualization and orchestration.

So - I am not current on malware. So I spent some time looking around this morning, reading anti-virus reviews.

It does not appear that there is much out there in the way of OpenSource AV. ClamAV looks like the only thing actively maintained. This is a bit of a surprise.

There are some free packages out there, but I am sure they probably nag you incessantly to buy or upgrade. The big question is this: Can you really trust FREE?

I also see some interesting Cloud-based packages out there that are working from outside your network. This would have been an absolute no-no for me in earlier times, but considering the danger of today's malware, maybe this kind of approach is worth re-examining, is good results are coming from it. One such company is Crystal Security.

I see some products like VoodooShield. And some new ones I had not previously encountered like GlarySoft Malware Hunter.

Of course, Kaspersky, ESET - these guys always get good reviews.

It is probably good to stay up to speed on this stuff. To take an hour here and there and stay current.

OpenBaton Fault Management and AutoScaling

It has been a while since I have taken any notes or blogged anything. That doesn't mean I haven't been doing anything, though. 😎

Over the last month or so I have been testing some of the more advanced features of OpenBaton.
- Fault Management
- Auto Scaling
- Network Slicing

These have taken time to test. I was informed by the development team that the "release" code was not suitable for the kind of rigorous testing I planned to do, and that I needed to use the development branch.

This led me down a road of having to familiarize myself with the "git" software management utility. I know git has been around for a while and silently crept in as almost a de-facto standard for code repository and source management. In many shops it has replaced the classic stalwart Clearcase, CVS, SVN and other software that has actually been in use for decades. Even in my own company's shop, they brought in a "git guy", and of course since that is the recipe he cooks, we now use that. But up to this point, I had not really had a need to do more than "git clone". Now - I am having to work with different branches, and as I said, this took some time. Git is fairly simple if you do simple things, but it is far more complex "under the hood" than it looks, especially if you are doing non-simple things with it. I could do a post just on git alone. I'm not an authority on it, but have picked up a few things - including opinions - on it (some favorable, some not).

The first thing I tested was Fault Management (FM). Fault Management is essentially the ability to identify faults and trigger actions around those faults. The actions can be an attempt to heal - or it can be an attempt to Scale, or it can be an attempt to fail-over based on a configured redundancy mechanism. The ETSI standard descriptors allow you to specify all of this. The interesting thing about FM in a virtualized context is that it gets into the "philosophy" of whether it makes sense to spend effort healing something, as opposed to just killing it and re-instantiating it. This is called the "Cattle vs Pets" argument. I think there ARE cases where you need to fix and heal VMs (Pets), but in most cases I think VMs can be treated as Cattle. When VMs are treated as Pets, the nodes are generally going to be important (i.e. they manage something important, as in a control plane or signaling plane element), and cannot just be taken down and re-instantiated due to load or function.

I then tested AutoScaling - or, using a better term, Elasticity. This allows a virtualized network to expand and contract based on real-time utilization. This feature took me a while to get working due to having to compile different modules of code from different moving-target git branches over and over until I could finally get some code that wanted to work, with slight modifications and patches. When I finally got this working, it was super cool to see this work. I could do a separate post on this feature alone. After I got it working I wound up helping some other guys in a German network integration company get the feature working.

Network Slicing has been more difficult to get to work. That is probably a separate post altogether, related and intertwined with QoS such topics.

Thursday, September 21, 2017

OpenStack - Two Compute Nodes

Getting two Compute nodes to work was fairly straightforward.

You basically just install openstack-nova-compute, and your Neutron network plugin (linuxbridge-agent in my case).

The only question I had was whether two Compute Nodes can belong to the same OpenStack Region.

Thank goodness I found a ppt where a guy made it clear that one could run a slew of nodes in a single region (he had multiples in Region 1, and Region 2).

At one point, I decided I would install the OpenVSwitch on this second Compute Node. I'll probably write a second post on that. It did not appear to me that you could mix and match OpenVSwitch and LinuxBridge on different Compute Nodes (at least not easily?). This is because the Neutron L3 Agent config file has a driver field and only seems to accept one mode or the other. I could be wrong about this; more testing necessary. But I backed OpenVSwitch out and enabled LinuxBridge-Agent. Things seem to be working very well with the Linux Bridge Agent.

The Linux Bridge Agent creates Layer 2 Tap interfaces and puts these interfaces on a bridge. If you are using VXLAN protocol it will also manage those interfaces as well.

OpenVSwitch

Today I added a 2nd Compute Node (KVM).

I thought I would use OpenVSwitch on it.

This took me down a deep rabbit hole, as OpenVSwitch is a complex little bugger.

I installed the OpenVSwitch package, then the driver agent (on Compute Node). I wanted it to run in a Layer 2 mode because I had LinuxBridge Agent running on the first Compute Node and the Controller.

After setting OpenVSwitch up on the 2nd Compute node, I realized my external NIC was a bridge, so I tried to use veth pairs to make it work. Nope. As it turns out, the Controller (and L3 agent) seems to use drivers for OpenVSwitch OR LinuxBridge (not both). It appears that it is all or nothing and you cannot mix and match between LinuxBridgeAgent and OpenVSwitchAgent.

I backed it out and used / installed LinuxBridgeAgent.

OpenStack Functional Demo

Originally, with one CentOS 7 server (32 Gb RAM) that was set up to run Ansible and LibvirtD at my disposal, I installed Openstack on a single box.

I put the Controller in a VM and used the host as the Nova Compute Node.

I had all sorts of issues initially. The Keystone and Glance were fairly straightforward. I did not have DNS, so I used IP addresses for most urls, which is a double-edged sword. The complexity in OpenStack is with Nova (virtualization management) and Neutron (networking).

I did not create a "Network Node". I used only a Controller Node and a Compute Node. What one would normally put on a Network Node, runs on the Controller Node (L3 agent, DHCP Agent, Metadta Agent).

One issue was libguestfs was not working. I finally removed it from the box only to realizs that there was a yum dependency with the openstack-nova-conpute package. So I installed nova compute using an rpm with the --nodeps flag.

Getting linuxbridge agent to work took some fiddling. One issue is that it was not clear if I needed fo run LinuxBridgeAgent on the Controller. The instructions make it seem that it is only for the Conpute Node. Well, not so. Neutron creates a tap for every dhcp agent, and every port. ON THE CONTROLLER if that is where you run those services. So you install it both places.

The Neutron configuration file...is about 10,000 lines long, leaving many opportunities for misconfiguration (by omission, incorrect assumption/interpretation, or just plain typos). It took a while to sleuth out how OpenStack uses Nova, Neutron and the l3 agent and linuxbridge agent to create bridges, vnets and taps (ports). But - confusing again - is whether you need to configure all parms exactly same on both boxes, of if some are ignored on one node or the other. I was not impressed with these old style ini and config files. Nightmares of complexity.

Another major challenge I had was the external network. I failed to realize (until I did network debugging) that packets that leave the confines of OpenStack need to find their way back into OpenStack. This means having specific routes to internal OpenStacks networks via the OpenStack external gateway port on the OpenStack router from VMs sitting outside OpenStack.

Another confusing thing is that OpenStack runs namespaces (separate and distinct network stacks) to avoid IP Overlays (by default - the way Neutron is configured). Knowing how to navigate namespaces is / was a new topic for me and makes it harder to debug connectivity issues.

Finally, when I worked all of this out, I realized that the deployment of VMs was taking up almost 100% CPU. This led me down a rabbit hole to discover that I needed to use the kvm virt_type, and a CPU mode of host-passthrough to calm the box down.

Once I got this done, I could deploy efficiently.

Another thing (maybe this should be its own post) is the notion of setting ports that you can use on deployment (instead of saying "deploy to this network", you can say "use this port on this network" - which has its own IP and port assignment). Because you can attach multiple submets to a single network, I figured I could create ports for nodes that I wanted to reside on that submet. And I COULD! But - the ETSI MANO standards have not caught up with this kind of cardinality / association (per my testing anyway) so it only works if you use OpenStack GUI to deploy. Therefore, having a "one subnet to one network" rule is simpler and will work better for most situations I think.

In the end, I was able to do everything smoothly with, OpenStack. Save Images, create Flavors, Networks, and Deploy. But it all has to be configured "just so".

Sunday, September 17, 2017

Service Orchestration and Automation with Open Baton

I haven't blogged in a while, but I've been busy working on Cloud Automation technology of late.

Originally, some guys in the company did an evaluation between Puppet, Chef and Ansible for automating the deployment of virtual machines into the Cloud (they hosted their own cloud and did not rely on the commercial cloud providers we see today).

It took me a while, but I finally had the time to examine their stuff, and before long I was hacking the scripts for my own purposes, so that I could build different versions of our SD-WAN solution, and different topologies of this solution (e.g. we had an L2 solution, an L3 solution, an L3 solution with Routing, et al). I fell in love with Ansible. I could spin a virtual network up in a matter of minutes, and I could start with raw virtual machines (Linux - CentOS) that would download the packages and install them (with yum installer), install and configure the software, etc. I could probably write a book on the topic of Ansible alone. But - I took someone else's hard work, and ran with the ball and it is always easier to do that then start from scratch yourself.

Then - I was asked to get a prototype of ETSI Mano working.

Years back, when I was at Nokia Networks, we examined Service Orchestration, but back then there were no standards and it was a HUGE integration clusterf$k to get that kind of technology working. We tried it with BEA and JNetX, and message queues. It was a mess.

This time, I read through the standards, and indeed, it looked to me like we HAVE standards drafted up. But - do we have any working solutions? I looked at a solution called OpenBaton, which is open source, out of Berlin, Germany. I put it on a box, went through the tutorials, and it seemed to "kinda sorta" work. So I was able to get this working with a stub "dummy" module that doesn't do anything.

Originally, I put OpenBaton on one virtual machine. It is designed to run on Ubuntu 14.04 (at least that is what the developers tested it on). So, not being heavily familiar with Ubuntu I installed 14.04 on a virtual machine on a KVM Host (32 Gig RAM, 8 core CPU and 1Tb Disk) and immediately the Ubuntu upgraded it to 16.04. This created some problems right away. One HUGE issue is that all of the software is written in Java, and they stated that they wanted JDK 1.7. But - guess what? Oracle had just deprecated 1.7 that very week, and took the 1.7 JDK link down, which broke all of Open Baton's scripts. Don't ask me how I got around this...it was very difficult. I installed OpenJDK 1.7, and then "faked things" so that Open Baton's scripts would believe Oracle JDK was on the box. I wound up having to download and compile many packages from GitHub and compiling them myself. I also wound up having to hack and manipulate the SystemD unit files so that the services would start up properly.

Initially, I only installed the Orchestrator (NFVO), and the Generic VNFM (Virtual Networking Function Manager) modules. But, to really vet the technology out, OpenBaton needs a "real" system to talk to. So, in a 2nd Virtual Machine on the KVM host, I installed an OpenStack Controller Virtual Machine on Centos 7, and it ran along side the OpenBaton Virtual Machine. On the KVM Host, I installed the Neutron Compute Module, which is responsible for interacting with the KVM host and launching the virtual machines.

I got it to launch machines, but that gets boring quick. I wanted to examine the ability to run scripts and configure the VMs dynamically, and have the VMs inter-communicate. I then learned that OpenBaton - though called an Orchestrator - cannot actually pull any of this off without using an EMS (Element Management System). And Open Baton uses Zabbix for this. So, I had to install a Zabbix Server and a Zabbix Plugin - and I installed these on the Open Baton Virtual Machine, thinking that I would alleviate issues if I put them all together on the same box (more on this later).

In the end, I am able to get Open Baton to launch VMs consistently, but I get a TON of timeout errors. As I debug things, I realize that threads and message queues are timing out because the process of deploying and configuring the VMs is so CPU and Disk intensive that the VMs just get overwhelmed and Open Baton gets impatient waiting for things to happen.

I run top (as well as htop and other tools) and realize that I need to take a step back if I am going to take a step forward. I need to get another box - a second box - and distribute some load, and move some things out of these virtual machines.

Okay, that's it for now. I will update more on the next post.