Grasping Technology

Sunday, September 17, 2017

Service Orchestration and Automation with Open Baton

I haven't blogged in a while, but I've been busy working on Cloud Automation technology of late.

Originally, some guys in the company did an evaluation between Puppet, Chef and Ansible for automating the deployment of virtual machines into the Cloud (they hosted their own cloud and did not rely on the commercial cloud providers we see today).

It took me a while, but I finally had the time to examine their stuff, and before long I was hacking the scripts for my own purposes, so that I could build different versions of our SD-WAN solution, and different topologies of this solution (e.g. we had an L2 solution, an L3 solution, an L3 solution with Routing, et al). I fell in love with Ansible. I could spin a virtual network up in a matter of minutes, and I could start with raw virtual machines (Linux - CentOS) that would download the packages and install them (with yum installer), install and configure the software, etc. I could probably write a book on the topic of Ansible alone. But - I took someone else's hard work, and ran with the ball and it is always easier to do that then start from scratch yourself.

Then - I was asked to get a prototype of ETSI Mano working.

Years back, when I was at Nokia Networks, we examined Service Orchestration, but back then there were no standards and it was a HUGE integration clusterf$k to get that kind of technology working. We tried it with BEA and JNetX, and message queues. It was a mess.

This time, I read through the standards, and indeed, it looked to me like we HAVE standards drafted up. But - do we have any working solutions? I looked at a solution called OpenBaton, which is open source, out of Berlin, Germany. I put it on a box, went through the tutorials, and it seemed to "kinda sorta" work. So I was able to get this working with a stub "dummy" module that doesn't do anything.

Originally, I put OpenBaton on one virtual machine. It is designed to run on Ubuntu 14.04 (at least that is what the developers tested it on). So, not being heavily familiar with Ubuntu I installed 14.04 on a virtual machine on a KVM Host (32 Gig RAM, 8 core CPU and 1Tb Disk) and immediately the Ubuntu upgraded it to 16.04. This created some problems right away. One HUGE issue is that all of the software is written in Java, and they stated that they wanted JDK 1.7. But - guess what? Oracle had just deprecated 1.7 that very week, and took the 1.7 JDK link down, which broke all of Open Baton's scripts. Don't ask me how I got around this...it was very difficult. I installed OpenJDK 1.7, and then "faked things" so that Open Baton's scripts would believe Oracle JDK was on the box. I wound up having to download and compile many packages from GitHub and compiling them myself. I also wound up having to hack and manipulate the SystemD unit files so that the services would start up properly.

Initially, I only installed the Orchestrator (NFVO), and the Generic VNFM (Virtual Networking Function Manager) modules. But, to really vet the technology out, OpenBaton needs a "real" system to talk to. So, in a 2nd Virtual Machine on the KVM host, I installed an OpenStack Controller Virtual Machine on Centos 7, and it ran along side the OpenBaton Virtual Machine. On the KVM Host, I installed the Neutron Compute Module, which is responsible for interacting with the KVM host and launching the virtual machines.

I got it to launch machines, but that gets boring quick. I wanted to examine the ability to run scripts and configure the VMs dynamically, and have the VMs inter-communicate. I then learned that OpenBaton - though called an Orchestrator - cannot actually pull any of this off without using an EMS (Element Management System). And Open Baton uses Zabbix for this. So, I had to install a Zabbix Server and a Zabbix Plugin - and I installed these on the Open Baton Virtual Machine, thinking that I would alleviate issues if I put them all together on the same box (more on this later).

In the end, I am able to get Open Baton to launch VMs consistently, but I get a TON of timeout errors. As I debug things, I realize that threads and message queues are timing out because the process of deploying and configuring the VMs is so CPU and Disk intensive that the VMs just get overwhelmed and Open Baton gets impatient waiting for things to happen.

I run top (as well as htop and other tools) and realize that I need to take a step back if I am going to take a step forward. I need to get another box - a second box - and distribute some load, and move some things out of these virtual machines.

Okay, that's it for now. I will update more on the next post.

Wednesday, July 26, 2017

NetFlow with Ntop

I had heard that Ntop supports Netflow on Linux.

I found a link / blog where someone else has played with this package for same or similar purposes. Let me share that here:
https://devops.profitbricks.com/tutorials/install-ntopng-network-traffic-monitoring-tool-on-centos-7/

I downloaded the Ntop package, and immediately it barked about the fact that I did not have kernel headers on the system.

This is bad, in my mind.

What box, running out in the field, would have kernel headers installed on it? That would be a bad security practice because it would mean that the box has a lot of stuff on it that it probably shouldn't have...specifically this would mean compilers, et al?

I also noticed that the package runs with a license code. There is a limited license it can run as, which is default configured. But I'm not sure I like having software, at least for this purpose, that is dependent on licensing. I did not study whether it is a key license that is time expired, or if it calls out to a remote server to authenticate the license, et al.

I kind of stopped there. I did not play with it any further. I may come back to it, and if I do I will update this accordingly.

Saturday, July 22, 2017

NetFlow with nfcapd and fprobe

I spent some time researching and using NetFlow this week (about a day).

Basically, you download the nfdump package, which has the collector (nfcapd), and a GUI (nfsen) and a command line tool called nfdump.

You run the collector, which listens on a standard or specified port, and "something" (i.e. a router) that knows how to capture flows, will write netflow formatted files. Then you can use nfdump or nfsen to view these flows.

There are multiple versions of NetFlow - from version 5 all the way up to 9 (see the NetFlow Wiki). The different versions provide additional data (or extensions as they refer to them).

The tricky part in testing this is to mimic or simulate a router. To do this:

fprobe is a tool you can install to generate flows. But it does not appear to install with the yum package manager, so you need to download the source and compile it, or there is an rpm that can be downloaded and installed.

frpobe-ulog is another tool, but it runs over iptables and requires iptables rules to work. I was surprised to see that yum COULD find and install this program, but not fprobe.

There are a few other tools as well, but these were the two I tried out.

Both of these worked, although there is not a lot of documentation or forum discussion on the fprobe-ulog approach. I wound up using fprobe.

There is the question of what defines and constitutes a network flow. The Wikipedia defines this. I think that if you have a bunch of udp traffic, it is harder for Netflow to stitch the traffic together into a flow for hindsight analysis. But TCP of course is straightforward.

System Tap

I spent some time reading the Beginner's Guide to System Tap.

https://sourceware.org/systemtap/SystemTap_Beginners_Guide/

I learned the basics of writing, reading and compiling / running the system tap scripts.

I also enjoyed running the sample System Tap scripts that are mentioned specifically in Chapter 5 of this guide.

I wound up downloading a bunch of these; especially the networking ones.

Writing these efficiently would take some practice. But there is a good Reference Guide that can make the process of writing the scripts easier. The question is, could there be a use case for writing one of these scripts that someone hasn't already thought up, and written?

Wednesday, July 19, 2017

Ansible Part II

I've had more time to play with Ansible, in the background. A little bit at least. Baby steps.

I use it now to deploy SD-WAN networks, which have different types of KVM-based network elements that need to be configured differently on individual virtual machines.

I enhanced it a bit to deploy virtual-machine based routers (Quagga), as I was building a number of routing scenarios on the same KVM host.

I have made some changes to make Ansible work more to my liking:

1. Every VM gets a management adaptor that connects to a default network.
2. The default network is a NAT network that has its own subnet mask and ip range.
3. I assign each VM an IP on this management network in the hosts file on the KVM host.

The ansible launch-vm script uses the getent package to figure out which IP address the VM has by its name, which is defined in the inventory file.

Because the adaptor type I like to use is Realtek, I had to change guestfish in the launch-vm script to use adaptor name ens3. I also had to change it to use an external DNS Server, because the lack of a DNS server was causing some serious issues with the playbooks not running correctly; especially when they needed to locate a host by name (i.e. to do a yum install).

This ansible has turned out to be very convenient. I can deploy VMs lickety split now, freeing up the time I would normally spend tweaking and configuring individual VM instances.

I'm thinking of writing my own Ansible module for Quagga set up and configuration. That might be a project I get into.

Before I do that, I may enhance the playbooks a bit, adding some "when" clauses and things like that. So far everything I have done has been pretty vanilla.

Wednesday, July 5, 2017

Quagga Routing - OSPF and BGP

For the last month or two, I have been "learning by doing" routing, using Quagga.

Quagga uses an abstraction layer called Zebra that sits (architecturally not literally) on top of the various routing protocols that it supports (OSPF, BGP, RIP, et al).

I designed two geographically-separated cluster of OSPF routers - area 0.0.0.1 - and then joined them with a "backbone" of two OSPF routers in area 0.0.0.0. I hosted these on virtual machines running on a KVM host.

From there, I modified the architecture to use BGP to another colleague's KVM virtual machine host that ran BGP.

Some things we learned from this exercise:
1. We learned how to configure the routers using vtysh,
2. We had to make firewall accomodations for OSPF (which uses multicast) and BGP.

We also used an in-house code project that tunnels Quagga. I have not examined the source for this, but that worked also, and required us to make specific firewall changes to allow tunx interfaces.

Percona XtraDB Cluster High Availability Training

Tuesday 5/23 through 5/26 I attended Percona Cluster High Availability training.

The instructor was very knowledgeable and experienced.

One of the topics we covered was the "pluggable", or module-based, architecture that allow different kinds / types of engines that can be used with the MySQL database. He mentioned InnoDB, and how the Percona XtraDB is based off of the InnoDB engine.

We also covered tools and utilities, not only from MySQL, but 3rd Parties, including Percona (Percona Toolkit).

We spent some time on Administration, such as Backup and Recovery.

We then moved on into Galera replication, and used VirtualBox images to create and manage clusters of 3 databases.

I won't reproduce a full week of rather complex training topics and details in this blog, but it was good training. I will need to go back and revisit / review this information so that it doesn't go stale on me.