Friday, November 15, 2019

SaltStack


I had heard of Puppet. I had heard of Chef. And I knew Ansible quite well because someone I know looked at all three (Puppet, Chef and Ansible) and chose Ansible for our organization.

I had never heard of Salt.

Until now.

Mirantis uses Salt to manage OpenStack infrastructure.

So in having some familiarity with Ansible, it made sense to type into the search engine:
"ansible vs salt'.

Well, sure enough. Someone has done a comparison.

Ansible vs Salt

What I see a number of people doing with Salt, is running remote commands on nodes that they otherwise might not have access to. But - recently, I have started looking more into Salt and it appears to be architected quite similar to Ansible, and is also quite powerful.

One of the features I have recently played around with, is the ability to use "Salt Grains". You can get all kinds of "grains of information" from a host with Salt Grains. In my case, I am calling Salt and telling it to give me all of the grains for all of the hosts in JSON format - and then I parse the json and make a csv spreadsheet. Pretty cool.

There's a lot more. Like Salt States (equivalent to Ansible Modules I think?). There are Salt Pillars.

They use the "salt" theme pretty well in naming all of their extensions.

This link, is called Salt in Ten Minutes. Gives a pretty good overview.

Salt in Ten Minutes

This link, below, is quite handy in figuring out how to target your minions using regular expressions.
https://docs.saltstack.com/en/latest/topics/targeting/globbing.html#regular-expressions

Analyzing Dropped Packets


I recently saw an alert, which created a ticket, for "over 100 dropped packets in 10s".

I thought this was interesting:

  • What KIND of packets?
  • What if the drops are intentional drops?
  • Is 100 dropped packets a LOT? 
    • How often is this happening?
Lots of questions.

This got me into taking a quick look on a number of different Linux hosts, to see what the drop situation looked like on certain interfaces.

I noticed that most drops on one sample of hosts were Rx packets.
I noticed that drops on most hosts, were Tx packets.

In looking at netstat -s, you can get an amazing picture of exactly why packets are being dropped on a Linux host. It could be related to congestion control, like a socket buffer overrun (applications cannot read fast enough due to high CPU perhaps). Or, it could be dropped because it was supposed to be dropped - maybe there is a checksum error, windowing error, or a packet that should never have arrived in the first place.

One Network Engineer mentioned to me that some packets are dropped due to Packet Cloning, or Packet Redundancy features. These features were enabled so that far-end routers and switches that lost a packet (for one reason or another) close to the destination, didn't have to truck it all the way back to the source for a re-send.  But when this feature is used, you can get a lot of dropped packets due to "de-dupping". This could create a false positive. Juniper has a document that describes their Packet Redundancy, or Packet Protection, feature:


Interesting. Worth mentioning, or blogging about. Juniper has a

This link below is also interesting when it comes to finding out how to debug dropped packets.
https://community.pivotal.io/s/article/Network-Troubleshooting-Guide

Here is another interesting link on same topic.
https://jvns.ca/blog/2017/09/05/finding-out-where-packets-are-being-dropped/

Layer 2 Networking Configuration in Linux

I have not had a tremendous amount of exposure to Layer 2 Networking in Linux, or in general.

The SD-WAN product at my previous startup company has a Layer 2 Gateway that essentially would allow corporations to join LAN segments over a wide area network. So people sitting in an office in, say, Seattle, could be "theoretically" sitting next to some colleagues sitting in an office in, say, Atlanta. All on the same LAN segment. How the product did this is a separate discussions since it involved taking Ethernet Frames, and transporting / tunneling them across the internet (albeit in a very creative and very fast way due to link aggregation, UDP acceleration, multiple channels for delivering the tunneled packets, et al).

I only scratched the surface in terms of understanding the nuances of L2 with this. For example, I learned quite a bit about Bridging (from a Linux perspective). I learned a bit about Spanning Tree Protocol as well, and BPDUS.

I had heard about protocols like LLDP (Link Layer Discovery Protocol), and LACP (Link Aggregation Control Protocol), but since I was not dealing with commercial switches and things, I had no need for enabling, configuring, tuning or analyzing these protocols.

But - I am in an environment now, where these things start to matter a bit more. We run OpenStack hosts that connect to large Juniper switches. These Linux servers are using Link Aggregation and Bonding, and as such, are configured to use LACP to send PDUs to the switches.

LLDP is also enabled. With LLDP, devices advertise their information to directly-connected peers/neighbors. I found a good link that describes how to enable LLDP on Linux.
https://community.mellanox.com/s/article/howto-enable-lldp-on-linux-servers-for-link-discovery

This Juniper document does a pretty good job of discussing LACP.
Understanding LAG and LACP







Friday, November 8, 2019

Some handy sed commands for formatting a large concatenated json file

More on Salt later in a separate post, but I am using Salt to pull salt grains from a number of servers so that I can extrapolate out the gain items (in my case, serial number, operating system and so forth).

I ran into an issue where Salt concatenates the json file as a bunch of continguous blocks of json.

When you try to load this into a json parser, it fails.

So in researching how to split this file, I ran into one particularly clever person on the web who said, "don't SPLIT the file! just make it an array of json elements".

I wonder if this guy knew how much wisdom he expelled.

So - I needed some sed to "prepare" this file.

And here it is:

#!/bin/bash

# This script will take the huge json file from Salt and make it something we can parse by
# making each json block an array element.

# Step 1 - add commas between blocks
sed -i 's/^}/},/g' saltgraininfo.json

# Step 2 - remove that last comma out which I could not figure out how to do in Step 1 in same sed command.
sed -i '$s/,//g' saltgraininfo.json

# Step 3 - put a bracket in beginning of the file
sed -i '1s/^/[\n/' saltgraininfo.json

# Step 4 - put a bracket at the end of the file
sed -i '$s/}/}\n]/g' saltgraininfo.json

After I did this, I used Python and did a json.load on the file, and voila'! It loads!

On to the parsing now....

Friday, October 11, 2019

Beware: Swapping a NIC Card Out Changes the MAC Address

I observed an issue this week where a flapping NIC was replaced by a Dell technician.

When the Dell technician swapped out the NIC card (and left), the interfaces on the card would not come up and go into operation.

They were ABOUT to call Dell back in and swap out the motherboard, when I decided to wander over and take a look and get involved.

It is always important to remember, that when you change out a NIC card, the mac address CHANGES!

And you never know, where that previous mac address might have been used! Here are just a few things a mac address might be used:

  • an upstream DHCP server might be assigning an IP address based on mac address
  • firewalls might be using the mac address in certain rules and policies
  • interfaces in the OS (Linux in particular - especially CentOS) might not come up with a new mac address
    • CentOS7 has HWADDR directive in the interface configuration files
    • scripts in rc.local or udev may be using the mac address to do certain things
      • configure certain interfaces to bridges or bonds
In this particular case, a udev script was renaming a specific interface - based on mac address - and assigning it to a nic teaming configuration (bond).

Don't just swap NICs out! Figure out who might be paying attention to mac addresses before swapping! It can pay dividends.

Friday, September 27, 2019

Vector Packet Processing - Part IV - Testing and Verification

As I work through the documentation on fd.io, it discusses Installation, and then there is a section called "Running VPP".

The first error I encountered in this documentation had to do with running the VPP Shell. The documentation said to run the following command: "sudo vppctl -s run/vpp/cli-vpp.sock"

On a CentOS7 installation, the cli-vpp.sock file is actually called "cli.sock", not "cli-vpp.sock".  So in correcting this, indeed, I see a CLI shell, which I will show further down.

So there is a CLI to master with this. And to be any kind of guru, one will need to learn this. It does look like a more or less "standardized" CLI, with syntax commands that include the familiar, "show", "set", etc. I ran the "help" command to get a dump of commands, which showed a hefty number of sub-commands to potentially learn.

I decided to run a fairly simple "show interface" command, to see what that would produce. And, here is the result of that.

"show interface" results from VPP shell CLI - all interfaces down
So the CLI sees 4 Gigabit Ethernet interfaces, all in a state of "down". 

This server has two dual-port NIC cards, so it makes sense to me that there would be two found on GigabitEthernet1. Why there is only a single interface found on GigabitEthernet3, I need to look into (seems there should also be two of these). The local0 interface, I presume, is a NIC that is on the motherboard (I could see people confusing local0 with a loopback). 

If you proceed with the dp.io documentation, it actually instructs you to set up a veth pair - not the actual physical NICs on the box - and create interfaces that way and enable them, and then do some tracing. It probably makes some sense to do that, before trying to bring these Gigabit Ethernet NICs up and test those. Why? Well, for one reason, you could knock your connectivity to the server out, which would be bad. So let's leave our physical NICs alone for the time being.

So next step, we will run the veth steps and the tracing steps on the dp.io website.

Then, after that, I noticed there is a VPP Testing site on GitHub.

https://github.com/FDio/vpp/tree/master/test

It is written in Python, so you could run your Makefile commands and, hopefully, run these easily.

Vector Packet Processing - Part III - Ensuring a Supported PCI Device

Okay - today I took a quick look into why the "Unsupported PCI Device" errors were popping up when I started the VPP service.

It turns out, that the Realtek network adaptors on that server, are, in fact, not supported! Duh. This has nothing to do with VPP. It has to do with the underlying Data Plane Development Kit, on which VPP resides as a layer on top of (in other words, VPP uses DPDK libraries).

The DPDK site lists the adaptors that are supported, on this page of their website, entitled, "Supported Hardware".
http://core.dpdk.org/supported/

Sure enough, no long-in-the-tooth RealTek NICs are listed here.

So what would you do (on that server) to test and experiment with VPP?

  1. Well, you could swap out the adaptors. If you do that, you better make sure you think about static IP assignments based on MAC address because all of your MACs will change. 
  2. You could use a virtual adaptor that is supported.
Or, you could simply find another server. Which I did. And this server is using Intel adaptors that ARE supported.

VPP Startup with Supported Adaptors
Next, I ran the "vppctl list plugins" command, which dumped out a ton of .so (shared object) files. 

These files are shared libraries, essentially. Rather than linking stuff into all of the binaries (making them larger), a shared object or shared library accommodates multiple binaries using the code (they get their own local data segments but share a pointer to the code segment - as I understand it). 

So - it looks like we have a working VPP service on this service. Yay. What next? Well, here are a couple of possibilities:

1. OpenStack has a Neutron VPP driver. That could be interesting to look into, and see what it's all about, and how well it works.

2. Maybe there are some ways of using or testing VPP in a standalone way. For example, some test clients. 

I think I will look into number 2 first. At this point, I am only interested in functional testing here. I am not doing any kind of performance bake-offs. Not even sure I have the environment and tools for that right now. We're just learning here.
  

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...