Friday, September 14, 2018

Kubernetes - Firewall Rules Causing the Kubernetes API Server to Crash


We hired a guy in here who knows a lot about Docker and Apache Mesos. He also has some Kubernetes expertise (a lot more expertise than I have).

I was showing him an "annoying phenomenon" in which I would repeatedly get "Connection Refused" errors printing in a loop in Syslog, on port 6443 (which is the Kubernetes api-server).

We did a TON of debugging on this, and I'm STILL not clear we have pinpointed this issue, but I think the issue has "something" to do with FirewallD and iptables.

What we wound up doing that SEEMS to have fixed the issue, is this:

1. Build a BRAND SPANKING NEW CentOS 7 Virtual Machine (from ISO)

2. Reinstall Packages from Scratch

3. Install a set of Firewall Rules


It turns out that the firewall rules in this architecture are rather complex. Docker puts in a set of Firewall Rules, Kubernetes puts in its own set of rules, and then on top of that there are some rules I see being added that are *not* added by default.

For the Master:
port 6443/tcp
port 2379-2380/tcp
port 10250/tcp
port 10251/tcp
port 10252/tcp
port 10255/tcp

For the Worker Nodes:
port 10250/tcp
port 10255/tcp
port 30000-32767/tcp
port 6783/tcp

Getting familiar with what uses what ports and why is an important part of understanding this kind of technology. 6443 is obviously the api-server. The other ports, honestly, I need to look up and get a better understanding of.

Now in FirewallD, you can NOT put these rules in the direct.xml file. I did that, thinking that was the way to go, and they did not work (I have not debugged why). I had to put each rule in with:
firewall-cmd --permanent --add-port=XXXX/tcp (and then do a firewall-cmd --reload at the end so they apply).

Putting the rules in this way puts the rules into the default zone, which is public with FirewallD. I would imagine if you monkeyed around with your zones, you could easily break these rules and they wouldn't work anymore. So Firewalling with this technology is nothing to take lightly.

No comments:

Fixing Clustering and Disk Issues on an N+1 Morpheus CMP Cluster

I had performed an upgrade on Morpheus which I thought was fairly successful. I had some issues doing this upgrade on CentOS 7 because it wa...