Grasping Technology: Cluster

Showing posts with label Cluster. Show all posts

Tuesday, March 1, 2022

VMWare Clustered File Systems - VMFS5 vs VMFS6

A nice table that describes the differences between VMWare's VMFS5 and the new VMFS 6.

Source: http://www.vmwarearena.com/difference-between-vmfs-5-vmfs-6/

For the difference in 512n versus 512e:

VMFSsparse:

VMFSsparse is a virtual disk format used when a VM snapshot is taken or when linked clones are created off the VM. VMFSsparse is implemented on top of VMFS and I/Os issued to a snapshot VM are processed by the VMFSsparse layer. VMFSsparse is essentially a redo-log that grows from empty (immediately after a VM snapshot is taken) to the size of its base VMDK (when the entire VMDK is re-written with new data after the VM snapshotting). This redo-log is just another file in the VMFS namespace and upon snapshot creation the base VMDK attached to the VM is changed to the newly created sparse VMDK.

SEsparse (space efficient):

SEsparse is a new virtual disk format that is similar to VMFSsparse (redo-logs) with some enhancements and new functionality. One of the differences of SEsparse with respect to VMFSsparse is that the block size is 4KB for SEsparse compared to 512 bytes for MFSsparse. Most of the performance aspects of VMFSsparse discussed above—impact of I/O type, snapshot depth, physical location of data, base VMDK type, etc.—applies to the SEsparse format also.

Monday, October 18, 2021

HAProxy - Aggravating Problem I Have Not Solved

I have not ever really blogged on proxies. I don't have a lot of proxy experience and don't consider myself a guru with proxies, load balancers, etc.

But more and more often, solutions have come in that require load distribution to an N+1 (Active Active) cluster. And, HAProxy is supposed to be a rather lightweight and simple approach, especially in situations where the mission is not totally critical, or the load is not seriously high.

I originally set HAProxy up to distribute load to a Cloudify cluster. And Cloudify provided the configuration for HAProxy that they had tested in their lab, and that they knew worked well. Later, I set HAProxy up to load balance our Morpheus cluster. Initially it was working fine.

Or, so it seemed. Later, I noticed errors. The first thing you generally do when you see errors, is to tell HAProxy to use one node (and not 2 or 3), so that you can reduce troubleshooting complexity and examine the logs on just one back-end node. So in doing this, I managed to rather quickly figure out that if I told HAProxy to use one back-end node, things worked fine. When I told HAProxy to use two or more back-end nodes, things didn't work.

So that's where it all started.

The Problem
Below is a picture of what we are doing with HAProxy, and based on the picture below, web access comes in on the northbound side of the picture, and web access is not the problem we are having. The problem, is that VMs that are deployed onto various internal networks by Morpheus "phone home" and they phone home on a different network interface.

This works fine with a single back-end enabled. But if you enable more than one back-end in HAProxy, Morpheus fails to fully transition the state of the VM to "running".

In testing this out a bit and dumping traffic, we initially noticed something interesting. The Source IP coming into each Morpheus node, was not the HAProxy VIP - it was the interface IP address. We wound up solving this, by telling KeepAliveD to delete and re-create the routes with the VIP to be used as the Source IP - but only when it had control of the VIP. But in the end, while this made traffic analysis (tcpdump on the Morpheus nodes) a bit more clear about the traffic flow, it did not solve the actual issue.

I STILL don't know why it works with one back-end, and not two or more. I have had Proxy experts in our organization come in and look, and they seem to think HAProxy is doing its job properly, and that the issue is on the back-end clustering. The vendor, however, is telling us the issue is with HAProxy.

Our next step may be to configure a different load balancer. That should definitely rule things out. I know Squid Proxy is quite popular, but these tools do have a Learning Curve, and I have zero zilch experience with Squid Proxy. I think we may use a Netscaler Load Balancer if we wind up going with another one.

I should mention that the HAProxy configuration is not the simplest. And as a result of configuring this, I have increased my general knowledge on Load Balancing.

Monday, October 15, 2018

Kubernetes Part VI - Helm Package Manager

This past week, my colleague has introduced me to something called Helm, which is sort of like a "pip" for Python. It manages Kubernetes packages (it is a Kubernetes Package Manager).

The reason this was introduced:
We found SEVERAL github repos with Prometheus Metrics in them, and they were not at all consistent.

Kubernetes had one
There was another one at a stefanprod project
There was yet a third called "incubator"

My colleague, through relentless research, figured out (or decided) that the one installed through Helm was the best one.

This meant I had to understand what Helm is. Helm is divided into a client (Helm), a server (Tiller), and you install packages (Charts). I guess it's a maritime themed concept, although I don't know why they can't call a package a package (copyright reasons maybe?).

So - I installed Helm, and that went smoothly enough. I installed it on my Kubernetes Master. I also downloaded a bunch of Charts off the stable release in GitHub (Prometheus is one of these). These all sit in a /stable directory (after the git clone, ./charts/stable).

When I came back in, and wanted to pick back up, I wasn't sure if I had installed Prometheus or not. So I ran a "helm list", and got the following error:

Error: configmaps is forbidden: User "system:serviceaccount:kube-system:default" cannot list configmaps in the namespace "kube-system"

Yikes. For a newbie, this looked scary. Fortunately Google had a fix for this on a StackOverflow page.

I had to run these commands:

kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'      
helm init --service-account tiller --upgrade

These seemed to work. The "helm list" command showed no results, though, so I guess I need to install the prometheus package (sorry...chart) after all with helm now.

But, more importantly, I really need to take some time to understand what we just ran above, with regards to this stuff above; the cluster role bindings, et al.

Grasping Technology