Thursday, August 13, 2020

NIC Teaming vs Active-Active NIC Bonding - differences - which is better?

I just went down a path of discovery trying to fully understand the differences between Bonding and NIC Teaming.

Bonding, of course, has a concept of "bonding modes" that allow you to use NICs together for a failover purpose in active-standby mode, and even active-active failover. When using these modes, however, the focus is not gluing the NICs together to achieve linear increases in bandwidth (i.e. 10G + 10G = 20G). To get into true link aggregation, you need to use different bonding modes that are specifically for that purpose. I will include a link that discusses in detail the bonding modes in Linux:

Linux Bonding Modes 

So what is the difference, between using Bonding Mode 4 (LACP Link Aggregation), or Bonding Mode 6 (Adaptive Load Balancing), and NIC Teaming? 

I found a great link that covers the performance differences between the two.

https://www.redhat.com/en/blog/if-you-bonding-you-will-love-teaming

At the end of the day, it comes down to the drivers and how well they're written of course. But for Red Hat 7 at least, we can see the following.

The performance is essentially "six of one half dozen of another" on RHEL 7, on a smaller machine with a 10G Fiber interface. But if you look carefully, while NIC Teaming provides you gains on smaller packet sizes, as packet sizes start to get large (64KB or higher), the Bonding starts to give you some gains.

I'll include the link and the screen snapshot. Keep in mind, I did not run this benchmark myself here. I am citing an external source for this information, found at the link below:

Friday, August 7, 2020

LACP not working on ESXi HOST - unless you use a Distributed vSwitch

Today we were trying to configure two NICs on an ESXi host in an Active-Active state, such that they would participate in a LAG using LACP with one NIC connected to one TOR (Top of Rack) Switch and the other connected to another separate TOR switch.

It didn't work.

There was no way to "bond" the two NICs (as you would typically do in Linux). ESXi only supported NIC Teaming. Perhaps only the most advanced networking folks realize, that NIC Teaming is not the same as NIC Bonding (we won't get into the weeds on that). And NIC Teaming and NIC Bonding are not the same as Link Aggregation.

So after configuring NIC Teaming, and enabling the second NIC on vSwitch0, poof! Lost connectivity.

Why? Well, ESXi runs Cisco Discovery Protocol (CDP). Not LACP, which the switch requires. So without LACP, there is no effective LAG, and the switch gets confused.

Finally, we read that in order to use LACP, you needed to use vDS - vmWare Distributed Switch. 

Huh? Another product? To do something we could do on a Linux box with no problems whatsoever?

Turns out, that to run vDS, you need to run vCenter Server. So they put the Distributed Switch on vCenter Server?

Doesn't that come at a performance cost? Just so they can charge licensing?

I was not impressed that I needed to use vCenter Server just to put 2 NICs on a box on a Link Aggregation Group.

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...