Grasping Technology: NSX-T

Showing posts with label NSX-T. Show all posts

Friday, August 18, 2023

Recovering a Corrupted NSX-T Manager

If your NSX-T Manager cluster is running as a cluster of VMs, if one is corrupted, there is a good chance they all are if the issue was related to storage connectivity. Or, maybe it is just one. If you are running a cluster, and don't have a backup to restore, these steps can be used to repair the file system. Mileage varies on repairing file systems, so there is no guarantee this will work, but this is the process to attempt nontheless.

1. Connect to the console of the appliance.

2. Reboot the system.

3. When the GRUB boot menu appears, press the left SHIFT or ESC key quickly. If you wait too long and the boot sequence does not pause, you must reboot the system again. Press e to edit the menu.

4. Keep the cursor on the Ubuntu selection.

5. Press e to edit the selected option.

6. Enter the user name ( root) and the GRUB password for root (not the same as the appliance's user root).Password "VMware1" before release 3.2 and "NSX@VM!WaR10" 3.2 and beyond.

7. Search for the line starting with "linux" having boot command.

8. Remove all options after root= (Starting from UUID) and add "rw single init=/bin/bash".

9. Press Ctrl-X to boot.

10. When the log messages stop, press Enter. You will see the prompt root@(none):/#.

11. Run following commands to repair the file system.

e2fsck -y /dev/sda1
e2fsck -y /dev/sda2
e2fsck -y /dev/sda3
e2fsck -y /dev/mapper/nsx-config
e2fsck -y /dev/mapper/nsx-image
e2fsck -y /dev/mapper/nsx-var+log
e2fsck -y /dev/mapper/nsx-repository
e2fsck -y /dev/mapper/nsx-secondary

Friday, January 13, 2023

Debugging Dropped Packets on NSX-T E-NVDS

Inside the hypervisor, we have the following nics:

The servers have physical nics as follows:

~~vmnic0 – 1G nic, Intel X550 – Unused~~
~~vmnic1 – 1G nic, Intel X550 - Unused~~
vmnic2 - 10G nic, SFP+, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up
vmnic3 - 10G nic, SFP+, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up
vmnic4 - 10G nic, SFP+, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up
vmnic5 - 10G nic, SFP+, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up

The nics connect to the upstream switches (Aristas), and they connect virtually to the virtual switches (discussed right below):

Inside Hypervisor (Host 5 in this specific case):

Distributed vSwitch

Physical Nic Side: vmnic2 and vmnic4

Virtual Side: vmk0 (VLAN 3850) and vmk1 (VLAN 3853)

NSX-T Switch (E-NVDS)

Physical NIC side: vmnic3 and vmnic5 à this is the nic that gets hit when we run the load tests

Virtual Side: 50+ individual segments that VMs connect to, and get assigned a port

Now, in my previous email, I dumped the stats for the physical NIC – meaning, from the “NIC Itself” from the ESXi OS operating system.

But, it is wise also, to take a look at the stats of the physical nic from the perspective of the virtual switch! Remember, vmnic5 is a port on the virtual switch!

So first, we need to figure out what port we need to look at:
net-stats -l

PortNum Type SubType SwitchName MACAddress ClientName

2214592527 4 0 DvsPortset-0 40:a6:b7:51:56:e9 vmnic3

2214592529 4 0 DvsPortset-0 40:a6:b7:51:1b:9d vmnic5 à here we go, port 2214592529 on switch DvsPortset-0 is the port of interest

67108885 3 0 DvsPortset-0 00:50:56:65:96:e4 vmk10

67108886 3 0 DvsPortset-0 00:50:56:65:80:84 vmk11

67108887 3 0 DvsPortset-0 00:50:56:66:58:98 vmk50

67108888 0 0 DvsPortset-0 02:50:56:56:44:52 vdr-vdrPort

67108889 5 9 DvsPortset-0 00:50:56:8a:09:15 DEV-ISC1-Vanilla3a.eth0

67108890 5 9 DvsPortset-0 00:50:56:8a:aa:3f DEV-ISC1-Vanilla3a.eth1

67108891 5 9 DvsPortset-0 00:50:56:8a:9d:b1 DEV-ISC1-Vanilla3a.eth2

67108892 5 9 DvsPortset-0 00:50:56:8a:d9:65 DEV-ISC1-Vanilla3a.eth3

67108893 5 9 DvsPortset-0 00:50:56:8a:fc:75 DEV-ISC1-Vanilla3b.eth0

67108894 5 9 DvsPortset-0 00:50:56:8a:7d:cd DEV-ISC1-Vanilla3b.eth1

67108895 5 9 DvsPortset-0 00:50:56:8a:d4:d8 DEV-ISC1-Vanilla3b.eth2

67108896 5 9 DvsPortset-0 00:50:56:8a:67:6f DEV-ISC1-Vanilla3b.eth3

67108901 5 9 DvsPortset-0 00:50:56:8a:32:1c DEV-MSC1-Vanilla3b.eth0

67108902 5 9 DvsPortset-0 00:50:56:8a:e6:2b DEV-MSC1-Vanilla3b.eth1

67108903 5 9 DvsPortset-0 00:50:56:8a:cc:eb DEV-MSC1-Vanilla3b.eth2

67108904 5 9 DvsPortset-0 00:50:56:8a:7a:83 DEV-MSC1-Vanilla3b.eth3

67108905 5 9 DvsPortset-0 00:50:56:8a:63:55 DEV-MSC1-Vanilla3a.eth3

67108906 5 9 DvsPortset-0 00:50:56:8a:40:9c DEV-MSC1-Vanilla3a.eth2

67108907 5 9 DvsPortset-0 00:50:56:8a:57:8f DEV-MSC1-Vanilla3a.eth1

67108908 5 9 DvsPortset-0 00:50:56:8a:5b:6d DEV-MSC1-Vanilla3a.eth0

/net/portsets/DvsPortset-0/ports/2214592529/> cat stats

packet stats {

pktsTx:10109633317

pktsTxMulticast:291909

pktsTxBroadcast:244088

pktsRx:10547989949 à total packets RECEIVED on vmnic5’s port on the virtual switch

pktsRxMulticast:243731083

pktsRxBroadcast:141910804

droppedTx:228

droppedRx:439933 à This is a lot more than the 3,717 Rx Missed errors, and probably accounts for why MetaSwitch sees more drops than we saw up to this point!

}

So – we have TWO things now to examine here.

Is the Receive Side Scaling configured properly and working?

We configured it, but…we need to make sure it is working and working properly.
We don’t see all of the queues getting packets. Each Rx Queue should be getting its own CPU.

Once packets get into the Ring Buffer and passed through to the VM (poll mode driver picks the packets up off the Ring), they hit the virtual switch.

And the switch is dropping some packets.
Virtual switches are software. As such, they need to be tuned to stretch their capability to keep up with what legacy hardware switches can do.

The NSX-T switch is a powerful switch, but is also a newer virtual switch, more bleeding edge in terms of technology.
I wonder if we are running the latest greatest version of this switch, and if that could help us here.

Now, I looked even deeper into the E-NVDS switch. I went into vsish shell, and started examining any and all statistics that are captured by that networking stack.

Since we are concerned with receives, I looked at the InputStats specifically. I noticed there are several filters – which, I presume is tied to a VMWare Packet Filtering flow, analogous to Netfilter in Linux, or perhaps Berkeley Packet Filter. But, I have no documentation whatsoever on this, and can’t find any, so I did my best to “back into” what I was seeing.

I see the following filters that packets can traverse – traceflow might be packet capture but not sure aside of that.

· ens-slowpath-input

·         traceflow-Uplink-Input:0x43110ae01630

·         vdl2-uplink-in:0x431e78801dd0

·         UplinkDoSwLRO@vmkernel#nover

·         VdrUplinkInput

If we go down into the filters and print the stats out, most of the stats seem to line up (started=passed, etc) except this one, which has drops in it:


/net/portsets/DvsPortset-0/ports/2214592529/inputFilters/vdl2-uplink-in/> cat stats

packet stats {

   pktsIn:31879020

   pktsOut:24269629

   pktsDropped:7609391


/net/portsets/DvsPortset-0/ports/2214592527/inputFilters/vdl2-uplink-in/> cat stats

packet stats {

   pktsIn:24817038

   pktsOut:17952829

   pktsDropped:6864209

That seems like a lot of dropped packets to me (a LOT more than those Rx Missed errors), so this looks like something we need to work with VMWare on because if I understand these stats properly, this suggests an issue on the virtual switch more than the adaptor itself.

Another thing I saw, poking around, was this interesting looking WRONG_VNIC on passthrough status on vmnic3 and vmnic5, the two nics being used in the test here. I think we should maybe ask VMWare about this and run this down also.

/net/portsets/DvsPortset-0/ports/2214592527/> cat status

port {

   port index:15

   vnic index:0xffffffff

   portCfg:

   dvPortId:4dfdff37-e435-4ba4-bbff-56f36bcc0779

   clientName:vmnic3

   clientType: 4 -> Physical NIC

   clientSubType: 0 -> NONE

   world leader:0

   flags: 0x460a3 -> IN_USE ENABLED UPLINK DVS_PORT DISPATCH_STATS_IN DISPATCH_STATS_OUT DISPATCH_STATS CONNECTED

   Impl customized blocked flags:0x00000000

   Passthru status: 0x1 -> WRONG_VNIC

   fixed Hw Id:40:a6:b7:51:56:e9:

   ethFRP:frame routing {

      requested:filter {

         flags:0x00000000

         unicastAddr:00:00:00:00:00:00:

         numMulticastAddresses:0

         multicastAddresses:

         LADRF:[0]: 0x0

         [1]: 0x0

      accepted:filter {

         flags:0x00000000

         unicastAddr:00:00:00:00:00:00:

         numMulticastAddresses:0

         multicastAddresses:

         LADRF:[0]: 0x0

         [1]: 0x0

   filter supported features: 0 -> NONE

   filter properties: 0 -> NONE

   rx mode: 0 -> INLINE

   tune mode: 2 -> invalid

   fastpath switch ID:0x00000000

   fastpath port ID:0x00000004

Tuesday, January 10, 2023

VMWare NSX-T Testing - Dropped Packets

We have been doing some performance testing with a voice system.

In almost all cases, these tests are failing. They are failing for two reasons:

Rx Missed counters on the physical adaptors of the hypervisors that are used to send the test traffic. These adaptors are connected to the E-NVDS virtual switch on one side, and to an upstream Arista data center switch on the other.
Dropped Packets - mostly media (RTP UDP), with less than 2% of the drops being RTCP traffic (TCP).

Lately, I used the"Performance Best Practices for VMWare vSphere 7.0" guide, as a method for trying to improve the dropped packets were seeing.

We attempted several things that were mentioned in this document:

ESxi NiC - enable Receive Side Scaling (RSS)

Actually, to be technical, we enabled DRSS (Default Queue RSS) rather than the RSS (NetQ RSS) which the i40en driver also supported for this Intel X710 adaptor.

LatencySensitivity=High - and we checked "Reserve all Memory" on the checkbox
Interrupt Coalescing

Disabling it, to see what affect disabling it had
Setting it from its rate-based scheme (the default, rbc) to static with 64 packets per interrupt

We didn't really see any noticeable improvement from the Receive Side Scaling or the Latency Sensitivity settings, which was a surprise, actually. We did see some perhaps minor improvement on the interrupt coalescing when we set it to static.

Thursday, April 14, 2022

IP MAC Discovery on NSX-T

We had a deployment where two customer VMs were deployed as an Active Standby cluster. And the failover wasn't working when they tested it.

I had already deployed a fully working pair of Active-Standby Virtual Machines using KeepaliveD, so I knew that VRRP worked. Now, I am not sure that the customer is using VRRP per se, but the concept of Active Standby failover remains a constant whether both of us were using a strict RFC-compliant VRRP or not.

So what was the difference between these customer VMs, and our VMs?

Well, the difference was that I was running my VMs on VLAN-backed network segments that were jacked into (legacy) vCenter / ESXi Distributed Port Groups. The customer's VMs, were jacked into NSX-T virtual switches (overlay segments).

So after re-verifying my VRRP failover (which worked flawlessly in both multicast and unicast peering configurations), the problem seemed to be traced back to NSX-T.

Was it Mac Spoofing? Was it a Firewall? NSX-T does run an Overlay Firewall! And these Firewalls are at the segment level, but also the Transport Zone (Tier 1 router) level. Sure enough, we realized that the Tier 1 Firewall was dropping packets on failover attempts.

After much testing, it was concluded that it was related to TOFU on the IP Discovery Switching Profile.

From this VMWare link, we get some insight on this:

Understanding IP Discovery Switching Profile

By default, the discovery methods ARP snooping and ND snooping operate in a mode called trust on first use (TOFU). In TOFU mode, when an address is discovered and added to the realized bindings list, that binding remains in the realized list forever. TOFU applies to the first 'n' unique <IP, MAC, VLAN> bindings discovered using ARP/ND snooping, where 'n' is the binding limit that you can configure. You can disable TOFU for ARP/ND snooping. The methods will then operate in trust on every use (TOEU) mode. In TOEU mode, when an address is discovered, it is added to the realized bindings list and when it is deleted or expired, it is removed from the realized bindings list. DHCP snooping and VM Tools always operate in TOEU mode.

So guess what? After disabling this profile, and effectively disabling TOFU mode, TOEU mode kicked in and lo and behold, the customer's failover started working.

Monday, October 4, 2021

The first Accelerated VNF on our NFV platform

I haven't posted anything since April but that isn't because I haven't been busy.

We have our new NFV Platform up and running, and it is NOT on OpenStack. It is NOT on VMWare VIO. It also, is NOT on VMWare Telco Cloud!

We are using ESXi, vCenter, NSX-T for the SD-WAN, and Morpheus as a Cloud Management solution. Morpheus has a lot of different integrations, and a great user interface that gives tenants a place to log in and call home and self-manage their resources.

The diagram below depicts what this looks like from a Reference Architecture perspective.

The OSS, which is not covered in the diagram, is a combination of Zabbix and VROPS, both working in tandem to ensure that the clustered hosts and management functions are behaving properly.

The platform is optimized with E-NVDS, which is also referred to commonly as Enhanced Datapath which requires special DPDK drivers to be loaded on the ESXi hosts, for starters, as well as some configuration in the hypervisors. There are also settings to be made in the hypervisors to ensure that the E-NVDS is configured properly (separate upcoming post).

Now that the platform is up and running, it is time to start discussing workload types. There are a number of Workload Categories that I tend to use:

Enterprise Workloads - Enterprise Applications, 3-Tier Architectures, etc.
Telecommunications Workloads

Control Plane Workloads
Data Plane Workloads

Control Plane workloads are have more tolerances for latency and system resources than Data Plane Workloads do.

Why? Because Control Plane workloads are typically TCP-based, frequently use APIs (RESTful), and tend to be more periodic in their behavior (periodic updates). Most of the time, when you see issues related to Control Plane, it is related to back-hauling a lot of measurements and statistics (Telemetry Data). But generally speaking, this data in of itself does not have stringent requirements.

From a VM perspective, there are a few key things you need to do to ensure your VNF behaves as a true VNF and not as a standard workload VM. These include:

setting Latency Sensitivity to High, which turns off interrupts and ensures that poll mode drivers are used.
Enable Huge Pages on the VM by going into VM Advanced Settings and adding the parameter: sched.mem.lpage.enable1GHugePage = TRUE

Note: Another setting worth checking, although we did not actually set this parameter ourselves, is: sched.mem.pin = TRUE

Note: Another setting, sched.mem.maxmemctl ensures that ballooing is turned off. We do NOT have this setting, but it was mentioned to us, and we are researching this setting.

One issue we seemed to continually run into, was a vCenter alert called Virtual Machine Memory Usage, displaying in vCenter as a red banner with "Acknowledge and Reset to Green" links. The VM was in fact running, but vCenter seemed to have issues with it. The latest change we made that seems to have fixed this error, was to check the "Reserve all guest memory (All locked)" option checkbox.

This checkbox to Reserve all guest memory seemed intimidating at first, because the concern was that the VM could reserve all memory on the host. That is NOT what this setting does!!! What it does, is allow the VM to reserve all of its memory up-front - but just the VM memory that is specified (i.e. 24G). If the VM has has HugePages enabled, it makes sense that one would want the entire allotment of VM memory to memory to be reserved up front and be contiguous. When we enabled this, our vCenter alerts disappeared.

Lastly, we decided to change DRS to Manual in VM Overrides. To find this setting amongst the huge number of settings hidden in vCenter, you go to the Cluster (not the Host, not the VM, not the Datacenter) and the option for VM Overrides is there, and you have four options:

None
Manual
Partial
Full

The thinking here, is that VMs with complex settings may not play well with vMotion. I will be doing more research on DRS for VNFs before considering setting this (back) to Partial or Full.

Monday, April 26, 2021

Migrating from OpenStack to VMWare

Been a while since I have posted anything. The latest news, is that my employer has decided to replace OpenStack with VMWare. We had to do a side-by-side comparison POC (Proof of Concept), between VMWare with Contrail as the network SDN, and VMWare with NSX-T as the network SDN.

In the end, they chose NSX-T, and the architecture looked like:

Cloudify - Cloud Management and Orchestration
VMWare vCenter Cloud (Private Cloud)
NSX-T - (I will probably blog more on NSX-T later)

So, I have had to go through a series of Bootcamp trainings on the following.

NSX-T
VeloCloud

I will make some brief posts about these, as well as some other things, going forward.

Grasping Technology