Showing posts with label E-NVDS. Show all posts
Showing posts with label E-NVDS. Show all posts

Friday, January 13, 2023

Debugging Dropped Packets on NSX-T E-NVDS

 

Inside the hypervisor, we have the following nics:

The servers have physical nics as follows:

  • vmnic0 – 1G nic, Intel X550 – Unused
  • vmnic1 – 1G nic, Intel X550 - Unused
  • vmnic2 - 10G nic, SFP+, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up
  • vmnic3 - 10G nic, SFP+, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up
  • vmnic4 - 10G nic, SFP+, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up
  • vmnic5 - 10G nic, SFP+, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up

The nics connect to the upstream switches (Aristas), and they connect virtually to the virtual switches (discussed right below):

 

Inside Hypervisor (Host 5 in this specific case):

Distributed vSwitch

                Physical Nic Side: vmnic2 and vmnic4

                Virtual Side: vmk0 (VLAN 3850) and vmk1 (VLAN 3853)

 

NSX-T Switch (E-NVDS)

                Physical NIC side: vmnic3 and vmnic5 à this is the nic that gets hit when we run the load tests

                Virtual Side: 50+ individual segments that VMs connect to, and get assigned a port

 

Now, in my previous email, I dumped the stats for the physical NIC – meaning, from the “NIC Itself” from the ESXi OS operating system.

 

But, it is wise also, to take a look at the stats of the physical nic from the perspective of the virtual switch! Remember, vmnic5 is a port on the virtual switch!

 

So first, we need to figure out what port we need to look at:
net-stats -l

PortNum          Type SubType SwitchName       MACAddress         ClientName

2214592527          4       0 DvsPortset-0     40:a6:b7:51:56:e9  vmnic3

2214592529          4       0 DvsPortset-0     40:a6:b7:51:1b:9d  vmnic5 à here we go, port 2214592529 on switch DvsPortset-0 is the port of interest

67108885            3       0 DvsPortset-0     00:50:56:65:96:e4  vmk10

67108886            3       0 DvsPortset-0     00:50:56:65:80:84  vmk11

67108887            3       0 DvsPortset-0     00:50:56:66:58:98  vmk50

67108888            0       0 DvsPortset-0     02:50:56:56:44:52  vdr-vdrPort

67108889            5       9 DvsPortset-0     00:50:56:8a:09:15  DEV-ISC1-Vanilla3a.eth0

67108890            5       9 DvsPortset-0     00:50:56:8a:aa:3f  DEV-ISC1-Vanilla3a.eth1

67108891            5       9 DvsPortset-0     00:50:56:8a:9d:b1  DEV-ISC1-Vanilla3a.eth2

67108892            5       9 DvsPortset-0     00:50:56:8a:d9:65  DEV-ISC1-Vanilla3a.eth3

67108893            5       9 DvsPortset-0     00:50:56:8a:fc:75  DEV-ISC1-Vanilla3b.eth0

67108894            5       9 DvsPortset-0     00:50:56:8a:7d:cd  DEV-ISC1-Vanilla3b.eth1

67108895            5       9 DvsPortset-0     00:50:56:8a:d4:d8  DEV-ISC1-Vanilla3b.eth2

67108896            5       9 DvsPortset-0     00:50:56:8a:67:6f  DEV-ISC1-Vanilla3b.eth3

67108901            5       9 DvsPortset-0     00:50:56:8a:32:1c  DEV-MSC1-Vanilla3b.eth0

67108902            5       9 DvsPortset-0     00:50:56:8a:e6:2b  DEV-MSC1-Vanilla3b.eth1

67108903            5       9 DvsPortset-0     00:50:56:8a:cc:eb  DEV-MSC1-Vanilla3b.eth2

67108904            5       9 DvsPortset-0     00:50:56:8a:7a:83  DEV-MSC1-Vanilla3b.eth3

67108905            5       9 DvsPortset-0     00:50:56:8a:63:55  DEV-MSC1-Vanilla3a.eth3

67108906            5       9 DvsPortset-0     00:50:56:8a:40:9c  DEV-MSC1-Vanilla3a.eth2

67108907            5       9 DvsPortset-0     00:50:56:8a:57:8f  DEV-MSC1-Vanilla3a.eth1

67108908            5       9 DvsPortset-0     00:50:56:8a:5b:6d  DEV-MSC1-Vanilla3a.eth0

 

/net/portsets/DvsPortset-0/ports/2214592529/> cat stats

packet stats {

   pktsTx:10109633317

   pktsTxMulticast:291909

   pktsTxBroadcast:244088

   pktsRx:10547989949 à total packets RECEIVED on vmnic5’s port on the virtual switch

   pktsRxMulticast:243731083

   pktsRxBroadcast:141910804

   droppedTx:228

   droppedRx:439933 à This is a lot more than the 3,717 Rx Missed errors, and probably accounts for why MetaSwitch sees more drops than we saw up to this point!

}

 

So – we have TWO things now to examine here.

  • Is the Receive Side Scaling configured properly and working? 
    • We configured it, but…we need to make sure it is working and working properly.
    • We don’t see all of the queues getting packets. Each Rx Queue should be getting its own CPU.
  • Once packets get into the Ring Buffer and passed through to the VM (poll mode driver picks the packets up off the Ring), they hit the virtual switch.
    • And the switch is dropping some packets.
    • Virtual switches are software. As such, they need to be tuned to stretch their capability to keep up with what legacy hardware switches can do.
      • The NSX-T switch is a powerful switch, but is also a newer virtual switch, more bleeding edge in terms of technology. 
      • I wonder if we are running the latest greatest version of this switch, and if that could help us here.

 

Now, I looked even deeper into the E-NVDS switch. I went into vsish shell, and started examining any and all statistics that are captured by that networking stack.

 

Since we are concerned with receives, I looked at the InputStats specifically. I noticed there are several filters – which, I presume is tied to a VMWare Packet Filtering flow, analogous to Netfilter in Linux, or perhaps Berkeley Packet Filter. But, I have no documentation whatsoever on this, and can’t find any, so I did my best to “back into” what I was seeing.

 

I see the following filters that packets can traverse – traceflow might be packet capture but not sure aside of that.

·         ens-slowpath-input

·         traceflow-Uplink-Input:0x43110ae01630
·         vdl2-uplink-in:0x431e78801dd0
·         UplinkDoSwLRO@vmkernel#nover
·         VdrUplinkInput

 

If we go down into the filters and print the stats out, most of the stats seem to line up (started=passed, etc) except this one, which has drops in it:

/net/portsets/DvsPortset-0/ports/2214592529/inputFilters/vdl2-uplink-in/> cat stats
packet stats {
   pktsIn:31879020
   pktsOut:24269629
   pktsDropped:7609391
}

/net/portsets/DvsPortset-0/ports/2214592527/inputFilters/vdl2-uplink-in/> cat stats
packet stats {
   pktsIn:24817038
   pktsOut:17952829
   pktsDropped:6864209
}
 

That seems like a lot of dropped packets to me (a LOT more than those Rx Missed errors), so this looks like something we need to work with VMWare on because if I understand these stats properly, this suggests an issue on the virtual switch more than the adaptor itself.

 

Another thing I saw, poking around, was this interesting looking WRONG_VNIC on passthrough status on vmnic3 and vmnic5, the two nics being used in the test here. I think we should maybe ask VMWare about this and run this down also.

 

/net/portsets/DvsPortset-0/ports/2214592527/> cat status
port {
   port index:15
   vnic index:0xffffffff
   portCfg:
   dvPortId:4dfdff37-e435-4ba4-bbff-56f36bcc0779
   clientName:vmnic3
   clientType: 4 -> Physical NIC
   clientSubType: 0 -> NONE
   world leader:0
   flags: 0x460a3 -> IN_USE ENABLED UPLINK DVS_PORT DISPATCH_STATS_IN DISPATCH_STATS_OUT DISPATCH_STATS CONNECTED
   Impl customized blocked flags:0x00000000
   Passthru status: 0x1 -> WRONG_VNIC
   fixed Hw Id:40:a6:b7:51:56:e9:
   ethFRP:frame routing {
      requested:filter {
         flags:0x00000000
         unicastAddr:00:00:00:00:00:00:
         numMulticastAddresses:0
         multicastAddresses:
         LADRF:[0]: 0x0
         [1]: 0x0
      }
      accepted:filter {
         flags:0x00000000
         unicastAddr:00:00:00:00:00:00:
         numMulticastAddresses:0
         multicastAddresses:
         LADRF:[0]: 0x0
         [1]: 0x0
      }
   }
   filter supported features: 0 -> NONE
   filter properties: 0 -> NONE
   rx mode: 0 -> INLINE
   tune mode: 2 -> invalid
   fastpath switch ID:0x00000000
   fastpath port ID:0x00000004
}

Monday, October 4, 2021

The first Accelerated VNF on our NFV platform

 I haven't posted anything since April but that isn't because I haven't been busy.

We have our new NFV Platform up and running, and it is NOT on OpenStack. It is NOT on VMWare VIO. It also, is NOT on VMWare Telco Cloud!

We are using ESXi, vCenter, NSX-T for the SD-WAN, and Morpheus as a Cloud Management solution. Morpheus has a lot of different integrations, and a great user interface that gives tenants a place to log in and call home and self-manage their resources.

The diagram below depicts what this looks like from a Reference Architecture perspective.

The OSS, which is not covered in the diagram, is a combination of Zabbix and VROPS, both working in tandem to ensure that the clustered hosts and management functions are behaving properly.

The platform is optimized with E-NVDS, which is also referred to commonly as Enhanced Datapath which requires special DPDK drivers to be loaded on the ESXi hosts, for starters, as well as some configuration in the hypervisors. There are also settings to be made in the hypervisors to ensure that the E-NVDS is configured properly (separate upcoming post).

Now that the platform is up and running, it is time to start discussing workload types. There are a number of Workload Categories that I tend to use:

  1. Enterprise Workloads - Enterprise Applications, 3-Tier Architectures, etc.
  2. Telecommunications Workloads
    • Control Plane Workloads
    • Data Plane Workloads

Control Plane workloads are have more tolerances for latency and system resources than Data Plane Workloads do. 

Why? Because Control Plane workloads are typically TCP-based,  frequently use APIs (RESTful),  and tend to be more periodic in their behavior (periodic updates).  Most of the time, when you see issues related to Control Plane, it is related to back-hauling a lot of measurements and statistics (Telemetry Data). But generally speaking, this data in of itself does not have stringent requirements.

From a VM perspective, there are a few key things you need to do to ensure your VNF behaves as a true VNF and not as a standard workload VM. These include:

  • setting Latency Sensitivity to High, which turns off interrupts and ensures that poll mode drivers are used.
  • Enable Huge Pages on the VM by going into VM Advanced Settings and adding the parameter: sched.mem.lpage.enable1GHugePage = TRUE

Note: Another setting worth checking, although we did not actually set this parameter ourselves, is: sched.mem.pin = TRUE

Note: Another setting, sched.mem.maxmemctl ensures that ballooing is turned off. We do NOT have this setting, but it was mentioned to us, and we are researching this setting.

One issue we seemed to continually run into, was a vCenter alert called Virtual Machine Memory Usage, displaying in vCenter as a red banner with "Acknowledge and Reset to Green" links. The VM was in fact running, but vCenter seemed to have issues with it. The latest change we made that seems to have fixed this error, was to check the "Reserve all guest memory (All locked)" option checkbox.

This checkbox to Reserve all guest memory seemed intimidating at first, because the concern was that the VM could reserve all memory on the host. That is NOT what this setting does!!! What it does, is allow the VM to reserve all of its memory up-front - but just the VM memory that is specified (i.e. 24G). If the VM has has HugePages enabled, it makes sense that one would want the entire allotment of VM memory to  memory to be reserved up front and be contiguous. When we enabled this, our vCenter alerts disappeared.

Lastly, we decided to change DRS to Manual in VM Overrides. To find this setting amongst the huge number of settings hidden in vCenter, you go to the Cluster (not the Host, not the VM, not the Datacenter) and the option for VM Overrides is there, and you have four options:

  • None
  • Manual
  • Partial
  • Full

The thinking here, is that VMs with complex settings may not play well with vMotion. I will be doing more research on DRS for VNFs before considering setting this (back) to Partial or Full.

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...