Tuesday, January 17, 2023

Trying to get RSS (Receive Side Scaling) to work on an Intel X710 NIC

 

Cisco M5 server, with 6 nics on it. The first two are 1G nics that are unused. 

The last 4, are:

  • vmnic2 - 10G nic, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up

  • vmnic3 - 10G nic, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up

  • vmnic4 - 10G nic, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up

  • vmnic5 - 10G nic, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up

Worth mentioning:

  • vmnic 2 and 4 are uplinks, using a standard Distributed Switch (virtual switch) for those uplinks.

  • vmnic 3 and 5 are connected to an N-VDS virtual switch (used with NSX-T) and don't have uplinks.

In ESXi (VMWare Hypervisor, v7.0), we have set the RSS values accordingly:

UPDATED: how we set the RSS Values!

first, make sure that RSS parameters are unset. Because DRSS and RSS should not be set together. 

> esxcli system module parameters set -m -i40en -p RSS=""

next, make sure that DRSS parameters are set. We are setting to 4 Rx queues per relevant vmnic.

esxcli system module parameters set -m -i40en -p DRSS=4,4,4,4 

now we list the parameters to ensure they took correctly

> esxcli system module parameters list -m i40en
Name           Type          Value    Description
-------------  ------------  -------  -----------
DRSS           array of int           Enable/disable the DefQueue RSS(default = 0 )
EEE            array of int           Energy Efficient Ethernet feature (EEE): 0 = disable, 1 = enable, (default = 1)
LLDP           array of int           Link Layer Discovery Protocol (LLDP) agent: 0 = disable, 1 = enable, (default = 1)
RSS            array of int  4,4,4,4  Enable/disable the NetQueue RSS( default = 1 )
RxITR          int                    Default RX interrupt interval (0..0xFFF), in microseconds (default = 50)
TxITR          int                    Default TX interrupt interval (0..0xFFF), in microseconds, (default = 100)
VMDQ           array of int           Number of Virtual Machine Device Queues: 0/1 = disable, 2-16 enable (default =8)
max_vfs        array of int           Maximum number of VFs to be enabled (0..128)
trust_all_vfs  array of int           Always set all VFs to trusted mode 0 = disable (default), other = enable

But, we are seeing this when we look at the individual adaptors in the ESXi kernel:

> vsish -e get /net/pNics/vmnic3/rxqueues/info
rx queues info {
   # queues supported:1
   # rss engines supported:0
   # filters supported:0
   # active filters:0
   # filters moved by load balancer:0
   RX filter classes: 0 -> No matching defined enum value found.
   Rx Queue features: 0 -> NONE
}

Nics 3 and 5, connected to the N-VDS virtual switch, only get one single Rx Queue supported, even though the kernel module is configured properly.

> vsish -e get /net/pNics/vmnic2/rxqueues/info
rx queues info {
   # queues supported:9
   # rss engines supported:1
   # filters supported:512
   # active filters:0
   # filters moved by load balancer:0
   RX filter classes: 0x1f -> MAC VLAN VLAN_MAC VXLAN Geneve
   Rx Queue features: 0x482 -> Pair Dynamic GenericRSS

But Nics 2 and 4, which are connected to the standard distributed switch, have 9 Rx Queues configured properly.

Is this related to the virtual switch we are connecting to (meaning we need to be looking at VMWare)? Or, is this somehow related to the i40en driver that is being used (in which case we need to be going to server vendor or Intel who makes the XL710 nic)?

Friday, January 13, 2023

Debugging Dropped Packets on NSX-T E-NVDS

 

Inside the hypervisor, we have the following nics:

The servers have physical nics as follows:

  • vmnic0 – 1G nic, Intel X550 – Unused
  • vmnic1 – 1G nic, Intel X550 - Unused
  • vmnic2 - 10G nic, SFP+, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up
  • vmnic3 - 10G nic, SFP+, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up
  • vmnic4 - 10G nic, SFP+, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up
  • vmnic5 - 10G nic, SFP+, Intel XL710, driver version 2.1.5.0 FW version 8.50, link state up

The nics connect to the upstream switches (Aristas), and they connect virtually to the virtual switches (discussed right below):

 

Inside Hypervisor (Host 5 in this specific case):

Distributed vSwitch

                Physical Nic Side: vmnic2 and vmnic4

                Virtual Side: vmk0 (VLAN 3850) and vmk1 (VLAN 3853)

 

NSX-T Switch (E-NVDS)

                Physical NIC side: vmnic3 and vmnic5 à this is the nic that gets hit when we run the load tests

                Virtual Side: 50+ individual segments that VMs connect to, and get assigned a port

 

Now, in my previous email, I dumped the stats for the physical NIC – meaning, from the “NIC Itself” from the ESXi OS operating system.

 

But, it is wise also, to take a look at the stats of the physical nic from the perspective of the virtual switch! Remember, vmnic5 is a port on the virtual switch!

 

So first, we need to figure out what port we need to look at:
net-stats -l

PortNum          Type SubType SwitchName       MACAddress         ClientName

2214592527          4       0 DvsPortset-0     40:a6:b7:51:56:e9  vmnic3

2214592529          4       0 DvsPortset-0     40:a6:b7:51:1b:9d  vmnic5 à here we go, port 2214592529 on switch DvsPortset-0 is the port of interest

67108885            3       0 DvsPortset-0     00:50:56:65:96:e4  vmk10

67108886            3       0 DvsPortset-0     00:50:56:65:80:84  vmk11

67108887            3       0 DvsPortset-0     00:50:56:66:58:98  vmk50

67108888            0       0 DvsPortset-0     02:50:56:56:44:52  vdr-vdrPort

67108889            5       9 DvsPortset-0     00:50:56:8a:09:15  DEV-ISC1-Vanilla3a.eth0

67108890            5       9 DvsPortset-0     00:50:56:8a:aa:3f  DEV-ISC1-Vanilla3a.eth1

67108891            5       9 DvsPortset-0     00:50:56:8a:9d:b1  DEV-ISC1-Vanilla3a.eth2

67108892            5       9 DvsPortset-0     00:50:56:8a:d9:65  DEV-ISC1-Vanilla3a.eth3

67108893            5       9 DvsPortset-0     00:50:56:8a:fc:75  DEV-ISC1-Vanilla3b.eth0

67108894            5       9 DvsPortset-0     00:50:56:8a:7d:cd  DEV-ISC1-Vanilla3b.eth1

67108895            5       9 DvsPortset-0     00:50:56:8a:d4:d8  DEV-ISC1-Vanilla3b.eth2

67108896            5       9 DvsPortset-0     00:50:56:8a:67:6f  DEV-ISC1-Vanilla3b.eth3

67108901            5       9 DvsPortset-0     00:50:56:8a:32:1c  DEV-MSC1-Vanilla3b.eth0

67108902            5       9 DvsPortset-0     00:50:56:8a:e6:2b  DEV-MSC1-Vanilla3b.eth1

67108903            5       9 DvsPortset-0     00:50:56:8a:cc:eb  DEV-MSC1-Vanilla3b.eth2

67108904            5       9 DvsPortset-0     00:50:56:8a:7a:83  DEV-MSC1-Vanilla3b.eth3

67108905            5       9 DvsPortset-0     00:50:56:8a:63:55  DEV-MSC1-Vanilla3a.eth3

67108906            5       9 DvsPortset-0     00:50:56:8a:40:9c  DEV-MSC1-Vanilla3a.eth2

67108907            5       9 DvsPortset-0     00:50:56:8a:57:8f  DEV-MSC1-Vanilla3a.eth1

67108908            5       9 DvsPortset-0     00:50:56:8a:5b:6d  DEV-MSC1-Vanilla3a.eth0

 

/net/portsets/DvsPortset-0/ports/2214592529/> cat stats

packet stats {

   pktsTx:10109633317

   pktsTxMulticast:291909

   pktsTxBroadcast:244088

   pktsRx:10547989949 à total packets RECEIVED on vmnic5’s port on the virtual switch

   pktsRxMulticast:243731083

   pktsRxBroadcast:141910804

   droppedTx:228

   droppedRx:439933 à This is a lot more than the 3,717 Rx Missed errors, and probably accounts for why MetaSwitch sees more drops than we saw up to this point!

}

 

So – we have TWO things now to examine here.

  • Is the Receive Side Scaling configured properly and working? 
    • We configured it, but…we need to make sure it is working and working properly.
    • We don’t see all of the queues getting packets. Each Rx Queue should be getting its own CPU.
  • Once packets get into the Ring Buffer and passed through to the VM (poll mode driver picks the packets up off the Ring), they hit the virtual switch.
    • And the switch is dropping some packets.
    • Virtual switches are software. As such, they need to be tuned to stretch their capability to keep up with what legacy hardware switches can do.
      • The NSX-T switch is a powerful switch, but is also a newer virtual switch, more bleeding edge in terms of technology. 
      • I wonder if we are running the latest greatest version of this switch, and if that could help us here.

 

Now, I looked even deeper into the E-NVDS switch. I went into vsish shell, and started examining any and all statistics that are captured by that networking stack.

 

Since we are concerned with receives, I looked at the InputStats specifically. I noticed there are several filters – which, I presume is tied to a VMWare Packet Filtering flow, analogous to Netfilter in Linux, or perhaps Berkeley Packet Filter. But, I have no documentation whatsoever on this, and can’t find any, so I did my best to “back into” what I was seeing.

 

I see the following filters that packets can traverse – traceflow might be packet capture but not sure aside of that.

·         ens-slowpath-input

·         traceflow-Uplink-Input:0x43110ae01630
·         vdl2-uplink-in:0x431e78801dd0
·         UplinkDoSwLRO@vmkernel#nover
·         VdrUplinkInput

 

If we go down into the filters and print the stats out, most of the stats seem to line up (started=passed, etc) except this one, which has drops in it:

/net/portsets/DvsPortset-0/ports/2214592529/inputFilters/vdl2-uplink-in/> cat stats
packet stats {
   pktsIn:31879020
   pktsOut:24269629
   pktsDropped:7609391
}

/net/portsets/DvsPortset-0/ports/2214592527/inputFilters/vdl2-uplink-in/> cat stats
packet stats {
   pktsIn:24817038
   pktsOut:17952829
   pktsDropped:6864209
}
 

That seems like a lot of dropped packets to me (a LOT more than those Rx Missed errors), so this looks like something we need to work with VMWare on because if I understand these stats properly, this suggests an issue on the virtual switch more than the adaptor itself.

 

Another thing I saw, poking around, was this interesting looking WRONG_VNIC on passthrough status on vmnic3 and vmnic5, the two nics being used in the test here. I think we should maybe ask VMWare about this and run this down also.

 

/net/portsets/DvsPortset-0/ports/2214592527/> cat status
port {
   port index:15
   vnic index:0xffffffff
   portCfg:
   dvPortId:4dfdff37-e435-4ba4-bbff-56f36bcc0779
   clientName:vmnic3
   clientType: 4 -> Physical NIC
   clientSubType: 0 -> NONE
   world leader:0
   flags: 0x460a3 -> IN_USE ENABLED UPLINK DVS_PORT DISPATCH_STATS_IN DISPATCH_STATS_OUT DISPATCH_STATS CONNECTED
   Impl customized blocked flags:0x00000000
   Passthru status: 0x1 -> WRONG_VNIC
   fixed Hw Id:40:a6:b7:51:56:e9:
   ethFRP:frame routing {
      requested:filter {
         flags:0x00000000
         unicastAddr:00:00:00:00:00:00:
         numMulticastAddresses:0
         multicastAddresses:
         LADRF:[0]: 0x0
         [1]: 0x0
      }
      accepted:filter {
         flags:0x00000000
         unicastAddr:00:00:00:00:00:00:
         numMulticastAddresses:0
         multicastAddresses:
         LADRF:[0]: 0x0
         [1]: 0x0
      }
   }
   filter supported features: 0 -> NONE
   filter properties: 0 -> NONE
   rx mode: 0 -> INLINE
   tune mode: 2 -> invalid
   fastpath switch ID:0x00000000
   fastpath port ID:0x00000004
}

Tuesday, January 10, 2023

VMWare NSX-T Testing - Dropped Packets

We have been doing some performance testing with a voice system.

In almost all cases, these tests are failing. They are failing for two reasons:

  1. Rx Missed counters on the physical adaptors of the hypervisors that are used to send the test traffic. These adaptors are connected to the E-NVDS virtual switch on one side, and to an upstream Arista data center switch on the other.
  2. Dropped Packets - mostly media (RTP UDP), with less than 2% of the drops being RTCP traffic (TCP).

Lately, I used the"Performance Best Practices for VMWare vSphere 7.0" guide, as a method for trying to improve the dropped packets were seeing.

We attempted several things that were mentioned in this document:

 

  • ESxi NiC - enable Receive Side Scaling (RSS)
    • Actually, to be technical, we enabled DRSS (Default Queue RSS) rather than the RSS (NetQ RSS) which the i40en driver also supported for this Intel X710 adaptor.
  • LatencySensitivity=High - and we checked "Reserve all Memory" on the checkbox
  • Interrupt Coalescing
    • Disabling it, to see what affect disabling it had
    • Setting it from its rate-based scheme (the default, rbc) to static with 64 packets per interrupt

We didn't really see any noticeable improvement from the Receive Side Scaling or the Latency Sensitivity settings, which was a surprise, actually. We did see some perhaps minor improvement on the interrupt coalescing when we set it to static.

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...