Showing posts with label packet drops. Show all posts
Showing posts with label packet drops. Show all posts

Friday, November 15, 2019

High Packet Loss in the Tx of TAP Interfaces



I was seeing some bond interfaces that had high dropped counts, but these were all Rx drops.

I noticed that the tap interfaces on OpenStack compute hosts - which were hooked to OpenContrail's vRouter - had drops on the Tx.

So, in trying to understand why we would be dropping packets on Tap interfaces, I did some poking around and found this link.

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/ovs-dpdk_end_to_end_troubleshooting_guide/high_packet_loss_in_the_tx_queue_of_the_instance_s_tap_interface

From this, article, an excerpt:
"TX drops occur because of interference between the instance’s vCPU and other processes on the hypervisor. The TX queue of the tap interface is a buffer that can store packets for a short while in case that the instance cannot pick up the packets. This would happen if the instance’s CPU is prevented from running (or freezes) for a long enough time."

The article goes on and elaborates on diagnosis, and how to fix by adjusting the Tx Queue Length.

Analyzing Dropped Packets


I recently saw an alert, which created a ticket, for "over 100 dropped packets in 10s".

I thought this was interesting:

  • What KIND of packets?
  • What if the drops are intentional drops?
  • Is 100 dropped packets a LOT? 
    • How often is this happening?
Lots of questions.

This got me into taking a quick look on a number of different Linux hosts, to see what the drop situation looked like on certain interfaces.

I noticed that most drops on one sample of hosts were Rx packets.
I noticed that drops on most hosts, were Tx packets.

In looking at netstat -s, you can get an amazing picture of exactly why packets are being dropped on a Linux host. It could be related to congestion control, like a socket buffer overrun (applications cannot read fast enough due to high CPU perhaps). Or, it could be dropped because it was supposed to be dropped - maybe there is a checksum error, windowing error, or a packet that should never have arrived in the first place.

One Network Engineer mentioned to me that some packets are dropped due to Packet Cloning, or Packet Redundancy features. These features were enabled so that far-end routers and switches that lost a packet (for one reason or another) close to the destination, didn't have to truck it all the way back to the source for a re-send.  But when this feature is used, you can get a lot of dropped packets due to "de-dupping". This could create a false positive. Juniper has a document that describes their Packet Redundancy, or Packet Protection, feature:


Interesting. Worth mentioning, or blogging about. Juniper has a

This link below is also interesting when it comes to finding out how to debug dropped packets.
https://community.pivotal.io/s/article/Network-Troubleshooting-Guide

Here is another interesting link on same topic.
https://jvns.ca/blog/2017/09/05/finding-out-where-packets-are-being-dropped/

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...