Friday, November 15, 2019

Analyzing Dropped Packets


I recently saw an alert, which created a ticket, for "over 100 dropped packets in 10s".

I thought this was interesting:

  • What KIND of packets?
  • What if the drops are intentional drops?
  • Is 100 dropped packets a LOT? 
    • How often is this happening?
Lots of questions.

This got me into taking a quick look on a number of different Linux hosts, to see what the drop situation looked like on certain interfaces.

I noticed that most drops on one sample of hosts were Rx packets.
I noticed that drops on most hosts, were Tx packets.

In looking at netstat -s, you can get an amazing picture of exactly why packets are being dropped on a Linux host. It could be related to congestion control, like a socket buffer overrun (applications cannot read fast enough due to high CPU perhaps). Or, it could be dropped because it was supposed to be dropped - maybe there is a checksum error, windowing error, or a packet that should never have arrived in the first place.

One Network Engineer mentioned to me that some packets are dropped due to Packet Cloning, or Packet Redundancy features. These features were enabled so that far-end routers and switches that lost a packet (for one reason or another) close to the destination, didn't have to truck it all the way back to the source for a re-send.  But when this feature is used, you can get a lot of dropped packets due to "de-dupping". This could create a false positive. Juniper has a document that describes their Packet Redundancy, or Packet Protection, feature:


Interesting. Worth mentioning, or blogging about. Juniper has a

This link below is also interesting when it comes to finding out how to debug dropped packets.
https://community.pivotal.io/s/article/Network-Troubleshooting-Guide

Here is another interesting link on same topic.
https://jvns.ca/blog/2017/09/05/finding-out-where-packets-are-being-dropped/

No comments:

NUMA on VM a Hyperthread-Enabled Server

This could be a long post, because things like NUMA can get complicated. For background, we are running servers - hypervisors - that have 24...