To make a disaster, you need a perfect storm of mistakes. A single mistake is insufficient, but a stack of sub-obtimal configuration choices stacked together will give you rampant failure and poor network quality. For today’s mistake we have massive and erratic packet duplication leading to random drops. We checked for STP loops, but eventually found it was caused by a combination of:
- A large VLAN in a data center with different IP ranges on the same physical subnet. Not a problem, you would think.
- Unicast flooding in the switch: a design feature.
- Packet forwarding of a single machine (acting as a router). Also a feature.
- Packet bridging on that same machine (acting as a bridge). Also a feature.
When a switch wants to send a packet, it sends it to the port where the MAC address was last seen from. If the MAC address has not recently been seen, or has just timed out of the CAM table (which maps MAC addresses to ports), then it has two choices:
- Drop the packet
- Broadcast the packet
Broadcasting is the smart choice, and is known as unicast flooding. Generally the packet solicits a reply, so the flooding is temporary.
In today’s disaster, the packet bridging implementation seems to rely on switching the network card into promiscuous mode. Now what happens when a unicast flooding packet is received for a subnet that the machine does not know about:
- Because the network interface card is in promiscuous mode to pick up bridged traffic, the packet is not dropped.
- Since the destination of the packet is not local, it is considered for routing.
- The machine happens to have a route to the destination so it retransmits the packet to the gateway – the switch again.
- The packet is received by the switch for routing, and it figures out that it must send it to a particular MAC address … for which it still does not have an entry in its CAM table.
- The switch does unicast flooding in the hope of getting the packet to its destination.
- After about 20 retransmissions, a packet is dropped, and things go quiet. If you’re lucky (ha ha) the recipient machine manages to transmit a reply, and the CAM table is updated. Unicast flooding stops, and the duplication stops, until the next crazy packet hits the system.
The “router” system which delivered this kind of behaviour to us is some kind of Microsoft windows box running some kind of virtualisation software – maybe hyper-v or worse. It has been unplugged, and will return with a firewall protecting it from the big nasty switch, or with some other fixes.
To detect this condition, you need to look for an abundance of unicast packets. You can do the following on an otherwise quiet box (that’s what the “port not 22” is about) and look for duplicates:
# tcpdump -i eth0 port not 22 and not multicast -c 100 -en tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 10:10:00.990296 1c:17:d3:xx:xx:c2 > 00:21:5e:xx:xx:e5, ethertype IPv4 (0x0800), length 98: xxx.xx.xx7.118 > xxx.xx.x0.40: ICMP echo reply, id 40490, seq 8510, length 64 10:10:00.990573 1c:17:d3:xx:xx:c2 > 00:21:5e:xx:xx:e5, ethertype IPv4 (0x0800), length 98: xxx.xx.xx7.118 > xxx.xx.x0.40: ICMP echo reply, id 40490, seq 8510, length 64 ...and a lot more duplicates than that...
Fixing it was a matter of removing the most culpable element of the chain (anything running Microsoft).