Netgear STP bug (or something)

I found a fun bug today. We have a stack of netgear switches in our office – and we keep getting disconnected at odd times. I already found a switch which did not have STP enabled, and turned that on, and rearranged the physical links so that we would be doing slightly more star topology and slightly less daisy chain, but there was one link that stubbornly refused to be moved — when it moved, the network went down. That was not cool. On my second time around, I had a look at the actual STP packets the switches were sending out:

I got this downstairs:

06:27:15.376168 STP 802.1w, Rapid STP, Flags [Learn, Forward, Agreement], bridge-id 8000.20:e5:2a:52:46:c8.801c, length 43
        message-age 0.00s, max-age 20.00s, hello-time 2.00s, forwarding-delay 15.00s
        root-id 8000.20:e5:2a:52:46:c8, root-pathcost 0, port-role Designated

I got this upstairs:

06:23:58.609601 STP 802.1w, Rapid STP, Flags [Topology change, Learn, Forward, Agreement], bridge-id 8000.e0:91:f5:ba:fb:ad.8005, length 43
        message-age 0.00s, max-age 20.00s, hello-time 2.00s, forwarding-delay 15.00s
        root-id 8000.e0:91:f5:ba:fb:ad, root-pathcost 0, port-role Designated

And this was odd, because the two networks are connected, so they should have the same root brigde (root-id). Instead, each side believes they are the best and greatest STP root. When I tried to move a link to our “core” switch, that link was disconnected. It was rather frustrating.

I (eventually) changed the STP priority for the core switch from the default 32768 to 16384, and voilla! it worked. Suddenly both sides of the network agree on what the STP root-id is:

06:33:57.396606 STP 802.1w, Rapid STP, Flags [Learn, Forward, Agreement], bridge-id 4000.20:e5:2a:52:46:c8.801c, length 43
        message-age 0.00s, max-age 20.00s, hello-time 2.00s, forwarding-delay 15.00s
        root-id 4000.20:e5:2a:52:46:c8, root-pathcost 0, port-role Designated

And now I moved the link to get a more sane topology, and that worked too.

I speculate that this happened because the netgear RSTP implementation does some improper comparison of the MAC addresses, and multiple devices end up getting what they believe is the best score.

The moral of the story is: don’t rely on the defaults to work — always set the STP Bridge Priority: Switching -> Advanced -> CST Configuration -> Bridge Priority -> your number here.

This entry was posted in Stuff and tagged , , , , , . Bookmark the permalink.