[Voyage-linux] ICMP dest unreachable broadcast storm on wlan0

Fri Sep 29 07:02:43 HKT 2006

Andrew,

It's certainly possible.  I had DHCP problems with Netgear routers if 
they were offline long enough to all want an address at once though that 
went away when I upgraded to a newer DHCP server.  In this case, the 
problem is either at or behind this one client so it's alittle harder to 
tell at this point.  I have plans to replace the CB3 in AP mode with a 
WRAP soon so will have better visibility into that 20 customer segment.

Thanks for the comments.

Edwin

Andrew Niemantsverdriet wrote:
> Does this client happen to have an older linksys or dlink SOHO router? We
> have seen this (although not with voyage) with people that have older
> routers. It is a known problem and a new router or firmware upgrade has
> fixed it. It has been a while since I have had to deal with this problem
> so the details are a bit fuzzy but basiclly the router just starts
> freaking out for some reason and then it is a chain reaction where all the
> other routers freak out as well. Not sure if this is your exact problem
> but it is something we had to deal with once.
>
>   
>> The "boot everyone off and add back one at a time" approach seems to
>> have gotten things working again though it has done so without
>> specifically identifying the culprit, i.e. *all* customers' radios have
>> now re-associated and the storm has not reappeared.
>>
>> My procedure was as follows:
>>
>> iwlist wlan0 scan|grep Address|tr -s ' ' ' '|cut -d' ' -f6
>>
>> returns the list of associated clients so the command:
>>
>> for mac in `iwlist wlan0 scan|grep Address|tr -s ' ' ' '|cut -d' ' -f6`;
>> do iwpriv wlan0 kickmac $mac;
>> done
>>
>> kicks everyone off.
>>
>> Using the maccmd tools: (http://madwifi.org/wiki/UserDocs/iwpriv)
>>
>> iwpriv wlan0 maccmd 3
>> iwpriv wlan0 maccmd 1 (whitelist)
>>
>> run the "for" loop above and Voila!, no associated clients (and no
>> broadcast storm either)
>>
>> iwpriv wlan0 addmac <mac address> (cut and paste each MAC in turn from
>> the scan command above run in a different window)
>>
>> all the while runnng tpcdump -i wlan0 proto ICMP in yet another window
>> and see which client added back in causes the storm to reappear.
>>
>> If/when it reappears, run
>>
>> iwpriv wlan0 delmac <mac address you just added>
>>
>> follwed by the kickmac for loop to see if the problem goes away (it
>> should).
>>
>> Add all the MAC addresses until you are back up.
>>
>> The real puzzle remains however in that while kicking all the clients
>> off and adding them back one at a time allowed me to finally isolate the
>> source of the traffic, I did actually add the offending client a few
>> minutes later and the problem did *not* reappear.  Why?  This particular
>> client actually has another AP behind it with about 20 clients so it was
>> important to get the connection there back up.
>>
>> I'd like to find out why, of course, but in any case, I've learned a
>> method to get things back on line and hope it might help someone else as
>> well.
>>
>> Edwin
>>
>> Edwin Whitelaw wrote:
>>     
>>> The DoS ICMP storm has now been going for 4+ hrs so I've had the
>>> opportunity to collect data and try different things.
>>>
>>> 1) Tried several versions of the Prism firmware, including 1.7.4 and
>>> 1.8.4.  I have been running 1.8.2 for almost a year w/o problems.
>>> Iptraf shows the incoming rate at ~150kbs and it was interesting that
>>> 1.8.4 slowed that to ~10kbs but functionally still did not allow
>>> association from my clients.  Broadcast storm aside, I have not found
>>> 1.8.4 to work, hence the 1.8.2 currently installed.
>>>
>>> 2) iwlist wlan0 scan shows everyone there and with good signal.
>>>
>>> 3)  tcpdump -i wlan0 proto ICMP shows around 650 packets/second
>>>
>>> 4) Here are some snippets from iptraf, then tcpdump, and finally
>>> iptraf in LAN Station monitor mode
>>>
>>> ICMP dest unrch (proto) (56 bytes) from 0.0.0.0 to 172.16.8.1 on wlan0 ?
>>> ? ICMP dest unrch (proto) (56 bytes) from 0.0.0.0 to 172.16.8.1 on
>>> wlan0 ?
>>> ? ICMP dest unrch (proto) (56 bytes) from 0.0.0.0 to 172.16.8.1 on
>>> wlan0 ?
>>> ? ICMP dest unrch (proto) (56 bytes) from 0.0.0.0 to 172.16.8.1 on
>>> wlan0 ?
>>> ? ICMP dest unrch (proto) (56 bytes) from 0.0.0.0 to 172.16.8.1 on
>>> wlan0 ?
>>> ? ICMP dest unrch (proto) (56 bytes) from 0.0.0.0 to 172.16.8.1 on
>>> wlan0 ?
>>> ? ICMP dest unrch (proto) (56 bytes) from 0.0.0.0 to 172.16.8.1 on
>>> wlan0 ?
>>> ? ICMP dest unrch (proto) (56 bytes) from 0.0.0.0 to 172.16.8.1 on
>>> wlan0 ?
>>> ? ICMP dest unrch (proto) (56 bytes) from 0.0.0.0 to 172.16.8.1 on wlan0
>>>
>>> tcpdump -i wlan0 proto ICMP
>>>
>>> 12:10:32.286597 IP 0.0.0.0 > 172.16.8.1: ICMP OSPF-ALL.MCAST.NET
>>> protocol 89 unreachable, length 36
>>> 12:10:32.286679 IP 0.0.0.0 > 172.16.8.1: ICMP OSPF-ALL.MCAST.NET
>>> protocol 89 unreachable, length 36
>>> 12:10:32.287238 IP 0.0.0.0 > 172.16.8.1: ICMP OSPF-ALL.MCAST.NET
>>> protocol 89 unreachable, length 36
>>> 12:10:32.287317 IP 0.0.0.0 > 172.16.8.1: ICMP OSPF-ALL.MCAST.NET
>>> protocol 89 unreachable, length 36
>>> 12:10:32.288810 IP 0.0.0.0 > 172.16.8.1: ICMP OSPF-ALL.MCAST.NET
>>> protocol 89 unreachable, length 36
>>> 12:10:32.288886 IP 0.0.0.0 > 172.16.8.1: ICMP OSPF-ALL.MCAST.NET
>>> protocol 89 unreachable, length 36
>>> 12:10:32.293616 IP 0.0.0.0 > 172.16.8.1: ICMP OSPF-ALL.MCAST.NET
>>> protocol 89 unreachable, length 36
>>> 12:10:32.293711 IP 0.0.0.0 > 172.16.8.1: ICMP OSPF-ALL.MCAST.NET
>>> protocol 89 unreachable, length 36
>>> 12:10:32.294479 IP 0.0.0.0 > 172.16.8.1: ICMP OSPF-ALL.MCAST.NET
>>> protocol 89 unreachable, length 36
>>> 12:10:32.294613 IP 0.0.0.0 > 172.16.8.1: ICMP OSPF-ALL.MCAST.NET
>>> protocol 89 unreachable, length 36
>>> 12:10:32.296556 IP 0.0.0.0 > 172.16.8.1: ICMP OSPF-ALL.MCAST.NET
>>> protocol 89 unreachable, length 36
>>>
>>> Note:  I have both shutdown quagga/ospfd and also disabled it and
>>> rebooted without affecting the broadcast storm so do not believe it is
>>> my own ospf process at fault.
>>>
>>> tcpdump with -v (verbose)
>>>
>>> 12:40:15.750062 IP (tos 0x0, ttl 122, id 49021, offset 0, flags
>>> [none], proto: ICMP (1), length: 56) 0.0.0.0 > 172.16.8.1: ICMP
>>> OSPF-ALL.MCAST.NET protocol 89 unreachable, length 36
>>> IP (tos 0xc0, ttl 1, id 49021, offset 0, flags [none], proto: OSPF
>>> (89), length: 64) 172.16.8.1 > OSPF-ALL.MCAST.NET: [|ospf]
>>> 12:40:15.751592 IP (tos 0x0, ttl 123, id 49021, offset 0, flags
>>> [none], proto: ICMP (1), length: 56) 0.0.0.0 > 172.16.8.1: ICMP
>>> OSPF-ALL.MCAST.NET protocol 89 unreachable, length 36
>>> IP (tos 0xc0, ttl 1, id 49021, offset 0, flags [none], proto: OSPF
>>> (89), length: 64) 172.16.8.1 > OSPF-ALL.MCAST.NET: [|ospf]
>>>
>>> iptraf LAN station monitor (sorry the formatting got scrogged.  This
>>> is sorted by byte count. The important part is the all 1s MAC address
>>> of all f's and the next one is the MAC of the wlan0 radio itself)
>>>
>>> IPTraf
>>> ??????? PktsIn ???????? IP In ????? BytesIn ????? InRate ???? PktsOut
>>> ??????? IP Out ???? BytesOut ??? OutRate ??????
>>> ? Ethernet HW addr: ffffffffffff on wlan0 ?
>>> ? 113097 0 6339844 147.6 0 0 0 0.0 ?
>>> ? Ethernet HW addr: 00026f42bca9 on wlan0 ?
>>> ? ? 9771 0 583293 8.4 1088 0 220688 0.4 ?
>>> ? Ethernet HW addr: 00026f3ab2f3 on wlan0 ?
>>> ? ? 102 0 57119 0.0 55 0 2641 0.0 ?
>>> ? Ethernet HW addr: 00026f3aaf4c on wlan0 ?
>>> ? ? 148 0 48995 0.0 153 0 19278 0.0 ?
>>> ? Ethernet HW addr: 00026f40e125 on wlan0 ?
>>> ? ? 127 0 24565 0.0 1818 0 113522 1.4 ?
>>>
>>> 5) Neither
>>>
>>> ebtables -A INPUT -i wlan0 -s FF:FF:FF:FF:FF:FF  -j DROP
>>>
>>> or
>>>
>>> iptables -A INPUT -p ICMP -s 0/0 -d 172.16.8.1 -j DROP
>>>
>>> have any effect.
>>>
>>> 6) Tcpdump on the 5GHz uplink interface shows no problems.
>>>
>>> 7) I have disconnected the antenna cable from wlan0 to the omni and
>>> all ICMP traffic ceases so it would appear to be externally
>>> generated.  Reconnecting immediately displays the problem.
>>> As you read this, I will be using iwpriv maccmd tools to knock off
>>> each radio in an attempt to identify/isolate the source.  I suppose
>>> it's also possible it could be coming from a non-customer in the area?
>>>
>>> Can anyone think of what or how a wireless client would be generating
>>> the traffic shown?
>>>
>>> Thanks,
>>>
>>> Edwin
>>>
>>>
>>> Punky Tse wrote:
>>>       
>>>> Hi Edwin,
>>>>
>>>> - did you box expose to Internet that everyone can reach your box?
>>>>         
>>> the 5GHz PtP link does have a real IP but the wlan0 side (where the
>>> problem exists is all private.  Sniffing the 5GHz side shows no
>>> problems.
>>>       
>>>> - did you check /var/log/ to see if any strange things happens?
>>>>         
>>> I have my logging redirected to a normal PC.  Tailing that
>>> /var/log/messages shows only normal DHCP traffic.
>>> /var/log/daemon is pretty full with
>>> Sep 24 20:03:01 rinerwrap zebra[2189]: netlink_parse_info:
>>> netlink-listen type RTM_NEWLINK(16), seq=0, pid=0
>>> Sep 24 20:03:01 rinerwrap zebra[2189]: netlink_link_change: ignoring
>>> IFLA_WIRELESS message
>>>
>>> entries but they pre-date the storm and look no different now.  There
>>> are also some fairly normal looking OSPF traffic from the other parts
>>> of my net but note that the storm exists even when all quagga/ospf
>>> processes on this box are shutdown.
>>>
>>>       
>>>> - did you run vmstat to see the CPU consumption?  If you can tell
>>>> whether it is consuming CPU on system or userland program it could
>>>> help.
>>>>         
>>> riner:~# vmstat 2
>>> procs -----------memory---------- ---swap-- -----io---- -system--
>>> ----cpu----
>>> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>>> id wa
>>> 0  0      0  86836   2184  24036    0    0     5     0 1072   49  3 17
>>> 80  0
>>> 0  0      0  86836   2184  24036    0    0     0     0 1199   24  0 17
>>> 83  0
>>> 0  0      0  86836   2184  24036    0    0     0     0 1184   14  1 17
>>> 83  0
>>> 0  0      0  86800   2184  24036    0    0     0     0 1196   19  0 19
>>> 81  0
>>> 0  0      0  86800   2184  24036    0    0     0     0 1124   13  1 14
>>> 85  0
>>> 0  0      0  86800   2184  24036    0    0     0     0 1086   13  0 16
>>> 84  0
>>> 0  0      0  86800   2184  24036    0    0     0     0 1103   21  0 16
>>> 83  0
>>>
>>> top similarly shows no runaway process
>>>
>>> riner:/var/log# w
>>> 12:55:47 up  1:12,  2 users,  load average: 0.00, 0.00, 0.00
>>>
>>>
>>>       
>>>> Punky
>>>>
>>>> Edwin Whitelaw wrote:
>>>>         
>>>>> Running Voyage 0.2 fully updated on WRAP 2C.  Two radios, one 5GHz
>>>>> (SR5)for backhaul and an NL2611 for the AP.  Firmware on the AP
>>>>> radio is
>>>>>
>>>>> wifi0: NIC: id=0x8013 v1.0.0
>>>>> wifi0: PRI: id=0x15 v1.1.1
>>>>> wifi0: STA: id=0x1f v1.8.2
>>>>> wifi0: Intersil Prism2.5 PCI: mem=0xa0000000, irq=9
>>>>> wifi0: registered netdevice wlan0
>>>>>
>>>>> I'm only recently getting occasional (every few days) ICMP dest
>>>>> unreachable broadcast storms that are effectively DoS attacks on the
>>>>> system though at this point I'm not sure whether it's a
>>>>> rogue/defective hardware issue, misbehaving software or a deliberate
>>>>> attack from an infected customer's site.  Unfortunately, it has been
>>>>> difficult to determine the origin since the source IP address is
>>>>> 0.0.0.0 and the source MAC shows as all "f"s.  Iptables entries to
>>>>> block all ICMP from 0.0.0.0 incoming on wlan0 has no effect.
>>>>>
>>>>> The storms last from just a few minutes to 10s of minutes though if
>>>>> I am not actually at the console when they occur it is difficult to
>>>>> get an exact read on the duration.
>>>>>
>>>>> The clients on this AP are a mix of Engenius CB3s, Tranzeo CPEs
>>>>> (basically the same radio) and a few smartbridges.
>>>>>
>>>>> iptraf shows the storms as ICMP dest unreachble and tcpdump shows
>>>>> ICMP and OSPF as the protocol.  We do run OSPF but I have shut down
>>>>> quagga during one of these storms with no effect and would expect it
>>>>> to stop if OSPF were the cause.
>>>>>
>>>>> Anyone else experiencing this problem or have a suggestion on how to
>>>>> protect against it?  I will try and capture some tcpdump output the
>>>>> next time and regret not having it at this point though to my eyes,
>>>>> it doesn't offer much information beyond this verbal description.
>>>>>
>>>>> Edwin
>>>>>
>>>>>
>>>>>           
>> --
>> <=+=+=+==+=+=+==+=+=+=+=+=+=+=+=>
>> Edwin Whitelaw, P.E.
>> New River Valley Unwired, LLC
>> 2200 Lonesome Dove Dr
>> Christiansburg, VA 24073
>> 540-239-0318
>>
>>
>> _______________________________________________
>> Voyage-linux mailing list
>> Voyage-linux at list.voyage.hk
>> http://list.voyage.hk/mailman/listinfo/voyage-linux
>>
>>     
>
>   

-- 
<=+=+=+==+=+=+==+=+=+=+=+=+=+=+=>
Edwin Whitelaw, P.E.
New River Valley Unwired, LLC
2200 Lonesome Dove Dr
Christiansburg, VA 24073
540-239-0318