NetBSD Problem Report #6475
Received: (qmail 11608 invoked from network); 21 Nov 1998 00:49:31 -0000
Message-Id: <199811210048.QAA00445@Cuisinart.Stanford.EDU>
Date: Fri, 20 Nov 1998 16:48:05 -0800 (PST)
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
Reply-To: jonathan@netbsd.org
To: gnats-bugs@gnats.netbsd.org
Subject: NetBSD's MI le driver does not report giant Ethernet packets
X-Send-Pr-Version: 3.95
>Number: 6475
>Category: kern
>Synopsis: NetBSD's MI le driver does not report giant Ethernet packets
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: suspended
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Nov 20 16:50:01 +0000 1998
>Closed-Date:
>Last-Modified: Sat Jan 05 08:34:28 +0000 2002
>Originator: Jonathan Stone
>Release: 19981114
>Organization:
>Environment:
System: NetBSD Cuisinart.Stanford.EDU 1.3H NetBSD 1.3H (EGCS-HAIFA) #15: Sun Nov 15 17:05:22 PST 1998 jonathan@Cuisinart.Stanford.EDU:/cuisinart/compile/EGCS-HAIFA pmax
>Description:
NetBSD's MI le driver does not report giant Ethernet packets
as giant Ethernet packets.
>How-To-Repeat:
Connect two machines -- say, NetBSD/pmax and a Sparc running
SunOS 4.1.3 -- to a broadcast 10Mbit ethernet segment.
Arrange for a wayward machine to send a spray of giant packets
on an Ethernet segment (someone else's machine did this to me today).
Compare the logs from NetBSD-current and Solaris.
On the netbsd/pmax machine, I get:
le0: dropping chained buffer
le0: dropping chained buffer
le0: dropping chained buffer
[repeated ad infinitum -- so much so i lost console access and rebooted]
On the SunOS 4.1.3_U1 sparc I get:
le0: Receive: giant packet from 0:40:33:a0:4c:7c
le0: Receive: STP in rmd cleared
le0: Receive: giant packet from 55:55:55:55:55:5
le0: Receive: STP in rmd cleared
le0: Receive: giant packet from 55:55:55:55:55:5
le0: Receive: STP in rmd cleared
le0: Receive: giant packet from 55:55:55:55:55:5
assuming this problem really is due to giant-packet errors, the SunOS
message seems much more informative.
>Fix:
I dont have a LANCE databook. But from looking at the SunOS messages,
it looks like some chip revisions do not set STP and OFLO in quite the
way that am7990.c's input-error handler expects.
>Release-Note:
>Audit-Trail:
From: Robert Elz <kre@munnari.OZ.AU>
To: jonathan@DSG.Stanford.EDU
Cc: gnats-bugs@gnats.netbsd.org
Subject: Re: kern/6475:
Date: Sat, 21 Nov 1998 23:23:04 +1100
Date: Fri, 20 Nov 1998 16:48:05 -0800 (PST)
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
Message-ID: <199811210048.QAA00445@Cuisinart.Stanford.EDU>
| I dont have a LANCE databook. But from looking at the SunOS messages,
| it looks like some chip revisions do not set STP and OFLO in quite the
| way that am7990.c's input-error handler expects.
I doubt it is quite that exotic ... I had never looked at the NetBSD
lance driver before, but I have programmed lances quite a bit in the
past (and I do have a databook, if that was needed, which it doesn't
seem to be here).
OFLO is irrelevant here, if that's happening you have real problems with
your hardware bus design (it indicates the lance wasn't able to get enough
DMA cycles). It looks as if the chances of ever seeing an overflow reported
are pretty close to 0 though, so if it is happening, you'll never know (maybe
this is by design). I'm not sure I understand the rational behind the
way OFLO is tested in the code.
The "dropping chained buffer" message is printed whenever a buffer without
both STP and ENP set is encountered - the lance sets STP in a buffer that
contains the start of a packet, and ENP in a buffer that contains the end
of a packet - the test implies that NetBSD wants entire packets to fit in
single buffers.
If you get a giant packet (bigger than a buffer - I assume that the buffers
used here are > 1516 bytes, though I haven't looked) then the first of those
packets will contain STP and not ENP, the last will contain just ENP, without
STP, and any others needed (for truly HUGE packets) will contain neither.
The way that the NetBSD test is written, each of those buffers (at least 2
per big packet) will cause a message to be written (and if the burst fills
all of the waiting buffers before any are freed, you'd also get "receive
buffer error").
That's probably overkill - it will almost certainly be good enough to
log the message on the packet that has STP set (unless your lance is terminally
broken, the first packet in one of these sequences will have that set).
If that is done, then you also know the packet header is in the buffer, and
it (might) make sense to log the source address of the packet - the problem
with doing that is that these giant packets can be caused by what amounts
to line noise, the "address" is gibberish - to detect the difference you
need to check for CRC errors, and (of course) CRC errors can only be detected
in the last buffer of the giant packet, which isn't the one with the STP
set (or the addresses in it). Similarly, this whole sequence of buffers
should be counted as just 1 error, not one per buffer.
Whether the text of the message is "dropping chained buffer" or "giant packet
ignored" or anything similar doesn't really matter, they all mean the same.
kre
ps: I have a very busy ethernet (too busy), and I frequently get the
"excessive collisions, tdr %d" message (the tdr reported is random, the
net isn't broken, just very busy). I don't mind the occasional one of
those, but getting them in bursts when things are really congested gets a
bit annoying. Most probably these kinds of messages ought to be rate
controlled, or perhaps this is one that could be moved inside LEDEBUG,
just as are reports of missed packets, checksum errors, etc.
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
To: Robert Elz <kre@munnari.OZ.AU>
Cc: gnats-bugs@gnats.netbsd.org
Subject: kern/6475:
Date: Sat, 21 Nov 1998 13:07:19 -0800
hi Robert,
The real problem is the swath of messages, but *also* not getting the
MAC address of the bad-guy sending the giant packets.
From (very rusty) memory, I think the required logic is something like:
/* an OFLOW indicate hardmare-silo overflow which may
drops the ENP marker, so ignore missing ENP in that case */
if (OFLO not set) {
if (STP not set) {
warn about missing STP (chained buffer overrun)
} else if (ENP not set) {
/* frame with STP but not ENP: they must be giants */
printf("giant packet from: %02x:%02x:%02x:%02x:%02x%02x\n",
<mac-address in frame>);
}
}
assuming the driver is set up to never need chained buffers.
i beleive ours is.
Responsible-Changed-From-To: kern-bug-people->fair
Responsible-Changed-By: fair
Responsible-Changed-When: Thu Mar 18 14:41:53 PST 1999
Responsible-Changed-Why:
Gonna be poking into the Am7990 (le) driver stuff anyway...
State-Changed-From-To: open->analyzed
State-Changed-By: fair
State-Changed-When: Mon Dec 20 18:47:08 PST 1999
State-Changed-Why:
Examination of the drivers leads me to believe that reporting the
ethernet MAC address of the station that transmitted the giant
packet would be messy.
Things are split into a system specific glue code, the am7990.c
for the chip, and then an additional set of lance.c routines for
stuff that's common to all AMD ethernet controllers, most specifically,
reading the packet out of the receive buffer and into an mbuf for
processing. When a recieve interrupt goes off, the chip-specific
code reads the status stuff, and decides whether to process the
packet or not, and if so, calls "lance_read" to grab it and hand
off to protocol specific layers.
So, reporting the MAC address for the "chained buffer overflow"
error means writing a routine to go dive into the packet and grab
that specifif datum out (it can probably go in lance.c since the
packets are being read in there). This is not really my forte' so
I'll settle for adjusting the documentation to explain that "dropping
chained buffer" really means "packet too big".
Responsible-Changed-From-To: fair->kern-bug-people
Responsible-Changed-By: fair
Responsible-Changed-When: Mon Dec 20 23:50:47 PST 1999
Responsible-Changed-Why:
I've more carefully documented the state of this problem in a revised le(4)
manual page, but I'm not up to hacking on the driver to fetch the MAC addr
out for the diagnostic - either it should be taken up by someone else, or
the submitter can decide that what I've done is enough.
State-Changed-From-To: analyzed->feedback
State-Changed-By: fair
State-Changed-When: Tue Feb 8 13:13:13 PST 2000
State-Changed-Why:
Is documentation of the diagnostics enough?
State-Changed-From-To: feedback->closed
State-Changed-By: fair
State-Changed-When: Tue Jun 27 12:32:47 PDT 2000
State-Changed-Why:
closed for lack of feedback; "silent indicates assent."
State-Changed-From-To: closed->feedback
State-Changed-By: fair
State-Changed-When: Tue Jun 27 13:39:58 PDT 2000
State-Changed-Why:
reopened by request - we'll wait for feedback for another quarter (3 mo).
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
To: fair@netbsd.org
Cc: jonathan@DSG.Stanford
Subject: Re: kern/6475
Date: Tue, 27 Jun 2000 13:41:09 -0700
overflowtext="",overflowoffset=0
date:component="Your message dated",formatfield=""
body:component=">",overflowtext=">",overflowoffset=0
In message <20000627193409.13560.qmail@mail.netbsd.org>fair@netbsd.org writes
>Synopsis: NetBSD's MI le driver does not report giant Ethernet packets
>
>State-Changed-From-To: feedback->closed
>State-Changed-By: fair
>State-Changed-When: Tue Jun 27 12:32:47 PDT 2000
>State-Changed-Why:
>closed for lack of feedback; "silent indicates assent."
No, it doesn't indicate assent; it indicates overload :).
Now I have to figure out how to re-open the PR.
Unfortunatley I'm no longer living at the residence where the old
Cisco AGS/AGS+ chassis and noisy DEMPRS caused lots and lots and lots
of giant frames...
State-Changed-From-To: feedback->suspended
State-Changed-By: fair
State-Changed-When: Sat Jan 5 00:30:27 PST 2002
State-Changed-Why:
We don't need to be poking Jonathan for feedback on this one every month.
The API for Ethernet drivers needs to be adjusted to allow MAC addresses
to be grabbed and reported for various error conditions detected by the MI
code - ideally for *all* Ethernet drivers, not just le(4). However, until
some brave soul volunteers to rototill the amount of code involved, this
problem sits only partially resolved.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.