NetBSD Problem Report #6475

Received: (qmail 11608 invoked from network); 21 Nov 1998 00:49:31 -0000
Message-Id: <199811210048.QAA00445@Cuisinart.Stanford.EDU>
Date: Fri, 20 Nov 1998 16:48:05 -0800 (PST)
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
Reply-To: jonathan@netbsd.org
To: gnats-bugs@gnats.netbsd.org
Subject: NetBSD's MI le driver does not report giant Ethernet packets
X-Send-Pr-Version: 3.95

>Number:         6475
>Category:       kern
>Synopsis:       NetBSD's MI le driver does not report giant Ethernet packets
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          suspended
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Nov 20 16:50:01 +0000 1998
>Closed-Date:    
>Last-Modified:  Sat Jan 05 08:34:28 +0000 2002
>Originator:     Jonathan Stone
>Release:        19981114
>Organization:

>Environment:

System: NetBSD Cuisinart.Stanford.EDU 1.3H NetBSD 1.3H (EGCS-HAIFA) #15: Sun Nov 15 17:05:22 PST 1998 jonathan@Cuisinart.Stanford.EDU:/cuisinart/compile/EGCS-HAIFA pmax


>Description:
	NetBSD's MI le driver does not report giant Ethernet packets
	as giant Ethernet packets.

>How-To-Repeat:

Connect two machines -- say,  NetBSD/pmax and a Sparc running
SunOS 4.1.3 -- to a broadcast 10Mbit ethernet segment.
Arrange for a wayward machine to send a spray of giant packets
on an Ethernet segment (someone else's machine did this to me today).

Compare the logs from NetBSD-current and Solaris.

On the netbsd/pmax machine, I get:
	le0: dropping chained buffer
	le0: dropping chained buffer
	le0: dropping chained buffer
[repeated ad infinitum -- so much so i lost console access and rebooted]

On the SunOS 4.1.3_U1 sparc I get:
	le0: Receive: giant packet from 0:40:33:a0:4c:7c
	le0: Receive: STP in rmd cleared
	le0: Receive: giant packet from 55:55:55:55:55:5
	le0: Receive: STP in rmd cleared
	le0: Receive: giant packet from 55:55:55:55:55:5
	le0: Receive: STP in rmd cleared
	le0: Receive: giant packet from 55:55:55:55:55:5

assuming this problem really is due to giant-packet errors, the SunOS
message seems much more informative.

>Fix:

I dont have a LANCE databook. But from looking at the SunOS messages,
it looks like some chip revisions do not set STP and OFLO in quite the
way that am7990.c's input-error handler expects.
>Release-Note:
>Audit-Trail:

From: Robert Elz <kre@munnari.OZ.AU>
To: jonathan@DSG.Stanford.EDU
Cc: gnats-bugs@gnats.netbsd.org
Subject: Re: kern/6475: 
Date: Sat, 21 Nov 1998 23:23:04 +1100

     Date:        Fri, 20 Nov 1998 16:48:05 -0800 (PST)
     From:        Jonathan Stone <jonathan@DSG.Stanford.EDU>
     Message-ID:  <199811210048.QAA00445@Cuisinart.Stanford.EDU>

   | I dont have a LANCE databook. But from looking at the SunOS messages,
   | it looks like some chip revisions do not set STP and OFLO in quite the
   | way that am7990.c's input-error handler expects.

 I doubt it is quite that exotic ... I had never looked at the NetBSD
 lance driver before, but I have programmed lances quite a bit in the
 past (and I do have a databook, if that was needed, which it doesn't
 seem to be here).

 OFLO is irrelevant here, if that's happening you have real problems with
 your hardware bus design (it indicates the lance wasn't able to get enough
 DMA cycles).   It looks as if the chances of ever seeing an overflow reported
 are pretty close to 0 though, so if it is happening, you'll never know (maybe
 this is by design).   I'm not sure I understand the rational behind the
 way OFLO is tested in the code.

 The "dropping chained buffer" message is printed whenever a buffer without
 both STP and ENP set is encountered - the lance sets STP in a buffer that
 contains the start of a packet, and ENP in a buffer that contains the end
 of a packet - the test implies that NetBSD wants entire packets to fit in
 single buffers.

 If you get a giant packet (bigger than a buffer - I assume that the buffers
 used here are > 1516 bytes, though I haven't looked) then the first of those
 packets will contain STP and not ENP, the last will contain just ENP, without
 STP, and any others needed (for truly HUGE packets) will contain neither.
 The way that the NetBSD test is written, each of those buffers (at least 2
 per big packet) will cause a message to be written (and if the burst fills
 all of the waiting buffers before any are freed, you'd also get "receive
 buffer error").

 That's probably overkill - it will almost certainly be good enough to
 log the message on the packet that has STP set (unless your lance is terminally
 broken, the first packet in one of these sequences will have that set).
 If that is done, then you also know the packet header is in the buffer, and
 it (might) make sense to log the source address of the packet - the problem
 with doing that is that these giant packets can be caused by what amounts
 to line noise, the "address" is gibberish - to detect the difference you
 need to check for CRC errors, and (of course) CRC errors can only be detected
 in the last buffer of the giant packet, which isn't the one with the STP
 set (or the addresses in it).   Similarly, this whole sequence of buffers
 should be counted as just 1 error, not one per buffer.

 Whether the text of the message is "dropping chained buffer" or "giant packet
 ignored" or anything similar doesn't really matter, they all mean the same.

 kre

 ps: I have a very busy ethernet (too busy), and I frequently get the
 "excessive collisions, tdr %d" message (the tdr reported is random, the
 net isn't broken, just very busy).   I don't mind the occasional one of
 those, but getting them in bursts when things are really congested gets a
 bit annoying.   Most probably these kinds of messages ought to be rate
 controlled, or perhaps this is one that could be moved inside LEDEBUG,
 just as are reports of missed packets, checksum errors, etc.


From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
To: Robert Elz <kre@munnari.OZ.AU>
Cc: gnats-bugs@gnats.netbsd.org
Subject: kern/6475: 
Date: Sat, 21 Nov 1998 13:07:19 -0800

 hi Robert,

 The real problem is the swath of messages, but *also* not getting the
 MAC address of the bad-guy sending the giant packets.

 From (very rusty) memory, I think the required logic is something like:


 	/* an OFLOW indicate  hardmare-silo overflow which may
 	   drops the ENP marker, so ignore missing  ENP in that case */
 	if (OFLO not set) {
 		if (STP not set) {
 	   	   warn about missing STP (chained buffer overrun)
    	   	} else if  (ENP not set) {
 		    /* frame with  STP but not ENP: they must be giants */
 		    printf("giant packet from: %02x:%02x:%02x:%02x:%02x%02x\n",
 				  <mac-address in frame>);
 		}				  
 	}

 assuming the driver is set up to never need chained buffers.
 i beleive ours is.
Responsible-Changed-From-To: kern-bug-people->fair 
Responsible-Changed-By: fair 
Responsible-Changed-When: Thu Mar 18 14:41:53 PST 1999 
Responsible-Changed-Why:  
Gonna be poking into the Am7990 (le) driver stuff anyway... 
State-Changed-From-To: open->analyzed 
State-Changed-By: fair 
State-Changed-When: Mon Dec 20 18:47:08 PST 1999 
State-Changed-Why:  
Examination of the drivers leads me to believe that reporting the
ethernet MAC address of the station that transmitted the giant
packet would be messy.

Things are split into a system specific glue code, the am7990.c
for the chip, and then an additional set of lance.c routines for
stuff that's common to all AMD ethernet controllers, most specifically,
reading the packet out of the receive buffer and into an mbuf for
processing. When a recieve interrupt goes off, the chip-specific
code reads the status stuff, and decides whether to process the
packet or not, and if so, calls "lance_read" to grab it and hand
off to protocol specific layers.

So, reporting the MAC address for the "chained buffer overflow"
error means writing a routine to go dive into the packet and grab
that specifif datum out (it can probably go in lance.c since the
packets are being read in there). This is not really my forte' so
I'll settle for adjusting the documentation to explain that "dropping
chained buffer" really means "packet too big".

Responsible-Changed-From-To: fair->kern-bug-people 
Responsible-Changed-By: fair 
Responsible-Changed-When: Mon Dec 20 23:50:47 PST 1999 
Responsible-Changed-Why:  
I've more carefully documented the state of this problem in a revised le(4) 
manual page, but I'm not up to hacking on the driver to fetch the MAC addr 
out for the diagnostic - either it should be taken up by someone else, or 
the submitter can decide that what I've done is enough. 
State-Changed-From-To: analyzed->feedback 
State-Changed-By: fair 
State-Changed-When: Tue Feb 8 13:13:13 PST 2000 
State-Changed-Why:  
Is documentation of the diagnostics enough? 
State-Changed-From-To: feedback->closed 
State-Changed-By: fair 
State-Changed-When: Tue Jun 27 12:32:47 PDT 2000 
State-Changed-Why:  
closed for lack of feedback; "silent indicates assent." 
State-Changed-From-To: closed->feedback 
State-Changed-By: fair 
State-Changed-When: Tue Jun 27 13:39:58 PDT 2000 
State-Changed-Why:  
reopened by request - we'll wait for feedback for another quarter (3 mo). 

From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
To: fair@netbsd.org
Cc: jonathan@DSG.Stanford
Subject: Re: kern/6475 
Date: Tue, 27 Jun 2000 13:41:09 -0700

 overflowtext="",overflowoffset=0
 date:component="Your message dated",formatfield=""
 body:component=">",overflowtext=">",overflowoffset=0
 In message <20000627193409.13560.qmail@mail.netbsd.org>fair@netbsd.org writes
 >Synopsis: NetBSD's MI le driver does not report giant Ethernet packets
 >
 >State-Changed-From-To: feedback->closed
 >State-Changed-By: fair
 >State-Changed-When: Tue Jun 27 12:32:47 PDT 2000
 >State-Changed-Why: 
 >closed for lack of feedback; "silent indicates assent."


 No, it doesn't indicate assent; it indicates overload :).

 Now I have to figure out how to re-open the PR.
 Unfortunatley I'm no longer living at the residence where the old
 Cisco AGS/AGS+ chassis and noisy DEMPRS caused lots and lots and lots
 of giant frames...
State-Changed-From-To: feedback->suspended 
State-Changed-By: fair 
State-Changed-When: Sat Jan 5 00:30:27 PST 2002 
State-Changed-Why:  
We don't need to be poking Jonathan for feedback on this one every month. 
The API for Ethernet drivers needs to be adjusted to allow MAC addresses 
to be grabbed and reported for various error conditions detected by the MI 
code - ideally for *all* Ethernet drivers, not just le(4). However, until 
some brave soul volunteers to rototill the amount of code involved, this 
problem sits only partially resolved. 
>Unformatted:
Home
PR Database Search
(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.