NetBSD Problem Report #18414

Received: (qmail 24150 invoked by uid 605); 25 Sep 2002 08:26:57 -0000
Message-Id: <20020925082656.8592511122@narn.netbsd.org>
Date: Wed, 25 Sep 2002 01:26:56 -0700 (PDT)
From: rauch@math.rice.edu
Sender: gnats-bugs-owner@netbsd.org
Reply-To: rkr@olib.org
To: gnats-bugs@gnats.netbsd.org
Subject: tlp driver can "collect" ~10 packets in 1.6 before sending them
X-Send-Pr-Version: www-1.0

>Number:         18414
>Category:       kern
>Synopsis:       tlp driver can "collect" ~10 packets in 1.6 before sending them
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Sep 25 08:27:00 +0000 2002
>Closed-Date:    
>Last-Modified:  Mon Mar 10 08:15:03 +0000 2008
>Originator:     Richard Rauch
>Release:        1.6 kernel (1.5 userland)
>Organization:
n/a
>Environment:
NetBSD hermes 1.6 NetBSD 1.6 (hermes) #0: Mon Sep 23 16:14:11 CDT 2002     root@prometheus:/usr/src/sys/arch/i386/compile/hermes i386

(Also happens with GENERIC; the main difference of the custom kernel
is that I increased the SYSV shared memory pages to run Ogle.)
>Description:
First, I have two machines with tlp based ethernet cards.
One (fully 1.6 installed) is working normally, and using an older
ethernet card (still a Tulip clone, but an older one).  The other
(1.6 kernel, 1.5.2 userland) is not in a state where I'm ready to
upgrade the whole system yet.  It's using a newer SoHo-ware card,
and is the one having the problems.  (I assume that the 1.5.x userland
wouldn't cause the problem.  But I need/want the 1.6 kernel for
other reasons.)

After booting, network interaction is very poor with my tlp-using
PCI 10/100 card.  The initial appearance is that the network card is
simply not working.  But if you do a "ping", say, and let it sit,
then after a few seconds, you'll see a burst of "ping" packets
sent/received all at once (round-trip times seperated by ~1 second,
because the packets were (so far as ping is concerned) sent 1 second
apart).  With nothing else doing network activity, it takes 9 or 10
"ping" packets to cause the interface to actually send the packets.
(So every ~10 seconds, ping will show one with ~9000 ms, one with ~8000
ms, ... one with ~1000ms, and one with more realistic roundtrip times
in the 0-to-1 range.  I don't know how the 10th packet's roundtrip
time compares to normal behavior, but it's at least within an order of
magnitude or so of the normal time.)

Sorry, I don't have a sample to show of the ping behavior.  Using
workaround (a) (see below), I can eventually kick it into a working
state, and I'm reluctant to reboot it just now and fiddle with it
long enough to make it work again.  (^&  (Maybe this weekend I can
put aside time to have this machine down like that.  Right now, I
am trying to use it.  Email me if more info is required and I'll try
to collect it when I can afford the downtime.)

Doing an "ifconfig" on the interface will cause the ~10 packet queue to
flush even if underfull.

Without *something* to fill/flush the queue, the packets appear to
remain enqueued forever.

dmesg for the card reads:

 /~~~

tlp0 at pci0 dev 13 function 0: Macronix MX98715AEC-x Ethernet, pass 2.5
tlp0: interrupting at irq 10
tlp0: Ethernet address 00:80:c6:f9:bc:35
tlp0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

 \___

/etc/ifconfig.tlp0 reads:

 /~~~

media 100baseTX
inet hermes netmask 255.255.255.0

 \___

>How-To-Repeat:
Use this ethernet card in this machine with 1.6 kernel (I assume
userland doesn't matter).  Boot.  Try to use the network.

>Fix:
I don't know a proper fix at this time.

I can offer some workarounds that may help others who encounter this
problem, though:

(a) ifconfig down/up cycling the card seems to eventually kick it
out of this problem, and then it works normally.  (Continuing to
ifconfig cycle it may restore the problem; I haven't gone there.)
I do NOT see a pattern to this; I may have to ifconfig cycle the
interface man times before it starts working.  But once it works, I
can leave it alone and the network just works.

(b) ping -f, or similar, can send lots of small packets and hence
reduce the latency on the packets considerably.  (Highly interactive
stuff, especially with small packets, may still suffer profoundly,
but at least the queue won't stall forever.)

(c) Similar to (b), one could do "ifconfig tlp0 >/dev/null" in a tight
loop, since the ifconfig probe seems to flush the queue.

(d) Get a different card.  (^&

>Release-Note:
>Audit-Trail:

From: John Kohl <jtk@kolvir.arlington.ma.us>
To: rauch@math.rice.edu
Cc: netbsd-bugs@netbsd.org
Subject: Re: kern/18414: tlp driver can "collect" ~10 packets in 1.6 before sending them
Date: Sun, 29 Sep 2002 23:05:30 -0400 (EDT)

 I had this problem with a SOHOware card, and figured out the problem was
 that I was explicitly choosing media rather than letting the card do its
 own media negotiation with the switch/hub.  (I had to force it in 1.5.x
 because auto-negotiation wasn't working right.)

 If your ifconfig scripts select ifmedia, try not doing that.

 tlp0 at pci0 dev 13 function 0: Macronix MX98715AEC-x Ethernet, pass 2.5
 tlp0: broken MicroWire interface detected; setting SROM size to 1Kb
 tlp0: interrupting at irq 9
 tlp0: Ethernet address 00:80:c6:f2:de:02
 tlp0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

 -- 
 ==John Kohl <jtk@kolvir.arlington.ma.us>, <john_kohl@alum.mit.edu>
 Home page: <http://john.kohl.home.attbi.com/>

From: Richard Rauch <rauch@rice.edu>
To: John Kohl <jtk@kolvir.arlington.ma.us>
Cc: <netbsd-bugs@netbsd.org>
Subject: Re: kern/18414: tlp driver can "collect" ~10 packets in 1.6 before
 sending them
Date: Mon, 30 Sep 2002 01:34:11 -0500 (CDT)

 > I had this problem with a SOHOware card, and figured out the problem was
 > that I was explicitly choosing media rather than letting the card do its

 Indeed, this is exactly the case that I had.


 > own media negotiation with the switch/hub.  (I had to force it in 1.5.x
 > because auto-negotiation wasn't working right.)
 >
 > If your ifconfig scripts select ifmedia, try not doing that.

 This works around the bug, and (from two boots) seems to be reliable.

 I am puzzled that repeated "ifconfig tlp0 down && ifconfig tlp0 up"
 eventually brings the driver in line, though.  (Not the first time, but
 eventually...)



 Thanks!


   ``I probably don't know what I'm talking about.'' --rauch@math.rice.edu
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/18414: tlp driver can "collect" ~10 packets in 1.6 before sending them
Date: Mon, 10 Mar 2008 08:14:35 +0000

 Oh, no.

 I had this problem years ago, tracked it down, wrote up what was going
 on... and then forgot about it, so it never got fixed. Later on when I
 found the writeup again, I searched the PR database without finding
 anything, concluded nobody else had the same kind of weirdly mangled
 tulip I did (which isn't entirely implausible) and let it slide.
 Several times, actually. I just found it this time completely by
 accident.

 Ironically, the writeup is dated the same day as this PR. Here goes:

    I built a current kernel again for my home box last night, and the
    tulip driver still doesn't go.

    It produces the curious symptom that, if ping is running, once
    every ten seconds a burst of packets gets through.

    The card is an 82C115 pass 2.5, and, as far as I can tell, probes
    and attaches correctly.

    What seems to be happening is that the link autonegotiation never
    completes. In tlp_2114x_nway_status (around line 5250 of tulip.c,
    version 1.119), it checks to see if autonegotiation is enabled, and
    if so, does this:

 		if ((siastat & SIASTAT_ANS) != SIASTAT_ANS_FLPGOOD) {
 			/* Erg, still trying, I guess... */
 			mii->mii_media_active |= IFM_NONE;
 			return;
 		}

    My card seems to always remain "still trying" forever. This means that
    it never gets to the code that raises IFM_ACTIVE in mii_media_status,
    which correspondingly means that TULIPF_LINK_UP never gets raised
    either, which causes tlp_start to abort near the top.

    (It sends occasionally anyway, because the test that causes tlp_start
    to abort only operates if the send queue is less than ten packets
    long. The driver then transmits even though it thinks it doesn't have
    link, which may or may not be a problem in its own right.)

    Turning off the autonegotiation bit (SIATXRX_ANE) during card
    initialization doesn't appear to work, or maybe I did it wrong, but
    bypassing the entire autonegotiation branch of code in
    tlp_2114x_nway_status seems to make everything run more or less ok.

    This is presumably not the correct solution, but hopefully someone who
    actually knows the hardware can take over from this point. (I'd be
    happy to run further experiments.)


 I do still have (and use) the card and can post the hack I've been
 carrying for the last five years if anyone wants it.

 -- 
 David A. Holland
 dholland@netbsd.org

>Unformatted:
 I (John Kohl) found this problem too, and "fixed" it by letting the card auto-
 select its media.  When I manually chose media, it screwed up.  (Is that a
 hub/switch problem or a driver/card problem?)

 As for the packet queuing, I correlated it to a delay in transmits until
 a receive interrupt/packet processing.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.