NetBSD Problem Report #18051

Received: (qmail 9900 invoked by uid 605); 23 Aug 2002 17:45:57 -0000
Message-Id: <200208231745.g7NHjsJ00909@raeburn.org>
Date: Fri, 23 Aug 2002 13:45:54 -0400 (EDT)
From: raeburn@raeburn.org
Sender: gnats-bugs-owner@netbsd.org
Reply-To: raeburn@raeburn.org
To: gnats-bugs@gnats.netbsd.org
Cc: raeburn@raeburn.org
Subject: tlp doesn't configure media, faults on apm reset
X-Send-Pr-Version: 3.95

>Number:         18051
>Category:       kern
>Synopsis:       tlp doesn't configure media, faults on apm reset
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    martin
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 23 17:46:00 +0000 2002
>Closed-Date:    
>Last-Modified:  Wed May 07 22:18:46 +0000 2003
>Originator:     Ken Raeburn <raeburn@raeburn.org>
>Release:        1.6 branch as of a few days ago
>Organization:
	not much
>Environment:

NetBSD 1.6 branch with kernel based on GENERIC 1.491.4.2
Architecture: i386
Machine: i386
>Description:

Following the lead of the GENERIC config, I updated my machine's
config to use the tulip driver rather than the de driver for my 4-port
card.

#de*	at pci? dev ? function ?	# DEC 21x4x-based Ethernet
tlp*	at pci? dev ? function ?	# DECchip 21x4x (and clones) Ethernet

I brought up my new kernel in single-user mode (so the fact that the
remaining user-land code is much older shouldn't matter much), and
walked away.

During boot, all four ports were identified as "DECchip 21143 Ethernet
pass 4.1".  All four also reported IRQ and ethernet address, and then
these messages:

tlp0: OUI 0x1000e8 model 0x0001 rev 0 at tlp0 phy 1 not configured
tlp0: unable to configure MII
tlp0: no media found!

But ports 0 and 2 are both connected.  The old de driver would report
100baseTX for one, and 10baseT for the other.

After a little while, the APM code kicked in, and when I came back, I
found the machine at the ddb prompt.  A stack trace showed:

	tlp_21142_reset+0x18
	tlp_reset
	tlp_stop
	tlp_power
	dopowerhooks
	...

In tlp_21142_reset is this code:

void
tlp_21142_reset(sc)
	struct tulip_softc *sc;
{
	struct ifmedia_entry *ife = sc->sc_mii.mii_media.ifm_cur;
	struct tulip_21x4x_media *tm = ife->ifm_aux;
	const u_int8_t *cp;
	int i;

	cp = &sc->sc_srom[tm->tm_reset_offset];
	for (i = 0; i < tm->tm_reset_length; i++, cp += 2) {

The instruction at +0x18 is the read of tm->tm_reset_offset, but ddb
indicates it's reading through a null pointer.  A printf statement
inserted confirms that tm is null at this point.


>How-To-Repeat:

I'm not sure why the "no media found" report comes up, or if it's
critical to reproducing the crash; perhaps bringing up a machine with
a tulip card unplugged from the net would be enough.  Then wait for
APM to shut it down.

>Fix:


Check for media info being a null pointer.
Drop network device if it has no media?
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: bouyer 
State-Changed-When: Sat Aug 24 07:48:41 PDT 2002 
State-Changed-Why:  
Please try with the PHY drivers configured in your kernel. 

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: raeburn@raeburn.org
Cc: gnats-bugs@gnats.netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/18051: tlp doesn't configure media, faults on apm reset
Date: Sat, 24 Aug 2002 16:48:23 +0200

 You need dont configure PHYs in you kernel.
 I think this PHY is either a nsphy or nsphyter, so add a last:
 nsphy*  at mii? phy ?                   # NS83840 PHYs
 nsphyter* at mii? phy ?                 # NS83843 PHYs 

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
 --

From: Ken Raeburn <raeburn@raeburn.org>
To: stephen@degler.net, chuq@chuq.com, bouyer@antioche.eu.org
Cc: gnats-bugs@gnats.netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/18051: tlp doesn't configure media, faults on apm reset
Date: Sun, 25 Aug 2002 23:54:17 -0400

 Thanks for all the responses.

 Manuel Bouyer's suggestion of adding the "nsphyter*" device to my
 config file seems to have fixed the problem.  Which leads to the
 question of whether this should be required (in which case config
 should have complained) or not (in which case the driver should cope).

 With that entry added, the kernel seems pretty happy with my DEC
 4-port card.  (Sorry, don't remember the exact model, but I'm pretty
 sure it was sold as a DEC card.)

     # dmesg | egrep 'ppb0|pci1|tlp|phy' 
     ppb0 at pci0 dev 19 function 0: Digital Equipment DECchip 21152 PCI-PCI Bridge (rev. 0x03)
     pci1 at ppb0 bus 1
     pci1: i/o space, memory space enabled
     tlp0 at pci1 dev 4 function 0: DECchip 21143 Ethernet, pass 4.1
     tlp0: interrupting at irq 11
     tlp0: Ethernet address 00:80:c8:f8:72:54
     nsphyter0 at tlp0 phy 1: DP83843 10/100 media interface, rev. 0
     nsphyter0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
     tlp1 at pci1 dev 5 function 0: DECchip 21143 Ethernet, pass 4.1
     tlp1: interrupting at irq 11
     tlp1: Ethernet address 00:80:c8:f8:72:55
     nsphyter1 at tlp1 phy 1: DP83843 10/100 media interface, rev. 0
     nsphyter1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
     tlp2 at pci1 dev 6 function 0: DECchip 21143 Ethernet, pass 4.1
     tlp2: interrupting at irq 11
     tlp2: Ethernet address 00:80:c8:f8:72:56
     nsphyter2 at tlp2 phy 1: DP83843 10/100 media interface, rev. 0
     nsphyter2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
     tlp3 at pci1 dev 7 function 0: DECchip 21143 Ethernet, pass 4.1
     tlp3: interrupting at irq 11
     tlp3: Ethernet address 00:80:c8:f8:72:57
     nsphyter3 at tlp3 phy 1: DP83843 10/100 media interface, rev. 0
     nsphyter3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

 All the devices appear to configure properly; pcictl shows these
 devices:

     000:19:0: Digital Equipment DECchip 21152 PCI-PCI Bridge (PCI bridge, revision 0x03)

     001:04:0: Digital Equipment DECchip 21142/21143 10/100 Ethernet (ethernet network, revision 0x41)
     001:05:0: Digital Equipment DECchip 21142/21143 10/100 Ethernet (ethernet network, revision 0x41)
     001:06:0: Digital Equipment DECchip 21142/21143 10/100 Ethernet (ethernet network, revision 0x41)
     001:07:0: Digital Equipment DECchip 21142/21143 10/100 Ethernet (ethernet network, revision 0x41)

 And ifconfig shows the correct media for everything:

     tlp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
             address: 00:80:c8:f8:72:54
             media: Ethernet autoselect (100baseTX)
             status: active
             inet [...]
     tlp1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
             address: 00:80:c8:f8:72:55
             media: Ethernet autoselect (none)
             status: no carrier
     tlp2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
             address: 00:80:c8:f8:72:56
             media: Ethernet autoselect (10baseT)
             status: active
             inet [...]
     tlp3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
             address: 00:80:c8:f8:72:57
             media: Ethernet autoselect (none)
             status: no carrier
     [...]

 As far as I can tell, everything looks good now, aside from the
 occasional "transmit underrun" messages; but if not having any PHY
 entries with a tulip device is an error, config should let me know
 that.  And if it isn't, the kernel shouldn't crash if I do it.

 Ken

From: Chuck Silvers <chuq@chuq.com>
To: Ken Raeburn <raeburn@raeburn.org>
Cc: stephen@degler.net, bouyer@antioche.eu.org, gnats-bugs@gnats.netbsd.org,
  netbsd-bugs@netbsd.org
Subject: Re: kern/18051: tlp doesn't configure media, faults on apm reset
Date: Sun, 25 Aug 2002 22:22:31 -0700

 On Sun, Aug 25, 2002 at 11:54:17PM -0400, Ken Raeburn wrote:
 > As far as I can tell, everything looks good now, aside from the
 > occasional "transmit underrun" messages; but if not having any PHY
 > entries with a tulip device is an error, config should let me know
 > that.

 some tlp cards don't use the chip's MII interface, so not having
 any PHYs configured can be ok.


 > And if it isn't, the kernel shouldn't crash if I do it.

 that's a good point.  just disabling the interface would suffice.

 -Chuck

From: Ken Raeburn <raeburn@raeburn.org>
To: gnats-bugs@gnats.netbsd.org
Cc:  
Subject: Re: kern/18051: tlp doesn't configure media, faults on apm reset
Date: Tue, 05 Nov 2002 20:37:53 -0500

 Unless "configure the PHYs you need or your kernel will crash on you"
 is the intended resolution for this issue, I don't think this PR
 should still be in "feedback" state.  I think the right answer is to
 have the interfaces disabled when PHYs are needed and not configured.

From: Ken Raeburn <raeburn@raeburn.org>
To: gnats-bugs@gnats.netbsd.org
Cc:  
Subject: Re: kern/18051: tlp doesn't configure media, faults on apm reset
Date: Wed, 07 May 2003 17:45:18 -0400

 I keep getting automatic reminders about this PR I filed waiting for
 my feedback, but I've given my feedback....

 Ken
State-Changed-From-To: feedback->open 
State-Changed-By: martin 
State-Changed-When: Wed May 7 22:17:43 UTC 2003 
State-Changed-Why:  
The submitter is right, there is still a problem left to solve. 


Responsible-Changed-From-To: kern-bug-people->martin 
Responsible-Changed-By: martin 
Responsible-Changed-When: Wed May 7 22:17:43 UTC 2003 
Responsible-Changed-Why:  
I can reproduce the problem and will look at it. 
>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.