NetBSD Problem Report #55067

From www@netbsd.org  Thu Mar 12 21:04:01 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A827F1A9213
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 12 Mar 2020 21:04:01 +0000 (UTC)
Message-Id: <20200312210400.82D9E1A924B@mollari.NetBSD.org>
Date: Thu, 12 Mar 2020 21:04:00 +0000 (UTC)
From: thorpej@me.com
Reply-To: thorpej@me.com
To: gnats-bugs@NetBSD.org
Subject: bge(4) does not work on MIPS systems (it panics)
X-Send-Pr-Version: www-1.0

>Number:         55067
>Category:       port-mips
>Synopsis:       bge(4) does not work on MIPS systems (it panics)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    thorpej
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Mar 12 21:05:00 +0000 2020
>Closed-Date:    Fri Mar 13 03:51:06 +0000 2020
>Last-Modified:  Fri Mar 13 07:40:01 +0000 2020
>Originator:     Jason Thorpe
>Release:        NetBSD 9.99.48
>Organization:
Riscy Business
>Environment:
NetBSD 9.99.48 cobalt mipsel
>Description:
The bge(4) driver does not function correctly on MIPS platforms.  The first symptom is the following panic:

Waiting for duplicate address detection to finish...
Starting dhcpcd.
[  23.6599208] panic: _bus_dmamap_sync: bad length
[  23.6713571] cpu0: Begin traceback...
[  23.6713571] pid -2086495680 not found
[  23.6801778] cpu0: End traceback...
[  23.6801778] kernel: breakpoint trap
Stopped in pid 119.1 (dhcpcd) at        netbsd:cpu_Debugger+0x4:        jr      
ra
                bdslot: nop
db> bt
0x83a29a68: cpu_Debugger+4 (3,8000,c,80657c40) ra 803cd228 sz 0
0x83a29a68: vpanic+15c (3,8000,c,80657c40) ra 803cd2bc sz 48
0x83a29a98: panic+24 (3,83edfc90,10998,0) ra 8000dd9c sz 32
0x83a29ab8: _bus_dmamap_sync+4bc (3,83edfc90,10998,0) ra 800726ec sz 72
0x83a29b00: bge_intr+730 (3,83edfc90,10998,0) ra 8000bef8 sz 104
0x83a29b68: icu_intr+130 (3,83edfc90,10998,0) ra 8000c8ac sz 64
0x83a29ba8: cpu_intr+17c (3,83edfc90,10998,0) ra 800119cc sz 56
0x83a29be0: mips3_kern_intr+cc (0,fb00,0,80657c40) ra 80064708 sz 192
0x83a29ca0: bge_ioctl+f8 (0,fb00,0,80657c40) ra 804750f8 sz 48
0x83a29cd0: doifioctl+9a4 (0,80906910,83e3f6e8,80657c40) ra 803dd8d4 sz 304
0x83a29e00: sys_ioctl+2d4 (0,80906910,83e3f6e8,80657c40) ra 8001f1dc sz 200
0x83a29ec8: syscall+28c (0,80906910,83e3f6e8,80657c40) ra 80011e60 sz 128
0x83a29f48: mips3_systemcall+e0 (0,80906910,83e3f6e8,80657c40) ra 7dd7fe40 sz 0
PC 0x7dd7fe40: not in kernel space
0x83a29f48: 0+7dd7fe40 (0,80906910,83e3f6e8,80657c40) ra 0 sz 0
User-level: pid 119.1
db> 

There may be other problems.
>How-To-Repeat:
Attempt to use bge(4) on MIPS.
>Fix:
N/A

>Release-Note:

>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55067: bge(4) does not work on MIPS systems (it panics)
Date: Thu, 12 Mar 2020 22:36:51 -0000 (UTC)

 thorpej@me.com writes:

 >[  23.6599208] panic: _bus_dmamap_sync: bad length

 >0x83a29ab8: _bus_dmamap_sync+4bc (3,83edfc90,10998,0) ra 800726ec sz 72

 mips _bus_dmamap_sync erroneously traps len == 0. It should be gracefully
 allowing that value like other archs.

 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

Responsible-Changed-From-To: kern-bug-people->thorpej
Responsible-Changed-By: thorpej@NetBSD.org
Responsible-Changed-When: Thu, 12 Mar 2020 23:15:47 +0000
Responsible-Changed-Why:
Take.


State-Changed-From-To: open->analyzed
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Thu, 12 Mar 2020 23:15:47 +0000
State-Changed-Why:
Have a fix.


From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: netbsd-bugs@netbsd.org
Subject: Re: kern/55067: bge(4) does not work on MIPS systems (it panics)
Date: Thu, 12 Mar 2020 16:25:52 -0700

 > On Mar 12, 2020, at 3:40 PM, Michael van Elst <mlelstv@serpens.de> =
 wrote:

 > mips _bus_dmamap_sync erroneously traps len =3D=3D 0. It should be =
 gracefully
 > allowing that value like other archs.

 Yah, it stops crashing once I fix that.  But it still doesn't work.  =
 Although, I think THAT problem is an issue with my test set-up.  I'm =
 using it in a 32-bit slot, but I suspect the firmware on the board =
 thinks it got plugged into a 64-bit slot; I noticed that my "gsip" board =
 also thinks it's connected to a 64-bit slot on this system.

 I believe that ACK64# and REQ64# are supposed to be left floating on a =
 32-bit slot (I need to re-read the spec to be sure).  I'm guessing that =
 they're being pulled in some direction they're not supposed to be on the =
 Qube2 mainboard, causing the card to mis-detect.  That will cause half =
 of the data path to march off a cliff.  Luckily, to make these cards =
 physically fit the machine, I have to use a couple of risers in a sort =
 of Rube Goldberg arrangement, so if I'm right about what's supposed to =
 happen with ACK64# and REQ64#, I'll simply cut those traces on the =
 risers.

 -- thorpej

From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: netbsd-bugs@netbsd.org
Subject: Re: kern/55067: bge(4) does not work on MIPS systems (it panics)
Date: Thu, 12 Mar 2020 20:28:03 -0700

 > On Mar 12, 2020, at 4:25 PM, Jason Thorpe <thorpej@me.com> wrote:
 >=20
 > Yah, it stops crashing once I fix that.  But it still doesn't work.  =
 Although, I think THAT problem is an issue with my test set-up.  I'm =
 using it in a 32-bit slot, but I suspect the firmware on the board =
 thinks it got plugged into a 64-bit slot; I noticed that my "gsip" board =
 also thinks it's connected to a 64-bit slot on this system.
 >=20
 > I believe that ACK64# and REQ64# are supposed to be left floating on a =
 32-bit slot (I need to re-read the spec to be sure).  I'm guessing that =
 they're being pulled in some direction they're not supposed to be on the =
 Qube2 mainboard, causing the card to mis-detect.  That will cause half =
 of the data path to march off a cliff.  Luckily, to make these cards =
 physically fit the machine, I have to use a couple of risers in a sort =
 of Rube Goldberg arrangement, so if I'm right about what's supposed to =
 happen with ACK64# and REQ64#, I'll simply cut those traces on the =
 risers.

 Well, I recalled incorrectly about 64-bit slot detection.

 ACK64# and REQ64# are supposed to be pulled high by the system board on =
 32-bit slots according to PCI 2.1... but I need to dig up an older =
 version of the spec to see if those pins were floating / NC prior to PCI =
 2.x.

 However, it may simply be the case that these boards don't work in =
 32-bit slots ... I tried a D-Link DGE-550T (64-bit card, "stge") and it =
 worked perfectly.

 -- thorpej

State-Changed-From-To: analyzed->closed
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Fri, 13 Mar 2020 03:51:06 +0000
State-Changed-Why:
Module Name:	src
Committed By:	thorpej
Date:		Fri Mar 13 03:49:39 UTC 2020

Modified Files:
	src/sys/arch/mips/mips: bus_dma.c

Log Message:
Allow len == 0 in bus_dmamap_sync().

XXX pullup-9


To generate a diff of this commit:
cvs rdiff -u -r1.38 -r1.39 src/sys/arch/mips/mips/bus_dma.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

No longer panics, but also still doesn't work, but that is a separate
problem.


From: Nick Hudson <nick.hudson@gmx.co.uk>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, thorpej@me.com
Cc: 
Subject: Re: kern/55067: bge(4) does not work on MIPS systems (it panics)
Date: Fri, 13 Mar 2020 06:58:27 +0000

 On 12/03/2020 22:40, Michael van Elst wrote:
 > The following reply was made to PR kern/55067; it has been noted by GNAT=
 S.
 >
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: kern/55067: bge(4) does not work on MIPS systems (it panics=
 )
 > Date: Thu, 12 Mar 2020 22:36:51 -0000 (UTC)
 >
 >   thorpej@me.com writes:
 >
 >   >[  23.6599208] panic: _bus_dmamap_sync: bad length
 >
 >   >0x83a29ab8: _bus_dmamap_sync+4bc (3,83edfc90,10998,0) ra 800726ec sz =
 72
 >
 >   mips _bus_dmamap_sync erroneously traps len =3D=3D 0. It should be gra=
 cefully
 >   allowing that value like other archs.

 I said the same on source-changes-d, but here goes again.

 The check for len !=3D0 in arm bus_dma.c has caught several bugs over the
 years.

 I plan on leaving it there.

 Nick

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55067: bge(4) does not work on MIPS systems (it panics)
Date: Fri, 13 Mar 2020 07:38:14 -0000 (UTC)

 nick.hudson@gmx.co.uk (Nick Hudson) writes:

 >On 12/03/2020 22:40, Michael van Elst wrote:
 >>   mips _bus_dmamap_sync erroneously traps len == 0. It should be gracefully
 >>   allowing that value like other archs.

 >I said the same on source-changes-d, but here goes again.

 >The check for len !=0 in arm bus_dma.c has caught several bugs over the
 >years.

 >I plan on leaving it there.


 If the check is supposed to catch bugs it should be a) a message, not a panic,
 b) only for DIAGNOSTIC and c) maintained for all archs and not your lone
 decision.


 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.