NetBSD Problem Report #22646
Received: (qmail 14959 invoked by uid 605); 30 Aug 2003 23:03:20 -0000
Message-Id:
Date: Sat, 30 Aug 2003 16:03:15 -0700 (PDT)
From: gson@gson.org (Andreas Gustafsson)
Sender: gnats-bugs-owner@NetBSD.org
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@gnats.netbsd.org
Subject: Panic in ohci_add_done()
X-Send-Pr-Version: 3.95
>Number: 22646
>Category: kern
>Synopsis: Panic in ohci_add_done()
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: skrll
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Aug 30 23:04:00 +0000 2003
>Closed-Date: Tue Jan 05 20:01:28 +0000 2021
>Last-Modified: Tue Jan 05 20:20:01 +0000 2021
>Originator: Andreas Gustafsson
>Release: NetBSD-current as of Aug 22, 2003
>Organization:
>Environment:
System: NetBSD guam.araneus.fi 1.6W NetBSD 1.6W (GUAM) #0: Fri Aug 22 22:32:46 PDT 2003 gson@guava.araneus.fi:/usr/src/sys.guam/arch/i386/compile/GUAM i386
Architecture: i386
Machine: i386
>Description:
When I plug a Nikon CoolPix 4500 camera into a USB port of a Toshiba
Libretto L2 laptop running NetBSD-current and run the command "gphoto2
--port usb --debug --capture-image" twice (using gphoto2 2.1.1 from pkgsrc)
the kernel panics during the second run:
kernel: page fpanic: ohci_add_done: addr 0xf000e2c0 not found
Stopped at netbsd:cpu_Debugger+0x4: leave
db>
The backtrace is as follows: (copied by hand)
cpu_Debugger+0x4
panic+0xb8
ohci_add_done+0x66
ohci_intr1+0x164
ohci_intr+0x28
Xintr_legacy7+0xa4
This is quite repeatable.
>How-To-Repeat:
See above.
>Fix:
Unknown.
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback
State-Changed-By: mycroft
State-Changed-When: Tue Jun 29 02:55:34 UTC 2004
State-Changed-Why:
Can you retest this with -current?
Responsible-Changed-From-To: kern-bug-people->mycroft
Responsible-Changed-By: mycroft
Responsible-Changed-When: Tue Jun 29 02:55:34 UTC 2004
Responsible-Changed-Why:
.
From: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@gnats.netbsd.org
Cc: mycroft@netbsd.org, kern-bug-people@netbsd.org
Subject: Re: kern/22646
Date: Wed, 30 Jun 2004 02:51:08 -0700 (PDT)
mycroft@netbsd.org writes:
> Synopsis: Panic in ohci_add_done()
>
> State-Changed-From-To: open->feedback
> State-Changed-By: mycroft
> State-Changed-When: Tue Jun 29 02:55:34 UTC 2004
> State-Changed-Why:
> Can you retest this with -current?
I just did, with a GENERIC kernel built from yesterday's -current source.
It still panics.
--
Andreas Gustafsson, gson@gson.org
From: gson@gson.org (Andreas Gustafsson)
To: "Charles M. Hannum" <abuse@spamalicious.com>
Cc: gnats-bugs@gnats.netbsd.org, kern-bug-people@netbsd.org
Subject: Re: kern/22646
Date: Wed, 30 Jun 2004 03:44:29 -0700 (PDT)
Charles M. Hannum writes:
> Could you please send *complete* dmesg output from your machine?
See below. This is from today's test. The reason the camera is shown
both as a umass device and a ugen device is that it was accidentally
set to "mass storage" mode when I first plugged it in, so I switched
it to PTP mode (as required by gphoto2) and plugged it in again.
NetBSD 2.0F (GENERICNOATH) #0: Wed Jun 30 01:04:24 EEST 2004
gson@guam.araneus.fi:/usr/src/sys/arch/i386/compile/GENERICNOATH
total memory = 366 MB
avail memory = 351 MB
BIOS32 rev. 0 found at 0xfb5bb
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Transmeta Crusoe (586-class), 597.78 MHz, id 0x543
cpu0: Processor revision 1.3.1.3
cpu0: Code Morphing Software Rev: 4.2.5-8-148
cpu0: 20010503 11:00 official release 4.2.5#1
cpu0: LongRun mode: 1 <600MHz 1600mV 100%>
cpu0: features 80893f<FPU,VME,DE,PSE,TSC,MSR,CX8,SEP>
cpu0: features 80893f<CMOV,MMX>
cpu0: "Transmeta(tm) Crusoe(tm) Processor TM5600"
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Transmeta Corp LongRun Northbridge (rev. 0x01)
Transmeta Corp SDRAM Controller (RAM memory) at pci0 dev 0 function 1 not configured
Transmeta Corp BIOS Scratchpad (RAM memory) at pci0 dev 0 function 2 not configured
vga1 at pci0 dev 4 function 0: S3 Savage/IX+MV (rev. 0x13)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
autri0 at pci0 dev 6 function 0: Acer Labs M5451 AC-Link Controller Audio Device (rev. 0x01)
autri0: can't map memory space
pcib0 at pci0 dev 7 function 0
pcib0: Acer Labs M1543 PCI-ISA Bridge (rev. 0x00)
fxp0 at pci0 dev 14 function 0: i82559 Ethernet, rev 8
fxp0: interrupting at irq 10
fxp0: Ethernet address 00:00:39:2b:65:99
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
aceride0 at pci0 dev 16 function 0
aceride0: Acer Labs M5229 UDMA IDE Controller (rev. 0xc3)
aceride0: bus-master DMA support present
aceride0: primary channel wired to compatibility mode
aceride0: primary channel interrupting at irq 14
atabus0 at aceride0 channel 0
aceride0: secondary channel wired to compatibility mode
aceride0: secondary channel ignored (disabled)
Acer Labs M7101 Power Management Controller (miscellaneous bridge) at pci0 dev 17 function 0 not configured
cbb0 at pci0 dev 18 function 0: Toshiba ToPIC100 CardBus-PCI Bridge (rev. 0x32)
cbb0: NOT USED because of unconfigured interrupt
ohci0 at pci0 dev 20 function 0: Acer Labs M5237 USB Host Controller (rev. 0x03)
ohci0: interrupting at irq 7
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: Acer Labs OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
isa0 at pcib0
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
isapnp0: no ISA Plug 'n Play devices found
Kernelized RAIDframe activated
wd0 at atabus0 drive 0: <TOSHIBA MK1016GAP>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 9590 MB, 19485 cyl, 16 head, 63 sec, 512 bytes/sect x 19640880 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
wd0(aceride0:0:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA data transfers)
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
umass0 at uhub0 port 1 configuration 1 interface 0
umass0: NIKON Nikon Digital Camera E4500, rev 1.10/1.00, addr 2
umass0: using SCSI over Bulk-Only
scsibus0 at umass0: 2 targets, 1 lun per target
sd0 at scsibus0 target 0 lun 0: <NIKON, NIKON DSC E4500, 1.00> disk removable
sd0: fabricating a geometry
sd0: 246 MB, 246 cyl, 64 head, 32 sec, 512 bytes/sect x 503808 sectors
umass0: at uhub0 port 1 (addr 2) disconnected
sd0 detached
scsibus0 detached
umass0 detached
ugen0 at uhub0 port 1
ugen0: NIKON Nikon Digital Camera E4500_PTP, rev 1.10/1.00, addr 2
ugen0: at uhub0 port 1 (addr 2) disconnected
ugen0 detached
ugen0 at uhub0 port 1
ugen0: NIKON Nikon Digital Camera E4500_PTP, rev 1.10/1.00, addr 2
panic: ohci_add_done: addr 0xf000e2c0 not found
> Also, does it consistently output this:
>
> kernel: page fpanic: ohci_add_done: addr 0xf000e2c0 not found
> Stopped at netbsd:cpu_Debugger+0x4: leave
> db>
I don't know about the "kernel: page f" part, but as you can see from
the dmesg output above, the message "panic: ohci_add_done: addr
0xf000e2c0 not found" is still exactly the same as when I first
filed the PR.
> And are you using a USB keyboard?
No. The camera is the only USB device in use.
I have also seen the same panic with a different machine and USB device,
but unfortunately I don't remember the details.
--
Andreas Gustafsson, gson@gson.org
From: "Charles M. Hannum" <abuse@spamalicious.com>
To: gson@gson.org (Andreas Gustafsson)
Cc: gnats-bugs@gnats.netbsd.org, kern-bug-people@netbsd.org
Subject: Re: kern/22646
Date: Wed, 30 Jun 2004 10:00:48 +0000
On Wednesday 30 June 2004 09:51, Andreas Gustafsson wrote:
> mycroft@netbsd.org writes:
> > Synopsis: Panic in ohci_add_done()
> >
> > State-Changed-From-To: open->feedback
> > State-Changed-By: mycroft
> > State-Changed-When: Tue Jun 29 02:55:34 UTC 2004
> > State-Changed-Why:
> > Can you retest this with -current?
>
> I just did, with a GENERIC kernel built from yesterday's -current source.
> It still panics.
Could you please send *complete* dmesg output from your machine?
Also, does it consistently output this:
kernel: page fpanic: ohci_add_done: addr 0xf000e2c0 not found
Stopped at netbsd:cpu_Debugger+0x4: leave
db>
And are you using a USB keyboard?
From: "Charles M. Hannum" <abuse@spamalicious.com>
To: gson@gson.org (Andreas Gustafsson)
Cc: gnats-bugs@gnats.netbsd.org, kern-bug-people@netbsd.org
Subject: Re: kern/22646
Date: Wed, 30 Jun 2004 11:28:05 +0000
On Wednesday 30 June 2004 10:44, Andreas Gustafsson wrote:
> I don't know about the "kernel: page f" part, but as you can see from
> the dmesg output above, the message "panic: ohci_add_done: addr
> 0xf000e2c0 not found" is still exactly the same as when I first
> filed the PR.
Hm, I guess this particular fruit doesn't hang quite as low as I thought it
did. :-)
Could you try it again after setting ohcidebug to 15 in DDB, and send me the
output? (You will need to increase MSGBUFSIZE. 1MB should be sufficient.)
From: gson@gson.org (Andreas Gustafsson)
To: "Charles M. Hannum" <abuse@spamalicious.com>
Cc: gnats-bugs@gnats.netbsd.org, kern-bug-people@netbsd.org
Subject: Re: kern/22646
Date: Wed, 30 Jun 2004 08:42:41 -0700 (PDT)
Charles M. Hannum writes:
> Could you try it again after setting ohcidebug to 15 in DDB, and send me the
> output? (You will need to increase MSGBUFSIZE. 1MB should be sufficient.)
Done. The output is in <http://www.gson.org/netbsd/bugs/22646/dmesg.ohcidebug>.
--
Andreas Gustafsson, gson@gson.org
From: "Charles M. Hannum" <abuse@spamalicious.com>
To: gson@gson.org (Andreas Gustafsson)
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/22646
Date: Wed, 30 Jun 2004 23:25:38 +0000
[Whoops, resending so it goes into the PR...]
Oh, I see what's happening here.
When we abort a transfer due to a timeout, we're not proactively cleaning out
the done queue. Then we're reusing the same TDs. The chip is thus creating
a completely non-sensical done queue which causes us to walk off into space.
I'll have to think a little bit about what the best solution to this is. I'll
get back to you.
State-Changed-From-To: feedback->analyzed
State-Changed-By: mycroft
State-Changed-When: Wed Jun 30 23:27:54 UTC 2004
State-Changed-Why:
.
From: "Charles M. Hannum" <abuse@spamalicious.com>
To: gson@gson.org (Andreas Gustafsson)
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/22646
Date: Wed, 30 Jun 2004 23:53:33 +0000
I'm thinking that something like the following will work -- it just defers the
cleaning of the done queue to ohci_softintr(), which ensures that the call
forced by ohci_abort_xfer() will clean it out.
I have NOT tested this yet, though.
From: "Charles M. Hannum" <abuse@spamalicious.com>
To: gson@gson.org (Andreas Gustafsson)
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/22646
Date: Wed, 30 Jun 2004 23:55:37 +0000
--Boundary-00=_5L14Ay34rDDx1KK
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
I'm thinking that something like the following will work -- it just defers the
cleaning of the done queue to ohci_softintr(), which ensures that the call
forced by ohci_abort_xfer() will clean it out.
I have NOT tested this yet, though.
[This time *with* the patch...]
--Boundary-00=_5L14Ay34rDDx1KK
Content-Type: text/x-diff;
charset="iso-8859-1";
name="ohci.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="ohci.diff"
Index: ohci.c
===================================================================
RCS file: /cvsroot/src/sys/dev/usb/ohci.c,v
retrieving revision 1.149
diff -u -r1.149 ohci.c
--- ohci.c 29 Jun 2004 03:56:04 -0000 1.149
+++ ohci.c 30 Jun 2004 23:51:04 -0000
@@ -1121,7 +1121,6 @@
intrs = OHCI_WDH;
if (done & OHCI_DONE_INTRS)
intrs |= OREAD4(sc, OHCI_INTERRUPT_STATUS);
- sc->sc_hcca->hcca_done_head = 0;
} else
intrs = OREAD4(sc, OHCI_INTERRUPT_STATUS) & ~OHCI_WDH;
@@ -1151,7 +1150,9 @@
eintrs &= ~OHCI_SO;
}
if (eintrs & OHCI_WDH) {
- ohci_add_done(sc, done &~ OHCI_DONE_INTRS);
+ /* Disable the interrupt until the softintr routine cleans out
+ the queue. */
+ OWRITE4(sc, OHCI_INTERRUPT_DISABLE, OHCI_WDH);
usb_schedsoftintr(&sc->sc_bus);
eintrs &= ~OHCI_WDH;
}
@@ -1282,17 +1283,24 @@
struct ohci_pipe *opipe;
int len, cc, s;
int i, j, actlen, iframes, uedir;
+ ohci_physaddr_t done;
DPRINTFN(10,("ohci_softintr: enter\n"));
sc->sc_bus.intr_context++;
s = splhardusb();
+ done = le32toh(sc->sc_hcca->hcca_done_head);
+ sc->sc_hcca->hcca_done_head = htole32(done & OHCI_DONE_INTRS);
+ OWRITE4(sc, OHCI_INTERRUPT_ENABLE, OHCI_WDH);
+ splx(s);
+
+ ohci_add_done(sc, done &~ OHCI_DONE_INTRS);
+
sdone = sc->sc_sdone;
sc->sc_sdone = NULL;
sidone = sc->sc_sidone;
sc->sc_sidone = NULL;
- splx(s);
DPRINTFN(10,("ohci_softintr: sdone=%p sidone=%p\n", sdone, sidone));
@@ -2239,7 +2247,7 @@
}
/* Zap headp register if hardware pointed inside the xfer. */
if (hit) {
- DPRINTFN(1,("ohci_abort_xfer: set hd=0x08%x, tl=0x%08x\n",
+ DPRINTFN(1,("ohci_abort_xfer: set hd=0x%08x, tl=0x%08x\n",
(int)p->physaddr, (int)le32toh(sed->ed.ed_tailp)));
sed->ed.ed_headp = htole32(p->physaddr); /* unlink TDs */
} else {
--Boundary-00=_5L14Ay34rDDx1KK--
From: "Charles M. Hannum" <abuse@spamalicious.com>
To: gson@gson.org (Andreas Gustafsson)
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/22646
Date: Thu, 1 Jul 2004 01:18:30 +0000
--Boundary-00=_mZ24Av5+Ih8dIzn
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Here's a patch that works on my Mac cube. Could you try it and see if it
prevents the panic? (I doubt that your device will actually *work* if it's
getting timeouts, but that's a different issue.)
--Boundary-00=_mZ24Av5+Ih8dIzn
Content-Type: text/x-diff;
charset="iso-8859-1";
name="ohci3.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="ohci3.diff"
Index: ohci.c
===================================================================
RCS file: /cvsroot/src/sys/dev/usb/ohci.c,v
retrieving revision 1.149
diff -u -r1.149 ohci.c
--- ohci.c 29 Jun 2004 03:56:04 -0000 1.149
+++ ohci.c 1 Jul 2004 01:14:26 -0000
@@ -142,7 +142,6 @@
Static void ohci_poll(struct usbd_bus *);
Static void ohci_softintr(void *);
Static void ohci_waitintr(ohci_softc_t *, usbd_xfer_handle);
-Static void ohci_add_done(ohci_softc_t *, ohci_physaddr_t);
Static void ohci_rhsc(ohci_softc_t *, usbd_xfer_handle);
Static usbd_status ohci_device_request(usbd_xfer_handle xfer);
@@ -1102,7 +1101,9 @@
ohci_intr1(ohci_softc_t *sc)
{
u_int32_t intrs, eintrs;
+#if 0
ohci_physaddr_t done;
+#endif
DPRINTFN(14,("ohci_intr1: enter\n"));
@@ -1114,6 +1115,7 @@
return (0);
}
+#if 0
intrs = 0;
done = le32toh(sc->sc_hcca->hcca_done_head);
if (done != 0) {
@@ -1121,15 +1123,16 @@
intrs = OHCI_WDH;
if (done & OHCI_DONE_INTRS)
intrs |= OREAD4(sc, OHCI_INTERRUPT_STATUS);
- sc->sc_hcca->hcca_done_head = 0;
} else
intrs = OREAD4(sc, OHCI_INTERRUPT_STATUS) & ~OHCI_WDH;
+#else
+ intrs = OREAD4(sc, OHCI_INTERRUPT_STATUS);
+#endif
if (!intrs)
return (0);
- intrs &= ~OHCI_MIE;
- OWRITE4(sc, OHCI_INTERRUPT_STATUS, intrs); /* Acknowledge */
+ OWRITE4(sc, OHCI_INTERRUPT_STATUS, intrs & ~(OHCI_MIE|OHCI_WDH)); /* Acknowledge */
eintrs = intrs & sc->sc_eintrs;
if (!eintrs)
return (0);
@@ -1151,8 +1154,8 @@
eintrs &= ~OHCI_SO;
}
if (eintrs & OHCI_WDH) {
- ohci_add_done(sc, done &~ OHCI_DONE_INTRS);
usb_schedsoftintr(&sc->sc_bus);
+ OWRITE4(sc, OHCI_INTERRUPT_DISABLE, OHCI_WDH);
eintrs &= ~OHCI_WDH;
}
if (eintrs & OHCI_RD) {
@@ -1236,10 +1239,27 @@
#endif
void
-ohci_add_done(ohci_softc_t *sc, ohci_physaddr_t done)
+ohci_softintr(void *v)
{
- ohci_soft_itd_t *sitd, *sidone, **ip;
- ohci_soft_td_t *std, *sdone, **p;
+ ohci_softc_t *sc = v;
+ ohci_soft_itd_t *sitd, *sidone, *sitdnext;
+ ohci_soft_td_t *std, *sdone, *stdnext;
+ usbd_xfer_handle xfer;
+ struct ohci_pipe *opipe;
+ int len, cc, s;
+ int i, j, actlen, iframes, uedir;
+ ohci_physaddr_t done;
+
+ DPRINTFN(10,("ohci_softintr: enter\n"));
+
+ sc->sc_bus.intr_context++;
+
+ s = splhardusb();
+ done = le32toh(sc->sc_hcca->hcca_done_head);
+ sc->sc_hcca->hcca_done_head = htole32(done & OHCI_DONE_INTRS);
+ OWRITE4(sc, OHCI_INTERRUPT_STATUS, OHCI_WDH);
+ OWRITE4(sc, OHCI_INTERRUPT_ENABLE, OHCI_WDH);
+ splx(s);
/* Reverse the done list. */
for (sdone = NULL, sidone = NULL; done != 0; ) {
@@ -1259,41 +1279,9 @@
DPRINTFN(5,("add ITD %p\n", sitd));
continue;
}
- panic("ohci_add_done: addr 0x%08lx not found", (u_long)done);
+ panic("ohci_softintr: addr 0x%08lx not found", (u_long)done);
}
- /* sdone & sidone now hold the done lists. */
- /* Put them on the already processed lists. */
- for (p = &sc->sc_sdone; *p != NULL; p = &(*p)->dnext)
- ;
- *p = sdone;
- for (ip = &sc->sc_sidone; *ip != NULL; ip = &(*ip)->dnext)
- ;
- *ip = sidone;
-}
-
-void
-ohci_softintr(void *v)
-{
- ohci_softc_t *sc = v;
- ohci_soft_itd_t *sitd, *sidone, *sitdnext;
- ohci_soft_td_t *std, *sdone, *stdnext;
- usbd_xfer_handle xfer;
- struct ohci_pipe *opipe;
- int len, cc, s;
- int i, j, actlen, iframes, uedir;
-
- DPRINTFN(10,("ohci_softintr: enter\n"));
-
- sc->sc_bus.intr_context++;
-
- s = splhardusb();
- sdone = sc->sc_sdone;
- sc->sc_sdone = NULL;
- sidone = sc->sc_sidone;
- sc->sc_sidone = NULL;
- splx(s);
-
DPRINTFN(10,("ohci_softintr: sdone=%p sidone=%p\n", sdone, sidone));
#ifdef OHCI_DEBUG
@@ -2239,7 +2227,7 @@
}
/* Zap headp register if hardware pointed inside the xfer. */
if (hit) {
- DPRINTFN(1,("ohci_abort_xfer: set hd=0x08%x, tl=0x%08x\n",
+ DPRINTFN(1,("ohci_abort_xfer: set hd=0x%08x, tl=0x%08x\n",
(int)p->physaddr, (int)le32toh(sed->ed.ed_tailp)));
sed->ed.ed_headp = htole32(p->physaddr); /* unlink TDs */
} else {
--Boundary-00=_mZ24Av5+Ih8dIzn--
From: gson@gson.org (Andreas Gustafsson)
To: "Charles M. Hannum" <abuse@spamalicious.com>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/22646
Date: Thu, 1 Jul 2004 05:41:19 -0700 (PDT)
Charles M. Hannum writes:
> Here's a patch that works on my Mac cube. Could you try it and see if it
> prevents the panic?
I'm afraid it went from bad to worse - it now panics as soon
as I plug in the camera (without having to run gphoto2), or if
it is plugged in when booting, it panics during the boot.
The following dmesg output shows the latter case.
NetBSD 2.0F (OHCITEST) #1: Thu Jul 1 12:52:20 EEST 2004
gson@guam.araneus.fi:/usr/src/sys/arch/i386/compile/OHCITEST
total memory = 366 MB
avail memory = 350 MB
BIOS32 rev. 0 found at 0xfb5bb
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Transmeta Crusoe (586-class), 597.80 MHz, id 0x543
cpu0: Processor revision 1.3.1.3
cpu0: Code Morphing Software Rev: 4.2.5-8-148
cpu0: 20010503 11:00 official release 4.2.5#1
cpu0: LongRun mode: 1 <600MHz 1600mV 100%>
cpu0: features 80893f<FPU,VME,DE,PSE,TSC,MSR,CX8,SEP>
cpu0: features 80893f<CMOV,MMX>
cpu0: "Transmeta(tm) Crusoe(tm) Processor TM5600"
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Transmeta Corp LongRun Northbridge (rev. 0x01)
Transmeta Corp SDRAM Controller (RAM memory) at pci0 dev 0 function 1 not configured
Transmeta Corp BIOS Scratchpad (RAM memory) at pci0 dev 0 function 2 not configured
vga1 at pci0 dev 4 function 0: S3 Savage/IX+MV (rev. 0x13)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
autri0 at pci0 dev 6 function 0: Acer Labs M5451 AC-Link Controller Audio Device (rev. 0x01)
autri0: can't map memory space
pcib0 at pci0 dev 7 function 0
pcib0: Acer Labs M1543 PCI-ISA Bridge (rev. 0x00)
fxp0 at pci0 dev 14 function 0: i82559 Ethernet, rev 8
fxp0: interrupting at irq 10
fxp0: Ethernet address 00:00:39:2b:65:99
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
aceride0 at pci0 dev 16 function 0
aceride0: Acer Labs M5229 UDMA IDE Controller (rev. 0xc3)
aceride0: bus-master DMA support present
aceride0: primary channel wired to compatibility mode
aceride0: primary channel interrupting at irq 14
atabus0 at aceride0 channel 0
aceride0: secondary channel wired to compatibility mode
aceride0: secondary channel ignored (disabled)
Acer Labs M7101 Power Management Controller (miscellaneous bridge) at pci0 dev 17 function 0 not configured
cbb0 at pci0 dev 18 function 0: Toshiba ToPIC100 CardBus-PCI Bridge (rev. 0x32)
cbb0: NOT USED because of unconfigured interrupt
ohci0 at pci0 dev 20 function 0: Acer Labs M5237 USB Host Controller (rev. 0x03)
ohci0: interrupting at irq 7
ohci_init: start
ohci0: OHCI version 1.0, legacy support
ohci_alloc_sed: allocating chunk
ohci_init: SMM active, request owner change
ohci0: resetting
ohci_dumpregs: rev=0x00000110 control=0x000000bf command=0x00000000
intrstat=0x00000004 intre=0x8000004b intrd=0x8000004b
hcca=0x019f6000 percur=0x00000000 ctrlhd=0x019f7fe0
ctrlcur=0x00000000 bulkhd=0x019f7fc0 bulkcur=0x00000000
done=0x00000000 fmival=0xa7782edf fmrem=0x800008a7
fmnum=0x00000017 perst=0x00002a2f lsthrs=0x00000628
desca=0x01000902 descb=0x001e0000 stat=0x00000000
port1=0x00000000 port2=0x00000000
HCCA: frame_number=0x001e done_head=0x00000000
usb0 at ohci0: USB revision 1.0
ohci_open: pipe=0xc0fd2700, addr=0, endpt=0 (0)
ohci_root_ctrl_control type=0x00 request=05
ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0100
ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0100
ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0300
ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0301
ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0301
ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0302
ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0302
uhub0 at usb0ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0301
ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0301
ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0302
ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0302
uhub0: Acer Labs OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0200
ohci_root_ctrl_control type=0x80 request=06
ohci_root_ctrl_control wValue=0x0200
ohci_root_ctrl_control type=0x00 request=09
ohci_root_ctrl_control type=0xa0 request=06
uhub0: 2 ports with 2 removable, self powered
ohci_open: pipe=0xc0fd2580, addr=1, endpt=129 (1)
ohci_root_ctrl_control type=0x02 request=01
ohci_root_ctrl_control type=0x23 request=03
ohci_root_ctrl_transfer: set port power 1
ohci_root_ctrl_control type=0x23 request=03
ohci_root_ctrl_transfer: set port power 2
isa0 at pcib0
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
isapnp0: no ISA Plug 'n Play devices found
ohci_intr1: enter
ohci_intr: sc=0xc1113000 intrs=0x44(0x40) eintrs=0x40
ohci_rhsc: sc=0xc1113000 xfer=0xc0fd2680 hstatus=0x00000000
ohci_rhsc: change=0x04
ohci_rhsc_able: on=0
Kernelized RAIDframe activated
ohci_root_ctrl_control type=0xa3 request=00
ohci_root_ctrl_transfer: get port status i=1
ohci_root_ctrl_transfer: port status=0x0100
ohci_root_ctrl_control type=0xa3 request=00
ohci_root_ctrl_transfer: get port status i=2
ohci_root_ctrl_transfer: port status=0x10101
ohci_root_ctrl_control type=0x23 request=01
ohci_root_ctrl_control: UR_CLEAR_PORT_FEATURE port=2 feature=16
ohci_rhsc_able: on=1
ohci_rhsc_able: on=1
ohci_root_ctrl_control type=0x23 request=03
ohci_root_ctrl_transfer: reset port 2
ohci_intr1: enter
ohci_intr: sc=0xc1113000 intrs=0x44(0x40) eintrs=0x40
ohci_rhsc: sc=0xc1113000 xfer=0xc0fd2680 hstatus=0x00000000
ohci_rhsc: change=0x04
ohci_rhsc_able: on=0
ohci port 2 reset, status = 0x100103
ohci_root_ctrl_control type=0xa3 request=00
ohci_root_ctrl_transfer: get port status i=2
ohci_root_ctrl_transfer: port status=0x100103
ohci_root_ctrl_control type=0x23 request=01
ohci_root_ctrl_control: UR_CLEAR_PORT_FEATURE port=2 feature=20
ohci_rhsc_able: on=1
ohci_root_ctrl_control type=0xa3 request=00
ohci_root_ctrl_transfer: get port status i=2
ohci_root_ctrl_transfer: port status=0x0103
ohci_open: pipe=0xc1132600, addr=0, endpt=0 (1)
ohci_alloc_std: allocating chunk
ohci_add_ed: sed=0xc86817a0 head=0xc8681fe0
ohci_device_control type=0x00, request=0x05, wValue=0x0002, wIndex=0x0000 len=0, addr=0, endpt=0
ohci_device_request:
ED(0xc86817a0) at 0x019f77a0: addr=0 endpt=0 maxp=8 flags=80000
tailp=0x01a6c7d0 headflags=1a6c7d0 headp=0x01a6c7d0 nexted=0x00000000
TD(0xc8eb07d0) at 01a6c7d0: f2e00000<SETTOGGLE> delay=7 ec=0 cc=15
cbp=0x019f8f80 nexttd=0x01a6c7a0 be=0x019f8f87
TD(0xc8eb07a0) at 01a6c7a0: f3300000<IN,TOG1,SETTOGGLE> delay=1 ec=0 cc=15
cbp=0x00000000 nexttd=0x01a6c770 be=0x00000000
TD(0xc8eb0770) at 01a6c770: 0 delay=0 ec=0 cc=0
cbp=0x00000000 nexttd=0x00000000 be=0x00000000
ohci_intr1: enter
ohci_intr: sc=0xc1113000 intrs=0x6(0x2) eintrs=0x2
ohci_softintr: enter
panic: ohci_softintr: addr 0x01a6c7a1 not found
syncing disks... ohci_intr1: enter
ohci_intr: sc=0xc1113000 intrs=0x44(0x40) eintrs=0x40
ohci_rhsc: sc=0xc1113000 xfer=0xc0fd2680 hstatus=0x00000000
ohci_rhsc: change=0x04
ohci_rhsc_able: on=0
done
ohci_shutdown: stopping the HC
rebooting...
--
Andreas Gustafsson, gson@gson.org
From: "Charles M. Hannum" <abuse@spamalicious.com>
To: gson@gson.org (Andreas Gustafsson)
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/22646
Date: Thu, 1 Jul 2004 16:45:42 +0000
On Thursday 01 July 2004 12:42, Andreas Gustafsson wrote:
> The following reply was made to PR kern/22646; it has been noted by GNATS.
>
> From: gson@gson.org (Andreas Gustafsson)
> To: "Charles M. Hannum" <abuse@spamalicious.com>
> Cc: gnats-bugs@netbsd.org
> Subject: Re: kern/22646
> Date: Thu, 1 Jul 2004 05:41:19 -0700 (PDT)
>
> Charles M. Hannum writes:
> > Here's a patch that works on my Mac cube. Could you try it and see if
> > it prevents the panic?
>
> I'm afraid it went from bad to worse - it now panics as soon
> as I plug in the camera (without having to run gphoto2), or if
> it is plugged in when booting, it panics during the boot.
> The following dmesg output shows the latter case.
Whoops. I forgot to add:
done &= ~OHCI_DONE_INTRS;
in ohci_softintr(). Can you add that and try it again?
From: gson@gson.org (Andreas Gustafsson)
To: "Charles M. Hannum" <abuse@spamalicious.com>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/22646
Date: Sat, 3 Jul 2004 02:57:45 -0700 (PDT)
Charles M. Hannum writes:
> Whoops. I forgot to add:
>
> done &= ~OHCI_DONE_INTRS;
>
> in ohci_softintr(). Can you add that and try it again?
I added it immediately before the comment "/* Reverse the done list. */";
I hope that was the right place.
With that change, it's back to crashing after running gphoto2 twice.
It now gets a page fault trap in usb_transfer_complete(). The
following was transcribed from the screen by hand after a crash:
...
ohci_intr: sc=0xc1113000 intrs=0x6(0x2) eintrs=0x2
kernel: page fault trap, code=0
Stopped in pid 147.1 (syslogd) at netbsd: usb_transfer_complete+0xa6:
movl 0x48(%eax), eax
db> t
usbd_transfer_complete(...)...+0xa6
ohci_softintr(...)...+0x3be
...
The complete dmesg output is in
<http://www.gson.org/netbsd/bugs/22646/dmesg.ohcidebug2>. Note that
the "panic: ohci_softintr: addr 0xf000e2c0 not found" near the end of
the output happened only after I typed "reboot" at the db> prompt; the
actual crash was at the uvm_fault line a few lines earlier.
--
Andreas Gustafsson, gson@gson.org
Responsible-Changed-From-To: mycroft->kern-bug-people
Responsible-Changed-By: wiz@netbsd.org
Responsible-Changed-When: Sun, 03 Sep 2006 01:15:03 +0000
Responsible-Changed-Why:
Back to role account, mycroft doesn't have commit access any longer.
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/22646: Panic in ohci_add_done()
Date: Sat, 4 Dec 2010 13:34:00 +0200
I tried to reproduce kern/22646 in -current using the releng
201012022200Z i386 snapshot, an ALI PCI USB card (EHCI/OHCI),
the same camera as in the original bug report, and gphoto2 from
current pkgsrc.
When I run "gphoto2 --port usb --debug --capture-image" twice in a row
as in the original bug report, I no longer get an instant panic during
the second run, but instead gphoto2 process hangs. It takes two
control-C presses to get a shell prompt back. If I then run the
command a third time, I get a crash, but a different one than before:
now it's a kernel trap in usb_transfer_complete() rather than a panic
in ohci_add_done(). Screnshot at
http://www.gson.org/netbsd/bugs/22646/trap.jpg
I'm not sure if this should still be considered the same bug, but in
any case, NetBSD's USB stack is still broken. I suspect it's still
an ohci issue, because I have been unable to trigger a similar crash
with uhci.
--
Andreas Gustafsson, gson@gson.org
Responsible-Changed-From-To: kern-bug-people->skrll
Responsible-Changed-By: skrll@NetBSD.org
Responsible-Changed-When: Mon, 25 Mar 2013 07:57:40 +0000
Responsible-Changed-Why:
I'm looking at ohci atm
State-Changed-From-To: analyzed->feedback
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Sat, 23 Apr 2016 13:04:37 +0000
State-Changed-Why:
Does this problem still exist in 7.99.28 (nick-nhusb merged)
From: Andreas Gustafsson <gson@gson.org>
To: skrll@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/22646 (Panic in ohci_add_done())
Date: Sat, 23 Apr 2016 20:41:20 +0300
skrll@NetBSD.org wrote:
> Does this problem still exist in 7.99.28 (nick-nhusb merged)
I repeated the test from the PR using the following configuration:
- Desktop with Asus P5B-V motherboard
- amd64 GENERIC kernel built from 2016.04.23.13.14.52 sources
- Userland built from 2016.04.10.18.27.56 sources
- /dev/usb9 through /dev/usb15 added using MAKEDEV
- Nikon CoolPix 4500 in PTP mode connected to ALI EHCI/OHCI PCI card
- gphoto2-2.5.8
- libgphoto2-2.5.8nb1
- libusb-0.1.12nb4 + patch to increase number of supported USB busses to 16
- gphoto2 command: gphoto2 --port usb --debug --capture-image >log 2>&1
It took a few runs of starting gphoto2, seeing it hang, and killing it
after 5-30 seconds by hitting control-C twice, but the end result was
that the machine locked up, instead of panicing as it had done in
earlier tests.
Breaking into ddb on the serial console shows:
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x59a
Xintr_ioapic_edge9() at netbsd:Xintr_ioapic_edge9+0xea
--- interrupt ---
ohci_softintr() at netbsd:ohci_softintr+0x454
usb_soft_intr() at netbsd:usb_soft_intr+0x1f
softint_dispatch() at netbsd:softint_dispatch+0xd3
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe8044f7eff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:
db{0}> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
170 1 3 3 0 fffffe80aa9ee8e0 cron tstile
783 1 3 1 0 fffffe80a6503500 cron tstile
574 1 3 3 0 fffffe80a6503920 cron wait
171 1 3 1 0 fffffe80ab6ec0c0 cron wait
686 1 3 1 0 fffffe80ab6ec4e0 gphoto2 usbxfer
146 1 3 1 80 fffffe804642d780 sh wait
429 1 3 1 80 fffffe8046848420 sh wait
104 1 3 1 80 fffffe804642d360 sshd select
486 1 3 1 80 fffffe80adb2a860 sshd select
636 1 3 1 0 fffffe804642dba0 sh biolock
455 1 3 2 80 fffffe80464a3760 getty ttyraw
625 1 3 3 80 fffffe80460d5320 getty ttyraw
643 1 3 0 80 fffffe80464a3b80 getty ttyraw
600 1 3 1 80 fffffe8045e87620 login wait
516 1 3 1 0 fffffe80ae0ad460 cron tstile
613 1 3 0 80 fffffe80adb2a020 inetd kqueue
565 1 3 3 80 fffffe80ae0ad040 qmgr kqueue
511 1 3 3 80 fffffe80adb2a440 pickup kqueue
571 1 3 3 80 fffffe80aa9ee0a0 master kqueue
349 1 3 0 80 fffffe80ae0ad880 powerd kqueue
182 1 3 1 0 fffffe8046848000 syslogd biowait
1 1 3 1 80 fffffe8045facae0 init wait
0 87 3 1 200 fffffe804600c720 physiod physiod
0 86 3 2 200 fffffe80460d5740 aiodoned aiodoned
0 85 3 3 200 fffffe80460d5b60 ioflush genput
0 84 3 2 200 fffffe804600c300 pgdaemon pgdaemon
0 81 3 2 200 fffffe804600cb40 atapibus0 sccomp
0 78 3 2 200 fffffe8045e87200 usb12 usbevt
0 77 3 2 200 fffffe8045e49a80 usb13 usbevt
0 76 3 2 200 fffffe8045ec8640 usb11 usbevt
0 75 3 2 200 fffffe8045e49660 usb9 tstile
0 74 3 2 200 fffffe8045e87a40 usb6 usbevt
0 73 3 2 200 fffffe8045ec8220 usb10 usbevt
0 72 3 2 200 fffffe8045f9f2e0 usb8 usbevt
0 71 3 3 200 fffffe8045f9f700 usb3 usbevt
0 70 3 1 200 fffffe8045f9fb20 usb2 usbevt
0 69 3 3 200 fffffe80460462c0 usb5 usbevt
0 68 3 1 200 fffffe80460466e0 usb7 usbevt
0 67 3 1 200 fffffe8046046b00 usb0 usbevt
0 66 3 3 200 fffffe8045fac2a0 usb1 usbevt
0 65 3 2 200 fffffe8045fac6c0 usb4 usbevt
0 64 3 2 200 fffffe8045f0e280 unpgc unpgc
0 63 3 2 200 fffffe8045f0e6a0 vmem_rehash vmem_rehash
0 62 3 0 200 fffffe8045f0eac0 coretemp3 coretemp3
0 61 3 1 200 fffffe8045ecc260 coretemp2 coretemp2
0 60 3 2 200 fffffe8045ecc680 coretemp1 coretemp1
0 59 3 3 200 fffffe8045eccaa0 coretemp0 coretemp0
0 58 3 2 200 fffffe8045e49240 aibs0 aibs0
0 49 3 1 200 fffffe8045e661e0 atabus6 atath
0 48 3 0 200 fffffe8045e66600 atabus5 atath
0 47 3 1 200 fffffe8045e66a20 iic0 iicintr
0 46 3 2 200 fffffe8045a231c0 atabus4 atath
0 45 3 3 200 fffffe8045a235e0 atabus3 atath
0 44 3 2 280 fffffe8045a23a00 fw0probe ieee1394
0 43 3 0 200 fffffe80457641a0 atabus2 atath
0 42 3 1 200 fffffe80457645c0 atabus1 atath
0 41 3 3 200 fffffe80457649e0 atabus0 atath
0 40 3 0 200 fffffe80456bf180 usbtask-dr usbtsk
0 39 2 0 200 fffffe80456bf5a0 usbtask-hc
0 38 3 1 200 fffffe80456bf9c0 intelfb intelfb
0 37 3 0 200 fffffe8045030160 i915 i915
0 36 3 3 200 fffffe8045030580 xcall/3 xcall
0 35 1 3 200 fffffe80450309a0 softser/3
0 34 1 3 200 fffffe804502e140 softclk/3
0 33 1 3 200 fffffe804502e560 softbio/3
0 32 1 3 200 fffffe804502e980 softnet/3
0 > 31 7 3 201 fffffe804502d120 idle/3
0 30 3 2 200 fffffe804502d540 xcall/2 xcall
0 29 1 2 200 fffffe804502d960 softser/2
0 28 1 2 200 fffffe804502b100 softclk/2
0 27 1 2 200 fffffe804502b520 softbio/2
0 26 1 2 200 fffffe804502b940 softnet/2
0 > 25 7 2 201 fffffe8044fe30e0 idle/2
0 24 3 1 200 fffffe8044fe3500 xcall/1 xcall
0 23 1 1 200 fffffe8044fe3920 softser/1
0 22 1 1 200 fffffe8044fe10c0 softclk/1
0 21 1 1 200 fffffe8044fe14e0 softbio/1
0 20 1 1 200 fffffe8044fe1900 softnet/1
0 > 19 7 1 201 fffffe80ae71f0a0 idle/1
0 18 3 3 200 fffffe80ae71f4c0 lnxsyswq lnxsyswq
0 17 3 0 200 fffffe80ae71f8e0 cryptoret crypto_w
0 16 3 0 200 fffffe80ae727080 sysmon smtaskq
0 15 3 2 200 fffffe80ae7274a0 pmfsuspend pmfsuspend
0 14 3 1 200 fffffe80ae7278c0 pmfevent pmfevent
0 13 3 0 200 fffffe80aed41060 sopendfree sopendfr
0 12 3 0 200 fffffe80aed41480 nfssilly nfssilly
0 11 3 3 200 fffffe80aed418a0 cachegc cachegc
0 10 3 1 200 fffffe80af34c040 vrele vrele
0 9 3 3 200 fffffe80af34c460 vdrain vdrain
0 8 3 0 200 fffffe80af34c880 modunload mod_unld
0 7 3 0 200 fffffe80af35a020 xcall/0 xcall
0 6 1 0 200 fffffe80af35a440 softser/0
0 5 1 0 200 fffffe80af35a860 softclk/0
0 4 1 0 200 fffffe80af35d000 softbio/0
0 > 3 7 0 200 fffffe80af35d420 softnet/0
0 > 2 7 0 201 fffffe80af35d840 idle/0
0 1 3 3 200 ffffffff812605e0 swapper uvm
db{0}>
--
Andreas Gustafsson, gson@gson.org
From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Andreas Gustafsson <gson@gson.org>
Cc:
Subject: Re: kern/22646 (Panic in ohci_add_done())
Date: Sat, 23 Apr 2016 19:35:41 +0100
On 04/23/16 18:45, Andreas Gustafsson wrote:
> The following reply was made to PR kern/22646; it has been noted by GNATS.
>
> From: Andreas Gustafsson <gson@gson.org>
> To: skrll@NetBSD.org
> Cc: gnats-bugs@NetBSD.org
> Subject: Re: kern/22646 (Panic in ohci_add_done())
> Date: Sat, 23 Apr 2016 20:41:20 +0300
>
> skrll@NetBSD.org wrote:
> > Does this problem still exist in 7.99.28 (nick-nhusb merged)
>
> I repeated the test from the PR using the following configuration:
>
> - Desktop with Asus P5B-V motherboard
> - amd64 GENERIC kernel built from 2016.04.23.13.14.52 sources
> - Userland built from 2016.04.10.18.27.56 sources
> - /dev/usb9 through /dev/usb15 added using MAKEDEV
> - Nikon CoolPix 4500 in PTP mode connected to ALI EHCI/OHCI PCI card
> - gphoto2-2.5.8
> - libgphoto2-2.5.8nb1
> - libusb-0.1.12nb4 + patch to increase number of supported USB busses to 16
> - gphoto2 command: gphoto2 --port usb --debug --capture-image >log 2>&1
>
> It took a few runs of starting gphoto2, seeing it hang,
Can you generate usbhist logs with a OHCI_DEBUG kernel and
sysctl -w hw.ohci.debug=10
You can extract the logs using
vmstat -u usbhist
> and killing it
> after 5-30 seconds by hitting control-C twice, but the end result was
> that the machine locked up, instead of panicing as it had done in
> earlier tests.
>
> Breaking into ddb on the serial console shows:
>
> db{0}> bt
> breakpoint() at netbsd:breakpoint+0x5
> comintr() at netbsd:comintr+0x59a
> Xintr_ioapic_edge9() at netbsd:Xintr_ioapic_edge9+0xea
> --- interrupt ---
> ohci_softintr() at netbsd:ohci_softintr+0x454
> usb_soft_intr() at netbsd:usb_soft_intr+0x1f
> softint_dispatch() at netbsd:softint_dispatch+0xd3
> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe8044f7eff0
> Xsoftintr() at netbsd:Xsoftintr+0x4f
> --- interrupt ---
> 0:
> db{0}> ps
ps/w output as well please
> PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
> 170 1 3 3 0 fffffe80aa9ee8e0 cron tstile
> 783 1 3 1 0 fffffe80a6503500 cron tstile
> 574 1 3 3 0 fffffe80a6503920 cron wait
> 171 1 3 1 0 fffffe80ab6ec0c0 cron wait
> 686 1 3 1 0 fffffe80ab6ec4e0 gphoto2 usbxfer
bt/a fffffe80ab6ec4e0
> 146 1 3 1 80 fffffe804642d780 sh wait
> 429 1 3 1 80 fffffe8046848420 sh wait
> 104 1 3 1 80 fffffe804642d360 sshd select
> 486 1 3 1 80 fffffe80adb2a860 sshd select
> 636 1 3 1 0 fffffe804642dba0 sh biolock
> 455 1 3 2 80 fffffe80464a3760 getty ttyraw
> 625 1 3 3 80 fffffe80460d5320 getty ttyraw
> 643 1 3 0 80 fffffe80464a3b80 getty ttyraw
> 600 1 3 1 80 fffffe8045e87620 login wait
> 516 1 3 1 0 fffffe80ae0ad460 cron tstile
> 613 1 3 0 80 fffffe80adb2a020 inetd kqueue
> 565 1 3 3 80 fffffe80ae0ad040 qmgr kqueue
> 511 1 3 3 80 fffffe80adb2a440 pickup kqueue
> 571 1 3 3 80 fffffe80aa9ee0a0 master kqueue
> 349 1 3 0 80 fffffe80ae0ad880 powerd kqueue
> 182 1 3 1 0 fffffe8046848000 syslogd biowait
> 1 1 3 1 80 fffffe8045facae0 init wait
> 0 87 3 1 200 fffffe804600c720 physiod physiod
> 0 86 3 2 200 fffffe80460d5740 aiodoned aiodoned
> 0 85 3 3 200 fffffe80460d5b60 ioflush genput
> 0 84 3 2 200 fffffe804600c300 pgdaemon pgdaemon
> 0 81 3 2 200 fffffe804600cb40 atapibus0 sccomp
> 0 78 3 2 200 fffffe8045e87200 usb12 usbevt
> 0 77 3 2 200 fffffe8045e49a80 usb13 usbevt
> 0 76 3 2 200 fffffe8045ec8640 usb11 usbevt
> 0 75 3 2 200 fffffe8045e49660 usb9 tstile
bt/a fffffe8045e49660
(in fact backtrace from all the "tstile"d lwps is probably best)
Thanks,
Nick
From: Andreas Gustafsson <gson@gson.org>
To: Nick Hudson <skrll@netbsd.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/22646 (Panic in ohci_add_done())
Date: Sun, 24 Apr 2016 12:42:20 +0300
Nick Hudson wrote:
> Can you generate usbhist logs with a OHCI_DEBUG kernel and
>
> sysctl -w hw.ohci.debug=10
>
> You can extract the logs using
>
> vmstat -u usbhist
For clarity, I'm providing the information you requested in the order
it was produced rather than the order it was requested, so the usbhist
data will be at the end. First, a backtrace from the hung OCHI_DEBUG
kernel:
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x59a
Xintr_ioapic_edge9() at netbsd:Xintr_ioapic_edge9+0xea
--- interrupt ---
bus_space_read_4() at netbsd:bus_space_read_4+0xa
bintime() at netbsd:bintime+0x11
microtime() at netbsd:microtime+0x15
ohci_softintr() at netbsd:ohci_softintr+0x7a8
usb_soft_intr() at netbsd:usb_soft_intr+0x1f
softint_dispatch() at netbsd:softint_dispatch+0xd3
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe8044f4eff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
> ps/w output as well please
db{0}> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
275 1 3 2 1000000 fffffe8046788420 gphoto2 usbxfer
74 1 3 2 80 fffffe80465ed360 sh wait
41 1 3 2 80 fffffe80465edba0 sh wait
612 1 3 3 80 fffffe8046363760 getty ttyraw
622 1 3 0 80 fffffe8046162320 getty ttyraw
573 1 3 1 80 fffffe8046363b80 getty ttyraw
623 1 3 2 80 fffffe8045e57200 login wait
601 1 3 1 80 fffffe80ab2864c0 cron nanoslp
582 1 3 1 80 fffffe80ab485900 inetd kqueue
561 1 3 0 80 fffffe80ad69b060 qmgr kqueue
566 1 3 2 80 fffffe80ad69b480 pickup kqueue
575 1 3 2 80 fffffe80ab2860a0 master kqueue
357 1 3 0 80 fffffe80ada2d040 sshd select
343 1 3 1 80 fffffe80ada2d880 powerd kqueue
180 1 2 2 0 fffffe8046788000 syslogd
1 1 3 2 80 fffffe8045f3aae0 init wait
0 87 3 2 200 fffffe8045fdc720 physiod physiod
0 86 3 3 200 fffffe8046162740 aiodoned aiodoned
0 85 3 3 200 fffffe8046162b60 ioflush syncer
0 84 3 3 200 fffffe8045fdc300 pgdaemon pgdaemon
0 81 3 3 200 fffffe8045fdcb40 atapibus0 sccomp
0 78 3 3 200 fffffe8045e19a80 usb12 usbevt
0 77 3 3 200 fffffe8045e98a60 usb13 usbevt
0 76 3 3 200 fffffe8045e98220 usb11 usbevt
0 75 3 3 200 fffffe8045e57620 usb8 usbevt
0 74 3 3 200 fffffe8045e57a40 usb10 usbevt
0 73 3 3 200 fffffe8045e98640 usb9 tstile
0 72 3 3 200 fffffe8045edb2e0 usb4 usbevt
0 71 3 1 200 fffffe8045edb700 usb5 usbevt
0 70 3 1 200 fffffe8045edbb20 usb6 usbevt
0 69 3 1 200 fffffe8045f0d2c0 usb7 usbevt
0 68 3 0 200 fffffe8045f0d6e0 usb3 usbevt
0 67 3 1 200 fffffe8045f0db00 usb1 usbevt
0 66 3 0 200 fffffe8045f3a2a0 usb0 usbevt
0 65 3 0 200 fffffe8045f3a6c0 usb2 usbevt
0 64 3 3 200 fffffe8045ede280 unpgc unpgc
0 63 3 3 200 fffffe8045ede6a0 vmem_rehash vmem_rehash
0 62 3 3 200 fffffe8045edeac0 coretemp3 coretemp3
0 61 3 2 200 fffffe8045e9c260 coretemp2 coretemp2
0 60 3 0 200 fffffe8045e9c680 coretemp1 coretemp1
0 59 3 1 200 fffffe8045e9caa0 coretemp0 coretemp0
0 58 3 2 200 fffffe8045e19240 aibs0 aibs0
0 49 3 2 200 fffffe8045e361e0 atabus6 atath
0 48 3 1 200 fffffe8045e36600 atabus5 atath
0 47 3 0 200 fffffe8045e36a20 iic0 iicintr
0 46 3 3 200 fffffe80459f31c0 atabus4 atath
0 45 3 0 200 fffffe80459f35e0 atabus3 atath
0 44 3 1 280 fffffe80459f3a00 fw0probe ieee1394
0 43 3 1 200 fffffe80457341a0 atabus2 atath
0 42 3 0 200 fffffe80457345c0 atabus1 atath
0 41 3 2 200 fffffe80457349e0 atabus0 atath
0 40 3 0 200 fffffe804564f180 usbtask-dr usbtsk
0 39 2 0 200 fffffe804564f5a0 usbtask-hc
0 38 3 0 200 fffffe804564f9c0 intelfb intelfb
0 37 3 0 200 fffffe8045040160 i915 i915
0 36 3 3 200 fffffe8045040580 xcall/3 xcall
0 35 1 3 200 fffffe80450409a0 softser/3
0 34 1 3 200 fffffe8044fbe140 softclk/3
0 33 1 3 200 fffffe8044fbe560 softbio/3
0 32 1 3 200 fffffe8044fbe980 softnet/3
0 > 31 7 3 201 fffffe8044ffd120 idle/3
0 30 3 2 200 fffffe8044ffd540 xcall/2 xcall
0 29 1 2 200 fffffe8044ffd960 softser/2
0 28 1 2 200 fffffe8044ffb100 softclk/2
0 27 1 2 200 fffffe8044ffb520 softbio/2
0 26 1 2 200 fffffe8044ffb940 softnet/2
0 > 25 7 2 201 fffffe8044fb30e0 idle/2
0 24 3 1 200 fffffe8044fb3500 xcall/1 xcall
0 23 1 1 200 fffffe8044fb3920 softser/1
0 22 1 1 200 fffffe8044fb10c0 softclk/1
0 21 1 1 200 fffffe8044fb14e0 softbio/1
0 20 1 1 200 fffffe8044fb1900 softnet/1
0 > 19 7 1 201 fffffe80ae63c0a0 idle/1
0 18 3 0 200 fffffe80ae63c4c0 lnxsyswq lnxsyswq
0 17 3 0 200 fffffe80ae63c8e0 cryptoret crypto_w
0 16 3 0 200 fffffe80ae639080 sysmon smtaskq
0 15 3 0 200 fffffe80ae6394a0 pmfsuspend pmfsuspend
0 14 3 0 200 fffffe80ae6398c0 pmfevent pmfevent
0 13 3 0 200 fffffe80aec53060 sopendfree sopendfr
0 12 3 0 200 fffffe80aec53480 nfssilly nfssilly
0 11 3 0 200 fffffe80aec538a0 cachegc cachegc
0 10 3 0 200 fffffe80af260040 vrele vrele
0 9 3 0 200 fffffe80af260460 vdrain vdrain
0 8 3 0 200 fffffe80af260880 modunload mod_unld
0 7 3 0 200 fffffe80af265020 xcall/0 xcall
0 6 1 0 200 fffffe80af265440 softser/0
0 5 1 0 200 fffffe80af265860 softclk/0
0 4 1 0 200 fffffe80af272000 softbio/0
0 > 3 7 0 200 fffffe80af272420 softnet/0
0 > 2 7 0 201 fffffe80af272840 idle/0
0 1 3 2 200 ffffffff812779e0 swapper uvm
db{0}> ps/w
PID LID COMMAND EMUL PRI WAIT-MSG WAIT-CHANNEL
275 1 gphoto2 netbsd 43 usbxfer fffffe8045f8b720
74 1 sh netbsd 43 wait fffffe804662b800
41 1 sh netbsd 43 wait fffffe804662bd30
612 1 getty netbsd 43 ttyraw fffffe80ae406950
622 1 getty netbsd 43 ttyraw fffffe804658ca50
573 1 getty netbsd 43 ttyraw fffffe80ae413a10
623 1 login netbsd 43 wait fffffe8045eb2a90
601 1 cron netbsd 43 nanoslp fffffe80ab2864c0
582 1 inetd netbsd 42 kqueue fffffe80ade71770
561 1 qmgr netbsd 43 kqueue fffffe80aba07280
566 1 pickup netbsd 43 kqueue fffffe80ade714f0
575 1 master netbsd 43 kqueue fffffe80ade71450
357 1 sshd netbsd 43 select fffffe80af4d6b80
343 1 powerd netbsd 43 kqueue fffffe80ae3996c8
180 1 syslogd netbsd 43 0
1 1 init netbsd 43 wait fffffe8045eb2d28
0 87 system netbsd 123 physiod fffffe8045ff6c08
0 86 system netbsd 125 aiodoned fffffe8045ff6d08
0 85 system netbsd 124 syncer fffffe8046162b60
0 84 system netbsd 126 pgdaemon ffffffff817cacb4
0 81 system netbsd 96 sccomp fffffe80457265b8
0 78 system netbsd 96 usbevt ffff8000051c9450
0 77 system netbsd 96 usbevt fffffe804580e490
0 76 system netbsd 96 usbevt ffff8000051c7450
0 75 system netbsd 96 usbevt ffff8000051c1450
0 74 system netbsd 96 usbevt fffffe80457c8490
0 73 system netbsd 96 tstile ffff8000051c34b8
0 72 system netbsd 96 usbevt ffff8000051b9450
0 71 system netbsd 96 usbevt ffff8000051bb450
0 70 system netbsd 96 usbevt fffffe80457c2490
0 69 system netbsd 96 usbevt ffff8000051bf450
0 68 system netbsd 96 usbevt ffff8000051b7450
0 67 system netbsd 96 usbevt ffff8000051b1450
0 66 system netbsd 96 usbevt ffff8000051af450
0 65 system netbsd 96 usbevt fffffe8045675490
0 64 system netbsd 96 unpgc ffffffff8183e520
0 63 system netbsd 125 vmem_rehash fffffe8045e21948
0 62 system netbsd 43 coretemp3 fffffe80459d2188
0 61 system netbsd 43 coretemp2 fffffe80459d2248
0 60 system netbsd 43 coretemp1 fffffe80459d2348
0 59 system netbsd 43 coretemp0 fffffe80459d2408
0 58 system netbsd 43 aibs0 fffffe80459d2508
0 49 system netbsd 96 atath fffffe8045df75f0
0 48 system netbsd 96 atath fffffe8045df7368
0 47 system netbsd 96 iicintr fffffe80459d25a0
0 46 system netbsd 96 atath fffffe8045a3c5f0
0 45 system netbsd 96 atath fffffe8045a3c368
0 44 system netbsd 96 ieee1394 ffff8000061fc870
0 43 system netbsd 96 atath fffffe80457265f0
0 42 system netbsd 96 atath ffff8000061a68a0
0 41 system netbsd 96 atath ffff8000061a6398
0 40 system netbsd 96 usbtsk ffffffff81304ed8
0 39 system netbsd 96 0
0 38 system netbsd 43 intelfb fffffe8045332cc8
0 37 system netbsd 43 i915 fffffe80af26a248
0 36 system netbsd 127 xcall fffffe8044fda0d8
0 35 system netbsd 223 0
0 34 system netbsd 220 0
0 33 system netbsd 221 0
0 32 system netbsd 222 0
0 > 31 system netbsd 0 0
0 30 system netbsd 127 xcall fffffe8044fb80d8
0 29 system netbsd 223 0
0 28 system netbsd 220 0
0 27 system netbsd 221 0
0 26 system netbsd 222 0
0 > 25 system netbsd 0 0
0 24 system netbsd 127 xcall fffffe80ae62b0d8
0 23 system netbsd 223 0
0 22 system netbsd 220 0
0 21 system netbsd 221 0
0 20 system netbsd 222 0
0 > 19 system netbsd 0 0
0 18 system netbsd 43 lnxsyswq fffffe80af26abc8
0 17 system netbsd 96 crypto_w ffffffff817acfe0
0 16 system netbsd 96 smtaskq ffffffff817ad880
0 15 system netbsd 43 pmfsuspend fffffe80af26acc8
0 14 system netbsd 43 pmfevent fffffe80af26ad88
0 13 system netbsd 96 sopendfr ffffffff8183e490
0 12 system netbsd 43 nfssilly fffffe80af26ae88
0 11 system netbsd 125 cachegc fffffe80aec538a0
0 10 system netbsd 125 vrele ffffffff812bb700
0 9 system netbsd 125 vdrain ffffffff812bb7c0
0 8 system netbsd 125 mod_unld ffffffff818301b0
0 7 system netbsd 127 xcall ffffffff811f65d8
0 6 system netbsd 223 0
0 5 system netbsd 220 0
0 4 system netbsd 221 0
0 > 3 system netbsd 222 0
0 > 2 system netbsd 0 0
0 1 system netbsd 125 uvm ffffffff812779e0
db{0}>
> bt/a fffffe80ab6ec4e0
In this run, gphoto2 is lwp fffffe8046788420:
db{0}> bt/a fffffe8046788420
trace: pid 275 lid 1 at 0xfffffe80463bc7d0
sleepq_block() at netbsd:sleepq_block+0x97
cv_wait() at netbsd:cv_wait+0xfb
usbd_transfer() at netbsd:usbd_transfer+0x795
usbd_do_request_flags() at netbsd:usbd_do_request_flags+0xd9
usbd_get_string_desc() at netbsd:usbd_get_string_desc+0x83
usbd_get_string0() at netbsd:usbd_get_string0+0x175
usbd_devinfo_vp() at netbsd:usbd_devinfo_vp+0x12e
usbd_fill_deviceinfo() at netbsd:usbd_fill_deviceinfo+0x64
usbioctl() at netbsd:usbioctl+0x479
cdev_ioctl() at netbsd:cdev_ioctl+0x88
VOP_IOCTL() at netbsd:VOP_IOCTL+0x3b
vn_ioctl() at netbsd:vn_ioctl+0xa6
sys_ioctl() at netbsd:sys_ioctl+0x101
syscall() at netbsd:syscall+0x15b
--- syscall (number 54) ---
7a96850e88fa:
db{0}>
> > bt/a fffffe8045e49660
In this run, usb9 is lwp fffffe8045e98640:
db{0}> bt/a fffffe8045e98640
trace: pid 0 lid 73 at 0xfffffe8045facda0
sleepq_block() at netbsd:sleepq_block+0x97
turnstile_block() at netbsd:turnstile_block+0x35c
mutex_vector_enter() at netbsd:mutex_vector_enter+0x29c
cv_timedwait() at netbsd:cv_timedwait+0x10e
usb_event_thread() at netbsd:usb_event_thread+0x9d
> (in fact backtrace from all the "tstile"d lwps is probably best)
That was the only lwp in tstile in this run.
To get the usbhist data, I did:
db{0}> reboot 0x100
[...reboot and savecore...]
# gunzip /var/crash/netbsd.3.*
# vmstat -M /var/crash/netbsd.3.core -u usbhist >usbhist.txt
The file usbhist.txt is too large to include here, but you can
download it from
http://www.gson.org/netbsd/bugs/22646/usbhist.txt
--
Andreas Gustafsson, gson@gson.org
From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Andreas Gustafsson <gson@gson.org>
Cc:
Subject: Re: kern/22646 (Panic in ohci_add_done())
Date: Sun, 24 Apr 2016 13:05:28 +0100
This is a multi-part message in MIME format.
--------------070409060704070203000102
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
On 04/24/16 10:45, Andreas Gustafsson wrote:
> The file usbhist.txt is too large to include here, but you can
> download it from
>
> http://www.gson.org/netbsd/bugs/22646/usbhist.txt
it's looping around a weird done list.
Can you increase USBHIST_SIZE and set usbdebug to 5 with the attached
patch applied?
Full dmesg showing the bus with the camera (and other devices) as well
please.
Thanks,
NIck
--------------070409060704070203000102
Content-Type: text/x-patch;
name="ohci.c.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="ohci.c.diff"
Index: sys/dev/usb/ohci.c
===================================================================
RCS file: /cvsroot/src/sys/dev/usb/ohci.c,v
retrieving revision 1.254.2.72
diff -u -p -r1.254.2.72 ohci.c
--- sys/dev/usb/ohci.c 16 Apr 2016 16:02:42 -0000 1.254.2.72
+++ sys/dev/usb/ohci.c 24 Apr 2016 12:05:06 -0000
@@ -1381,6 +1381,11 @@ ohci_softintr(void *v)
int len, cc;
int i, j, actlen, iframes, uedir;
ohci_physaddr_t done;
+#ifdef OHCI_DEBUG
+ extern int usbdebug;
+ ohci_physaddr_t seen[2] = {0, 0};
+ int s = 0;
+#endif
KASSERT(sc->sc_bus.ub_usepolling || mutex_owned(&sc->sc_lock));
@@ -1402,22 +1407,38 @@ ohci_softintr(void *v)
for (sdone = NULL, sidone = NULL; done != 0; ) {
std = ohci_hash_find_td(sc, done);
if (std != NULL) {
+ seen[(s++) & 1] = done;
usb_syncmem(&std->dma, std->offs, sizeof(std->td),
BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD);
std->dnext = sdone;
done = O32TOH(std->td.td_nexttd);
sdone = std;
DPRINTFN(10, "add TD %p", std, 0, 0, 0);
+#ifdef OHCI_DEBUG
+ if (done == seen[0] || done == seen[1]) {
+ usbdebug = 0;
+ ohcidebug = 0;
+ break;
+ }
+#endif
continue;
}
sitd = ohci_hash_find_itd(sc, done);
if (sitd != NULL) {
+ seen[(s++) & 1] = done;
usb_syncmem(&sitd->dma, sitd->offs, sizeof(sitd->itd),
BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD);
sitd->dnext = sidone;
done = O32TOH(sitd->itd.itd_nextitd);
sidone = sitd;
DPRINTFN(5, "add ITD %p", sitd, 0, 0, 0);
+#ifdef OHCI_DEBUG
+ if (done == seen[0] || done == seen[1]) {
+ usbdebug = 0;
+ ohcidebug = 0;
+ break;
+ }
+#endif
continue;
}
DPRINTFN(10, "addr %p not found", done, 0, 0, 0);
--------------070409060704070203000102--
From: Andreas Gustafsson <gson@gson.org>
To: Nick Hudson <skrll@netbsd.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/22646 (Panic in ohci_add_done())
Date: Sun, 24 Apr 2016 15:29:08 +0300
Nick Hudson wrote:
> Can you increase USBHIST_SIZE
To how much? 500000?
> and set usbdebug to 5
Like this?
sysctl -w hw.ohci.debug=10
sysctl -w hw.usb.debug=5
Also, if you would like me to run some gdb commands against the kernel
core file (have symbols), or make the core and kernel available for
download, just let me know.
--
Andreas Gustafsson, gson@gson.org
From: Nick Hudson <nick.hudson@gmx.co.uk>
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Andreas Gustafsson <gson@gson.org>
Cc:
Subject: Re: kern/22646 (Panic in ohci_add_done())
Date: Sun, 24 Apr 2016 12:42:01 +0100
On 24-Apr-16 1:30 PM, Andreas Gustafsson wrote:
> The following reply was made to PR kern/22646; it has been noted by GNATS.
>
> From: Andreas Gustafsson <gson@gson.org>
> To: Nick Hudson <skrll@netbsd.org>
> Cc: gnats-bugs@NetBSD.org
> Subject: Re: kern/22646 (Panic in ohci_add_done())
> Date: Sun, 24 Apr 2016 15:29:08 +0300
>
> Nick Hudson wrote:
> > Can you increase USBHIST_SIZE
>
> To how much? 500000?
That should do it. :)
>
> > and set usbdebug to 5
>
> Like this?
>
> sysctl -w hw.ohci.debug=10
> sysctl -w hw.usb.debug=5
yeah.
>
> Also, if you would like me to run some gdb commands against the kernel
> core file (have symbols), or make the core and kernel available for
> download, just let me know.
The bus lock is being held forever as ohci_softintr gets stuck in an
infinite loop.
Nick
From: Andreas Gustafsson <gson@gson.org>
To: Nick Hudson <nick.hudson@gmx.co.uk>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/22646 (Panic in ohci_add_done())
Date: Sun, 24 Apr 2016 18:54:06 +0300
Nick Hudson wrote:
> > > Can you increase USBHIST_SIZE
> > To how much? 500000?
> That should do it. :)
Actually, it didn't: with USBHIST_SIZE=500000, the kernel failed to
boot. I reduced it to 100000, and it booted again.
With your patch, the system no longer locks up, presumably due to the
added "break" statements. Instead, after a few runs of gphoto2 that
hang and need to be manually interrupted by a pair of control C:s as
before, all subsequent runs of gphoto2 exit by themselves in less than
a second. In the above usbhist2 output, each gphoto2 run hung and was
interrupted by C-c C-c, except for the last one:
# sh cam.sh
^C^C
# sh cam.sh
^C^C
# sh cam.sh
# vmstat -u >usbhist2.txt
The output from "vmstat -u usbhist" is at
http://www.gson.org/netbsd/bugs/22646/usbhist2.txt
> Full dmesg showing the bus with the camera (and other devices) as well
> please.
http://www.gson.org/netbsd/bugs/22646/dmesg.2016.txt
--
Andreas Gustafsson, gson@gson.org
From: Nick Hudson <nick.hudson@gmx.co.uk>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/22646 (Panic in ohci_add_done())
Date: Mon, 25 Apr 2016 08:30:46 +0100
This is a multi-part message in MIME format.
--------------020300030403000900070201
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
On 04/24/16 16:54, Andreas Gustafsson wrote:
> Nick Hudson wrote:
>>>> Can you increase USBHIST_SIZE
>>> To how much? 500000?
>> That should do it. :)
> Actually, it didn't: with USBHIST_SIZE=500000, the kernel failed to
> boot. I reduced it to 100000, and it booted again.
>
> With your patch, the system no longer locks up, presumably due to the
> added "break" statements.
Right.
Can you do the test again with the modified patch attached and also
provide UGEN_DEBUG/ugendebug=10 output?
vmstat -u usbhist as well
Thanks,
Nick
--------------020300030403000900070201
Content-Type: text/x-patch;
name="ohci.c.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="ohci.c.diff"
Index: sys/dev/usb/ohci.c
===================================================================
RCS file: /cvsroot/src/sys/dev/usb/ohci.c,v
retrieving revision 1.254.2.72
diff -u -p -r1.254.2.72 ohci.c
--- sys/dev/usb/ohci.c 16 Apr 2016 16:02:42 -0000 1.254.2.72
+++ sys/dev/usb/ohci.c 25 Apr 2016 07:30:28 -0000
@@ -1381,6 +1381,12 @@ ohci_softintr(void *v)
int len, cc;
int i, j, actlen, iframes, uedir;
ohci_physaddr_t done;
+#ifdef OHCI_DEBUG
+ extern int usbdebug;
+ ohci_physaddr_t seen[2] = {0, 0};
+ bool debugdone = false;
+ int s = 0;
+#endif
KASSERT(sc->sc_bus.ub_usepolling || mutex_owned(&sc->sc_lock));
@@ -1402,22 +1408,40 @@ ohci_softintr(void *v)
for (sdone = NULL, sidone = NULL; done != 0; ) {
std = ohci_hash_find_td(sc, done);
if (std != NULL) {
+#ifdef OHCI_DEBUG
+ seen[(s++) & 1] = done;
+#endif
usb_syncmem(&std->dma, std->offs, sizeof(std->td),
BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD);
std->dnext = sdone;
done = O32TOH(std->td.td_nexttd);
sdone = std;
DPRINTFN(10, "add TD %p", std, 0, 0, 0);
+#ifdef OHCI_DEBUG
+ if (done != 0 && (done == seen[0] || done == seen[1])) {
+ debugdone = true;
+ break;
+ }
+#endif
continue;
}
sitd = ohci_hash_find_itd(sc, done);
if (sitd != NULL) {
+#ifdef OHCI_DEBUG
+ seen[(s++) & 1] = done;
+#endif
usb_syncmem(&sitd->dma, sitd->offs, sizeof(sitd->itd),
BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD);
sitd->dnext = sidone;
done = O32TOH(sitd->itd.itd_nextitd);
sidone = sitd;
DPRINTFN(5, "add ITD %p", sitd, 0, 0, 0);
+#ifdef OHCI_DEBUG
+ if (done != 0 && (done == seen[0] || done == seen[1])) {
+ debugdone = true;
+ break;
+ }
+#endif
continue;
}
DPRINTFN(10, "addr %p not found", done, 0, 0, 0);
@@ -1584,6 +1608,12 @@ ohci_softintr(void *v)
usb_transfer_complete(xfer);
}
}
+#ifdef OHCI_DEBUG
+ if (debugdone) {
+ usbdebug = 0;
+ ohcidebug = 0;
+ }
+#endif
if (sc->sc_softwake) {
sc->sc_softwake = 0;
--------------020300030403000900070201--
From: Andreas Gustafsson <gson@gson.org>
To: Nick Hudson <nick.hudson@gmx.co.uk>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/22646 (Panic in ohci_add_done())
Date: Mon, 25 Apr 2016 12:07:49 +0300
Nick Hudson wrote:
> Can you do the test again with the modified patch attached and also
> provide UGEN_DEBUG/ugendebug=10 output?
Doing just that, I was unable to reproduce the previous behavior;
gphoto2 just kept hanging and never got into the non-hanging state.
I figured this was because of all those ugen debug printfs going to the
9600 bps serial console and slowing things down, so I redirected the
ugen debug logs to the message buffer only by changing the lines
#define DPRINTF(x) if (ugendebug) printf x
#define DPRINTFN(n,x) if (ugendebug>(n)) printf x
to
#define DPRINTF(x) if (ugendebug) printf_tolog x
#define DPRINTFN(n,x) if (ugendebug>(n)) printf_tolog x
With that change, I was again able to get the behavior where gphoto2
stops hanging after a few runs. The following logs are again from a
number of hanging gphoto2 runs followed by a single non-hanging one.
http://www.gson.org/netbsd/bugs/22646/dmesg3.txt
> vmstat -u usbhist as well
http://www.gson.org/netbsd/bugs/22646/usbhist3.txt
--
Andreas Gustafsson, gson@gson.org
From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Andreas Gustafsson <gson@gson.org>
Cc:
Subject: Re: kern/22646 (Panic in ohci_add_done())
Date: Tue, 26 Apr 2016 14:04:57 +0100
This is a multi-part message in MIME format.
--------------090601000904070803000502
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Does this patch (without the ohci.c patch) help?
Not sure why your camera doesn't like begin asked for strings at the
moment and ohci should do a better good of not getting confused.
Nick
--------------090601000904070803000502
Content-Type: text/x-patch;
name="usb.c.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="usb.c.diff"
Index: sys/dev/usb/usb.c
===================================================================
RCS file: /cvsroot/src/sys/dev/usb/usb.c,v
retrieving revision 1.163
diff -u -p -r1.163 usb.c
--- sys/dev/usb/usb.c 26 Apr 2016 12:58:48 -0000 1.163
+++ sys/dev/usb/usb.c 26 Apr 2016 13:01:18 -0000
@@ -787,7 +787,7 @@ usbioctl(dev_t devt, u_long cmd, void *d
error = ENXIO;
goto fail;
}
- usbd_fill_deviceinfo(dev, di, 1);
+ usbd_fill_deviceinfo(dev, di, false);
break;
}
--------------090601000904070803000502--
From: Andreas Gustafsson <gson@gson.org>
To: skrll@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/22646 (Panic in ohci_add_done())
Date: Tue, 26 Apr 2016 17:16:50 +0300
Nick Hudson wrote:
> Does this patch (without the ohci.c patch) help?
No, it does not. It's back to locking up the system, with this
backtrace:
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x59a
Xintr_ioapic_edge9() at netbsd:Xintr_ioapic_edge9+0xea
--- interrupt ---
bus_space_read_4() at netbsd:bus_space_read_4+0xa
bintime() at netbsd:bintime+0x11
microtime() at netbsd:microtime+0x15
ohci_softintr() at netbsd:ohci_softintr+0x7a8
usb_soft_intr() at netbsd:usb_soft_intr+0x1f
softint_dispatch() at netbsd:softint_dispatch+0xd3
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe8044f2dff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
I have a saved kernel core dump, but its usbhist is just lots of lines
like these:
1461679403.982811 ohci_softintr#67@0: add TD 0xffff8000334ffe20
1461679403.982811 ohci_softintr#67@0: add TD 0xffff8000334ffd60
1461679403.982812 ohci_softintr#67@0: add TD 0xffff8000334ffe20
1461679403.928422 ohci_softintr#67@0: add TD 0xffff8000334ffd60
> Not sure why your camera doesn't like begin asked for strings at the
> moment and ohci should do a better good of not getting confused.
Just to be clear, this PR is all about ohci getting confused. My goal
is not to get the camera to work, but to get NetBSD to not crash. The
camera just happens to be the only way I know of triggering those
crashes.
--
Andreas Gustafsson, gson@gson.org
State-Changed-From-To: feedback->open
State-Changed-By: gson@NetBSD.org
State-Changed-When: Mon, 09 May 2016 16:20:08 +0000
State-Changed-Why:
Feedback provided.
State-Changed-From-To: open->analyzed
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Mon, 09 May 2016 16:34:05 +0000
State-Changed-Why:
When we abort a transfer due, we're not proactively cleaning out the done
queue. Then we're reusing the same TDs. The chip can create a loop in the
done queue which means we never exit the softint and the bus lock is never
released, etc.
That is, "What Charles said"
From: "Nick Hudson" <skrll@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/22646 CVS commit: [nick-nhusb] src/sys/dev/usb
Date: Mon, 30 May 2016 06:46:50 +0000
Module Name: src
Committed By: skrll
Date: Mon May 30 06:46:50 UTC 2016
Modified Files:
src/sys/dev/usb [nick-nhusb]: ohci.c ohcivar.h
Log Message:
Restructure the abort code for TD based transfers (ctrl, bulk, intr).
In PR/22646 some TDs can be on the done queue when the abort start and,
if this is the case, they need to processed after the WDH interrupt.
Instead of waiting for WDH we release TDs that have been touched by the
HC and replace them with new ones. Once WDH happens the floating TDs
will be returned to the free list.
To generate a diff of this commit:
cvs rdiff -u -r1.254.2.75 -r1.254.2.76 src/sys/dev/usb/ohci.c
cvs rdiff -u -r1.55.6.14 -r1.55.6.15 src/sys/dev/usb/ohcivar.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Nick Hudson" <skrll@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/22646 CVS commit: src/sys/dev/usb
Date: Wed, 9 Dec 2020 07:10:01 +0000
Module Name: src
Committed By: skrll
Date: Wed Dec 9 07:10:01 UTC 2020
Modified Files:
src/sys/dev/usb: ohci.c ohcivar.h
Log Message:
Restructure the abort code for TD based transfers (ctrl, bulk, intr).
In PR kern/22646 some TDs can be on the done queue when the abort start
and, if this is the case, they need to processed after the WDH interrupt.
Instead of waiting for WDH we release TDs that have been touched by the
HC and replace them with new ones. Once WDH happens the floating TDs
will be returned to the free list.
Also addresses the issue seen in PR kern/55835
Thanks to both Andreas Gustafsson and Edgar Fuß for testing. Apologies to
Andreas Gustafsson for not committing this to HEAD for 4y6m.w
To generate a diff of this commit:
cvs rdiff -u -r1.310 -r1.311 src/sys/dev/usb/ohci.c
cvs rdiff -u -r1.61 -r1.62 src/sys/dev/usb/ohcivar.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->pending-pullups
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Sat, 12 Dec 2020 16:59:50 +0000
State-Changed-Why:
[pullup-8 #1637]
[pullup-9 #1150]
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/22646 CVS commit: [netbsd-9] src/sys/dev/usb
Date: Sat, 12 Dec 2020 20:29:11 +0000
Module Name: src
Committed By: martin
Date: Sat Dec 12 20:29:11 UTC 2020
Modified Files:
src/sys/dev/usb [netbsd-9]: ohci.c ohcivar.h
Log Message:
Pull up following revision(s) (requested by skrll in ticket #1150):
sys/dev/usb/ohci.c: revision 1.311
sys/dev/usb/ohci.c: revision 1.312
sys/dev/usb/ohcivar.h: revision 1.62
Restructure the abort code for TD based transfers (ctrl, bulk, intr).
In PR kern/22646 some TDs can be on the done queue when the abort start
and, if this is the case, they need to processed after the WDH interrupt.
Instead of waiting for WDH we release TDs that have been touched by the
HC and replace them with new ones. Once WDH happens the floating TDs
will be returned to the free list.
Also addresses the issue seen in PR kern/55835
Thanks to both Andreas Gustafsson and Edgar Fu=C3=9F for testing. Apologi=
es to
Andreas Gustafsson for not committing this to HEAD for 4y6m.w
Remove leading space that crept in in the last change
To generate a diff of this commit:
cvs rdiff -u -r1.289.4.5 -r1.289.4.6 src/sys/dev/usb/ohci.c
cvs rdiff -u -r1.60 -r1.60.6.1 src/sys/dev/usb/ohcivar.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/22646 CVS commit: [netbsd-8] src/sys/dev/usb
Date: Sat, 12 Dec 2020 20:30:57 +0000
Module Name: src
Committed By: martin
Date: Sat Dec 12 20:30:57 UTC 2020
Modified Files:
src/sys/dev/usb [netbsd-8]: ohci.c ohcivar.h
Log Message:
Pull up following revision(s) (requested by skrll in ticket #1637):
sys/dev/usb/ohci.c: revision 1.311
sys/dev/usb/ohci.c: revision 1.312
sys/dev/usb/ohcivar.h: revision 1.62
Restructure the abort code for TD based transfers (ctrl, bulk, intr).
In PR kern/22646 some TDs can be on the done queue when the abort start
and, if this is the case, they need to processed after the WDH interrupt.
Instead of waiting for WDH we release TDs that have been touched by the
HC and replace them with new ones. Once WDH happens the floating TDs
will be returned to the free list.
Also addresses the issue seen in PR kern/55835
Thanks to both Andreas Gustafsson and Edgar Fu=C3=9F for testing. Apologi=
es to
Andreas Gustafsson for not committing this to HEAD for 4y6m.w
Remove leading space that crept in in the last change
To generate a diff of this commit:
cvs rdiff -u -r1.273.6.6 -r1.273.6.7 src/sys/dev/usb/ohci.c
cvs rdiff -u -r1.58.10.1 -r1.58.10.2 src/sys/dev/usb/ohcivar.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->feedback
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Fri, 18 Dec 2020 14:15:53 +0000
State-Changed-Why:
ok to close?
State-Changed-From-To: feedback->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Tue, 05 Jan 2021 20:01:28 +0000
State-Changed-Why:
Panic no longer happens.
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: skrll@netbsd.org
Subject: Re: kern/22646 (Panic in ohci_add_done())
Date: Tue, 5 Jan 2021 21:58:39 +0200
On Dec 18, skrll@NetBSD.org wrote:
> ok to close?
Yes - I am no longer able to make the NetBSD kernel panic or hang
using the combination of gphoto2, a Nikon CoolPix 4500, and OHCI.
It also does not panic or hang with UHCI, EHCI or XHCI.
That doesn't mean gphoto2 actually works - it still doesn't, but
that's a different bug.
--
Andreas Gustafsson, gson@gson.org
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.