NetBSD Problem Report #53598
From martin@aprisoft.de Wed Sep 12 11:37:25 2018
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 9665A7A111
for <gnats-bugs@gnats.NetBSD.org>; Wed, 12 Sep 2018 11:37:25 +0000 (UTC)
Message-Id: <20180912113715.03F7F5CC906@emmas.aprisoft.de>
Date: Wed, 12 Sep 2018 13:37:15 +0200 (CEST)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: assertion failure in usb detach
X-Send-Pr-Version: 3.95
>Number: 53598
>Category: kern
>Synopsis: assertion failure in usb detach
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: mrg
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Sep 12 11:40:00 +0000 2018
>Closed-Date: Fri Aug 23 07:37:39 +0000 2019
>Last-Modified: Fri Aug 23 07:37:39 +0000 2019
>Originator: Martin Husemann
>Release: NetBSD 8.99.25
>Organization:
The NetBSD Foundation, Inc.
>Environment:
(not this actual kernel...)
System: NetBSD seven-days-to-the-wolves.aprisoft.de 8.99.25 NetBSD 8.99.25 (GENERIC) #236: Tue Sep 4 08:48:16 CEST 2018 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
Trying to reboot a GENERIC kernel from the 201809111830Z daily build
reproducable crashes for me like this (manual transscript):
panic: kernel diagnostic assertion "cv_is_valid(cv)" failed: file ...kern/kern_condvar.c line 151
vpanic
ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
cv_destroy() at netbsd:cv_destroy+0x55
usb_detach() at netbsd:usb_detach+0xf9
...
>How-To-Repeat:
s/a
>Fix:
n/a
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/53598: assertion failure in usb detach
Date: Wed, 12 Sep 2018 14:25:42 +0000
Problem is that this (VirtualBox) xhci only partly attached, USB revision
is 0 and we never make it to usb_doattach(), but set sc_dying instead.
When we get to usb_detach, the softint never had been established
nor the cv initialized.
Maybe we should define several values for sc_dying, representing how
far during initalization we got? Something like: sc_dying = 3 if we
did not initialize anything, sc_dying = 2 if softint is valid but
cv is not, sc_dying = 1 if we have been fully functional but are now
winding down.
Martin
From: "matthew green" <mrg@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53598 CVS commit: src/sys/dev
Date: Tue, 18 Sep 2018 05:24:10 +0000
Module Name: src
Committed By: mrg
Date: Tue Sep 18 05:24:10 UTC 2018
Modified Files:
src/sys/dev/pci: xhci_pci.c
src/sys/dev/usb: usb.c
Log Message:
deal with partial attach failures in usb_attach vs usb_detach aka PR 53598.
- make sure xhci's sc->sc_ios is NULL if failure happens.
- rearrange usb_attach() / usb_doattach() to make it simpler to clean up.
- move usb_async_intr softint into usb_once_init(). previously, each USB
controller would start a new one, and leave the old one leaked.
- handle controller interrupts without a bus attached
To generate a diff of this commit:
cvs rdiff -u -r1.13 -r1.14 src/sys/dev/pci/xhci_pci.c
cvs rdiff -u -r1.172 -r1.173 src/sys/dev/usb/usb.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: kern-bug-people->mrg
Responsible-Changed-By: mrg@NetBSD.org
Responsible-Changed-When: Tue, 18 Sep 2018 05:39:11 +0000
Responsible-Changed-Why:
i wrote a patch.
State-Changed-From-To: open->feedback
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Tue, 18 Sep 2018 05:39:11 +0000
State-Changed-Why:
can you test -current? i forced a detach failure at the top of
usb_attach() like you described, and i was able to boot, reboot
and it worked. thanks.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53598 CVS commit: [netbsd-8] src
Date: Thu, 27 Sep 2018 14:52:27 +0000
Module Name: src
Committed By: martin
Date: Thu Sep 27 14:52:27 UTC 2018
Modified Files:
src/share/man/man4 [netbsd-8]: usb.4
src/sys/ddb [netbsd-8]: db_command.c db_output.c
src/sys/dev/pci [netbsd-8]: xhci_pci.c
src/sys/dev/usb [netbsd-8]: ehci.c ohci.c uhci.c uhcivar.h uhub.c usb.c
usb_subr.c usbdi.c xhci.c
src/sys/external/bsd/dwc2 [netbsd-8]: dwc2.c
src/sys/kern [netbsd-8]: subr_userconf.c
Log Message:
Pull up following revision(s) (requested by mrg in ticket #1037):
sys/dev/usb/uhub.c: revision 1.139
sys/external/bsd/dwc2/dwc2.c: revision 1.55
sys/ddb/db_output.c: revision 1.34
sys/ddb/db_command.c: revision 1.160
sys/dev/usb/ehci.c: revision 1.264
sys/dev/usb/xhci.c: revision 1.99
sys/dev/usb/ehci.c: revision 1.265
sys/kern/subr_userconf.c: revision 1.27
sys/dev/usb/ehcivar.h: revision 1.46
sys/dev/usb/ohci.c: revision 1.287
sys/dev/usb/uhci.c: revision 1.284
sys/dev/usb/usbdi.c: revision 1.178
sys/dev/usb/usb.c: revision 1.172
sys/dev/pci/xhci_pci.c: revision 1.14
sys/dev/usb/usb.c: revision 1.173
sys/dev/usb/usb.c: revision 1.174
share/man/man4/usb.4: revision 1.110
sys/ddb/db_command.c: revision 1.159
sys/dev/usb/usb_subr.c: revision 1.227
sys/dev/usb/uhcivar.h: revision 1.56
(all via patch)
consolidate the handling of polling across HC drivers, and generic USB:
- don't take mutexes if polling
- normalise the code across all drivers
- add some not yet code to block discovery to/from polling
- minor CSE
- adjust comment for usbd_set_polling() to reality now i properly
understand what it is used for and why.
this, with a hack to make RB_ASKNAME to wait 5 seconds allows boot -a
work with USB keyboards. there are still multiple issues remaining:
- discovery and polling need to be mutually exclusive
- attachment of ukbd and wskbd is not handled by config_pending, and
the 5 second delay isn't going to always be enough.
call cnpollc(1) and cnpollc(0) around cngetc().
(christos has a good idea to add a function that does all 3,
and we should switch all the callers in this sequence to use
it (and fix the MD ones missing it still). not all can, as
eg, line-grabbing functions can use cngetsn(), which only
calls cnpollc() twice.)
When this file is used when not building the kernel (eg: /usr/sbin/crash)
make cnpollc() go away.
reorder some struct members to remove holes.
add config_pending usage to uhub and general USB device attachment.
- call config_pending_incr() and config_pending_decr() around attaching
devices against "usbdevif" attribute.
uhub:
- convert sc_explorepending and sc_running to bool. add new sc_first_explore.
- call config_pending_incr() at the start of uhub_attach(). dropped in
uhub_explore(), if this is the first explore.
implement a gross hack to fix "boot -a" on systems with usb keyboards on
systems with ehci handover to uhci (and maybe ohci), and fix a similar
problem for "boot -s".
there is effort to ensure that all devices attached via USB are probed
before RB_ASKNAME or RB_SINGLE attempts to ask any questions on the console,
and largely this works, often by chance, today, for USB disks and root.
i've recently pushed this more into uhub and general USB device attachment
as well, and kept a config_pending reference across the first explore of
a bus. these fix many issues with directly attached hubs.
however, on systems where devices connected to ehci ports are handed over
to a companion uhci or ohci port, it may not be the first, or even second,
bus explore that finds the device finally before attachment, and at this
point all config_pending references are dropped.
there is no direct communication between drivers, the potentials are
looked up but their device_t is only used for generic things like the name,
so informing the correct companion to expect a device and deal with the
config_pending references is not possible without some fairly ugly layer
violations or multi-level callbacks (eg, we have "ehci0", and usually an
the relevant companion, eg, "uhci2", but it is the uhub that uhci2 has
attached that will deal with the device attachment.)
with the above fixes to generic USB code, the disown happens during the
first explore. the hack works by, at this point, checking if (a) root
is not mounted, (b) single user or ask name are set, and (c) if the hack
as not been triggered already. if all 3 conditions are true, then a
config_pending_incr() is called and a callback is triggered for (default)
5 seconds to call config_pending_decr(). ehci detach pauses waiting for
this callback if scheduled.
this allows enough time for the uhub and the ukbd/wskbd to attach before
the RK_ASKROOT prompts appear. testing shows it takes between 1.5 and
2 seconds for the keyboard to appear after the disown occurs.
Index: dev/usb/ehcivar.c
- new sc_compcallout, sc_compcallout, sc_complock, and a state for th
handover hack.
Index: dev/usb/ehci.c
ehci_init():
- use aprint_normal_dev() instead of manual device_xname().
- initialise sc_compcallout, sc_compcallout, sc_complock, and sc_comp_state.
ehci_detach():
- if there are companion controllers, tear own the above, including waiting
if there is a callback scheduled.
ehci_disown_callback():
- new callout to call config_pending_decr() in the the future.
schedule this ca
ehci_disown_sched_callback():
- if booting to single user or asking names, call config_pending_incr() and
schedule the callout above, default 5 second delay.
ehci_disown():
- if disowning a port call ehci_disown_sched_callback().
deal with partial attach failures in usb_attach vs usb_detach aka PR 53598.
- make sure xhci's sc->sc_ios is NULL if failure happens.
- rearrange usb_attach() / usb_doattach() to make it simpler to clean up.
- move usb_async_intr softint into usb_once_init(). previously, each USB
controller would start a new one, and leave the old one leaked.
- handle controller interrupts without a bus attached
remove usb(4)'s "flags 1" code. it has been dead for a while,
as it runs during the interrupts part of configuration now,
and all the devices try attach as early as possible, including
any root or boot required disk or keyboard device, which is
what this flag was for.
To generate a diff of this commit:
cvs rdiff -u -r1.105 -r1.105.4.1 src/share/man/man4/usb.4
cvs rdiff -u -r1.148.8.4 -r1.148.8.5 src/sys/ddb/db_command.c
cvs rdiff -u -r1.33 -r1.33.32.1 src/sys/ddb/db_output.c
cvs rdiff -u -r1.8 -r1.8.6.1 src/sys/dev/pci/xhci_pci.c
cvs rdiff -u -r1.254.8.4 -r1.254.8.5 src/sys/dev/usb/ehci.c
cvs rdiff -u -r1.273.6.3 -r1.273.6.4 src/sys/dev/usb/ohci.c
cvs rdiff -u -r1.275.2.4 -r1.275.2.5 src/sys/dev/usb/uhci.c
cvs rdiff -u -r1.53.10.1 -r1.53.10.2 src/sys/dev/usb/uhcivar.h
cvs rdiff -u -r1.136.2.1 -r1.136.2.2 src/sys/dev/usb/uhub.c
cvs rdiff -u -r1.165.6.3 -r1.165.6.4 src/sys/dev/usb/usb.c
cvs rdiff -u -r1.220.2.4 -r1.220.2.5 src/sys/dev/usb/usb_subr.c
cvs rdiff -u -r1.173.2.2 -r1.173.2.3 src/sys/dev/usb/usbdi.c
cvs rdiff -u -r1.72.2.7 -r1.72.2.8 src/sys/dev/usb/xhci.c
cvs rdiff -u -r1.46.2.2 -r1.46.2.3 src/sys/external/bsd/dwc2/dwc2.c
cvs rdiff -u -r1.26 -r1.26.22.1 src/sys/kern/subr_userconf.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: feedback->closed
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Fri, 23 Aug 2019 07:37:39 +0000
State-Changed-Why:
feedback timeout. submitter applied fixes to netbsd-8 branch 11 months ago.. ;)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.