NetBSD Problem Report #53019

From paul@whooppee.com  Mon Feb 12 23:44:02 2018
Return-Path: <paul@whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 6FBC57A1EA
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 12 Feb 2018 23:44:02 +0000 (UTC)
Message-Id: <20180212234359.66D6316E44@speedy.whooppee.com>
Date: Tue, 13 Feb 2018 07:43:59 +0800 (+08)
From: paul@whooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: xhci-connected keyboard with LOCKDEBUG kernel causes panic
X-Send-Pr-Version: 3.95

>Number:         53019
>Category:       kern
>Synopsis:       xhci-connected keyboard with LOCKDEBUG kernel causes panic
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 12 23:45:00 +0000 2018
>Closed-Date:    Wed Jun 09 16:10:29 +0000 2021
>Last-Modified:  Wed Jun 09 16:10:29 +0000 2021
>Originator:     Paul Goyette
>Release:        NetBSD 8.99.12
>Organization:
+------------------+--------------------------+----------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:          |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+------------------+--------------------------+----------------------------+
>Environment:


System: NetBSD speedy.whooppee.com 8.99.12 NetBSD 8.99.12 (SPEEDY 2018-02-12 00:00:12 UTC) #3: Mon Feb 12 06:57:12 UTC 2018 paul@speedy.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/SPEEDY amd64
Architecture: x86_64
Machine: amd64
>Description:
With a LOCKDEBUG kernel and a USB keyboard attached via a xhci USB-3 port,
typing a character at the DDB(4) prompt causes a kernel panic.

It appears that the xhci_device_intr_start() code is trying to obtain
a spin mutex while another spin mutex is already held (perhaps in the
xhci_poll() routine?).

Here's the console output from the LOCKDEBUG panic - all transcribed by
hand, but hopefully without too many typos!

Mutex error: mutex_vector_enter,523: spin lock held
lock address: 0xffffe410e9d1d9a0   type: spin
initialized:  0xffffffff802bac06
shared holds:                  0   exclusive:                  1
shares wanted:                 0   exclusive:                  0
current CPU:                  11   last held:                 11
curlwp:       0xffffe41fc09ad2c0   last held: 0xffffe41fc09ad2c0
last locked*: 0xffffffff802b81de   unlocked:  0xffffffff80291179
owner field:  0x0000000000010600   wait/spin:                0/1
panic: LOCKDEBUG: Mutex error: mutex_vector_enter,523: spin lock held

And the backtrace is

vpanic+0x140
snprintf
lockdebug_more
mutex_enter+0x69d
xhci_device_intr_start+0x125
usbd_start_next+0x65
xhci_soft_intr+0x49b
xhci_poll+0x37
ukbd_cngetc+0x19
cngetc+0x34
db_readline+0x65
db_read_line+0x15
db_command_loop+0x84
db_trap+0xe3
kbd_trap+0xe2
trap (number 4)

(This is then followed by the original backtrace which caused ddb(4)
to be entered in the first place.)

>How-To-Repeat:
See above.  Boot a LOCKDEBUG kernel, and enter ddb(4) (via some
pre-existing bug - have not tried to enter via cnmagic key-combo).

Type a character and watch it go boom.

>Fix:


>Release-Note:

>Audit-Trail:
From: David Holland <dholland-gnats@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53019: xhci-connected keyboard with LOCKDEBUG kernel causes
 panic
Date: Mon, 12 Mar 2018 00:18:31 +0000

 Not sent to gnats (gnats@ is the administrator address; use gnats-bugs@)

    ------

 From: Paul Goyette <paul@whooppee.com>
 To: gnats@netbsd.org
 Subject: Re: kern/53019
 Date: Tue, 13 Feb 2018 16:05:34 +0800 (+08)


 	#  addr2line -e /netbsd.gdb 0xffffffff802bac06
 	/build/netbsd-local/src_ro/sys/dev/usb/xhci.c:1154
 	#

 Looks like it is in xhci_init() right before setting the erst variable.
 So it is likely scs->sc_intr_lock ...

From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53019
Date: Tue, 18 Sep 2018 22:09:49 -0400

 Hi,

 I've filed what I believe is a related bug as kern/52944. mrg@ has
 made some changes in -current that may be fix this and asked for a
 re-test, so I thought I'd mention that here, too.

 Regards,

 Dave


From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53019
Date: Wed, 19 Sep 2018 16:13:21 +0800 (+08)

 I attempted to retest kern/53019 but unfortunately a -current kernel
 does not work on my hardware set-up.  I'm suspecting it is related to
 my video card (GTX 1050-Ti).

State-Changed-From-To: open->closed
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Wed, 09 Jun 2021 16:10:29 +0000
State-Changed-Why:
This seems to have been fixed with one or more of the somewhat-recent
changes to xhci code.  At least, I can no longer reproduce.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.