NetBSD Problem Report #57270

From bouyer@antioche.eu.org  Wed Mar 15 11:40:00 2023
Return-Path: <bouyer@antioche.eu.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 022781A9239
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 15 Mar 2023 11:40:00 +0000 (UTC)
Message-Id: <20230315113958.C796B2E9170@sandettie.soc.lip6.fr>
Date: Wed, 15 Mar 2023 12:39:58 +0100 (MET)
From: bouyer@antioche.eu.org
Reply-To: bouyer@antioche.eu.org
To: gnats-bugs@NetBSD.org
Subject: panic (KASSERT) in usb
X-Send-Pr-Version: 3.95

>Number:         57270
>Category:       kern
>Synopsis:       panic (KASSERT) in usb
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          analyzed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Mar 15 11:40:00 +0000 2023
>Closed-Date:    
>Last-Modified:  Wed Mar 15 12:12:07 +0000 2023
>Originator:     Manuel Bouyer
>Release:        NetBSD 10.0_BETA
>Organization:
>Environment:
System: NetBSD sandettie.soc.lip6.fr 10.0_BETA NetBSD 10.0_BETA (GENERIC_CAN) #4: Mon Mar 13 15:30:21 MET 2023 bouyer@armandeche.soc.lip6.fr:/local/armandeche1/tmp/build/amd64/obj/local/armandeche2/netbsd-10/src/sys/arch/amd64/compile/GENERIC_CAN amd64
Architecture: x86_64
Machine: amd64
>Description:
	I'm using a USB device to program some microcontroller.
	The device powers the target from USB.
	While it works fine when using my desktop, it fails to program
	the target when using my laptop, most probably because the laptop's USB
	port doesn't provide enough power (using an external PSU for the
	target works around this issue).

	The programmer appears a a ugen device and a userland program
	drives it.
	When the target programming fails, the application driving the
	programmer says "Resetting PICkit device..." (which, I guess, causes
	the device to disconnect from USB) and just after that,
	the kernel panics:
System panicked: kernel diagnostic assertion "pipe->up_abortlwp == NULL" failed: file "/local/armandeche2/netbsd-10/src/sys/dev/usb/usbdi.c", line 1037 pipe->up_abortlwp=0xffff85745ccae640
crash> tr
__kernel_end() at 0
kern_reboot() at sys_reboot
vpanic() at vpanic+0x18d
kern_assert() at __x86_indirect_thunk_rax
usbd_ar_pipe() at usbd_ar_pipe+0x223
usbd_abort_pipe() at usbd_abort_pipe+0x25
ugen_do_close() at ugen_do_close+0x8c
ugenclose() at ugenclose+0x5f
cdev_close() at cdev_close+0x92
spec_close() at spec_close+0x22a
VOP_CLOSE() at VOP_CLOSE+0x44
vn_close() at vn_close+0x35
closef() at closef+0x60
fd_close() at fd_close+0x138
sys_close() at sys_close+0x22
syscall() at syscall+0x196
--- syscall (number 6) ---


>How-To-Repeat:
	run a pickit3 with PK2CMD on a computer with limited USB power.
>Fix:

>Release-Note:

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: bouyer@antioche.eu.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/57270: panic (KASSERT) in usb
Date: Wed, 15 Mar 2023 12:04:51 +0000

 > Date: Wed, 15 Mar 2023 12:39:58 +0100 (MET)
 > From: bouyer@antioche.eu.org
 >=20
 > System panicked: kernel diagnostic assertion "pipe->up_abortlwp =3D=3D NU=
 LL" failed: file "/local/armandeche2/netbsd-10/src/sys/dev/usb/usbdi.c", li=
 ne 1037 pipe->up_abortlwp=3D0xffff85745ccae640
 > crash> tr
 > __kernel_end() at 0
 > kern_reboot() at sys_reboot
 > vpanic() at vpanic+0x18d
 > kern_assert() at __x86_indirect_thunk_rax
 > usbd_ar_pipe() at usbd_ar_pipe+0x223
 > usbd_abort_pipe() at usbd_abort_pipe+0x25
 > ugen_do_close() at ugen_do_close+0x8c
 > ugenclose() at ugenclose+0x5f
 > cdev_close() at cdev_close+0x92

 The problem is that ugenclose and ugen_detach raced to abort the pipe.
 Aborting one pipe in two threads concurrently is forbidden.

 The correct approach here is:

 1. Split ugenclose into ugencancel and ugenclose.

    =3D> ugencancel will abort the pipes to wake all pending I/O and make
       it fail promptly.

       (ugen_detach also does cv_broadcast to wake all pending I/O, but
       I think the pipe abort can be made to suffice.)

    =3D> ugenclose is guaranteed that all I/O (open, read, write, ioctl)=20
       has completed, so it can safely free the resources.

 2. Remove the abort/drain/refcnt logic in ugen_detach.

    =3D> vdevgone will take care of it by revoking all the open
       instances, which will force ugencancel and ugenclose to complete
       on all of them before returning.

    (All the I/O refcnt logic should be able to be ripped out once ugen
    uses cancel/close.)

 I started drafting a change to do this last year but got sidetracked
 by per-endpoint locking and the ugen/ugenif dual use, so I focussed on
 dealing with ucom and uhid instead, which were a little simpler and
 higher-priority for my needs at the time.  Perhaps your device will
 serve as a helpful test case for this race!

State-Changed-From-To: open->analyzed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 15 Mar 2023 12:12:07 +0000
State-Changed-Why:
problem has been analyzed


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.