NetBSD Problem Report #49022

From www@NetBSD.org  Sat Jul 19 18:29:31 2014
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 024E7A6557
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 19 Jul 2014 18:29:31 +0000 (UTC)
Message-Id: <20140719182929.A4E25A6558@mollari.NetBSD.org>
Date: Sat, 19 Jul 2014 18:29:29 +0000 (UTC)
From: n54@gmx.com
Reply-To: n54@gmx.com
To: gnats-bugs@NetBSD.org
Subject: Locking issues in interaction of TTY and USB subsystems (exposed with uhso(4))
X-Send-Pr-Version: www-1.0

>Number:         49022
>Category:       kern
>Synopsis:       Locking issues in interaction of TTY and USB subsystems (exposed with uhso(4))
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          dead
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jul 19 18:30:00 +0000 2014
>Closed-Date:    Sat Jul 11 09:54:48 +0000 2015
>Last-Modified:  Sat Jul 11 09:54:48 +0000 2015
>Originator:     Kamil Rytarowski
>Release:        current 6.99.47
>Organization:
>Environment:
NetBSD compaq 6.99.47 NetBSD 6.99.47 (NO_DRM) #2: Fri Jul 18 09:54:57 CEST 2014 kamil@compaq:/home/kamil/NetBSD-current/src/sys/arch/amd64/compile/NO_DRM amd64
>Description:
Here is my original mail to current-users [1]:

uhso(4) is a kernel driver for Option modems. It utilizes inter alia USB and TTY subsystems in the kernel. I would like to thank to the NetBSD crew for providing a driver for my modem!

Unfortunatelly, I've got issues with the driver with LOCKDEBUG turned on (expensive locking checks/support), and these issues result in kernel panics.

Grzegorz found at least two bugs and told me that the driver has to be discussed with authors and most likely parly rewritten.

I issue: spin lock held when taking mutex

File: src/sys/kern/subr_devsw.c
int
cdev_write(dev_t dev, struct uio *uio, int flag)
{
const struct cdevsw *d;
int rv, mpflag;

if ((d = cdevsw_lookup(dev)) == NULL)
return ENXIO;

DEV_LOCK(d);
rv = (*d->d_write)(dev, uio, flag);
DEV_UNLOCK(d);

return rv;
}

DEV_LOCK() is a macro defined at the same file src/sys/kern/subr_devsw.c:
/*
* Device access methods.
*/

#define DEV_LOCK(d) \
if ((mpflag = (d->d_flag & D_MPSAFE)) == 0) { \
KERNEL_LOCK(1, NULL); \
}

#define DEV_UNLOCK(d) \
if (mpflag == 0) { \
KERNEL_UNLOCK_ONE(NULL); \
}

And uhso_tty_write() at src/sys/dev/usb/uhso.c takes mutex.

Backtrace:
#0 0xffffffff805b619f in cpu_reboot (howto=260, bootstr=<optimized out>)
at ../../../../arch/amd64/amd64/machdep.c:671
671 dumpsys();
#0 0xffffffff805b619f in cpu_reboot (howto=260, bootstr=<optimized out>)
at ../../../../arch/amd64/amd64/machdep.c:671
#1 0xffffffff807791fd in vpanic (
fmt=0xffffffff80bf982b "LOCKDEBUG: %s error: %s: %s",
ap=0xfffffe8007246950) at ../../../../kern/subr_prf.c:284
#2 0xffffffff807792b1 in panic (fmt=<unavailable>)
at ../../../../kern/subr_prf.c:200
#3 0xffffffff80770501 in lockdebug_abort1 (ld=0xffff800005989658, s=6,
func=0xffffffff80b40220 "lockdebug_barrier",
msg=0xffffffff80bf98de "spin lock held", dopanic=true)
at ../../../../kern/subr_lockdebug.c:783
#4 0xffffffff80539b1c in mutex_vector_enter (mtx=0xfffffe8006d1c010)
at ../../../../kern/kern_mutex.c:487
#5 0xffffffff802aacc2 in ehci_device_bulk_transfer (xfer=0xfffffe8006d69a10)
at ../../../../dev/usb/ehci.c:3547
#6 0xffffffff808473dc in usbd_transfer (xfer=0xfffffe8006d69a10)
at ../../../../dev/usb/usbdi.c:300
#7 0xffffffff80814fe2 in uhso_bulk_write (hp=0xfffffe80be282288)
at ../../../../dev/usb/uhso.c:1187
#8 0xffffffff80816bc6 in uhso_tty_start (tp=0xfffffe80be28c6c8)
at ../../../../dev/usb/uhso.c:1820
#9 0xffffffff807cb840 in ttstart (tp=0xfffffe80be28c6c8)
at ../../../../kern/tty.c:1680
#10 ttwrite (tp=0xfffffe80be28c6c8, uio=0xfffffe8007246de0,
flag=<optimized out>) at ../../../../kern/tty.c:2196
#11 0xffffffff80814af9 in uhso_tty_write (dev=<optimized out>,
uio=0xfffffe8007246de0, flag=16) at ../../../../dev/usb/uhso.c:1633
#12 0xffffffff80767bde in cdev_write (dev=50184, uio=0xfffffe8007246de0,
flag=16) at ../../../../kern/subr_devsw.c:886
#13 0xffffffff8074ea1c in spec_write (v=0xfffffe8007246d30)
at ../../../../miscfs/specfs/spec_vnops.c:790
#14 0xffffffff808a7fa3 in VOP_WRITE (vp=0xfffffe80bd811730,
uio=<optimized out>, ioflag=<optimized out>, cred=<optimized out>)
at ../../../../kern/vnode_if.c:430
#15 0xffffffff8089201e in vn_write (fp=<optimized out>,
offset=0xfffffe80b5b3ad40, uio=0xfffffe8007246de0,
cred=0xfffffe80b6c79c00, flags=1) at ../../../../kern/vfs_vnops.c:588
#16 0xffffffff80788206 in dofilewrite (fd=1, fp=0xfffffe80b5b3ad40,
buf=0x7f7fffffd3af, nbyte=1, offset=<optimized out>,
flags=<optimized out>, retval=0xfffffe8007246eb0)
at ../../../../kern/sys_generic.c:355
#17 0xffffffff80788333 in sys_write (l=<optimized out>,
uap=0xfffffe8007246f00, retval=0xfffffe8007246eb0)
at ../../../../kern/sys_generic.c:323
#18 0xffffffff80793a86 in sy_call (rval=0xfffffe8007246eb0,
uap=0xfffffe8007246f00, l=0xfffffe80b8e60ba0, sy=0xffffffff80ec0580)
at ../../../../sys/syscallvar.h:61
#19 sy_invoke (code=4, rval=0xfffffe8007246eb0, uap=0xfffffe8007246f00,
l=0xfffffe80b8e60ba0, sy=0xffffffff80ec0580)
at ../../../../sys/syscallvar.h:85
#20 syscall (frame=0xfffffe8007246f00)
at ../../../../arch/x86/x86/syscall.c:156
#21 0xffffffff80100691 in ?? ()
#22 0x0000000000000001 in ?? ()
#23 0x00007f7fffffd3af in ?? ()
#24 0x0000000000000001 in ?? ()
#25 0x00007f7ff743952a in ?? ()
#26 0x0000000000000000 in ?? ()


II issue is similar, because adding D_MPSAFE (for test purposes, not a real fix) to d_flag a spin lock is held in ttwrite.

Testing change of d_flag to 'D_TTY | D_MPSAFE' was done at src/sys/dev/usb/uhso.c

const struct cdevsw uhso_cdevsw = {
.d_open = uhso_tty_open,
.d_close = uhso_tty_close,
.d_read = uhso_tty_read,
.d_write = uhso_tty_write,
.d_ioctl = uhso_tty_ioctl,
.d_stop = uhso_tty_stop,
.d_tty = uhso_tty_tty,
.d_poll = uhso_tty_poll,
.d_mmap = nommap,
.d_kqfilter = ttykqfilter,
.d_flag = D_TTY,
};




I asked the author (Iain Hibbert) [2], and here is my answer to his question:
On Saturday, July 19, 2014 at 7:11 PM, "Iain Hibbert" wrote:

> Is this with LOCKDEBUG and no other changes, or with LOCKDEBUG and
> the addition of the MPSAFE flag?

Both, D_MPSAFE was an attempt to walk-around the first mentioned
panic, but we ran into another (mentioned) issue with spin lock held
when taking a mutex.

> there is USB locking and TTY locking which must interact and I am
> not sure what work has been done to make drivers MPSAFE in general.

Thank-you for your confirmantion that it's a design issue in
interaction with (at least) USB and TTY locking.


[1] http://mail-index.netbsd.org/current-users/2014/07/19/msg025295.html
[2] http://mail-index.netbsd.org/current-users/2014/07/19/msg025302.html reply for http://mail-index.netbsd.org/current-users/2014/07/19/msg025300.html
>How-To-Repeat:
1. plug a USB modem

2. # pppd call your-operator

get a core / debug with ddb.onpanic
>Fix:
Temporarily walk-around to make it work (hide bug): Disable kernel locking-correctness asserts (LOCKDEBUG).

>Release-Note:

>Audit-Trail:
From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: n54@gmx.com, kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
 netbsd-bugs@netbsd.org
Subject: Re: kern/49022: Locking issues in interaction of TTY and USB subsystems
 (exposed with uhso(4))
Date: Sun, 20 Jul 2014 12:19:14 +0100

 On 07/19/14 19:30, n54@gmx.com wrote:
 > #3 0xffffffff80770501 in lockdebug_abort1 (ld=0xffff800005989658, s=6,
 > func=0xffffffff80b40220 "lockdebug_barrier",
 > msg=0xffffffff80bf98de "spin lock held", dopanic=true)
 > at ../../../../kern/subr_lockdebug.c:783
 > #4 0xffffffff80539b1c in mutex_vector_enter (mtx=0xfffffe8006d1c010)
 > at ../../../../kern/kern_mutex.c:487
 > #5 0xffffffff802aacc2 in ehci_device_bulk_transfer (xfer=0xfffffe8006d69a10)
 > at ../../../../dev/usb/ehci.c:3547
 > #6 0xffffffff808473dc in usbd_transfer (xfer=0xfffffe8006d69a10)
 > at ../../../../dev/usb/usbdi.c:300
 > #7 0xffffffff80814fe2 in uhso_bulk_write (hp=0xfffffe80be282288)
 > at ../../../../dev/usb/uhso.c:1187
 > #8 0xffffffff80816bc6 in uhso_tty_start (tp=0xfffffe80be28c6c8)
 > at ../../../../dev/usb/uhso.c:1820
 > #9 0xffffffff807cb840 in ttstart (tp=0xfffffe80be28c6c8)
 > at ../../../../kern/tty.c:1680
 > #10 ttwrite (tp=0xfffffe80be28c6c8, uio=0xfffffe8007246de0,
 > flag=<optimized out>) at ../../../../kern/tty.c:2196

 tty_lock is held here...

     2196         ttstart(tp);
     2197         mutex_spin_exit(&tty_lock);

 http://nxr.netbsd.org/xref/src/sys/kern/tty.c#2196

 Nick

State-Changed-From-To: open->dead
State-Changed-By: kamil@NetBSD.org
State-Changed-When: Sat, 11 Jul 2015 11:54:48 +0200
State-Changed-Why:
The originator no longer has the hardware.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.