NetBSD Problem Report #46462

From www@NetBSD.org  Thu May 17 22:33:27 2012
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id D745F63B89C
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 17 May 2012 22:33:27 +0000 (UTC)
Message-Id: <20120517223326.D6D8663B882@www.NetBSD.org>
Date: Thu, 17 May 2012 22:33:26 +0000 (UTC)
From: cyber@netbsd.org
Reply-To: cyber@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: Hang trying to run a disklabel on an inserted USB flash drive
X-Send-Pr-Version: www-1.0

>Number:         46462
>Category:       kern
>Synopsis:       Hang trying to run a disklabel on an inserted USB flash drive
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          feedback
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu May 17 22:35:00 +0000 2012
>Closed-Date:    
>Last-Modified:  Sat May 27 12:21:19 +0000 2023
>Originator:     Erik Berls
>Release:        
>Organization:
NetBSD
>Environment:
amd64
>Description:
Plugging in a usb memory stick got me a hung proc.
These details are for rmind@


db> tr/t 0t3707
trace: pid 3707 lid 1 at 0xffffa00048ebc860
sleepq_block() at netbsd:sleepq_block+0xdd
turnstile_block() at netbsd:turnstile_block+0x2a3
mutex_vector_enter() at netbsd:mutex_vector_enter+0xf7
sdopen() at netbsd:sdopen+0x99
spec_open() at netbsd:spec_open+0x15e
VOP_OPEN() at netbsd:VOP_OPEN+0x29
vn_open() at netbsd:vn_open+0x13d
sys_open() at netbsd:sys_open+0xeb
syscall() at netbsd:syscall+0xb4
db>

db> ps/l
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
23700    1 3   0        84   ffffa00048245bc0             pickup kqueue
4394     1 3   0        84   ffffa00049fe9be0                csh ttyraw
3707     1 3   0   1000004   ffffa00048e1f7e0          disklabel tstile
3516     1 3   0        84   ffffa00048e0b7c0               tcsh pause
2981     1 3   0        84   ffffa00048e0b3e0               sshd select
3229     1 3   0        84   ffffa00048e0bba0               sshd netio
3222     1 3   0        84   ffffa00048f8dbc0             mountd select
1015     5 3   0        84   ffffa000488d5420              slave nfsd
              4 3   0        84   ffffa000488d5be0              slave nfsd
              3 3   0        84   ffffa00048e1f020              slave nfsd
              2 3   0        84   ffffa00048e1fbc0              slave nfsd
              1 3   0        84   ffffa000488d5800             master select
740      1 3   0        84   ffffa00048ccd000          ssh-agent select
639      1 3   0        84   ffffa00048982000               qmgr kqueue
558      1 3   0        84   ffffa00048192000              login wait
630      1 3   0        84   ffffa00048a14be0               cron nanoslp
618      1 3   0        80   ffffa00048a02400              inetd kqueue
601      1 3   0        84   ffffa00048a02bc0             master kqueue
372      1 3   0        84   ffffa000489827c0               sshd select
373     18 3   0        84   ffffa00048c76420               java parked
             17 3   0        80   ffffa00048c76800               java select
             16 3   0        84   ffffa00048c76be0               java parked
             15 3   0        80   ffffa00048c0c020               java select
             14 3   0        84   ffffa00048c0c400               java parked
             13 5   0         4   ffffa00048c0c7e0               java
             11 3   0        84   ffffa00048b3b000               java parked
             10 3   0        84   ffffa00048b3b3e0               java parked
              9 3   0        80   ffffa00048b3b7c0               java parked
              8 3   0        84   ffffa00048b3bba0               java parked
              7 3   0        84   ffffa00048a14040               java parked
              6 3   0        80   ffffa00048a14420               java parked
              5 3   0        80   ffffa00048a14800               java parked
              4 3   0        80   ffffa00048332be0               java parked
              3 3   0        84   ffffa00048a02020               java parked
              2 3   0        80   ffffa00048a027e0               java select
              1 3   0        80   ffffa000489823e0               java parked
297      1 3   0        84   ffffa00048982ba0              mdnsd select
303      1 3   0        84   ffffa00048912040   mDNSResponderPos select
275      2 3   0        84   ffffa00048895ba0        xenbackendd netio
              1 3   0        84   ffffa00048332800        xenbackendd parked
262      2 3   0        84   ffffa00048895000        xenconsoled netio
              1 3   0        84   ffffa0004890cbc0        xenconsoled select
254     63 5   0         4   ffffa00049bd7400          python2.6
              7 3   0        84   ffffa00048912420          python2.6 select
              6 3   0        80   ffffa00048912800          python2.6 select
              5 3   0        84   ffffa00048912be0          python2.6 netio
              4 3   0        84   ffffa0004890c020          python2.6 parked
              3 3   0        84   ffffa0004890c400          python2.6 select
              2 3   0        80   ffffa0004890c7e0          python2.6 socket
              1 3   0        80   ffffa000488953e0          python2.6 select
251      1 3   0        80   ffffa000488957c0          python2.6 wait
235      1 3   0        84   ffffa00048332420          xenstored select
163      1 3   0        84   ffffa000482457e0            rpcbind select
76       1 2   0         4   ffffa00048245400            syslogd
1        1 3   0        84   ffffa0004545f7e0               init wait
0       59 3   0       204   ffffa00049c5b000           scsibus0 biowait
             57 5   0       204   ffffa0004544f040           (zombie)
             49 3   0       204   ffffa00048245020            physiod physiod
             48 3   0       204   ffffa000481923e0           aiodoned aiodoned
             47 3   0       204   ffffa000481927c0            ioflush syncer
             46 3   0       204   ffffa00048192ba0           pgdaemon pgdaemon
             45 3   0       204   ffffa00048188040           xbdbackd xbdbackd
             44 3   0       204   ffffa00048188420          cryptoret crypto_wait
             43 3   0       204   ffffa0004545b7e0              ipmi0 ipmi0
             42 3   0       204   ffffa0004545b020               ipmi ipmi_poll

             40 3   0       204   ffffa0004545eba0               usb7 usbevt
             39 3   0       204   ffffa0004545d040               usb6 usbevt
             38 3   0       204   ffffa0004545d420               usb5 usbevt
             37 3   0       204   ffffa0004545b400               usb4 usbevt
             36 3   0       204   ffffa0004545d800               usb0 usbevt
             35 3   0       204   ffffa0004545dbe0               usb3 usbevt
             34 3   0       204   ffffa00048188800               usb2 usbevt
             33 3   0       204   ffffa00048188be0         usbtask-dr usbtsk
             32 3   0       204   ffffa0004545f020         usbtask-hc usbtsk
             31 3   0       204   ffffa0004545f400               usb1 usbevt
             30 3   0       204   ffffa0004545fbc0              unpgc unpgc
             29 3   0       204   ffffa0004545e000        vmem_rehash vmem_rehash
             28 3   0       204   ffffa0004545e3e0             xenbus rdst
             27 3   0       204   ffffa0004545e7c0           xenwatch evtsq
             18 3   0       204   ffffa0004545bbc0            atabus3 atath
             17 3   0       204   ffffa00045459000            atabus2 atath
             16 3   0       204   ffffa000454593e0            atabus1 atath
             15 3   0       204   ffffa000454597c0            atabus0 atath
             14 3   0       284   ffffa00045459ba0          fw0_probe -
             12 3   0       204   ffffa0004544f420             sysmon smtaskq
             11 3   0       204   ffffa0004544f800           pmfevent pmfevent
             10 3   0       204   ffffa0004544fbe0           nfssilly nfssilly
              9 3   0       204   ffffa0004544c020            cachegc cachegc
              8 3   0       204   ffffa0004544c400              vrele vrele
              7 3   0       204   ffffa0004544c7e0            xcall/0 xcall
              6 1   0       204   ffffa0004544cbc0          softser/0
              5 1   0       204   ffffa0004544a000          softclk/0
              4 1   0       204   ffffa0004544a3e0          softbio/0
              3 1   0       204   ffffa0004544a7c0          softnet/0
          >   2 7   0       205   ffffa0004544aba0             idle/0
              1 3   0       204   ffffffff80b5a220            swapper schedule
db>

db> tr/t 0.59
trace: pid 0 lid 59 at 0xffffa00049c1b770
sleepq_block() at netbsd:sleepq_block+0xdd
cv_wait() at netbsd:cv_wait+0xf1
biowait() at netbsd:biowait+0x4f
read_sector() at netbsd:read_sector+0x23
scan_mbr() at netbsd:scan_mbr+0x40
readdisklabel() at netbsd:readdisklabel+0xfb
sdopen() at netbsd:sdopen+0x26a
spec_open() at netbsd:spec_open+0x29a
VOP_OPEN() at netbsd:VOP_OPEN+0x29
dkwedge_discover() at netbsd:dkwedge_discover+0xee
sdattach() at netbsd:sdattach+0x206
config_attach_loc() at netbsd:config_attach_loc+0x15a
scsi_probe_bus() at netbsd:scsi_probe_bus+0x4aa
scsibus_config() at netbsd:scsibus_config+0x5d
scsipi_completion_thread() at netbsd:scsipi_completion_thread+0x23
>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: cyber@netbsd.org
Subject: Re: kern/46462: Hang trying to run a disklabel on an inserted USB
 flash drive
Date: Sat, 19 May 2012 20:29:03 +0100

 cyber@netbsd.org wrote:
 > >Number:         46462
 > >Category:       kern
 > >Synopsis:       Hang trying to run a disklabel on an inserted USB flash
 > >drive Confidential:   no
 > >Severity:       serious
 > >Priority:       medium
 > >Responsible:    kern-bug-people
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Thu May 17 22:35:00 +0000 2012
 > >Originator:     Erik Berls
 > <...>

 OK, this thread is likely a problem:

 > db> tr/t 0.59
 > trace: pid 0 lid 59 at 0xffffa00049c1b770
 > sleepq_block() at netbsd:sleepq_block+0xdd
 > cv_wait() at netbsd:cv_wait+0xf1
 > biowait() at netbsd:biowait+0x4f
 > read_sector() at netbsd:read_sector+0x23
 > scan_mbr() at netbsd:scan_mbr+0x40
 > readdisklabel() at netbsd:readdisklabel+0xfb
 > sdopen() at netbsd:sdopen+0x26a
 > spec_open() at netbsd:spec_open+0x29a
 > VOP_OPEN() at netbsd:VOP_OPEN+0x29
 > dkwedge_discover() at netbsd:dkwedge_discover+0xee
 > sdattach() at netbsd:sdattach+0x206
 > config_attach_loc() at netbsd:config_attach_loc+0x15a
 > scsi_probe_bus() at netbsd:scsi_probe_bus+0x4aa
 > scsibus_config() at netbsd:scsibus_config+0x5d
 > scsipi_completion_thread() at netbsd:scsipi_completion_thread+0x23

 - scsipi_completion_thread()
 	This asynchronous completion thread handles the event, which is
 	in this case attaching of a new device.  Callbacks are performed
 	under splbio().

 -- [skipping] .. sdopen()
 	Acquiring sd->sc_dk.dk_openlock here.  We are still at IPL_BIO.

 --- [skipping] .. read_sector() -> disk_read_sectors()
 	Invoking sdstrategy() and entering biowait().  However, since
 	we are still at IPL_BIO, biodone() does not happen - thread has
 	deadlocked against interrupt with sd->sc_dk.dk_openlock held.

 > db> tr/t 0t3707
 > trace: pid 3707 lid 1 at 0xffffa00048ebc860
 > sleepq_block() at netbsd:sleepq_block+0xdd
 > turnstile_block() at netbsd:turnstile_block+0x2a3
 > mutex_vector_enter() at netbsd:mutex_vector_enter+0xf7
 > sdopen() at netbsd:sdopen+0x99
 > spec_open() at netbsd:spec_open+0x15e
 > VOP_OPEN() at netbsd:VOP_OPEN+0x29
 > vn_open() at netbsd:vn_open+0x13d
 > sys_open() at netbsd:sys_open+0xeb
 > syscall() at netbsd:syscall+0xb4
 > db>

 This thread merely tries to acquire sd->sc_dk.dk_openlock and waits in
 the turnstile for deadlocked scsipi_completion_thread().

 Fix: rework code to remove top-level splbio() in scsipi_completion_thread().
 Long term fix: make scsipi subsystem MP-safe.

 -- 
 Mindaugas

State-Changed-From-To: open->analyzed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Sat, 19 May 2012 19:32:14 +0000
State-Changed-Why:


From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Mindaugas Rasiukevicius <rmind@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, cyber@NetBSD.org
Subject: Re: kern/46462: Hang trying to run a disklabel on an inserted USB
 flash drive
Date: Mon, 21 May 2012 14:45:47 +0200

 On Sat, May 19, 2012 at 08:29:03PM +0100, Mindaugas Rasiukevicius wrote:
 > OK, this thread is likely a problem:
 > 
 > > db> tr/t 0.59
 > > trace: pid 0 lid 59 at 0xffffa00049c1b770
 > > sleepq_block() at netbsd:sleepq_block+0xdd
 > > cv_wait() at netbsd:cv_wait+0xf1
 > > biowait() at netbsd:biowait+0x4f
 > > read_sector() at netbsd:read_sector+0x23
 > > scan_mbr() at netbsd:scan_mbr+0x40
 > > readdisklabel() at netbsd:readdisklabel+0xfb
 > > sdopen() at netbsd:sdopen+0x26a
 > > spec_open() at netbsd:spec_open+0x29a
 > > VOP_OPEN() at netbsd:VOP_OPEN+0x29
 > > dkwedge_discover() at netbsd:dkwedge_discover+0xee
 > > sdattach() at netbsd:sdattach+0x206
 > > config_attach_loc() at netbsd:config_attach_loc+0x15a
 > > scsi_probe_bus() at netbsd:scsi_probe_bus+0x4aa
 > > scsibus_config() at netbsd:scsibus_config+0x5d
 > > scsipi_completion_thread() at netbsd:scsipi_completion_thread+0x23
 > 
 > - scsipi_completion_thread()
 > 	This asynchronous completion thread handles the event, which is
 > 	in this case attaching of a new device.  Callbacks are performed
 > 	under splbio().
 > 
 > -- [skipping] .. sdopen()
 > 	Acquiring sd->sc_dk.dk_openlock here.  We are still at IPL_BIO.
 > 
 > --- [skipping] .. read_sector() -> disk_read_sectors()
 > 	Invoking sdstrategy() and entering biowait().  However, since
 > 	we are still at IPL_BIO, biodone() does not happen - thread has
 > 	deadlocked against interrupt with sd->sc_dk.dk_openlock held.

 I don't understand how you get to this conclusion. biowait() will cause the
 thread to sleep, and IPL_BIO interrupts will be accepted again while
 it sleeps. Otherwise, all IPL_VM and lower interrupts would be blocked,
 which would probably cause not only disklabel to hang, but most probably
 the complete system.

 I guess the problem is more that the USB I/O doesn't completes for some
 reason.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Sat, 14 Jul 2018 19:00:39 +0000
Responsible-Changed-Why:
Found this reference in PR kern/52783, keeping on me just in case it turns
out being similar underlying problem.


Responsible-Changed-From-To: jdolecek->kern-bug-people
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Fri, 10 Aug 2018 22:49:34 +0000
Responsible-Changed-Why:
kern/52783 turned out to be wd(4) specific problem, so returning back


State-Changed-From-To: analyzed->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sat, 27 May 2023 12:21:19 +0000
State-Changed-Why:
scsipi has been made MP-safe and umass has had lots of error branch fixes,
is this still reproducible?


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.