NetBSD Problem Report #42377

From htodd@kerry.i8u.org  Wed Nov 25 14:55:57 2009
Return-Path: <htodd@kerry.i8u.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 8673C63B8B4
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 25 Nov 2009 14:55:57 +0000 (UTC)
Message-Id: <200911251455.nAPEturq014526@kerry.i8u.org>
Date: Wed, 25 Nov 2009 06:55:56 -0800 (PST)
From: htodd@twofifty.com
Reply-To: htodd@twofifty.com
To: gnats-bugs@gnats.NetBSD.org
Subject: netbsd-5 i386 system hangs on file access after changes of around 11/13
X-Send-Pr-Version: 3.95

>Number:         42377
>Category:       kern
>Synopsis:       netbsd-5 i386 system hangs on file access after changes of around 11/13
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    bouyer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 25 15:00:01 +0000 2009
>Closed-Date:    Sat Nov 28 19:11:29 +0000 2009
>Last-Modified:  Sat Nov 28 19:11:29 +0000 2009
>Originator:     H. Todd Fujinaka
>Release:        NetBSD 5.0_STABLE
>Organization:
None
>Environment:


System: NetBSD kerry.i8u.org 5.0_STABLE NetBSD 5.0_STABLE (KERRY) #1191: Mon Nov 23 09:44:39 PST 2009 htodd@kerry.i8u.org:/home/obj/sys/arch/i386/compile.i386/KERRY i386
Architecture: i386
Machine: i386
>Description:
My system now hangs during builds of "world" and also on reboot. Similar problems are experienced in amd64-current. On reboot the system seems to "hang" on unmounting disks. Debugger information (screenshots of the debugger) is located at http://www.i8u.org/~htodd/logpix.tar.bz2.

wd0 at atabus2 drive 0: <WDC WD2500KS-00MJB0>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x 488397168 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(piixide1:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)

# /dev/rwd0d:
type: unknown
disk: NetBSD
label: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 484521
total sectors: 488397168
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

16 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a:   1024128        63     4.2BSD   1024  8192     0  # (Cyl.      0*-   1016*)
 b:   8192016   1024191       swap                     # (Cyl.   1016*-   9143*)
 c: 488397105        63     unused      0     0        # (Cyl.      0*- 484520)
 d: 488397168         0     unused      0     0        # (Cyl.      0 - 484520)
 e:  12288528   9216207     4.2BSD   2048 16384     0  # (Cyl.   9143*-  21334*)
 f:   6144768  21504735     4.2BSD   2048 16384     0  # (Cyl.  21334*-  27430*)
 g: 460747665  27649503     4.2BSD   2048 16384     0  # (Cyl.  27430*- 484520)



>How-To-Repeat:
Just start a build, have it hang, then try to reboot.

>Fix:
I'll start reveting things 


>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->bouyer
Responsible-Changed-By: bouyer@NetBSD.org
Responsible-Changed-When: Fri, 27 Nov 2009 19:10:15 +0000
Responsible-Changed-Why:
I've reproduced it and have a patch


State-Changed-From-To: open->feedback
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Fri, 27 Nov 2009 19:10:15 +0000
State-Changed-Why:
Hi,
could you please try the patch I posted in
http://mail-index.netbsd.org/tech-kern/2009/11/27/msg006546.html


From: Hisashi T Fujinaka <htodd@twofifty.com>
To: gnats-bugs@NetBSD.org
Cc: bouyer@NetBSD.org, kern-bug-people@NetBSD.org, netbsd-bugs@NetBSD.org,
        gnats-admin@NetBSD.org, bouyer@NetBSD.org
Subject: Re: kern/42377 (netbsd-5 i386 system hangs on file access after
 changes of around 11/13)
Date: Fri, 27 Nov 2009 23:53:31 -0800 (PST)

 The patch allows my build to finish, fixing my main complaint.

From: Manuel Bouyer <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/42377 CVS commit: src/sys/kern
Date: Sat, 28 Nov 2009 10:10:18 +0000

 Module Name:	src
 Committed By:	bouyer
 Date:		Sat Nov 28 10:10:18 UTC 2009

 Modified Files:
 	src/sys/kern: vfs_subr.c

 Log Message:
 Previous did cause a deadlock with layered FS: the vrele thread
 can sleep on the vnode lock, while vget is sleeping on the
 VI_INACTNOW flag (or the vget caller is looping on vget returning failure
 because of the VI_INACTNOW flag). With layered FSes, the upper and lower
 vnodes share the same lock, so the vget() caller above can be already
 holding the vnode lock.

 Fix by dropping VI_INACTNOW before sleeping on the vnode lock in
 vrelel(), and check the ref count again once we have the lock. If the
 vnode has more than one reference, donc VOP_INACTIVE it.
 Fix PR kern/42318 and PR kern/42377
 patch tested by Hisashi T Fujinaka, Joachim König, Stephen Borrill and
 Matthias Scheler.


 To generate a diff of this commit:
 cvs rdiff -u -r1.391 -r1.392 src/sys/kern/vfs_subr.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Stephen Borrill <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/42377 CVS commit: [netbsd-5] src/sys/kern
Date: Sat, 28 Nov 2009 18:59:12 +0000

 Module Name:	src
 Committed By:	sborrill
 Date:		Sat Nov 28 18:59:11 UTC 2009

 Modified Files:
 	src/sys/kern [netbsd-5]: vfs_subr.c

 Log Message:
 Pull up the following revisions(s) (requested by bouyer in ticket #1171):
 	sys/kern/vfs_subr.c:	revision 1.392

 Previous caused a deadlock with layered FS: the vrele thread can sleep on
 the vnode lock, while vget is sleeping on the VI_INACTNOW flag (or the vget
 caller is looping on vget returning failure because of the VI_INACTNOW
 flag). With layered FSes, the upper and lower vnodes share the same lock, so
 the vget() caller above can be already holding the vnode lock.

 Fix by dropping VI_INACTNOW before sleeping on the vnode lock in
 vrelel(), and check the ref count again once we have the lock. If the
 vnode has more than one reference, don't VOP_INACTIVE it.
 Fix PR kern/42318 and PR kern/42377


 To generate a diff of this commit:
 cvs rdiff -u -r1.357.4.7 -r1.357.4.8 src/sys/kern/vfs_subr.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->closed
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Sat, 28 Nov 2009 19:11:29 +0000
State-Changed-Why:
Patch commited and pulled up


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.