NetBSD Problem Report #42318
From louis@maat.zabrico.com Sat Nov 14 21:22:55 2009
Return-Path: <louis@maat.zabrico.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 845D163B844
for <gnats-bugs@gnats.NetBSD.org>; Sat, 14 Nov 2009 21:22:55 +0000 (UTC)
Message-Id: <200911142122.nAELMqql015136@maat.zabrico.com>
Date: Sat, 14 Nov 2009 16:22:52 -0500 (EST)
From: louis@zabrico.com
Reply-To: louis@zabrico.com
To: gnats-bugs@gnats.NetBSD.org
Subject: chroot or pkg_comp causes a hang on netbsd-5
X-Send-Pr-Version: 3.95
>Number: 42318
>Category: kern
>Synopsis: chroot or pkg_comp causes a hang on netbsd-5
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: bouyer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Nov 14 21:25:00 +0000 2009
>Closed-Date: Sat Nov 28 19:10:51 +0000 2009
>Last-Modified: Sat Nov 28 19:15:03 +0000 2009
>Originator: Louis Guillaume
>Release: NetBSD 5.0_STABLE - sources from Nov. 11, 2009
>Organization:
>Environment:
System: NetBSD maat.zabrico.com 5.0_STABLE NetBSD 5.0_STABLE (GENERIC) #9: Thu Nov 12 22:02:50 EST 2009 louis@maat.zabrico.com:/usr/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
After upgrading my NetBSD-5.0_STABLE, i386 system, I attempted to update a
pkg_comp chroot environment to upgrade my packages. I unpacked the latest
kernel and base, comp, etc and text sets from the recent build (build.sh
release). Then I ran postinstall to clean up and everything seemed to be
working fine.
I entered the chroot environment with "sudo pkg_comp chroot" and then
began to run pkg_rolling-replace as usual. Now the process hung while doing
the "pkg_chk -uq" to figure out what was out-of-date. From a different
session, I was able to kill the pkg_comp process, figuring this was all a
fluke of some kind.
Then I chrooted again. This time, upon simply executing "pkg_chk -uq" the
whole system hung! I was not able to get an ssh session, and the console
took the user part of the login but froze before prompting for password.
I Ctrl-Alt-Esc'd to get to the debugger and it said...
fatal breakpoint trap in supervisor mode
/netbsd: trap type 1 code 0 eip c05788dc cs 8 eflags 202 cr2 cd4e2000 ilevel 6
syslogd: restart
After the "restart" it tried to sync disks, printed a bunch of `2's and
hung. I had to cold reboot it.
This was repeatable each time I attempted to use pkg_comp for just about
anything other than going into the chroot itself.
I decided something was wrong with pkg_comp and decided to re-create my
environment. So I started fresh with pkg_comp makeroot. And things seem
to be working quite well.
But I just found a hang when unmounting the pkg_comp filesystems (which
are null-mounted). Interrupt will not kill the process and it's not
responsive to a regular kill. It responded to a CTRL-Z, (susp), but if
I try to "kill %1" I get "/bin/ksh: kill: %1: No such process".
But that may be just because of using "sudo", I don't know.
Root was able to kill the chroot process with no issue, but the "umount"
command is still running anyway, unkillable even with `9'.
The system hasn't hung again, but this is the kind of thing that was
happening before right before it hung.
>How-To-Repeat:
Not sure, but try to run pkg_comp (or perhaps other chroot operations)
with netbsd-5 from at least Nov. 11th.
>Fix:
Unknown
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->bouyer
Responsible-Changed-By: bouyer@NetBSD.org
Responsible-Changed-When: Sat, 28 Nov 2009 00:09:21 +0000
Responsible-Changed-Why:
Probably the same nullfs issue as kern/42377
From: Manuel Bouyer <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/42318 CVS commit: src/sys/kern
Date: Sat, 28 Nov 2009 10:10:18 +0000
Module Name: src
Committed By: bouyer
Date: Sat Nov 28 10:10:18 UTC 2009
Modified Files:
src/sys/kern: vfs_subr.c
Log Message:
Previous did cause a deadlock with layered FS: the vrele thread
can sleep on the vnode lock, while vget is sleeping on the
VI_INACTNOW flag (or the vget caller is looping on vget returning failure
because of the VI_INACTNOW flag). With layered FSes, the upper and lower
vnodes share the same lock, so the vget() caller above can be already
holding the vnode lock.
Fix by dropping VI_INACTNOW before sleeping on the vnode lock in
vrelel(), and check the ref count again once we have the lock. If the
vnode has more than one reference, donc VOP_INACTIVE it.
Fix PR kern/42318 and PR kern/42377
patch tested by Hisashi T Fujinaka, Joachim König, Stephen Borrill and
Matthias Scheler.
To generate a diff of this commit:
cvs rdiff -u -r1.391 -r1.392 src/sys/kern/vfs_subr.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Louis Guillaume <louis@zabrico.com>
To: gnats-bugs@NetBSD.org
Cc: bouyer@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: PR/42318 CVS commit: src/sys/kern
Date: Sat, 28 Nov 2009 12:02:28 -0500
Manuel Bouyer wrote:
> The following reply was made to PR kern/42318; it has been noted by GNATS.
>
> From: Manuel Bouyer <bouyer@netbsd.org>
> To: gnats-bugs@gnats.NetBSD.org
> Cc:
> Subject: PR/42318 CVS commit: src/sys/kern
> Date: Sat, 28 Nov 2009 10:10:18 +0000
>
> Module Name: src
> Committed By: bouyer
> Date: Sat Nov 28 10:10:18 UTC 2009
>
> Modified Files:
> src/sys/kern: vfs_subr.c
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.391 -r1.392 src/sys/kern/vfs_subr.c
>
Hi,
Can we please have this pulled up to the netbsd-5 branch? Not sure if
there was more work done here; the revision I have in my copy is
1.357.47. Are there changes in other files or can I just grab v. 1.391?
Thanks!
Louis
From: Stephen Borrill <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/42318 CVS commit: [netbsd-5] src/sys/kern
Date: Sat, 28 Nov 2009 18:59:12 +0000
Module Name: src
Committed By: sborrill
Date: Sat Nov 28 18:59:11 UTC 2009
Modified Files:
src/sys/kern [netbsd-5]: vfs_subr.c
Log Message:
Pull up the following revisions(s) (requested by bouyer in ticket #1171):
sys/kern/vfs_subr.c: revision 1.392
Previous caused a deadlock with layered FS: the vrele thread can sleep on
the vnode lock, while vget is sleeping on the VI_INACTNOW flag (or the vget
caller is looping on vget returning failure because of the VI_INACTNOW
flag). With layered FSes, the upper and lower vnodes share the same lock, so
the vget() caller above can be already holding the vnode lock.
Fix by dropping VI_INACTNOW before sleeping on the vnode lock in
vrelel(), and check the ref count again once we have the lock. If the
vnode has more than one reference, don't VOP_INACTIVE it.
Fix PR kern/42318 and PR kern/42377
To generate a diff of this commit:
cvs rdiff -u -r1.357.4.7 -r1.357.4.8 src/sys/kern/vfs_subr.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Sat, 28 Nov 2009 19:10:51 +0000
State-Changed-Why:
Patch commit6ed and pulled up to netbsd-5
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Louis Guillaume <louis@zabrico.com>
Cc: gnats-bugs@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: PR/42318 CVS commit: src/sys/kern
Date: Sat, 28 Nov 2009 20:10:09 +0100
On Sat, Nov 28, 2009 at 12:02:28PM -0500, Louis Guillaume wrote:
> Manuel Bouyer wrote:
> >The following reply was made to PR kern/42318; it has been noted by GNATS.
> >
> >From: Manuel Bouyer <bouyer@netbsd.org>
> >To: gnats-bugs@gnats.NetBSD.org
> >Cc:
> >Subject: PR/42318 CVS commit: src/sys/kern
> >Date: Sat, 28 Nov 2009 10:10:18 +0000
> >
> > Module Name: src
> > Committed By: bouyer
> > Date: Sat Nov 28 10:10:18 UTC 2009
> >
> > Modified Files:
> > src/sys/kern: vfs_subr.c
> >
>
> > To generate a diff of this commit:
> > cvs rdiff -u -r1.391 -r1.392 src/sys/kern/vfs_subr.c
> >
>
> Hi,
>
> Can we please have this pulled up to the netbsd-5 branch? Not sure if
> there was more work done here; the revision I have in my copy is
> 1.357.47. Are there changes in other files or can I just grab v. 1.391?
The pullup has just been done
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.