NetBSD Problem Report #32090

From Manuel.Bouyer@lip6.fr  Wed Nov 16 13:57:15 2005
Return-Path: <Manuel.Bouyer@lip6.fr>
Received: from isis.lip6.fr (isis.lip6.fr [132.227.60.2])
	by narn.netbsd.org (Postfix) with ESMTP id 58D7763B8CA
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 16 Nov 2005 13:57:14 +0000 (UTC)
Message-Id: <200511161357.jAGDvBuT024892@pop.lip6.fr>
Date: Wed, 16 Nov 2005 14:57:11 +0100 (CET)
From: Manuel.Bouyer@lip6.fr (Manuel Bouyer)
Reply-To:
To: gnats-bugs@netbsd.org
Subject: panic after "vnode: table is full" with layerfs
X-Send-Pr-Version: 3.95

>Number:         32090
>Category:       kern
>Synopsis:       panic after "vnode: table is full" with layerfs
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 16 13:58:00 +0000 2005
>Last-Modified:  Sun Apr 16 14:45:36 +0000 2006
>Originator:     Manuel Bouyer
>Release:        NetBSD 2.1_STABLE
>Organization:
UPMC/LIP6/ASIM

>Environment:
System: NetBSD pop.lip6.fr 2.1_STABLE NetBSD 2.1_STABLE (GENERIC.MPDEBUG) #1: Wed Nov 2 13:30:15 CET 2005 bouyer@pop.lip6.fr:/local/pop1/bouyer/tmp/i386/obj/local/pop1/bouyer/netbsd-2/src/sys/arch/i386/compile/GENERIC.MPDEBUG i386
Architecture: i386
Machine: i386
>Description:
	This SMP box is doing pkgsrc bulk builds (in 2 chroots: one with
	1.6.2 binaries, one with 2.0).
	Today the system paniced after a stream of
vnode: table is full - increase kern.maxvnodes or NVNODE
	(which may or may not be the cause of the problem):
uvm_fault(0xcdc87004, 0x35a0000, 0, 1) -> 0xe

db{0}> tr
_simple_lock(35a0200,c0755bc0,5f1,8,0) at netbsd:_simple_lock+0x41
nfs_invaldircache(cf3bb884,1,ce165cbc,c0376aeb,cf3bb90c) at netbsd:nfs_invaldircache+0x52
nfs_inactive(ce165cd4,0,ce165cec,c03c67b1,c05f4f20) at netbsd:nfs_inactive+0x17d
VOP_INACTIVE(cf3bb884,cdc5b00c,54f,cf3bb8d4,c252e520) at netbsd:VOP_INACTIVE+0x28
vrele(cf3bb884,c07b1fa0,36a,c23acdc4,cdcafc30) at netbsd:vrele+0x124
layer_reclaim(ce165d54,c075e460,0,cdcafc30,c05f4f60) at netbsd:layer_reclaim+0x91
VOP_RECLAIM(cdcafc30,cdc5b00c,11,c07bdb40,cf3bb90c) at netbsd:VOP_RECLAIM+0x28
vclean(cdcafc30,8,cdc5b00c,2000,cdc5b00c) at netbsd:vclean+0x128
vgonel(cdcafc30,cdc5b00c,6bb,c07bdb40,cdcafc30) at netbsd:vgonel+0x56
vgone(cdcafc30,0,0,c07b3558,c05f4fa0) at netbsd:vgone+0x34
layer_inactive(ce165e24,0,ce165e3c,c03c67b1,c05f4f20) at netbsd:layer_inactive+0x3b
VOP_INACTIVE(cdcafc30,cdc5b00c,54f,cdcafc80,0) at netbsd:VOP_INACTIVE+0x28
vrele(cdcafc30,ce165ed0,ce165eac,c03c7510,cdcafc30) at netbsd:vrele+0x124
layer_remove(ce165e94,cebc2970,cdc5b00c,c2431600,c05f4ce0) at netbsd:layer_remove+0x42
VOP_REMOVE(cebc2970,cdcafc30,ce165ef8,2,0) at netbsd:VOP_REMOVE+0x2e
sys_unlink(ccb5ace4,ce165f64,ce165f5c,a,0) at netbsd:sys_unlink+0xfb
syscall_plain() at netbsd:syscall_plain+0x17e
--- syscall (number 10) ---
0x48217ee7:
db{0}> mach cpu 1
using CPU 1
db{0}> tr
acquire(c0828b40,cd6c5efc,400000,0,600) at netbsd:acquire+0x49
_lockmgr(c0828b40,400002,0,c075e460,54f) at netbsd:_lockmgr+0x4c0
_kernel_proc_lock(cdcc6ef8,cd6c5f64,4,6,cdcc6ef8) at netbsd:_kernel_proc_lock+0x
39
syscall_plain() at netbsd:syscall_plain+0x16f
--- syscall (number 6) ---
0x48052a47:

I have a core dump.

>How-To-Repeat:
	see above
>Fix:
	unknown

>Release-Note:

>Audit-Trail:
From: Chuck Silvers <chuq@chuq.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/32090: uvm_fault() after "vnode: table is full"
Date: Fri, 18 Nov 2005 08:41:28 -0800

 there are several other PRs about crashes after "vnode: table is full",
 and all of them also involve a layered file system:

 28705	2.0 kernel panic in layer_unlock with "vnode: table is full"
 31979	panic: lockmgr: release of unlocked lock (layerfs)
 29670	"release of unlocked lock" panic with null fs

 the last one added a tangentially-related improvement (falling back to
 reclaiming a vnode from the hold list in the case that there are vnodes
 on the free list but they're all busy).  this makes it much less likely
 that this problem would be seen, but it doesn't fix the bug responsible
 for the crash.  this change should be pulled up to the 2.x branches,
 though.

 in this dump, the vnode being processed by nfs_inactive() is actually an
 FFS vnode, so most likely it has been reused while this thread was sleeping.
 the vnode is not locked, even though it should be while VOP_INACTIVE() is
 still in process.  vrele() puts the vnode on the freelist before calling
 VOP_INACTIVE(), so the vnode being incorrectly unlocked while this thread
 was sleeping would also allow the vnode to be reused too early like this.

 there are some related fixes in revs 1.19 and 1.21 of layer_vnops.c:

 ----------------------------
 revision 1.21
 date: 2004/06/16 17:59:53;  author: wrstuden;  state: Exp;  lines: +5 -5
 Make sure we actually locked the parent vnode before we clear
 PDIRUNLOCK. The whole reason we have the flag is to note (rare)
 cases where we are supposed to have the parent directory locked
 but don't. Permits error handling code to know what to do with
 the parrent vnode (vrele() vs vput()).
 ----------------------------
 ...
 ----------------------------
 revision 1.19
 date: 2004/06/16 12:37:01;  author: yamt;  state: Exp;  lines: +14 -3
 missing error recover from layer_node_create failure.
 ----------------------------

 both of these changes should also be pulled up to 2.x.
 it doesn't look to me that these changes fix the problem in these PRs, though.
 without these changes, the symptom would have been vnodes left locked when
 they should have been be unlocked, which is the opposite of what we're seeing
 here.

 -Chuck

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.