NetBSD Problem Report #32090
From Manuel.Bouyer@lip6.fr Wed Nov 16 13:57:15 2005
Return-Path: <Manuel.Bouyer@lip6.fr>
Received: from isis.lip6.fr (isis.lip6.fr [132.227.60.2])
by narn.netbsd.org (Postfix) with ESMTP id 58D7763B8CA
for <gnats-bugs@gnats.NetBSD.org>; Wed, 16 Nov 2005 13:57:14 +0000 (UTC)
Message-Id: <200511161357.jAGDvBuT024892@pop.lip6.fr>
Date: Wed, 16 Nov 2005 14:57:11 +0100 (CET)
From: Manuel.Bouyer@lip6.fr (Manuel Bouyer)
Reply-To:
To: gnats-bugs@netbsd.org
Subject: panic after "vnode: table is full" with layerfs
X-Send-Pr-Version: 3.95
>Number: 32090
>Category: kern
>Synopsis: panic after "vnode: table is full" with layerfs
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Nov 16 13:58:00 +0000 2005
>Last-Modified: Sun Apr 16 14:45:36 +0000 2006
>Originator: Manuel Bouyer
>Release: NetBSD 2.1_STABLE
>Organization:
UPMC/LIP6/ASIM
>Environment:
System: NetBSD pop.lip6.fr 2.1_STABLE NetBSD 2.1_STABLE (GENERIC.MPDEBUG) #1: Wed Nov 2 13:30:15 CET 2005 bouyer@pop.lip6.fr:/local/pop1/bouyer/tmp/i386/obj/local/pop1/bouyer/netbsd-2/src/sys/arch/i386/compile/GENERIC.MPDEBUG i386
Architecture: i386
Machine: i386
>Description:
This SMP box is doing pkgsrc bulk builds (in 2 chroots: one with
1.6.2 binaries, one with 2.0).
Today the system paniced after a stream of
vnode: table is full - increase kern.maxvnodes or NVNODE
(which may or may not be the cause of the problem):
uvm_fault(0xcdc87004, 0x35a0000, 0, 1) -> 0xe
db{0}> tr
_simple_lock(35a0200,c0755bc0,5f1,8,0) at netbsd:_simple_lock+0x41
nfs_invaldircache(cf3bb884,1,ce165cbc,c0376aeb,cf3bb90c) at netbsd:nfs_invaldircache+0x52
nfs_inactive(ce165cd4,0,ce165cec,c03c67b1,c05f4f20) at netbsd:nfs_inactive+0x17d
VOP_INACTIVE(cf3bb884,cdc5b00c,54f,cf3bb8d4,c252e520) at netbsd:VOP_INACTIVE+0x28
vrele(cf3bb884,c07b1fa0,36a,c23acdc4,cdcafc30) at netbsd:vrele+0x124
layer_reclaim(ce165d54,c075e460,0,cdcafc30,c05f4f60) at netbsd:layer_reclaim+0x91
VOP_RECLAIM(cdcafc30,cdc5b00c,11,c07bdb40,cf3bb90c) at netbsd:VOP_RECLAIM+0x28
vclean(cdcafc30,8,cdc5b00c,2000,cdc5b00c) at netbsd:vclean+0x128
vgonel(cdcafc30,cdc5b00c,6bb,c07bdb40,cdcafc30) at netbsd:vgonel+0x56
vgone(cdcafc30,0,0,c07b3558,c05f4fa0) at netbsd:vgone+0x34
layer_inactive(ce165e24,0,ce165e3c,c03c67b1,c05f4f20) at netbsd:layer_inactive+0x3b
VOP_INACTIVE(cdcafc30,cdc5b00c,54f,cdcafc80,0) at netbsd:VOP_INACTIVE+0x28
vrele(cdcafc30,ce165ed0,ce165eac,c03c7510,cdcafc30) at netbsd:vrele+0x124
layer_remove(ce165e94,cebc2970,cdc5b00c,c2431600,c05f4ce0) at netbsd:layer_remove+0x42
VOP_REMOVE(cebc2970,cdcafc30,ce165ef8,2,0) at netbsd:VOP_REMOVE+0x2e
sys_unlink(ccb5ace4,ce165f64,ce165f5c,a,0) at netbsd:sys_unlink+0xfb
syscall_plain() at netbsd:syscall_plain+0x17e
--- syscall (number 10) ---
0x48217ee7:
db{0}> mach cpu 1
using CPU 1
db{0}> tr
acquire(c0828b40,cd6c5efc,400000,0,600) at netbsd:acquire+0x49
_lockmgr(c0828b40,400002,0,c075e460,54f) at netbsd:_lockmgr+0x4c0
_kernel_proc_lock(cdcc6ef8,cd6c5f64,4,6,cdcc6ef8) at netbsd:_kernel_proc_lock+0x
39
syscall_plain() at netbsd:syscall_plain+0x16f
--- syscall (number 6) ---
0x48052a47:
I have a core dump.
>How-To-Repeat:
see above
>Fix:
unknown
>Release-Note:
>Audit-Trail:
From: Chuck Silvers <chuq@chuq.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/32090: uvm_fault() after "vnode: table is full"
Date: Fri, 18 Nov 2005 08:41:28 -0800
there are several other PRs about crashes after "vnode: table is full",
and all of them also involve a layered file system:
28705 2.0 kernel panic in layer_unlock with "vnode: table is full"
31979 panic: lockmgr: release of unlocked lock (layerfs)
29670 "release of unlocked lock" panic with null fs
the last one added a tangentially-related improvement (falling back to
reclaiming a vnode from the hold list in the case that there are vnodes
on the free list but they're all busy). this makes it much less likely
that this problem would be seen, but it doesn't fix the bug responsible
for the crash. this change should be pulled up to the 2.x branches,
though.
in this dump, the vnode being processed by nfs_inactive() is actually an
FFS vnode, so most likely it has been reused while this thread was sleeping.
the vnode is not locked, even though it should be while VOP_INACTIVE() is
still in process. vrele() puts the vnode on the freelist before calling
VOP_INACTIVE(), so the vnode being incorrectly unlocked while this thread
was sleeping would also allow the vnode to be reused too early like this.
there are some related fixes in revs 1.19 and 1.21 of layer_vnops.c:
----------------------------
revision 1.21
date: 2004/06/16 17:59:53; author: wrstuden; state: Exp; lines: +5 -5
Make sure we actually locked the parent vnode before we clear
PDIRUNLOCK. The whole reason we have the flag is to note (rare)
cases where we are supposed to have the parent directory locked
but don't. Permits error handling code to know what to do with
the parrent vnode (vrele() vs vput()).
----------------------------
...
----------------------------
revision 1.19
date: 2004/06/16 12:37:01; author: yamt; state: Exp; lines: +14 -3
missing error recover from layer_node_create failure.
----------------------------
both of these changes should also be pulled up to 2.x.
it doesn't look to me that these changes fix the problem in these PRs, though.
without these changes, the symptom would have been vnodes left locked when
they should have been be unlocked, which is the opposite of what we're seeing
here.
-Chuck
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.