NetBSD Problem Report #44809

From martin@aprisoft.de  Thu Mar 31 07:44:53 2011
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id EDC1163BC20
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 31 Mar 2011 07:44:52 +0000 (UTC)
Message-Id: <20110331074445.10B08AF580E@emmas.aprisoft.de>
Date: Thu, 31 Mar 2011 09:44:45 +0200 (CEST)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@gnats.NetBSD.org
Subject: strange vclean() crash
X-Send-Pr-Version: 3.95

>Number:         44809
>Category:       kern
>Synopsis:       strange vclean() crash
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Mar 31 07:45:00 +0000 2011
>Closed-Date:    Wed Oct 12 13:00:06 +0000 2016
>Last-Modified:  Wed Oct 12 13:00:06 +0000 2016
>Originator:     Martin Husemann
>Release:        NetBSD 5.99.48
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD after-hours.aprisoft.de 5.99.48 NetBSD 5.99.48 (MODULAR) #36: Wed Mar 30 11:57:17 CEST 2011 martin@after-hours.aprisoft.de:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
Architecture: sparc64
Machine: sparc64
>Description:

Every now and then this machine crashes, in what looks like always the same
way:

trap type 0x34: cpu 0, pc=15176e4 npc=15176e8 pstate=0x820006<PRIV,IE>
kernel trap 34: mem address not aligned
Stopped in pid 10502.1 (find) at        netbsd:VOP_LOCK+0x64:   jmpl           %g1 + %g0], %o7
db{0}> bt
vclean(f242b40, 8, 0, 0, 96, 0) at netbsd:vclean+0xa8
getcleanvnode(f242b40, 0, f277400, 15, 15, 18cd400) at netbsd:getcleanvnode+0x15c
getnewvnode(1, d6e6030, d5e0830, 114d55f0, d1f0454, 0) at netbsd:getnewvnode+0x74
ffs_vget(d6e6030, 3bcd72, 114d5730, 4, 4000, d1f04f0) at netbsd:ffs_vget+0x20
ufs_lookup(0, 2c4, 300, 3fff, 2, 2) at netbsd:ufs_lookup+0x740
VOP_LOOKUP(17eb4a60, 114d5b40, 114d5b68, 179af10, badcafe, 0) at netbsd:VOP_LOOKUP+0xac
do_lookup(f277400, 114d5b20, 10, 10, 0, 114d58e8) at netbsd:do_lookup+0x48c
namei(114d5b20, 114d5b98, badcafe, 114d5b20, badcafe, badcafe) at netbsd:namei+0x14c
do_sys_stat(0, 0, 114d5c68, badcafe, badcafe, badcafe) at netbsd:do_sys_stat+0x38
sys___lstat50(f277400, 114d5dc0, 114d5e00, 4074f6a0, 4093f160, 4093f138) at netbsd:sys___lstat50+0x10
syscall_plain(114d5ed0, 114d5f50, 40744a88, 24f, 40744a88, c00) at netbsd:syscall_plain+0x138
?(40a02880, 40a028b0, 0, 1, 0, 40a203a0) at 0x1008f58
db{0}> show vnode 0xf242b40
OBJECT 0xf242b40: locked=0, pgops=0x170e708, npages=0, refs=-2147483647

VNODE flags 0x1010<MPSAFE,XLOCK>
mp 0x0 numoutput 0 size 0xffffffffffffffff writesize 0xffffffffffffffff
data 0x10073910 writecount 0 holdcnt 0
tag VT_MFS(3) type VBLK(3) mount 0x0 typedata 0x100a3cd0
v_lock 0xf242c48

crash happens here:

netbsd:VOP_LOCK+0x5c:   ldx             [%i0 + 0x98], %g2
netbsd:VOP_LOCK+0x60:   ldx             [%g2 + 0xf8], %g1
netbsd:VOP_LOCK+0x64:   jmpl            [%g1 + %g0], %o7
netbsd:VOP_LOCK+0x68:   add             %fp, 0x7d7, %o0

%i0 is clearly bogus:
i0          0x2000

so we end up with garbage:
g1          0x39d77614b250ef8d
g2          0xe78ee10


In source terms, this is at:
(gdb) list *(VOP_LOCK+0x64)
0x15176e4 is in VOP_LOCK (../../../../kern/vnode_if.c:1103).
1098            a.a_desc = VDESC(vop_lock);
1099            a.a_vp = vp;
1100            a.a_flags = flags;
1101            mpsafe = (vp->v_vflag & VV_MPSAFE);
1102            if (!mpsafe) { KERNEL_LOCK(1, curlwp); }
1103            error = (VCALL(vp, VOFFSET(vop_lock), &a));
1104            if (!mpsafe) { KERNEL_UNLOCK_ONE(curlwp); }
1105            return error;
1106    }

and called from:

(gdb) list *(vclean+0xa8)
0x1502968 is in vclean (../../../../kern/vfs_subr.c:1854).
1849            vp->v_iflag &= ~(VI_TEXT|VI_EXECMAP);
1850            active = (vp->v_usecount & VC_MASK) > 1;
1851    
1852            /* XXXAD should not lock vnode under layer */
1853            mutex_exit(&vp->v_interlock);
1854            VOP_LOCK(vp, LK_EXCLUSIVE);
1855    
1856            /*
1857             * Clean out any cached data associated with the vnode.
1858             * If purging an active vnode, it must be closed and

This are all mounts involved:

/dev/sd0a on / type ffs (log, local)
kernfs on /kern type kernfs (local)
ptyfs on /dev/pts type ptyfs (local)
procfs on /proc type procfs (local)

Any ideas what to examine when it happens next time?

>How-To-Repeat:

No idea, just happens "sometimes" for me.

>Fix:

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 09 Oct 2016 22:05:55 +0000
State-Changed-Why:
has this been seen since the vnode cache rewrite?


State-Changed-From-To: feedback->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Wed, 12 Oct 2016 13:00:06 +0000
State-Changed-Why:
Have not seen this in a long time, assume it is fixed.


>Unformatted:
Home
PR Database Search
(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.