NetBSD Problem Report #45854

From martin@aprisoft.de  Wed Jan 18 08:31:52 2012
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 30E4463BF06
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 18 Jan 2012 08:31:52 +0000 (UTC)
Message-Id: <20120118083143.52113AF580F@emmas.aprisoft.de>
Date: Wed, 18 Jan 2012 09:31:43 +0100 (CET)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@gnats.NetBSD.org
Subject: vnode clean list corruption?
X-Send-Pr-Version: 3.95

>Number:         45854
>Category:       kern
>Synopsis:       vnode clean list corruption?
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 18 08:35:00 +0000 2012
>Last-Modified:  Wed Nov 07 23:40:03 +0000 2012
>Originator:     Martin Husemann
>Release:        NetBSD 5.99.60
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD after-hours.aprisoft.de 5.99.60 NetBSD 5.99.60 (MODULAR) #97: Tue Jan 17 10:51:06 CET 2012 martin@after-hours.aprisoft.de:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
Architecture: sparc64
Machine: sparc64
>Description:

I get a (more or less reproducable, but variying in details) crash during
vnode cleaning, seems the vnode is not quite correct:

trap type 0x10: cpu 0, pc=122d4ae4 npc=ffffffffdbff1708 pstate=0x820006<PRIV,IE>
kernel trap 10: illegal instruction                                            
Stopped in pid 0.9 (system) at  122d4ae4:       illtrap         11adcb64
db{0}> bt                                                               
vinvalbuf(12192a10, 1, ffffffffffffffff, 39d4000, 0, 0) at netbsd:vinvalbuf+0x58

vclean(12192a10, 8, 0, 0, 17b63c8, 17b6150) at netbsd:vclean+0x1f8
cleanvnode(17b6510, 17b6180, 1739f08, 1739f40, 17b6150, 1885000) at netbsd:clean
vnode+0x13c                                                                    
vdrain_thread(39d4000, 39d4000, 0, 1884b08, 7d, e0018000) at netbsd:vdrain_threa
d+0x90                                                                         
lwp_trampoline(f0059840, 19afaa0, 1124c8, 114400, 114000, 111ba8) at netbsd:lwp_
trampoline+0x8                                                                 
db{0}> show vnode 0x12192a10
OBJECT 0x12192a10: locked=0, pgops=0x172d898, npages=0, refs=-2147483647

VNODE flags 0x1010<MPSAFE,XLOCK>
mp 0x0 numoutput 0 size 0xffffffffffffffff writesize 0xffffffffffffffff
data 0x3baa900 writecount 0 holdcnt 0                                  
tag VT_MFS(3) type VBLK(3) mount 0x0 typedata 0xd4791c0
v_lock 0x12192b20                                      


In a previous instance I saw a bogus puffs vnode instead. This time I
modunload'ed puffs after the test runs. Not sure how VT_MFS
comes into play here, I only have ffs, nfs, ptyfs, procfs and kernfs in
use (but see below, during a test run various other things could have been
used).

>How-To-Repeat:

Not sure, seems to reliable happen over night if I leave the machine running
after doing a full atf test run.

>Fix:
n/a

>Release-Note:

>Audit-Trail:
From: Nat Sloss <nathanialsloss@yahoo.com.au>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/45854
Date: Sat, 7 Jul 2012 15:02:02 +1000

 Hi.

 I think I'm having the same trouble on 6.99.7 i386.

 It happens a random for me.  It happened whilst copying, sometimes after a 
 long while (copying 120GB), sometimes after a short while (copying 50MB).

 This is the back trace:

 uvm_fault(0xc0c297a0, 0, 1) -> 0xe
 fatal page fault in supervisor mode
 trap type 6 code 0 eip c0326575 cs 8 eflags 210213 cr2 2c ilevel 0
 db{2}> bt
 genfs_do_putpages(c6112f24,0,0,0,0,201b,0,c6112f24,0,201b) at netbsd:genfs_do_putpages+0x43a
 genfs_putpages(dbaf3c18,0,0,0,1,0,c4214aa0,c4214aa0,0,c0ad059c) at netbsd:genfs_putpages+0x3f
 VOP_PUTPAGES(c6112f24,0,0,0,0,201b,c0bf63c0,c852fec0,8,c4214aa0) at netbsd:VOP_PUTPAGES+0x68
 vinvalbuf(c6112f24,1,ffffffff,c4214aa0,0,0,0,c421ef8c,0,0) at netbsd:vinvalbuf+0x62
 vclean(c6112f24,8,c4214aa0,c4214aa0,c4214aa0,c08552aa,dbaf3d2c,c085530d,c0bf63c0,c0bf63c0) at netbsd:vclean+0x200
 cleanvnode(c0bf63c0,c0bf63c0,64,c4214aa0,c08552aa,0,0,c0100321,c4214aa0,db7000) at netbsd:cleanvnode+0x9a
 vdrain_thread(c4214aa0,db7000,dbf000,0,c0100307,0,0,0,0,0) at netbsd:vdrain_thread+0x63

 it seams to read a null vnode maybe.  This would occur on an ffs v2 
 filesystem.

 Just a thought I don't know where the null values are coming from but would it 
 be possible not to add null vnodes to the free list/ or remove null entries 
 from the free list.

 Regards,

 Nat.

From: Nat Sloss <nathanialsloss@yahoo.com.au>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/45854
Date: Thu, 8 Nov 2012 10:27:10 +1100

 On Sat, 7 July 2012 15:02:02 you wrote:
 > Hi.
 >
 > I think I'm having the same trouble on 6.99.7 i386.
 >
 > It happens a random for me.  It happened whilst copying, sometimes after a
 > long while (copying 120GB), sometimes after a short while (copying 50MB).
 >
 > This is the back trace:
 >
 > uvm_fault(0xc0c297a0, 0, 1) -> 0xe ...

 Hi.

 I've since found out that my new computer has had motherboard and memory 
 failures (I'll have to wait five weeks for warranty replacements) and this is 
 the cause of my crash related to kern/45854.

 So please ignore my previous addition to this PR.

 Regards,

 Nat.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.