NetBSD Problem Report #49530

From martin@duskware.de  Sat Jan  3 14:10:33 2015
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A2E01A654D
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  3 Jan 2015 14:10:33 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: kernel crash with corrupt stack/invalid backtrace
X-Send-Pr-Version: 3.95

>Number:         49530
>Category:       kern
>Synopsis:       kernel crash with corrupt stack/invalid backtrace
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jan 03 14:15:00 +0000 2015
>Last-Modified:  Tue Feb 10 00:25:00 +0000 2015
>Originator:     Martin Husemann
>Release:        NetBSD 7.99.4
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD thirdstage.duskware.de 7.99.4 NetBSD 7.99.4 (MODULAR) #243: Fri Jan 2 18:46:49 CET 2015 martin@thirdstage.duskware.de:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
Architecture: sparc64
Machine: sparc64
>Description:

I got this twice after updating my machine recently - once sometime over night
(probably during the nightly runs, machine was otherwise idle), the last time
during a "cvs up" (otherwise idle).

The machine drops into ddb, but stack backtrace on both CPUs is empty.
The register window dump however is interesting:

text_access_fault: pc=0 va=0
kernel trap 64: +fast instruction access MMU miss
Stopped in pid 0.9 (system) at  0:      undefined
db{0}> mach stack                                
Window 0 frame64 0x1b027ba80 locals, ins:
103af9660 2ac8 0 0 17e73a0 17ac220 17ac1e8 17bf678
1052fb8f0 20002 10 4 104754008 1749ee0 1b027b351=sp 156d73c=pc:netbsd:vn_lock+0x7c
Window 1 frame64 0x1b027bb50 locals, ins:
18310f0 17ac1e8 17ac220 17ac220 1816908 17ac220 17ac1e8 2014000
1052fb8f0 20002 103b558e0 1ce4000 104754008 0 1b027b401=sp 1569624=pc:netbsd:vclean+0x84
Window 2 frame64 0x1b027bc00 locals, ins:
0 1c16400 1c98000 1 4 17ba020 0 6        
1052fb8f0 103b558e0 103b558e0 0 103b558e0 1 1b027b4c1=sp 156ad1c=pc:netbsd:cleanvnode+0x13c
Window 3 frame64 0x1b027bcc0 locals, ins:
17ac220 17ac1e8 1831558 18313d8 0 17ac220 17ac1e8 17bf678
0 1831288 1c95400 1c95580 1c95580 1052fb8f0 1b027b571=sp 156aee4=pc:netbsd:vdrain_thread+0x84
Window 4 frame64 0x1b027bd70 locals, ins:
103b44420 1b0278000 deaddead 1b027bed0 1c6f868 1b0270000 e0048000 e0048000
1c95500 1831598 1c0a154 1c95600 1c0a140 1c955c0 1b027b621=sp 1011fb8=pc:netbsd:lwp_trampoline+0x8
Window 5 frame64 0x1b027be20 locals, ins:
156ae60 103b558e0 103b558e0 1fff 28 30 3 2014000
f005eab8 113668 116000 1 0 fedc9c48 fedc9651=sp 100950=pc:100950
Window 6 frame64 0xfedc9e50 locals, ins:                        
fff72020 10f8 0 f0068080 10000 1001f59e0 0 7fff001927c
0 0 0 0 f005eab8 0 fedc9701=sp 0=pc:0                 
Window 7 frame64 0xfedc9f00 locals, ins:
0 0 0 0 0 0 0 0                         
0 0 0 0 0 0 0=sp 0=pc:0



>How-To-Repeat:
n/a

>Fix:
n/a

>Audit-Trail:
From: Martin Husemann <martin@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49530: kernel crash with corrupt stack/invalid backtrace
Date: Sat, 3 Jan 2015 14:23:35 +0000

 Forgot to add mount info:

 /dev/sd0a on / type ffs (log, local)
 kernfs on /kern type kernfs (local)
 ptyfs on /dev/pts type ptyfs (local)
 tmpfs on /tmp type tmpfs (local)
 tmpfs on /var/shm type tmpfs (local)
 procfs on /proc type procfs (local)

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49530: kernel crash with corrupt stack/invalid backtrace
Date: Sat, 10 Jan 2015 11:30:41 +0100

 And again a crash from the vdrain thread:

 db{0}> mach stack
 Window 0 frame64 0x1b027ba80 locals, ins:
 103b458c0 1b0278000 ffffffffffffffff 1b027bed0 1c6f988 1b018a000 e0048000 e0048000
 1052190a0 20002 10 4 1048bb008 174b7a0 1b027b351=sp 156eefc=pc:netbsd:vn_lock+0x7c
 Window 1 frame64 0x1b027bb50 locals, ins:
 1832b50 17adac0 17adaf8 17adaf8 1818338 17adaf8 17adac0 0
 1052190a0 20002 103b558e0 1ce4400 1048bb008 0 1b027b401=sp 156ade4=pc:netbsd:vcl
 ean+0x84                                                                       
 Window 2 frame64 0x1b027bc00 locals, ins:
 0 0 0 0 17e8d68 17adaf8 17adac0 17c0f80  
 1052190a0 103b558e0 103b558e0 0 103b558e0 1 1b027b4c1=sp 156c4dc=pc:netbsd:cleanvnode+0x13c
 Window 3 frame64 0x1b027bcc0 locals, ins:
 17adaf8 17adac0 1832fb8 1832e38 0 17adaf8 17adac0 17c0f80
 0 1832ce8 1c95400 1c95740 1c95740 1052190a0 1b027b571=sp 156c6a4=pc:netbsd:vdrain_thread+0x84
 Window 4 frame64 0x1b027bd70 locals, ins:
 103b44420 1b0278000 deaddead 1b027bed0 1c6f988 1b0270000 e0048000 e0048000
 1c956c0 1832ff8 1c0a1fc 1c957c0 1c0a1e8 1c95780 1b027b621=sp 1011fb8=pc:netbsd:lwp_trampoline+0x8

 Sounds to me like a somehow corrupted vnode on the cleanlist?

 The crash happens in vn_lock() when calling VOP_LOCK(), probably the

 	error = (VCALL(vp, VOFFSET(vop_lock), &a));

 going wrong. Is there an easy way to assert that the vop_lock points to
 a kernel text address?

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49530: kernel crash with corrupt stack/invalid backtrace
Date: Sat, 10 Jan 2015 11:56:03 +0100

 Could this just be a NULL vnode pointer used with VCALL?

 Martin

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49530: kernel crash with corrupt stack/invalid backtrace
Date: Tue, 10 Feb 2015 00:22:42 +0000

 On Sat, Jan 10, 2015 at 11:00:01AM +0000, Martin Husemann wrote:
  >  Could this just be a NULL vnode pointer used with VCALL?

 If it were a null vnode you'd get a fault reading the ops table before
 it jumped anywhere.

 (Is it expected that jumping to null loses the stack backtrace? That
 seems pretty feeble of ddb.)

 However, it does look like it jumped to null, so a reasonable
 conclusion is that it got a null function pointer out of the ops
 table... is it feasible to figure out the vnode address from the
 window dump and do "show vnode" on it? It is probably in a register
 but you probably need to disassemble the indirect call logic to figure
 out which one.

 There are also only a few vnode ops tables (especially since you
 probably aren't using any of the obscure fses) so you might try having
 it check vn->v_op and print or bail if it's not one of the tables
 belonging to one of the fses you're using.

 Another wild guess: assert in vn_lock that the vnode isn't a marker
 vnode. ((vn->v_iflag & VI_MARKER) == 0)

 -- 
 David A. Holland
 dholland@netbsd.org

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.