NetBSD Problem Report #51614

From www@NetBSD.org  Tue Nov  8 22:35:07 2016
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 423827A16D
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  8 Nov 2016 22:35:07 +0000 (UTC)
Message-Id: <20161108223505.CC7807A2C8@mollari.NetBSD.org>
Date: Tue,  8 Nov 2016 22:35:05 +0000 (UTC)
From: jdbaker@mylinuxisp.com
Reply-To: jdbaker@consolidated.net
To: gnats-bugs@NetBSD.org
Subject: vdrain/cache trap panics on NetBSD/amd64-7.0_STABLE
X-Send-Pr-Version: www-1.0

>Number:         51614
>Notify-List:    jdbaker@consolidated.net
>Category:       kern
>Synopsis:       vdrain/cache trap panics on NetBSD/amd64-7.0_STABLE
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Nov 08 22:40:00 +0000 2016
>Closed-Date:    Thu Jul 22 23:35:32 +0000 2021
>Last-Modified:  Thu Jul 22 23:40:01 +0000 2021
>Originator:     John D. Baker
>Release:        NetBSD/amd64-7.0_STABLE
>Organization:
>Environment:
NetBSD yggdrasil.technoskunk.fur 7.0_STABLE NetBSD 7.0_STABLE (YGGDRASIL) #45: Fri Oct 14 10:10:14 CDT 2016  sysop@yggdrasil.technoskunk.fur:/r0/build/netbsd-7/obj/amd64/sys/arch/amd64/compile/YGGDRASIL amd64

>Description:
My post to netbsd-users@:

Every so often, my file server panics and reboots--which it did
just a few hours ago.

The system runs a RAIDframe RAID-R across 8 1TB SATA disks with a
single filesystem.  It also monitors a USB-attached UPS via 'apcupsd',
and serves as slave DNS and NTP.

The saved core reports, via 'crash':

$ crash -N netbsd.6 -M netbsd.6.core
Crash version 7.0_STABLE, image version 7.0_STABLE.
System panicked: trap
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NAGR() at 0
_KERNEL_OPT_NAGR() at 0
vpanic() at vpanic+0x145
snprintf() at snprintf
startlwp() at startlwp
calltrap() at calltrap+0x11
cache_purge1() at cache_purge1+0x10f
vclean() at vclean+0xa8
cleanvnode() at cleanvnode+0xd0
vdrain_thread() at vdrain_thread+0x58
crash>

The previous occasion (29 July 2016) showed:

$ crash -N netbsd.5 -M netbsd.5.core 
Crash version 7.0_STABLE, image version 7.0_STABLE.
System panicked: trap
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NAGR() at 0
_KERNEL_OPT_NAGR() at 0
vpanic() at vpanic+0x145
snprintf() at snprintf
startlwp() at startlwp
calltrap() at calltrap+0x11
cache_reclaim() at cache_reclaim+0x201
cache_thread() at cache_thread+0x15

Before that, (25 Dec 2015):

$ crash -N netbsd.4 -M netbsd.4.core 
Crash version 7.0_STABLE, image version 7.0_STABLE.
System panicked: trap
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NAGR() at 0
_KERNEL_OPT_NAGR() at 0
vpanic() at vpanic+0x145
snprintf() at snprintf
startlwp() at startlwp
calltrap() at calltrap+0x11
uvm_pagefree() at uvm_pagefree+0xd4
genfs_do_putpages() at genfs_do_putpages+0xce0
VOP_PUTPAGES() at VOP_PUTPAGES+0x3a
uvm_pageout() at uvm_pageout+0x2f1

and before that (7 Nov 2015):

$ crash -N netbsd.3 -M netbsd.3.core 
Crash version 7.0_STABLE, image version 7.0_STABLE.
System panicked: trap
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NAGR() at 0
_KERNEL_OPT_NAGR() at 0
vpanic() at vpanic+0x145
snprintf() at snprintf
startlwp() at startlwp
calltrap() at calltrap+0x11
ufsquota_free() at ufsquota_free+0x15
ufs_reclaim() at ufs_reclaim+0xaf
ffs_reclaim() at ffs_reclaim+0xa1
VOP_RECLAIM() at VOP_RECLAIM+0x2f
vclean() at vclean+0xa6
cleanvnode() at cleanvnode+0xb8
vdrain_thread() at vdrain_thread+0x58

And the earliest I have saved (6 Nov 2015):

$ crash -N netbsd.2 -M netbsd.2.core 
Crash version 7.0_STABLE, image version 7.0_STABLE.
System panicked: trap
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NAGR() at 0
_KERNEL_OPT_NAGR() at 0
vpanic() at vpanic+0x145
snprintf() at snprintf
startlwp() at startlwp
calltrap() at calltrap+0x11
VOP_PUTPAGES() at VOP_PUTPAGES+0x3a
uvm_pageout() at uvm_pageout+0x2f1

Greg Oster reports:

One of my machines started doing
something similar to your last panic:

uvm_fault(0xffffffff81041020, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff809c29bd cs 8 rflags 10202 cr2 12 ilevel
0 rsp fffffe810ed60b00 curlwp 0xfffffe823ce98780 pid 0.95 lowest kstack
0xfffffe810ed5e2c0 panic: trap
cpu6: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
uvm_pagefree() at netbsd:uvm_pagefree+0xd4
genfs_do_putpages() at netbsd:genfs_do_putpages+0xce0
VOP_PUTPAGES() at netbsd:VOP_PUTPAGES+0x3a
uvm_pageout() at netbsd:uvm_pageout+0x2f1
cpu6: End traceback...
uvm_fault(0xfffffe813bd35e70, 0x0, 2) -> e
fatal page fault in supervisor mode
trap type 6 code 2 rip ffffffff805ac769 cs 8 rflags 10202 cr2 84 ilevel
8 rsp fffffe806abeed98 curlwp 0xfffffe80a5198b60 pid 1766.1 lowest
kstack 0xfffffe806abec2c0

dumping to dev 18,1 (offset=18259895, size=2092553):

but I havn't investigated as to what's up yet....  (it crashed Oct 13,
and then Nov 7 and Nov 8... )

>How-To-Repeat:
See above.  It seems to happen entirely randomly and spontaneously.  Not
sure if there was some unusual activity on the machine at the time.
>Fix:

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Thu, 22 Jul 2021 23:35:32 +0000
State-Changed-Why:
many bugs in the vnode lifecycle code have been fixed since 2016


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: jdbaker@consolidated.net
Subject: Re: kern/51614 (vdrain/cache trap panics on NetBSD/amd64-7.0_STABLE)
Date: Thu, 22 Jul 2021 23:37:07 +0000

 On Thu, Jul 22, 2021 at 11:35:33PM +0000, dholland@NetBSD.org wrote:
  > Synopsis: vdrain/cache trap panics on NetBSD/amd64-7.0_STABLE
  > 
  > State-Changed-From-To: open->closed
  > State-Changed-By: dholland@NetBSD.org
  > State-Changed-When: Thu, 22 Jul 2021 23:35:32 +0000
  > State-Changed-Why:
  > many bugs in the vnode lifecycle code have been fixed since 2016

 (That is: this is almost certainly fixed, and if not, it will likely
 manifest differently, so unless you've seen it recently we should
 assume it's history. If you _have_ seen it recently, let us/me
 know...)

 -- 
 David A. Holland
 dholland@netbsd.org

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.