NetBSD Problem Report #39548
From simonb@thistledown.com.au Mon Sep 15 01:25:22 2008
Return-Path: <simonb@thistledown.com.au>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 94E6F63B842
for <gnats-bugs@gnats.NetBSD.org>; Mon, 15 Sep 2008 01:25:22 +0000 (UTC)
Message-Id: <20080915012520.469BDAFD04@thoreau.thistledown.com.au>
Date: Mon, 15 Sep 2008 11:25:20 +1000 (EST)
From: Simon Burge <simonb@NetBSD.org>
Reply-To: Simon Burge <simonb@NetBSD.org>
To: gnats-bugs@gnats.NetBSD.org
Subject: kernel debugging assertion "(vp->v_flag & VONWORKLST)" failed
X-Send-Pr-Version: 3.95
>Number: 39548
>Category: kern
>Synopsis: kernel debugging assertion "(vp->v_flag & VONWORKLST)" failed
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: kern-bug-people
>State: analyzed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Sep 15 01:30:00 +0000 2008
>Closed-Date:
>Last-Modified: Wed May 06 23:15:03 +0000 2009
>Originator: Simon Burge
>Release: NetBSD 4.0_STABLE (sources from netbsd-4 branch on Jul 15 2008)
>Organization:
>Environment:
System: NetBSD thoreau 4.0_STABLE NetBSD 4.0_STABLE (THOREAU) #6: Tue Jul 15 00:48:08 EST 2008 simonb@thoreau:/usr/obj/sys/arch/i386/compile/THOREAU i386
Architecture: i386
Machine: i386
>Description:
I got a
panic: kernel debugging assertion "(vp->v_flag & VONWORKLST)" failed: file "./sys/miscfs/genfs/genfs_vnops.c", line 1283
panic just after starting to move some large files from one
filesystem to another. I was in X at the time, but do have a
crash dump. The backtrace is:
#17 0xc045978c in panic (
fmt=0xc099211c "kernel %sassertion \"%s\" failed: file \"%s\", line %d")
at ./sys/kern/subr_prf.c:235
#18 0xc075a8a6 in __assert (t=0xc08f1884 "debugging ",
f=0xc095eb88 "./sys/miscfs/genfs/genfs_vnops.c", l=1283,
e=0xc09053c6 "(vp->v_flag & VONWORKLST)")
at ../.././sys/lib/libkern/__assert.c:45
#19 0xc0499f6f in genfs_do_putpages (vp=0xd4c3d938, startoff=259100672,
endoff=259104768, flags=9, busypg=0x0)
at ./sys/miscfs/genfs/genfs_vnops.c:1283
#20 0xc0499fe9 in genfs_putpages (v=0xcfe81b60)
at ./sys/miscfs/genfs/genfs_vnops.c:1055
#21 0xc0496ccc in VOP_PUTPAGES (vp=0xd4c3d938, offlo=0, offhi=0, flags=9)
at ./sys/kern/vnode_if.c:1592
#22 0xc03f3cc5 in uvn_put (uobj=0xd4c3d938, offlo=259100672, offhi=259104768,
flags=9) at ./sys/uvm/uvm_vnode.c:273
#23 0xc03ef9b3 in uvm_pageout (arg=0xcf418c1c) at ./sys/uvm/uvm_pdaemon.c:732
#24 0xc01002e1 in proc_trampoline ()
The eariler frames are for a secondary "panic: wdc_exec_command:
polled command not done" panic that happened after the dump (and
then it proceeded to dump again!).
This is with a UP kernel on a machine with 2GB of RAM, a mix of
raidframe and standalone disks (swap is to a RF mirror). The
kernel has DIAGNOSTIC, LOCKDEBUG, DEBUG.
>How-To-Repeat:
Not sure - only observed once so far.
>Fix:
None given.
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->analyzed
State-Changed-By: pooka@NetBSD.org
State-Changed-When: Wed, 17 Sep 2008 18:58:39 +0300
State-Changed-Why:
some sort of analysis attempted.
I'd like to re-prioritize this as "low", since it won't likely affect
people running without DEBUG.
From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/39548: kernel debugging assertion "(vp->v_flag & VONWORKLST)" failed
Date: Wed, 17 Sep 2008 18:57:24 +0300
On Mon Sep 15 2008 at 01:30:00 +0000, Simon Burge wrote:
> System: NetBSD thoreau 4.0_STABLE NetBSD 4.0_STABLE (THOREAU) #6: Tue Jul 15 00:48:08 EST 2008 simonb@thoreau:/usr/obj/sys/arch/i386/compile/THOREAU i386
>
> panic: kernel debugging assertion "(vp->v_flag & VONWORKLST)" failed: file "./sys/miscfs/genfs/genfs_vnops.c", line 1283
>
> panic just after starting to move some large files from one
> filesystem to another. I was in X at the time, but do have a
> crash dump. The backtrace is:
>
> #17 0xc045978c in panic (
> fmt=0xc099211c "kernel %sassertion \"%s\" failed: file \"%s\", line %d")
> at ./sys/kern/subr_prf.c:235
> #18 0xc075a8a6 in __assert (t=0xc08f1884 "debugging ",
> f=0xc095eb88 "./sys/miscfs/genfs/genfs_vnops.c", l=1283,
> e=0xc09053c6 "(vp->v_flag & VONWORKLST)")
> at ../.././sys/lib/libkern/__assert.c:45
> #19 0xc0499f6f in genfs_do_putpages (vp=0xd4c3d938, startoff=259100672,
> endoff=259104768, flags=9, busypg=0x0)
> at ./sys/miscfs/genfs/genfs_vnops.c:1283
> #20 0xc0499fe9 in genfs_putpages (v=0xcfe81b60)
> at ./sys/miscfs/genfs/genfs_vnops.c:1055
> #21 0xc0496ccc in VOP_PUTPAGES (vp=0xd4c3d938, offlo=0, offhi=0, flags=9)
> at ./sys/kern/vnode_if.c:1592
> #22 0xc03f3cc5 in uvn_put (uobj=0xd4c3d938, offlo=259100672, offhi=259104768,
> flags=9) at ./sys/uvm/uvm_vnode.c:273
> #23 0xc03ef9b3 in uvm_pageout (arg=0xcf418c1c) at ./sys/uvm/uvm_pdaemon.c:732
> #24 0xc01002e1 in proc_trampoline ()
>
> The eariler frames are for a secondary "panic: wdc_exec_command:
> polled command not done" panic that happened after the dump (and
> then it proceeded to dump again!).
>
> This is with a UP kernel on a machine with 2GB of RAM, a mix of
> raidframe and standalone disks (swap is to a RF mirror). The
> kernel has DIAGNOSTIC, LOCKDEBUG, DEBUG.
Looks like a bad case of race condition between moving the vnode on
and off the worklist and a conflict between the syncer and pagedaemon.
I think you need to get memory exhausted and the pagedaemon interested
in the situation and a case of very bad luck.
Since this code has changed much for -current, I'd say a) sacrifice
more chicken (poulet de bresse preferred) b) run without DEBUG or c)
remove the KDASSERT and compile a new kernel.
But, you could still print the vnode and append to the PR. Also,
applying meditation and trying to figure out what exactly protects
VONWORKLST can't hurt.
From: Simon Burge <simonb@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, Simon Burge <simonb@NetBSD.org>
Subject: Re: kern/39548: kernel debugging assertion "(vp->v_flag & VONWORKLST)" failed
Date: Thu, 18 Sep 2008 15:16:19 +1000
Antti Kantee wrote:
> Looks like a bad case of race condition between moving the vnode on
> and off the worklist and a conflict between the syncer and pagedaemon.
> I think you need to get memory exhausted and the pagedaemon interested
> in the situation and a case of very bad luck.
>
> Since this code has changed much for -current, I'd say a) sacrifice
> more chicken (poulet de bresse preferred) b) run without DEBUG or c)
> remove the KDASSERT and compile a new kernel.
I might run with c) for now. If that KDASSERT is subject to a race,
should we just remove it on netbsd-4? And is it still potentially a
problem with -current too? I guess some meditation might apply here...
> But, you could still print the vnode and append to the PR.
(gdb) print *(struct vnode *)0xd4c3d938
$3 = {v_uobj = {vmobjlock = {lock_data = 1,
lock_file = 0xc08ff85b "./sys/uvm/uvm_pdaemon.c",
unlock_file = 0xc095eb88 "./sys/miscfs/genfs/genfs_vnops.c",
lock_line = 406, unlock_line = 1464, list = {tqe_next = 0x0,
tqe_prev = 0xc09a6f70}, lock_holder = 0}, pgops = 0xc09a484c, memq = {
tqh_first = 0xc30c0a68, tqh_last = 0xc216c938}, uo_npages = 191,
uo_refs = 1}, v_size = 367638528, v_flag = 65664, v_numoutput = 0,
v_writecount = 0, v_holdcnt = 4, v_mount = 0xc37c7000, v_op = 0xc39d4000,
v_freelist = {tqe_next = 0xda0aa590, tqe_prev = 0xde71d6dc}, v_mntvnodes = {
tqe_next = 0xd2010d04, tqe_prev = 0xdfb2c33c}, v_cleanblkhd = {
lh_first = 0xd3ea85dc}, v_dirtyblkhd = {lh_first = 0x0},
v_synclist_slot = 31, v_synclist = {tqe_next = 0xdd5be3b0,
tqe_prev = 0xc3753ef8}, v_dnclist = {lh_first = 0x0}, v_nclist = {
lh_first = 0x0}, v_un = {vu_mountedhere = 0xe9540408,
vu_socket = 0xe9540408, vu_specinfo = 0xe9540408,
vu_fifoinfo = 0xe9540408, vu_ractx = 0xe9540408}, v_lease = 0x0,
v_type = VREG, v_tag = VT_UFS, v_lock = {lk_interlock = {lock_data = 0,
lock_file = 0xc0901702 "./sys/kern/kern_lock.c",
unlock_file = 0xc0901702 "./sys/kern/kern_lock.c", lock_line = 568,
unlock_line = 920, list = {tqe_next = 0x0, tqe_prev = 0x0},
lock_holder = 4294967295}, lk_flags = 0, lk_sharecount = 0,
lk_exclusivecount = 0, lk_recurselevel = 0, lk_waitcount = 0,
lk_wmesg = 0xc0904d6e "vnlock", lk_un = {lk_un_sleep = {
lk_sleep_lockholder = -1, lk_sleep_locklwp = 0, lk_sleep_prio = 20,
lk_sleep_timo = 0, lk_newlock = 0x0}, lk_un_spin = {
lk_spin_cpu = 4294967295, lk_spin_list = {tqe_next = 0x0,
tqe_prev = 0x14}}},
lk_lock_file = 0xc095eb88 "./sys/miscfs/genfs/genfs_vnops.c",
lk_unlock_file = 0xc095eb88 "./sys/miscfs/genfs/genfs_vnops.c",
lk_lock_line = 309, lk_unlock_line = 325}, v_vnlock = 0xd4c3d9c4,
v_data = 0xd977978c, v_klist = {slh_first = 0x0}}
Simon.
From: Simon Burge <simonb@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org, pooka@NetBSD.org
Subject: Re: kern/39548 (kernel debugging assertion "(vp->v_flag & VONWORKLST)" failed)
Date: Thu, 18 Sep 2008 15:20:36 +1000
pooka@NetBSD.org wrote:
> Synopsis: kernel debugging assertion "(vp->v_flag & VONWORKLST)" failed
>
> State-Changed-From-To: open->analyzed
> State-Changed-By: pooka@NetBSD.org
> State-Changed-When: Wed, 17 Sep 2008 18:58:39 +0300
> State-Changed-Why:
> some sort of analysis attempted.
>
> I'd like to re-prioritize this as "low", since it won't likely affect
> people running without DEBUG.
Sound ok, I've done this.
Thanks,
Simon.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/39548: kernel debugging assertion "(vp->v_flag &
VONWORKLST)" failed
Date: Wed, 6 May 2009 23:12:44 +0000
see also 41157.
--
David A. Holland
dholland@netbsd.org
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.