NetBSD Problem Report #40361
From oster@scrooge.localdomain Sat Jan 10 20:51:27 2009
Return-Path: <oster@scrooge.localdomain>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 2D0C963B8BA
for <gnats-bugs@gnats.NetBSD.org>; Sat, 10 Jan 2009 20:51:27 +0000 (UTC)
Message-Id: <20090110195116.4BC2C113D94@scrooge.localdomain>
Date: Sat, 10 Jan 2009 13:51:15 -0600 (CST)
From: oster@netbsd.org
Reply-To: oster@netbsd.org
To: gnats-bugs@gnats.NetBSD.org
Subject: WAPBL issues with ffs_fsync in -current
X-Send-Pr-Version: 3.95
>Number: 40361
>Category: kern
>Synopsis: WAPBL locking panic in -current
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: ad
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jan 10 20:55:00 +0000 2009
>Closed-Date: Mon Feb 23 02:06:20 +0000 2009
>Last-Modified: Tue Feb 24 04:15:12 +0000 2009
>Originator: Greg Oster
>Release: NetBSD 5.99.6
>Organization:
>Environment:
System: NetBSD cow 5.99.6 NetBSD 5.99.6 (XEN3_DOMU.RF) #0: Sat Jan 10 12:37:48 CST 2009 oster@quad:/u1/builds/build120/src/obj/amd64/u1/builds/build120/src/sys/arch/amd64/compile/XEN3_DOMU.RF amd64
and
System: NetBSD thog 5.99.6 NetBSD 5.99.6 (RAIDFRAME.ddb) #0: Fri Jan 9 21:25:28 CST 2009 oster@quad:/u1/devel/current/src/sys/arch/i386/compile/RAIDFRAME.ddb i386
Architecture: i386/amd64/xen
Machine: i386/amd64/xen
>Description:
With a DIAGNOSTIC kernel and a non-logging /, create and mount a
WAPBL-enabled filesystem. Touch a file on said filesystem, and
call 'sync'. Observe the following panic:
panic: kernel diagnostic assertion "rw_write_held(&wl->wl_rwlock)" failed: file "/u1/builds/build120/src/sys/kern/vfs_wapbl.c", line 1580
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80137d7d cs e030 rflags 246 cr2 7f7ffdb55430 cpl 0 rsp ffffa00046f2f670
Stopped in pid 406.1 (sync) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
cpu_Debugger() at netbsd:cpu_Debugger+0x9
panic() at netbsd:panic+0x279
__kernassert() at netbsd:__kernassert+0x46
wapbl_jlock_assert() at netbsd:wapbl_jlock_assert+0x45
wapbl_add_buf() at netbsd:wapbl_add_buf+0x8a
bdwrite() at netbsd:bdwrite+0x18c
bwrite() at netbsd:bwrite+0x14d
vn_bwrite() at netbsd:vn_bwrite+0x21
VOP_BWRITE() at netbsd:VOP_BWRITE+0x60
bawrite() at netbsd:bawrite+0x61
ffs_full_fsync() at netbsd:ffs_full_fsync+0x4dc
ffs_fsync() at netbsd:ffs_fsync+0x92
VOP_FSYNC() at netbsd:VOP_FSYNC+0x7f
ffs_sync() at netbsd:ffs_sync+0x2c2
VFS_SYNC() at netbsd:VFS_SYNC+0x2c
sys_sync() at netbsd:sys_sync+0xab
sy_call() at 0xffffffff803fe659
syscall() at netbsd:syscall+0x1ba
ds 0xf650
es 0xdc08
fs 0x82
gs 0x1e84
rdi 0
rsi 0xd
rbp 0xffffa00046f2f670
rbx 0x500d50
rdx 0xffffffff80a25000
rcx 0xffffffff80a11000
rax 0x1
r8 0xffffffff806b3dc0 cpu_info_primary
r9 0x7f7fffffffe0
r10 0xffffa00046f2f5c0
r11 0xe033
r12 0x7f7fffffffe0
r13 0x7f7ffdffa000
r14 0xffffa00046f197c0
r15 0xffffa00046ebe010
rip 0xffffffff80137d7d breakpoint+0x5
cs 0xe030
rflags 0x246
rsp 0xffffa00046f2f670
ss 0xe02b
netbsd:breakpoint+0x5: leave
db> tr
breakpoint() at netbsd:breakpoint+0x5
cpu_Debugger() at netbsd:cpu_Debugger+0x9
panic() at netbsd:panic+0x279
__kernassert() at netbsd:__kernassert+0x46
wapbl_jlock_assert() at netbsd:wapbl_jlock_assert+0x45
wapbl_add_buf() at netbsd:wapbl_add_buf+0x8a
bdwrite() at netbsd:bdwrite+0x18c
bwrite() at netbsd:bwrite+0x14d
vn_bwrite() at netbsd:vn_bwrite+0x21
VOP_BWRITE() at netbsd:VOP_BWRITE+0x60
bawrite() at netbsd:bawrite+0x61
ffs_full_fsync() at netbsd:ffs_full_fsync+0x4dc
ffs_fsync() at netbsd:ffs_fsync+0x92
VOP_FSYNC() at netbsd:VOP_FSYNC+0x7f
ffs_sync() at netbsd:ffs_sync+0x2c2
VFS_SYNC() at netbsd:VFS_SYNC+0x2c
sys_sync() at netbsd:sys_sync+0xab
sy_call() at 0xffffffff803fe659
syscall() at netbsd:syscall+0x1ba
db>
I see this both in a -current amd64 XEN3 DOMU, and in a -current i386
native install.
In ffs_full_fsync we are in the "this isn't a filesystem with
logging" mode, yet when we get to bdwrite() it somehow thinks the vp
belongs to a filesystem with logging. At that point we call
wapbl_add_buf(), do a DIAGNOSTIC check to make sure that the locks are
fine, and then keel over.
>How-To-Repeat:
1) Have / be a non-logging filesystem.
2) Have /mnt be a logging filesystem (e.g. 'mount -o log /dev/wd1g /mnt')
3) As root: cd /mnt ; touch foo ; sync
4) *boom*
Note that if / is a logging filesystem as well, then there are no
issues with /mnt in that case.
>Fix:
Unknown. I note that 5.0_BETA does not have this issue.
Additional details available on request, as I can easily replicate
this panic.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Wed, 04 Feb 2009 08:45:06 +0000
Responsible-Changed-Why:
take
State-Changed-From-To: open->feedback
State-Changed-By: ad@NetBSD.org
State-Changed-When: Sun, 22 Feb 2009 20:12:34 +0000
State-Changed-Why:
fixed?
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40361 CVS commit: src/sys
Date: Sun, 22 Feb 2009 20:10:25 +0000 (UTC)
Module Name: src
Committed By: ad
Date: Sun Feb 22 20:10:25 UTC 2009
Modified Files:
src/sys/kern: vfs_wapbl.c
src/sys/miscfs/syncfs: sync_subr.c sync_vnops.c
src/sys/ufs/ffs: ffs_alloc.c ffs_vfsops.c ffs_vnops.c
Log Message:
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
To generate a diff of this commit:
cvs rdiff -r1.22 -r1.23 src/sys/kern/vfs_wapbl.c
cvs rdiff -r1.35 -r1.36 src/sys/miscfs/syncfs/sync_subr.c
cvs rdiff -r1.25 -r1.26 src/sys/miscfs/syncfs/sync_vnops.c
cvs rdiff -r1.120 -r1.121 src/sys/ufs/ffs/ffs_alloc.c
cvs rdiff -r1.241 -r1.242 src/sys/ufs/ffs/ffs_vfsops.c
cvs rdiff -r1.109 -r1.110 src/sys/ufs/ffs/ffs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: feedback->closed
State-Changed-By: oster@NetBSD.org
State-Changed-When: Mon, 23 Feb 2009 02:06:20 +0000
State-Changed-Why:
Confirmed fixed in both test cases. Thanks!
From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40361 CVS commit: [netbsd-5] src/sys
Date: Tue, 24 Feb 2009 04:13:35 +0000 (UTC)
Module Name: src
Committed By: snj
Date: Tue Feb 24 04:13:35 UTC 2009
Modified Files:
src/sys/kern [netbsd-5]: vfs_wapbl.c
src/sys/miscfs/syncfs [netbsd-5]: sync_subr.c sync_vnops.c
src/sys/ufs/ffs [netbsd-5]: ffs_alloc.c ffs_vfsops.c ffs_vnops.c
Log Message:
Pull up following revision(s) (requested by ad in ticket #490):
sys/kern/vfs_wapbl.c: revision 1.23
sys/miscfs/syncfs/sync_subr.c: revision 1.36
sys/miscfs/syncfs/sync_vnops.c: revision 1.26
sys/ufs/ffs/ffs_alloc.c: revision 1.121
sys/ufs/ffs/ffs_vfsops.c: revision 1.242
sys/ufs/ffs/ffs_vnops.c: revision 1.110
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
To generate a diff of this commit:
cvs rdiff -r1.3 -r1.3.8.1 src/sys/kern/vfs_wapbl.c
cvs rdiff -r1.34 -r1.34.20.1 src/sys/miscfs/syncfs/sync_subr.c
cvs rdiff -r1.25 -r1.25.10.1 src/sys/miscfs/syncfs/sync_vnops.c
cvs rdiff -r1.113 -r1.113.4.1 src/sys/ufs/ffs/ffs_alloc.c
cvs rdiff -r1.239 -r1.239.2.1 src/sys/ufs/ffs/ffs_vfsops.c
cvs rdiff -r1.104.4.5 -r1.104.4.6 src/sys/ufs/ffs/ffs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.