NetBSD Problem Report #40948
From njoly@lanfeust.sis.pasteur.fr Tue Mar 3 20:50:07 2009
Return-Path: <njoly@lanfeust.sis.pasteur.fr>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 549AE63BD3B
for <gnats-bugs@gnats.NetBSD.org>; Tue, 3 Mar 2009 20:50:07 +0000 (UTC)
Message-Id: <20090303205005.6DE4CDC9B9@lanfeust.sis.pasteur.fr>
Date: Tue, 3 Mar 2009 21:50:05 +0100 (CET)
From: njoly@pasteur.fr
Reply-To: njoly@pasteur.fr
To: gnats-bugs@gnats.NetBSD.org
Subject: ffs+log lock error panic
X-Send-Pr-Version: 3.95
>Number: 40948
>Category: kern
>Synopsis: ffs+log lock error panic
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Mar 03 20:55:00 +0000 2009
>Closed-Date: Fri Apr 03 14:59:51 +0000 2009
>Last-Modified: Mon Apr 06 14:10:03 +0000 2009
>Originator: Nicolas Joly
>Release: NetBSD 5.99.7
>Organization:
Institut Pasteur
>Environment:
System: NetBSD lanfeust.sis.pasteur.fr 5.99.7 NetBSD 5.99.7 (LANFEUST) #0: Tue Mar 3 12:23:05 CET 2009 njoly@lanfeust.sis.pasteur.fr:/local/src/NetBSD/obj.amd64/sys/arch/amd64/compile/LANFEUST amd64
Architecture: x86_64
Machine: amd64
>Description:
I recently switched to ffs+log ffs+softdep filesystems from now unsupported
ffs+softdep. Since then i encountered some lock error panics while running
compat linux binaries from the Linux Testsuite Project (LTP). I was able
to reproduce it with the following code where 2 unprivilegied processes try
to concurrently access to the same file.
#include <sys/types.h>
#include <err.h>
#include <fcntl.h>
#include <signal.h>
#include <unistd.h>
static void handler(int sig) { return; }
int main() {
int res, fd;
pid_t kid1, kid2;
sigset_t set;
struct sigaction sa;
sigemptyset(&set);
sa.sa_handler = handler;
sa.sa_mask = set;
sa.sa_flags = 0;
res = sigaction(SIGALRM, &sa, NULL);
if (res == -1)
err(1, "sigaction failed");
kid1 = fork();
if (kid1 == -1)
err(1, "fork failed");
if (kid1 == 0) {
while (1) {
fd = open("rename.test1", O_WRONLY|O_CREAT|O_TRUNC, 0666);
unlink("rename.test1");
close(fd); }
return 0; }
kid2 = fork();
if (kid2 == -1)
err(1, "fork failed");
if (kid2 == 0) {
while (1) {
rename("rename.test1", "rename.test2"); }
return 0; }
alarm(10);
pause();
kill(kid1, SIGTERM);
kill(kid2, SIGTERM);
waitpid(-1, NULL, 0);
waitpid(-1, NULL, 0);
return 0; }
panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80238315 cs 8 rflags 246 cr2 7f7ffdd20004 cpl 0 rsp ffff800049be4730
Stopped in pid 363.1 (rename) at netbsd:breakpoint+0x5: leave
db{1}> mach cpu 0
using CPU 0
db{1}> bt
rw_vector_enter() at netbsd:rw_vector_enter+0x148
vlockmgr() at netbsd:vlockmgr+0xf6
VOP_LOCK() at netbsd:VOP_LOCK+0x64
vn_lock() at netbsd:vn_lock+0xd9
namei() at netbsd:namei+0x174
vn_open() at netbsd:vn_open+0x9e
sys_open() at netbsd:sys_open+0xeb
syscall() at netbsd:syscall+0xb6
db{1}> mach cpu 1
using CPU 1
db{1}> bt
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x249
lockdebug_abort() at netbsd:lockdebug_abort+0x42
rw_vector_enter() at netbsd:rw_vector_enter+0x2ea
vlockmgr() at netbsd:vlockmgr+0xf6
VOP_LOCK() at netbsd:VOP_LOCK+0x64
vn_lock() at netbsd:vn_lock+0xd9
namei() at netbsd:namei+0x174
do_sys_rename() at netbsd:do_sys_rename+0x6f
syscall() at netbsd:syscall+0xb6
Unfortunately, a LOCKDEBUG kernel does not give me any useful insight :
panic: LOCKDEBUG
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80238315 cs 8 rflags 246 cr2 7f7ffdd20004 cpl 0
rsp ffff800049cd3ae0
Stopped in pid 504.1 (rename) at netbsd:breakpoint+0x5: leave
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x249
lockdebug_abort1() at netbsd:lockdebug_abort1+0xd3
syscall() at netbsd:syscall+0x12f
db{0}> mach cpu 1
using CPU 1
db{0}> bt
x86_stihlt() at netbsd:x86_stihlt+0x6
idle_loop() at netbsd:idle_loop+0x198
Bad frame pointer: 0xffff80004745aba0
>How-To-Repeat:
Run the provided testcase on a ffs+log filesystem.
>Fix:
n/a
>Release-Note:
>Audit-Trail:
From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/40948
Date: Wed, 1 Apr 2009 19:44:19 +0300
this seems to be an "easy" locking error. i'll look deeper soon (tomorrow
or so).
From: Antti Kantee <pooka@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40948 CVS commit: src/sys/ufs/ufs
Date: Thu, 2 Apr 2009 11:33:04 +0000
Module Name: src
Committed By: pooka
Date: Thu Apr 2 11:33:04 UTC 2009
Modified Files:
src/sys/ufs/ufs: ufs_wapbl.c
Log Message:
Release tdvp in an appropriate VOP_RENAME error branch to avoid
panic described in PR kern/40948.
As usual, all the error branches in rename live based on an unholy
amalgamation of prayer and the blood of cute, furry and tasty
quadrupeds, so I won't even attempt to audit the rest.
And this wapbl rename really really needs to be merged with the
standard rename. That should be a fun PhD thesis topic ....
To generate a diff of this commit:
cvs rdiff -u -r1.5 -r1.6 src/sys/ufs/ufs/ufs_wapbl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: pooka@NetBSD.org
State-Changed-When: Thu, 02 Apr 2009 14:36:08 +0300
State-Changed-Why:
try rev 1.6 of ufs_wapbl.c
From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/40948
Date: Thu, 2 Apr 2009 14:46:17 +0300
--J2SCkAp4GZ/dPZZf
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Attached is a program which makes repeating the problem a little easier.
You can repeat it with a rump_ffs mount too, but I was quickly reminded
that I should fix kern/41051 first ;)
compile with:
cc -Wall crash.c -lukfs -lrumpfs_ffs -lrumpfs_ufs -lrumpvfs -lrump -lrumpuser -lpthread
--J2SCkAp4GZ/dPZZf
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="crash.c"
#include <sys/types.h>
#include <sys/mount.h>
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <rump/rump.h>
#include <rump/rump_syscalls.h>
#include <rump/ukfs.h>
#include <ufs/ufs/ufsmount.h>
#define IMAGE "/home/pooka/img/ffs4.img"
void *
w1(void *arg)
{
int fd;
for (;;) {
fd = rump_sys_open("/rename.test1", O_WRONLY|O_CREAT|O_TRUNC, 0666);
rump_sys_unlink("/rename.test1");
rump_sys_close(fd);
}
return NULL;
}
int main() {
struct ufs_args args;
struct ukfs *fs;
pthread_t pt;
int fail = 0, succ = 0;
memset(&args, 0, sizeof(args));
args.fspec = IMAGE;
ukfs_init();
fs = ukfs_mount(MOUNT_FFS, IMAGE, UKFS_DEFAULTMP, MNT_LOG,&args,sizeof(args));
if (fs == NULL)
err(1, "ukfs_mount");
pthread_create(&pt, NULL, w1, fs);
while (1) {
int rv;
rv = rump_sys_rename("/rename.test1", "/rename.test2");
if (rv == 0) {
if (succ++ % 10000 == 0)
printf("success\n");
} else {
if (fail++ % 10000 == 0)
printf("fail\n");
}
}
return 0; }
--J2SCkAp4GZ/dPZZf--
From: Nicolas Joly <njoly@pasteur.fr>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org, pooka@NetBSD.org, njoly@pasteur.fr
Subject: Re: kern/40948 (ffs+log lock error panic)
Date: Thu, 2 Apr 2009 15:01:18 +0200
On Thu, Apr 02, 2009 at 11:36:09AM +0000, Antti Kantee wrote:
> Synopsis: ffs+log lock error panic
>
> State-Changed-From-To: open->feedback
> State-Changed-By: pooka@NetBSD.org
> State-Changed-When: Thu, 02 Apr 2009 14:36:08 +0300
> State-Changed-Why:
> try rev 1.6 of ufs_wapbl.c
That fixed it, thanks.
Both testcase and original linux binary from LTP project works ... on
wapbl ffs.
But since i switched /tmp to tmpfs recently, i was bitten there too
:-(
njoly@lanfeust [~]> cd /tmp
njoly@lanfeust [/tmp]> df -h .
Filesystem Size Used Avail %Cap Mounted on
tmpfs 12G 4.0K 12G 0% /tmp
njoly@lanfeust [/tmp]> ~/emul/netbsd/todo/rename
[...PANIC...]
panic: kernel diagnostic assertion "de->td_node == fnode" failed: file "/local/src/NetBSD/src/sys/fs/tmpfs/tmpfs_vnops.c", line 832
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80238765 cs 8 rflags 246 cr2 7f7ffdd201c4 cpl 0 rsp ffff800049b87880
Stopped in pid 600.1 (rename) at netbsd:breakpoint+0x5: leave
db{0}> mach cpu 0
using CPU 0
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x289
__kernassert() at netbsd:__kernassert+0x2d
tmpfs_rename() at netbsd:tmpfs_rename+0xa1c
VOP_RENAME() at netbsd:VOP_RENAME+0x75
do_sys_rename() at netbsd:do_sys_rename+0x59a
syscall() at netbsd:syscall+0xc2
db{0}> mach cpu 1
using CPU 1
db{0}> bt
x86_stihlt() at netbsd:x86_stihlt+0x6
idle_loop() at netbsd:idle_loop+0x18e
Bad frame pointer: 0xffff80004745fba0
Same testcase, under same conditions but on a tmpfs mount. Do you want
to have a look at it or should i submit another PR ?
--
Nicolas Joly
Biological Software and Databanks.
Institut Pasteur, Paris.
State-Changed-From-To: feedback->closed
State-Changed-By: pooka@NetBSD.org
State-Changed-When: Fri, 03 Apr 2009 17:59:51 +0300
State-Changed-Why:
confirmed fixed
From: Antti Kantee <pooka@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40948 CVS commit: src/sys/ufs/ufs
Date: Mon, 6 Apr 2009 14:09:57 +0000
Module Name: src
Committed By: pooka
Date: Mon Apr 6 14:09:57 UTC 2009
Modified Files:
src/sys/ufs/ufs: ufs_wapbl.c
Log Message:
Fix reference leak in fix for PR kern/40948.
Pointed out by David Holland.
To generate a diff of this commit:
cvs rdiff -u -r1.6 -r1.7 src/sys/ufs/ufs/ufs_wapbl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.