NetBSD Problem Report #40948

From njoly@lanfeust.sis.pasteur.fr  Tue Mar  3 20:50:07 2009
Return-Path: <njoly@lanfeust.sis.pasteur.fr>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 549AE63BD3B
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  3 Mar 2009 20:50:07 +0000 (UTC)
Message-Id: <20090303205005.6DE4CDC9B9@lanfeust.sis.pasteur.fr>
Date: Tue,  3 Mar 2009 21:50:05 +0100 (CET)
From: njoly@pasteur.fr
Reply-To: njoly@pasteur.fr
To: gnats-bugs@gnats.NetBSD.org
Subject: ffs+log lock error panic
X-Send-Pr-Version: 3.95

>Number:         40948
>Category:       kern
>Synopsis:       ffs+log lock error panic
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Mar 03 20:55:00 +0000 2009
>Closed-Date:    Fri Apr 03 14:59:51 +0000 2009
>Last-Modified:  Mon Apr 06 14:10:03 +0000 2009
>Originator:     Nicolas Joly
>Release:        NetBSD 5.99.7
>Organization:
Institut Pasteur
>Environment:
System: NetBSD lanfeust.sis.pasteur.fr 5.99.7 NetBSD 5.99.7 (LANFEUST) #0: Tue Mar 3 12:23:05 CET 2009 njoly@lanfeust.sis.pasteur.fr:/local/src/NetBSD/obj.amd64/sys/arch/amd64/compile/LANFEUST amd64
Architecture: x86_64
Machine: amd64
>Description:
I recently switched to ffs+log ffs+softdep filesystems from now unsupported
ffs+softdep. Since then i encountered some lock error panics while running
compat linux binaries from the Linux Testsuite Project (LTP). I was able
to reproduce it with the following code where 2 unprivilegied processes try
to concurrently access to the same file.

#include <sys/types.h>

#include <err.h>
#include <fcntl.h>
#include <signal.h>
#include <unistd.h>

static void handler(int sig) { return; }

int main() {
  int res, fd;
  pid_t kid1, kid2;
  sigset_t set;
  struct sigaction sa;

  sigemptyset(&set);
  sa.sa_handler = handler;
  sa.sa_mask = set;
  sa.sa_flags = 0;
  res = sigaction(SIGALRM, &sa, NULL);
  if (res == -1)
    err(1, "sigaction failed");

  kid1 = fork();
  if (kid1 == -1)
    err(1, "fork failed");
  if (kid1 == 0) {
    while (1) {
      fd = open("rename.test1", O_WRONLY|O_CREAT|O_TRUNC, 0666);
      unlink("rename.test1");
      close(fd); }
    return 0; }

  kid2 = fork();
  if (kid2 == -1)
    err(1, "fork failed");
  if (kid2 == 0) {
    while (1) {
      rename("rename.test1", "rename.test2"); }
    return 0; }

  alarm(10);
  pause();

  kill(kid1, SIGTERM);
  kill(kid2, SIGTERM);
  waitpid(-1, NULL, 0);
  waitpid(-1, NULL, 0);

  return 0; }


panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80238315 cs 8 rflags 246 cr2  7f7ffdd20004 cpl 0 rsp ffff800049be4730
Stopped in pid 363.1 (rename) at        netbsd:breakpoint+0x5:  leave
db{1}> mach cpu 0
using CPU 0
db{1}> bt
rw_vector_enter() at netbsd:rw_vector_enter+0x148
vlockmgr() at netbsd:vlockmgr+0xf6
VOP_LOCK() at netbsd:VOP_LOCK+0x64
vn_lock() at netbsd:vn_lock+0xd9
namei() at netbsd:namei+0x174
vn_open() at netbsd:vn_open+0x9e
sys_open() at netbsd:sys_open+0xeb
syscall() at netbsd:syscall+0xb6
db{1}> mach cpu 1
using CPU 1
db{1}> bt
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x249
lockdebug_abort() at netbsd:lockdebug_abort+0x42
rw_vector_enter() at netbsd:rw_vector_enter+0x2ea
vlockmgr() at netbsd:vlockmgr+0xf6
VOP_LOCK() at netbsd:VOP_LOCK+0x64
vn_lock() at netbsd:vn_lock+0xd9
namei() at netbsd:namei+0x174
do_sys_rename() at netbsd:do_sys_rename+0x6f
syscall() at netbsd:syscall+0xb6

Unfortunately, a LOCKDEBUG kernel does not give me any useful insight :

panic: LOCKDEBUG
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80238315 cs 8 rflags 246 cr2  7f7ffdd20004 cpl 0
rsp ffff800049cd3ae0
Stopped in pid 504.1 (rename) at        netbsd:breakpoint+0x5:  leave
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x249
lockdebug_abort1() at netbsd:lockdebug_abort1+0xd3
syscall() at netbsd:syscall+0x12f
db{0}> mach cpu 1
using CPU 1
db{0}> bt
x86_stihlt() at netbsd:x86_stihlt+0x6
idle_loop() at netbsd:idle_loop+0x198
Bad frame pointer: 0xffff80004745aba0

>How-To-Repeat:
Run the provided testcase on a ffs+log filesystem.
>Fix:
n/a

>Release-Note:

>Audit-Trail:
From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/40948
Date: Wed, 1 Apr 2009 19:44:19 +0300

 this seems to be an "easy" locking error.  i'll look deeper soon (tomorrow
 or so).

From: Antti Kantee <pooka@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/40948 CVS commit: src/sys/ufs/ufs
Date: Thu, 2 Apr 2009 11:33:04 +0000

 Module Name:	src
 Committed By:	pooka
 Date:		Thu Apr  2 11:33:04 UTC 2009

 Modified Files:
 	src/sys/ufs/ufs: ufs_wapbl.c

 Log Message:
 Release tdvp in an appropriate VOP_RENAME error branch to avoid
 panic described in PR kern/40948.

 As usual, all the error branches in rename live based on an unholy
 amalgamation of prayer and the blood of cute, furry and tasty
 quadrupeds, so I won't even attempt to audit the rest.

 And this wapbl rename really really needs to be merged with the
 standard rename.  That should be a fun PhD thesis topic ....


 To generate a diff of this commit:
 cvs rdiff -u -r1.5 -r1.6 src/sys/ufs/ufs/ufs_wapbl.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: pooka@NetBSD.org
State-Changed-When: Thu, 02 Apr 2009 14:36:08 +0300
State-Changed-Why:
try rev 1.6 of ufs_wapbl.c


From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/40948
Date: Thu, 2 Apr 2009 14:46:17 +0300

 --J2SCkAp4GZ/dPZZf
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 Attached is a program which makes repeating the problem a little easier.
 You can repeat it with a rump_ffs mount too, but I was quickly reminded
 that I should fix kern/41051 first ;)

 compile with:
 cc -Wall crash.c -lukfs -lrumpfs_ffs -lrumpfs_ufs -lrumpvfs -lrump -lrumpuser -lpthread

 --J2SCkAp4GZ/dPZZf
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="crash.c"

 #include <sys/types.h>
 #include <sys/mount.h>

 #include <err.h>
 #include <errno.h>
 #include <fcntl.h>
 #include <pthread.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <string.h>

 #include <rump/rump.h>
 #include <rump/rump_syscalls.h>
 #include <rump/ukfs.h>

 #include <ufs/ufs/ufsmount.h>

 #define IMAGE "/home/pooka/img/ffs4.img"

 void *
 w1(void *arg)
 {
   int fd;

   for (;;) {
     fd = rump_sys_open("/rename.test1", O_WRONLY|O_CREAT|O_TRUNC, 0666);
     rump_sys_unlink("/rename.test1");
     rump_sys_close(fd);
   }
   return NULL;
 }

 int main() {
   struct ufs_args args;
   struct ukfs *fs;
   pthread_t pt;
   int fail = 0, succ = 0;

   memset(&args, 0, sizeof(args));
   args.fspec = IMAGE;

   ukfs_init();
   fs = ukfs_mount(MOUNT_FFS, IMAGE, UKFS_DEFAULTMP, MNT_LOG,&args,sizeof(args));
   if (fs == NULL)
     err(1, "ukfs_mount");

   pthread_create(&pt, NULL, w1, fs);

   while (1) {
     int rv;
     rv = rump_sys_rename("/rename.test1", "/rename.test2");
     if (rv == 0) {
       if (succ++ % 10000 == 0)
         printf("success\n");
     } else {
       if (fail++ % 10000 == 0)
         printf("fail\n");
     }
   }

   return 0; }

 --J2SCkAp4GZ/dPZZf--

From: Nicolas Joly <njoly@pasteur.fr>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
	gnats-admin@netbsd.org, pooka@NetBSD.org, njoly@pasteur.fr
Subject: Re: kern/40948 (ffs+log lock error panic)
Date: Thu, 2 Apr 2009 15:01:18 +0200

 On Thu, Apr 02, 2009 at 11:36:09AM +0000, Antti Kantee wrote:
 > Synopsis: ffs+log lock error panic
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: pooka@NetBSD.org
 > State-Changed-When: Thu, 02 Apr 2009 14:36:08 +0300
 > State-Changed-Why:
 > try rev 1.6 of ufs_wapbl.c

 That fixed it, thanks.

 Both testcase and original linux binary from LTP project works ... on
 wapbl ffs.

 But since i switched /tmp to tmpfs recently, i was bitten there too
 :-(

 njoly@lanfeust [~]> cd /tmp 
 njoly@lanfeust [/tmp]> df -h .
 Filesystem        Size       Used      Avail %Cap Mounted on
 tmpfs              12G       4.0K        12G   0% /tmp
 njoly@lanfeust [/tmp]> ~/emul/netbsd/todo/rename 
 [...PANIC...]

 panic: kernel diagnostic assertion "de->td_node == fnode" failed: file "/local/src/NetBSD/src/sys/fs/tmpfs/tmpfs_vnops.c", line 832
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 rip ffffffff80238765 cs 8 rflags 246 cr2  7f7ffdd201c4 cpl 0 rsp ffff800049b87880
 Stopped in pid 600.1 (rename) at        netbsd:breakpoint+0x5:  leave
 db{0}> mach cpu 0
 using CPU 0
 db{0}> bt
 breakpoint() at netbsd:breakpoint+0x5
 panic() at netbsd:panic+0x289
 __kernassert() at netbsd:__kernassert+0x2d
 tmpfs_rename() at netbsd:tmpfs_rename+0xa1c
 VOP_RENAME() at netbsd:VOP_RENAME+0x75
 do_sys_rename() at netbsd:do_sys_rename+0x59a
 syscall() at netbsd:syscall+0xc2
 db{0}> mach cpu 1
 using CPU 1
 db{0}> bt
 x86_stihlt() at netbsd:x86_stihlt+0x6
 idle_loop() at netbsd:idle_loop+0x18e
 Bad frame pointer: 0xffff80004745fba0

 Same testcase, under same conditions but on a tmpfs mount. Do you want
 to have a look at it or should i submit another PR ?

 -- 
 Nicolas Joly

 Biological Software and Databanks.
 Institut Pasteur, Paris.

State-Changed-From-To: feedback->closed
State-Changed-By: pooka@NetBSD.org
State-Changed-When: Fri, 03 Apr 2009 17:59:51 +0300
State-Changed-Why:
confirmed fixed


From: Antti Kantee <pooka@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/40948 CVS commit: src/sys/ufs/ufs
Date: Mon, 6 Apr 2009 14:09:57 +0000

 Module Name:	src
 Committed By:	pooka
 Date:		Mon Apr  6 14:09:57 UTC 2009

 Modified Files:
 	src/sys/ufs/ufs: ufs_wapbl.c

 Log Message:
 Fix reference leak in fix for PR kern/40948.
 Pointed out by David Holland.


 To generate a diff of this commit:
 cvs rdiff -u -r1.6 -r1.7 src/sys/ufs/ufs/ufs_wapbl.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.