NetBSD Problem Report #42093

From njoly@lanfeust.sis.pasteur.fr  Sat Sep 19 09:18:24 2009
Return-Path: <njoly@lanfeust.sis.pasteur.fr>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 085BD63BC1D
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 19 Sep 2009 09:18:24 +0000 (UTC)
Message-Id: <20090919091820.F04EFDC9B9@lanfeust.sis.pasteur.fr>
Date: Sat, 19 Sep 2009 11:18:20 +0200 (CEST)
From: njoly@pasteur.fr
Reply-To: njoly@pasteur.fr
To: gnats-bugs@gnats.NetBSD.org
Subject: rump rename race testcase crashes
X-Send-Pr-Version: 3.95

>Number:         42093
>Category:       kern
>Synopsis:       rump rename race testcase crashes
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Sep 19 09:20:00 +0000 2009
>Closed-Date:    Mon Jul 18 15:56:16 +0000 2011
>Last-Modified:  Mon Jul 18 15:56:16 +0000 2011
>Originator:     Nicolas Joly
>Release:        NetBSD 5.99.18
>Organization:
Insitut Pasteur
>Environment:
System: NetBSD lanfeust.sis.pasteur.fr 5.99.18 NetBSD 5.99.18 (LANFEUST) #0: Sat Sep 19 01:00:25 CEST 2009 njoly@lanfeust.sis.pasteur.fr:/local/src/NetBSD/obj.amd64/sys/arch/amd64/compile/LANFEUST amd64
Architecture: x86_64
Machine: amd64
>Description:
Noticed that ffs/t_renamerace ATF testcase sometimes crash. Trying to debug it,
i made the following rump small code which do crash too.

The crashes do only appear when logging is enabled (MNT_LOG), and are not so uncommom.
On 10 runs; i get 6 success, 3 crashes with `Resource deadlock avoided' and 1
with `segmentation fault'.

njoly@lanfeust [rump/ffs]> make ffs_rename
cc -g -O0 -Wall -Werror   -lrumpfs_ffs -lrumpvfs -o ffs_rename ffs_rename.c 

I currently do see 2 different symptoms when running it:

1) 
njoly@lanfeust [rump/ffs]> ./ffs_rename 
panic: rumpuser fatal failure 11 (Resource deadlock avoided)
zsh: abort (core dumped)  ./ffs_rename
njoly@lanfeust [rump/ffs]> gdb ffs_rename ffs_rename.core 
GNU gdb 6.5
[...]
Core was generated by `ffs_rename'.
Program terminated with signal 6, Aborted.
#0  0x00007f7ffd7e0c0a in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0  0x00007f7ffd7e0c0a in _lwp_kill () from /usr/lib/libc.so.12
#1  0x00007f7ffd7e057e in abort () at /local/src/NetBSD/src/lib/libc/stdlib/abort.c:74
#2  0x00007f7ffd103662 in rumpuser_rw_enter (rw=0x7f7ffb2011a0, iswrite=<value optimized out>)
    at /local/src/NetBSD/src/lib/librumpuser/../../sys/rump/librump/rumpuser/rumpuser_pth.c:353
#3  0x00007f7ffda2ab6a in vlockmgr (vl=0x7f7ffb20f2a8, flags=0)
    at /local/src/NetBSD/src/lib/librumpvfs/../../sys/rump/../kern/vfs_subr.c:2832
#4  0x00007f7ffd538d29 in VOP_LOCK (vp=0x7f7ffb20f1a0, flags=6)
    at /local/src/NetBSD/src/lib/librump/../../sys/rump/../kern/vnode_if.c:1380
#5  0x00007f7ffda24c38 in vn_lock (vp=0x7f7ffb20f1a0, flags=2)
    at /local/src/NetBSD/src/lib/librumpvfs/../../sys/rump/../kern/vfs_vnops.c:765
#6  0x00007f7ffda2ec4c in vget (vp=0x7f7ffb20f1a0, flags=65538)
    at /local/src/NetBSD/src/lib/librumpvfs/../../sys/rump/../kern/vfs_subr.c:1310
#7  0x00007f7ffdc26bee in ufs_ihashget (dev=65024, inum=5, flags=2)
    at /local/src/NetBSD/src/sys/rump/fs/lib/libffs/../../../../ufs/ufs/ufs_ihash.c:153
#8  0x00007f7ffdc0eeaf in ffs_vget (mp=0x7f7ffc6b3000, ino=5, vpp=0x7f7ffb7ff8b0)
    at /local/src/NetBSD/src/sys/rump/fs/lib/libffs/../../../../ufs/ffs/ffs_vfsops.c:1717
#9  0x00007f7ffdc0c07a in ufs_lookup (v=<value optimized out>)
    at /local/src/NetBSD/src/sys/rump/fs/lib/libffs/../../../../ufs/ufs/ufs_lookup.c:541
#10 0x00007f7ffd539c71 in VOP_LOOKUP (dvp=0x7f7ffc6ad400, vpp=0x6, cnp=<value optimized out>)
    at /local/src/NetBSD/src/lib/librump/../../sys/rump/../kern/vnode_if.c:130
#11 0x00007f7ffdc17599 in wapbl_ufs_rename (v=<value optimized out>)
    at /local/src/NetBSD/src/sys/rump/fs/lib/libffs/../../../../ufs/ufs/ufs_wapbl.c:358
#12 0x00007f7ffdc14403 in ufs_rename (v=<value optimized out>)
    at /local/src/NetBSD/src/sys/rump/fs/lib/libffs/../../../../ufs/ufs/ufs_vnops.c:976
#13 0x00007f7ffd539074 in VOP_RENAME (fdvp=0x7f7ffc6ad400, fvp=0x6, fcnp=<value optimized out>, 
    tdvp=0x7f7ffd7e0c0a, tvp=0x1, tcnp=0x8080808080808080)
    at /local/src/NetBSD/src/lib/librump/../../sys/rump/../kern/vnode_if.c:998
#14 0x00007f7ffda25ef6 in do_sys_rename (from=<value optimized out>, to=0x4011ba "/mnt/rename2.test", 
    seg=UIO_USERSPACE, retain=0)
    at /local/src/NetBSD/src/lib/librumpvfs/../../sys/rump/../kern/vfs_syscalls.c:3395
#15 0x00007f7ffd552a94 in rump_sys_rename (from=<value optimized out>, to=<value optimized out>)
    at /local/src/NetBSD/src/lib/librump/../../sys/rump/librump/rumpkern/rump_syscalls.c:878
#16 0x0000000000400e86 in func2 (arg=0x0) at ffs_rename.c:37
#17 0x00007f7ffd30a812 in pthread__create_tramp (cookie=<value optimized out>)
    at /local/src/NetBSD/src/lib/libpthread/pthread.c:476
#18 0x00007f7ffd76e690 in ___lwp_park50 () from /usr/lib/libc.so.12
#19 0x00007f7ffb800000 in ?? ()
#20 0x0000000111110001 in ?? ()
#21 0x0000000033330003 in ?? ()
#22 0x0000000000000000 in ?? ()

2)
njoly@lanfeust [rump/ffs]> ./ffs_rename
zsh: segmentation fault (core dumped)  ./ffs_rename
njoly@lanfeust [rump/ffs]> gdb ffs_rename ffs_rename.core
GNU gdb 6.5
[...]
Core was generated by `ffs_rename'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f7ffdc0b198 in ufs_makedirentry (ip=0x7f7ffb416100, 
    cnp=0x7f7ffb7ffb70, newdirp=0x7f7ffb51a040)
    at /local/src/NetBSD/src/sys/rump/fs/lib/libffs/../../../../ufs/ufs/ufs_lookup.c:735
735             if (FSFMT(ITOV(ip)))
(gdb) bt
#0  0x00007f7ffdc0b198 in ufs_makedirentry (ip=0x7f7ffb416100, 
    cnp=0x7f7ffb7ffb70, newdirp=0x7f7ffb51a040)
    at /local/src/NetBSD/src/sys/rump/fs/lib/libffs/../../../../ufs/ufs/ufs_lookup.c:735
#1  0x00007f7ffdc17f3b in wapbl_ufs_rename (v=<value optimized out>)
    at /local/src/NetBSD/src/sys/rump/fs/lib/libffs/../../../../ufs/ufs/ufs_wapbl.c:443
#2  0x00007f7ffdc14403 in ufs_rename (v=<value optimized out>)
    at /local/src/NetBSD/src/sys/rump/fs/lib/libffs/../../../../ufs/ufs/ufs_vnops.c:976
#3  0x00007f7ffd539074 in VOP_RENAME (fdvp=0x7f7ffc6ad400, fvp=0x7f7ffb501811, 
    fcnp=<value optimized out>, tdvp=0x0, tvp=0x4, tcnp=0xe2)
    at /local/src/NetBSD/src/lib/librump/../../sys/rump/../kern/vnode_if.c:998
#4  0x00007f7ffda25ef6 in do_sys_rename (from=<value optimized out>, 
    to=0x4011ba "/mnt/rename2.test", seg=UIO_USERSPACE, retain=0)
    at /local/src/NetBSD/src/lib/librumpvfs/../../sys/rump/../kern/vfs_syscalls.c:3395
#5  0x00007f7ffd552a94 in rump_sys_rename (from=<value optimized out>, 
    to=<value optimized out>)
    at /local/src/NetBSD/src/lib/librump/../../sys/rump/librump/rumpkern/rump_syscalls.c:878
#6  0x0000000000400e86 in func2 (arg=0x0) at ffs_rename.c:37
#7  0x00007f7ffd30a812 in pthread__create_tramp (cookie=<value optimized out>)
    at /local/src/NetBSD/src/lib/libpthread/pthread.c:476
#8  0x00007f7ffd76e690 in ___lwp_park50 () from /usr/lib/libc.so.12
#9  0x00007f7ffb800000 in ?? ()
#10 0x0000000111110001 in ?? ()
#11 0x0000000033330003 in ?? ()
#12 0x0000000000000000 in ()
(gdb) p ip
$1 = (struct inode *) 0x7f7ffb416100
(gdb) p ip->i_vnode
$2 = (struct vnode *) 0x7f7ffb40f070
(gdb) p ip->i_vnode->v_mount
$3 = (struct mount *) 0x0
(gdb) p ip->i_vnode->v_mount->mnt_iflag
Cannot access memory at address 0x64


njoly@lanfeust [rump/ffs]> cat ffs_rename.c

#include <rump/rump_syscalls.h>
#include <rump/rump.h>

#include <ufs/ufs/ufsmount.h>

#include <err.h>
#include <fcntl.h>
#include <pthread.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define IMAGE "image.ffs"
#define DEVICE "/dev/device.ffs"
#define MNTDIR "/mnt"
#define FILE1 MNTDIR "/rename1.test"
#define FILE2 MNTDIR "/rename2.test"


static int quit = 0;


static void *func1(void *arg) {
  int fd;

  while (quit != 1) {
    fd = rump_sys_open(FILE1, O_WRONLY|O_CREAT|O_TRUNC, 0666);
    rump_sys_unlink(FILE1);
    rump_sys_close(fd); }

  return NULL; }

static void *func2(void *arg) {

  while (quit != 1) {
    rump_sys_rename(FILE1, FILE2); }

  return NULL; }


int main() {
  int res;
  pthread_t th1, th2;
  struct ufs_args args;

  (void)unlink(IMAGE);
  res = system("newfs -F -s 10000 " IMAGE " >/dev/null");
  if (res == -1)
    err(1, "system failed");

  res = rump_init();
  if (res != 0)
    err(1, "rump_init failed");

  res = rump_etfs_register(DEVICE, IMAGE, RUMP_ETFS_BLK);
  if (res != 0)
    err(1, "rump_etfs_register failed");

  res = rump_sys_mkdir(MNTDIR, 0777);
  if (res != 0)
    err(1, "rump_sys_mkdir failed");

  memset(&args, 0x0, sizeof(args));
  args.fspec = DEVICE;

  res = rump_sys_mount(MOUNT_FFS, MNTDIR, MNT_LOG, &args, sizeof(args));
  if (res == -1)
    err(1, "rump_sys_mount failed");

  res = pthread_create(&th1, NULL, func1, NULL);
  if (res != 0)
    errx(1, "pthread_create failed");
  res = pthread_create(&th2, NULL, func2, NULL);
  if (res != 0)
    errx(1, "pthread_create failed");

  sleep(10);
  quit = 1;

  res = pthread_join(th2, NULL);
  if (res != 0)
    errx(1, "pthread_join failed");
  res = pthread_join(th1, NULL);
  if (res != 0)
    errx(1, "pthread_join failed");

  res = rump_sys_unmount(MNTDIR, 0);
  if (res == -1)
    err(1, "rump_sys_unmount failed");

  res = rump_etfs_remove(DEVICE);
  if (res == -1)
    err(1, "rump_etfs_remove failed");

  res = unlink(IMAGE);
  if (res == -1)
    err(1, "unlink failed");

  return 0; }

>How-To-Repeat:
Compile the provided testcase, run it multiple times and see it fail.
>Fix:

>Release-Note:

>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: lib-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: lib/42093: rump rename race testcase crashes
Date: Sat, 19 Sep 2009 23:17:36 +0200

 On Sat, Sep 19, 2009 at 09:20:01AM +0000, njoly@pasteur.fr wrote:
 > System: NetBSD lanfeust.sis.pasteur.fr 5.99.18 NetBSD 5.99.18 (LANFEUST) #0: Sat Sep 19 01:00:25 CEST 2009 njoly@lanfeust.sis.pasteur.fr:/local/src/NetBSD/obj.amd64/sys/arch/amd64/compile/LANFEUST amd64
 > Architecture: x86_64
 > Machine: amd64
 > >Description:
 > Noticed that ffs/t_renamerace ATF testcase sometimes crash. Trying to debug it,
 > i made the following rump small code which do crash too.
 > 
 > The crashes do only appear when logging is enabled (MNT_LOG), and are not so uncommom.
 > On 10 runs; i get 6 success, 3 crashes with `Resource deadlock avoided' and 1
 > with `segmentation fault'.

 Could it be related to kern/41417, which have been investigated to be
 a deadlock in WAPBL's copy of rename ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/42093
Date: Sun, 20 Sep 2009 11:58:50 +0300

 It might or might not be related to kern/41417.  I think there are
 actually two bugs here per the two different failure modes reported:

   a) an error branch locking error, where a vnode is not unlocked
      properly
   b) a reference couting bug which causes use-after-free
      (the test case vnodes are recycled immediately due to them
      having a 0 link count)

 There might one or more cases of "a", and 41417 might be one of them.

 [i changed the PR from lib to kern]

From: "David A. Holland" <dholland@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/42093 CVS commit: src/sys/ufs/ufs
Date: Sun, 17 Jul 2011 22:07:59 +0000

 Module Name:	src
 Committed By:	dholland
 Date:		Sun Jul 17 22:07:59 UTC 2011

 Modified Files:
 	src/sys/ufs/ufs: ufs_extern.h ufs_lookup.c ufs_wapbl.c

 Log Message:
 Provide correct locking for ufs_wapbl_rename. Note that this does not
 fix the non-wapbl rename; that will be coming soon. This patch also
 leaves a lot of the older locking-related code around in #if 0 blocks,
 and there's a lot of leftover redundant logic. All that will be going
 away later.

 Relates to at least these PRs:

   PR kern/24887
   PR kern/41417
   PR kern/42093
   PR kern/43626

 and possibly others.


 To generate a diff of this commit:
 cvs rdiff -u -r1.65 -r1.66 src/sys/ufs/ufs/ufs_extern.h
 cvs rdiff -u -r1.110 -r1.111 src/sys/ufs/ufs/ufs_lookup.c
 cvs rdiff -u -r1.16 -r1.17 src/sys/ufs/ufs/ufs_wapbl.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 18 Jul 2011 06:48:05 +0000
State-Changed-Why:
fixed, please test


From: Nicolas Joly <njoly@pasteur.fr>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/42093 (rump rename race testcase crashes)
Date: Mon, 18 Jul 2011 17:19:02 +0200

 On Mon, Jul 18, 2011 at 06:48:09AM +0000, dholland@NetBSD.org wrote:
 > Synopsis: rump rename race testcase crashes
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: dholland@NetBSD.org
 > State-Changed-When: Mon, 18 Jul 2011 06:48:05 +0000
 > State-Changed-Why:
 > fixed, please test

 I tested it again, and cannot make it fail anymore (with or without
 wapbl). Works fine for more than 100s, where it died previously within
 a few seconds.

 Great job ! Thanks.

 -- 
 Nicolas Joly

 Projects and Developments in Bioinformatics
 Institut Pasteur, Paris.

State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 18 Jul 2011 15:56:16 +0000
State-Changed-Why:
confirmed, thanks :-)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.