NetBSD Problem Report #13666

Received: (qmail 1257 invoked from network); 9 Aug 2001 06:36:59 -0000
Message-Id: <200108090641.f796f6416843@chuq.com>
Date: Wed, 8 Aug 2001 23:41:06 -0700 (PDT)
From: Chuck Silvers <chuq@chuq.com>
Reply-To: Chuck Silvers <chuq@chuq.com>
To: gnats-bugs@gnats.netbsd.org
Subject: NFS loopback mount deadlock
X-Send-Pr-Version: 3.95

>Number:         13666
>Category:       kern
>Synopsis:       NFS loopback mount deadlock
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Aug 09 06:37:00 +0000 2001
>Closed-Date:    
>Last-Modified:  Fri Nov 03 16:15:01 +0000 2006
>Originator:     Chuck Silvers
>Release:        NetBSD-current Sat Aug  4 00:24:07 PDT 2001
>Organization:
me
>Environment:
System: NetBSD spathi.chuq.com 1.5T NetBSD 1.5T (SPATHI) #1: Mon Mar 26 23:51:38 PST 2001 chs@spathi.chuq.com:/home/chs/netbsd/src/ubc.spathi/sys/arch/i386/compile/SPATHI i386



>Description:
	removing a device node via NFS when the client and the server
	are the same machine causes a deadlock.


>How-To-Repeat:
8 pc:~ # mount pc:/usr/chs/netbsd /build
9 pc:~ # ls -l /usr/chs/netbsd/destdir/dev/rsd0a
crw-r-----  1 root  operator  13, 0 Aug  6 07:12 /usr/chs/netbsd/destdir/dev/rsd0a
10 pc:~ # rm /build/destdir/dev/rsd0a
<hang>

db> t/t 0t136
trace: pid 136 at 0xcaf918dc
bpendtsleep(cb05f4f4,14,c02c19c0,0,cb05f4f4) at bpendtsleep
lockmgr(cb05f4f4,10002,cb05f488,caf91970,c018070b) at lockmgr+0x522
genfs_lock(caf91964) at genfs_lock+0x16
vn_lock(cb05f488,10002,caff9ecc,cb05f488,caf919b8) at vn_lock+0x6f
vget(cb05f488,10002) at vget+0xa1
checkalias(cb05f520,d00,c066bc00,caff9ecc,c066bc00) at checkalias+0x7c
ufs_vinit(c066bc00,c05ec100,c05ec000,caf91a2c,c36bff08) at ufs_vinit+0x80
ffs_vget(c066bc00,1d2e,caf91abc,caf91d90,cade8000) at ffs_vget+0x21f
ufs_lookup(caf91b2c,caf91d6c,caf91d90,c06b5440,100) at ufs_lookup+0x914
lookup(caf91d6c,0,5,caf91c34,0) at lookup+0x26b
nfs_namei(caf91d6c,caf91c64,5,c05eca00,c06b5900) at nfs_namei+0x620
nfsrv_remove(c06cb000,c05eca00,cadf71cc,caf91e08,0) at nfsrv_remove+0x434
nfssvc_nfsd(caf91e68,804b3a0,cadf71cc,caf91f78,cadf71cc) at nfssvc_nfsd+0x533
sys_nfssvc(cadf71cc,caf91f80,caf91f78) at sys_nfssvc+0x6d3
syscall_plain(bfbf001f,1f,bfbf001f,1f,bfbfdce4) at syscall_plain+0x98
db> t/t 0t233
trace: pid 233 at 0xcb05bbf0
bpendtsleep(c06d5a8c,18,c02c1580,0,0) at bpendtsleep
sbwait(c06d5a8c,0,c06d5a48,c06f0240,0) at sbwait+0x33
soreceive(c06d5a48,cb05bd1c,cb05bccc,cb05bd20,0) at soreceive+0x2b2
nfs_receive(c06f0240,cb05bd1c,cb05bd20,c06b5700,c06f0240) at nfs_receive+0x432
nfs_reply(c06f0240,c06b5000,c06b5038,c06b5000,ca3b9020) at nfs_reply+0x52
nfs_request(cb05f2c0,c06b5000,c,cafc739c,c0666500) at nfs_request+0x3db
nfs_removerpc(cb05f2c0,cade8813,5,c0666500,cafc739c) at nfs_removerpc+0x5c3
nfs_remove(cb05bef0,cb05bf78,cafc739c,c032c9f4,cb05bef0) at nfs_remove+0xde
sys_unlink(cafc739c,cb05bf80,cb05bf78) at sys_unlink+0x12c
syscall_plain(1f,1f,1f,1f,bfbfd918) at syscall_plain+0x98



>Fix:
	not provided.
>Release-Note:
>Audit-Trail:
From: "Julio M. Merino Vidal" <jmmv84@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/13666
Date: Fri, 3 Nov 2006 17:10:47 +0100

 I've just found more problems with respect to using special files over
 a NFS loopback mount.  Consider the following test case:

 # mount localhost:/foo /mnt/remote
 # cd /mnt/remote
 # mkfifo zero c 2 12
 # dd if=zero of=test bs=1k count=1
 < file system stalled >

 The above 'dd' command gets locked in the close(2) call; I've
 diagnosed this by writing a very simple test case that opens a device,
 reads from it and then closes it.  Note that the lockup does not
 happen if there is no read in between the open and close (more below).

 The problem is that nfsspec_close is calling VOP_SETATTR(vp, ...) on
 the underlying file system with vp *locked*.  Then, when the remote
 file system (running on the same machine!) tries to allocate a new
 vnode for the special file, it calls checkalias, and that routine
 locks up when scanning the special files vnodes list because one of
 them is locked.  The lockup happens in vfs_subr.c:1113 v1.275.

 I've tried commenting out the VOP_SETATTR call and that fixes this
 specific problem.  (This is why the lock up does not happen when not
 reading from the device before close, because the access time does not
 need to be updated.)  But then the server locks up again when removing
 the special file.

 Similar problems arise when dealing with fifos.

 How to properly fix this, I don't know.  But the issue seems to be
 quite serious.

 (For future reference, kern/8151 and kern/30401 report similar issues.)

 -- 
 Julio M. Merino Vidal <jmmv84@gmail.com>
 The Julipedia - http://julipedia.blogspot.com/

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.