NetBSD Problem Report #36572

From martin@aprisoft.de  Thu Jun 28 08:36:01 2007
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 7D97963B882
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 28 Jun 2007 08:36:01 +0000 (UTC)
Message-Id: <20070628083558.82479AF5824@emmas.aprisoft.de>
Date: Thu, 28 Jun 2007 10:35:58 +0200 (CEST)
From: martin@aprisoft.de
Reply-To: martin@aprisoft.de
To: gnats-bugs@NetBSD.org
Subject: panic on NFS unmount
X-Send-Pr-Version: 3.95

>Number:         36572
>Category:       kern
>Synopsis:       panic on NFS unmount
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          analyzed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jun 28 08:40:00 +0000 2007
>Closed-Date:    
>Last-Modified:  Mon Aug 06 11:55:59 +0000 2007
>Originator:     Martin Husemann
>Release:        NetBSD 4.99.21
>Organization:
>Environment:
System: NetBSD nelly.aprisoft.de 4.99.21 NetBSD 4.99.21 (NELLY) #5: Wed Jun 27 05:09:38 CEST 2007 martin@emmas.aprisoft.de:/nelly/usr/src/sys/arch/sparc64/compile/NELLY sparc64
Architecture: sparc64
Machine: sparc64
>Description:

When rebooting a diskless machine after quite some NFS usage I hit this panic:

panic: nfs_inactive: vp=0xd6dd9e0 error=0

nfs_inactive() + 0x13c
VOP_INACTIVE()
vclean()
vgonel()
vflush()
nfs_unmount()
dounmount()
vfs_unmountall()

It has a big XXX comment:

0x1047d5c is in nfs_inactive (../../../../nfs/nfs_node.c:272).
267                      */
268     
269                     error = vn_lock(sp->s_dvp, LK_EXCLUSIVE | LK_CANRECURSE);
270                     if (error || sp->s_dvp->v_data == NULL) {
271                             /* XXX should recover */
272                             panic("%s: vp=%p error=%d", __func__, sp->s_dvp, error);
273                     }
274                     nfs_removeit(sp);
275                     kauth_cred_free(sp->s_cred);
276                     vput(sp->s_dvp);

What is this test for? Is it to protect vput()? But nfs_unlock() already deals
with v_data == 0. Or has this vnode been reclaimed already (how could I tell
from ddb?)

Might this be related to PR kern/36424 (wishfull thinking).

>How-To-Repeat:

I used the / on NFS for a day (including a build.sh run) and then rebooted.

>Fix:
no idea, sorry.

>Release-Note:

>Audit-Trail:
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 20:38:40 +0900 (JST)

 > When rebooting a diskless machine after quite some NFS usage I hit this panic:
 > 
 > panic: nfs_inactive: vp=0xd6dd9e0 error=0
 > 
 > nfs_inactive() + 0x13c
 > VOP_INACTIVE()
 > vclean()
 > vgonel()
 > vflush()
 > nfs_unmount()
 > dounmount()
 > vfs_unmountall()
 > 
 > It has a big XXX comment:
 > 
 > 0x1047d5c is in nfs_inactive (../../../../nfs/nfs_node.c:272).
 > 267                      */
 > 268     
 > 269                     error = vn_lock(sp->s_dvp, LK_EXCLUSIVE | LK_CANRECURSE);
 > 270                     if (error || sp->s_dvp->v_data == NULL) {
 > 271                             /* XXX should recover */
 > 272                             panic("%s: vp=%p error=%d", __func__, sp->s_dvp, error);
 > 273                     }
 > 274                     nfs_removeit(sp);
 > 275                     kauth_cred_free(sp->s_cred);
 > 276                     vput(sp->s_dvp);
 > 
 > What is this test for? Is it to protect vput()? But nfs_unlock() already deals
 > with v_data == 0. Or has this vnode been reclaimed already (how could I tell
 > from ddb?)

 sh vnode 0xd6dd9e0

 if dvp have been revoked (v_data == NULL), nfs_removeit() doesn't work.
 probably it isn't critical enough to panic, but it should be fixed eventually.

 > Might this be related to PR kern/36424 (wishfull thinking).

 i guess it's unrelated.

 YAMAMOTO Takashi

From: Antti Kantee <pooka@cs.hut.fi>
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 14:53:59 +0300

 On Thu Jun 28 2007 at 20:38:40 +0900, YAMAMOTO Takashi wrote:
 > > 0x1047d5c is in nfs_inactive (../../../../nfs/nfs_node.c:272).
 > > 267                      */
 > > 268     
 > > 269                     error = vn_lock(sp->s_dvp, LK_EXCLUSIVE | LK_CANRECURSE);
 > > 270                     if (error || sp->s_dvp->v_data == NULL) {
 > > 271                             /* XXX should recover */
 > > 272                             panic("%s: vp=%p error=%d", __func__, sp->s_dvp, error);
 > > 273                     }
 > > 274                     nfs_removeit(sp);
 > > 275                     kauth_cred_free(sp->s_cred);
 > > 276                     vput(sp->s_dvp);
 > > 
 > > What is this test for? Is it to protect vput()? But nfs_unlock() already deals
 > > with v_data == 0. Or has this vnode been reclaimed already (how could I tell
 > > from ddb?)
 > 
 > sh vnode 0xd6dd9e0
 > 
 > if dvp have been revoked (v_data == NULL), nfs_removeit() doesn't work.
 > probably it isn't critical enough to panic, but it should be fixed eventually.

 And it can happen only in unmount(MNT_FORCE), since otherwise sillyrename
 holds a reference.  This might actually be fairly easy to duplicate by
 opening a file, removing it and doing unmount -f.

 Maybe we should record a dependency to sillyrenamed nodes in the
 parent and force flushing of those before flushing the parent node.

 I'd tend to not care except that otherwise we leave .nfs-files hanging
 around.

 -- 
 Antti Kantee <pooka@iki.fi>                     Of course he runs NetBSD
 http://www.iki.fi/pooka/                          http://www.NetBSD.org/
     "la qualité la plus indispensable du cuisinier est l'exactitude"

From: Martin Husemann <martin@duskware.de>
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 14:35:53 +0200

 On Thu, Jun 28, 2007 at 08:38:40PM +0900, YAMAMOTO Takashi wrote:
 > sh vnode 0xd6dd9e0

 I'll dig deeper next time - I have failed to reproduce it so far.

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 14:36:52 +0200

 On Thu, Jun 28, 2007 at 11:55:02AM +0000, Antti Kantee wrote:
 >  holds a reference.  This might actually be fairly easy to duplicate by
 >  opening a file, removing it and doing unmount -f.

 That did not reproduce it for me.

 Martin

From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
	martin@aprisoft.de
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 15:44:13 +0300

 On Thu Jun 28 2007 at 12:40:04 +0000, Martin Husemann wrote:
 > The following reply was made to PR kern/36572; it has been noted by GNATS.
 > 
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/36572: panic on NFS unmount
 > Date: Thu, 28 Jun 2007 14:36:52 +0200
 > 
 >  On Thu, Jun 28, 2007 at 11:55:02AM +0000, Antti Kantee wrote:
 >  >  holds a reference.  This might actually be fairly easy to duplicate by
 >  >  opening a file, removing it and doing unmount -f.
 >  
 >  That did not reproduce it for me.

 You obviously need some luck in the vnode list order so that the parent
 is cleaned before the child.

 .. or my theory is wrong.

 -- 
 Antti Kantee <pooka@iki.fi>                     Of course he runs NetBSD
 http://www.iki.fi/pooka/                          http://www.NetBSD.org/
     "la qualité la plus indispensable du cuisinier est l'exactitude"

From: Matthias Drochner <M.Drochner@fz-juelich.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org
Subject: Re: kern/36572: panic on NFS unmount 
Date: Thu, 28 Jun 2007 14:51:14 +0200

 This reminds me of the old sillyrename problem -- didn't
 find a PR but google found
 http://osdir.com/ml/os.netbsd.devel.kernel/2003-04/msg00318.html

 The problem is still present, at least it was some weeks ago.

 best regards
 Matthias


 Forschungszentrum Juelich GmbH
 52425 Juelich

 Sitz der Gesellschaft: Juelich
 Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
 Vorsitzende des Aufsichtsrats: MinDirig'in Baerbel Brumme-Bothe
 Vorstand: Prof. Dr. Achim Bachem (Vorsitzender), Dr. Ulrich Krafft (stellv. 
 Vorsitzender)

From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@netbsd.org, netbsd-bugs@netbsd.org, martin@aprisoft.de
Cc: 
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 15:58:51 +0300

 On Thu Jun 28 2007 at 12:45:03 +0000, Antti Kantee wrote:
 >  >  On Thu, Jun 28, 2007 at 11:55:02AM +0000, Antti Kantee wrote:
 >  >  >  holds a reference.  This might actually be fairly easy to duplicate by
 >  >  >  opening a file, removing it and doing unmount -f.
 >  >  
 >  >  That did not reproduce it for me.
 >  
 >  You obviously need some luck in the vnode list order so that the parent
 >  is cleaned before the child.
 >  
 >  .. or my theory is wrong.

 I started thinking about this and concluded it should happen always.

 And sure enough, touch foo ; sleep 10 < foo & rm foo ; unmount -f
 made it panic for me.

 -- 
 Antti Kantee <pooka@iki.fi>                     Of course he runs NetBSD
 http://www.iki.fi/pooka/                          http://www.NetBSD.org/
     "la qualité la plus indispensable du cuisinier est l'exactitude"

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: pooka@cs.hut.fi
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 22:00:50 +0900 (JST)

 > And it can happen only in unmount(MNT_FORCE), since otherwise sillyrename
 > holds a reference.  This might actually be fairly easy to duplicate by
 > opening a file, removing it and doing unmount -f.
 > 
 > Maybe we should record a dependency to sillyrenamed nodes in the
 > parent and force flushing of those before flushing the parent node.
 > 
 > I'd tend to not care except that otherwise we leave .nfs-files hanging
 > around.

 right.  (and there is revoke(2) as well.)

 another easy way to fix it would be to save the filehandle of the parent
 rather than its vnode pointer.

 YAMAMOTO Takashi

From: YAMAMOTO Takashi <yamt@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: PR/36572 CVS commit: src/sys/nfs
Date: Mon,  6 Aug 2007 11:55:08 +0000 (UTC)

 Module Name:	src
 Committed By:	yamt
 Date:		Mon Aug  6 11:55:08 UTC 2007

 Modified Files:
 	src/sys/nfs: nfs_node.c

 Log Message:
 nfs_inactive: turn a panic into a printf for now, as it isn't critical.
 PR/36572 from Martin Husemann.


 To generate a diff of this commit:
 cvs rdiff -r1.94 -r1.95 src/sys/nfs/nfs_node.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->analyzed
State-Changed-By: yamt@netbsd.org
State-Changed-When: Mon, 06 Aug 2007 11:55:59 +0000
State-Changed-Why:
the problem is well-understood and workarounded.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.