NetBSD Problem Report #36572
From martin@aprisoft.de Thu Jun 28 08:36:01 2007
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 7D97963B882
for <gnats-bugs@gnats.NetBSD.org>; Thu, 28 Jun 2007 08:36:01 +0000 (UTC)
Message-Id: <20070628083558.82479AF5824@emmas.aprisoft.de>
Date: Thu, 28 Jun 2007 10:35:58 +0200 (CEST)
From: martin@aprisoft.de
Reply-To: martin@aprisoft.de
To: gnats-bugs@NetBSD.org
Subject: panic on NFS unmount
X-Send-Pr-Version: 3.95
>Number: 36572
>Category: kern
>Synopsis: panic on NFS unmount
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: analyzed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jun 28 08:40:00 +0000 2007
>Closed-Date:
>Last-Modified: Mon Aug 06 11:55:59 +0000 2007
>Originator: Martin Husemann
>Release: NetBSD 4.99.21
>Organization:
>Environment:
System: NetBSD nelly.aprisoft.de 4.99.21 NetBSD 4.99.21 (NELLY) #5: Wed Jun 27 05:09:38 CEST 2007 martin@emmas.aprisoft.de:/nelly/usr/src/sys/arch/sparc64/compile/NELLY sparc64
Architecture: sparc64
Machine: sparc64
>Description:
When rebooting a diskless machine after quite some NFS usage I hit this panic:
panic: nfs_inactive: vp=0xd6dd9e0 error=0
nfs_inactive() + 0x13c
VOP_INACTIVE()
vclean()
vgonel()
vflush()
nfs_unmount()
dounmount()
vfs_unmountall()
It has a big XXX comment:
0x1047d5c is in nfs_inactive (../../../../nfs/nfs_node.c:272).
267 */
268
269 error = vn_lock(sp->s_dvp, LK_EXCLUSIVE | LK_CANRECURSE);
270 if (error || sp->s_dvp->v_data == NULL) {
271 /* XXX should recover */
272 panic("%s: vp=%p error=%d", __func__, sp->s_dvp, error);
273 }
274 nfs_removeit(sp);
275 kauth_cred_free(sp->s_cred);
276 vput(sp->s_dvp);
What is this test for? Is it to protect vput()? But nfs_unlock() already deals
with v_data == 0. Or has this vnode been reclaimed already (how could I tell
from ddb?)
Might this be related to PR kern/36424 (wishfull thinking).
>How-To-Repeat:
I used the / on NFS for a day (including a build.sh run) and then rebooted.
>Fix:
no idea, sorry.
>Release-Note:
>Audit-Trail:
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 20:38:40 +0900 (JST)
> When rebooting a diskless machine after quite some NFS usage I hit this panic:
>
> panic: nfs_inactive: vp=0xd6dd9e0 error=0
>
> nfs_inactive() + 0x13c
> VOP_INACTIVE()
> vclean()
> vgonel()
> vflush()
> nfs_unmount()
> dounmount()
> vfs_unmountall()
>
> It has a big XXX comment:
>
> 0x1047d5c is in nfs_inactive (../../../../nfs/nfs_node.c:272).
> 267 */
> 268
> 269 error = vn_lock(sp->s_dvp, LK_EXCLUSIVE | LK_CANRECURSE);
> 270 if (error || sp->s_dvp->v_data == NULL) {
> 271 /* XXX should recover */
> 272 panic("%s: vp=%p error=%d", __func__, sp->s_dvp, error);
> 273 }
> 274 nfs_removeit(sp);
> 275 kauth_cred_free(sp->s_cred);
> 276 vput(sp->s_dvp);
>
> What is this test for? Is it to protect vput()? But nfs_unlock() already deals
> with v_data == 0. Or has this vnode been reclaimed already (how could I tell
> from ddb?)
sh vnode 0xd6dd9e0
if dvp have been revoked (v_data == NULL), nfs_removeit() doesn't work.
probably it isn't critical enough to panic, but it should be fixed eventually.
> Might this be related to PR kern/36424 (wishfull thinking).
i guess it's unrelated.
YAMAMOTO Takashi
From: Antti Kantee <pooka@cs.hut.fi>
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 14:53:59 +0300
On Thu Jun 28 2007 at 20:38:40 +0900, YAMAMOTO Takashi wrote:
> > 0x1047d5c is in nfs_inactive (../../../../nfs/nfs_node.c:272).
> > 267 */
> > 268
> > 269 error = vn_lock(sp->s_dvp, LK_EXCLUSIVE | LK_CANRECURSE);
> > 270 if (error || sp->s_dvp->v_data == NULL) {
> > 271 /* XXX should recover */
> > 272 panic("%s: vp=%p error=%d", __func__, sp->s_dvp, error);
> > 273 }
> > 274 nfs_removeit(sp);
> > 275 kauth_cred_free(sp->s_cred);
> > 276 vput(sp->s_dvp);
> >
> > What is this test for? Is it to protect vput()? But nfs_unlock() already deals
> > with v_data == 0. Or has this vnode been reclaimed already (how could I tell
> > from ddb?)
>
> sh vnode 0xd6dd9e0
>
> if dvp have been revoked (v_data == NULL), nfs_removeit() doesn't work.
> probably it isn't critical enough to panic, but it should be fixed eventually.
And it can happen only in unmount(MNT_FORCE), since otherwise sillyrename
holds a reference. This might actually be fairly easy to duplicate by
opening a file, removing it and doing unmount -f.
Maybe we should record a dependency to sillyrenamed nodes in the
parent and force flushing of those before flushing the parent node.
I'd tend to not care except that otherwise we leave .nfs-files hanging
around.
--
Antti Kantee <pooka@iki.fi> Of course he runs NetBSD
http://www.iki.fi/pooka/ http://www.NetBSD.org/
"la qualité la plus indispensable du cuisinier est l'exactitude"
From: Martin Husemann <martin@duskware.de>
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 14:35:53 +0200
On Thu, Jun 28, 2007 at 08:38:40PM +0900, YAMAMOTO Takashi wrote:
> sh vnode 0xd6dd9e0
I'll dig deeper next time - I have failed to reproduce it so far.
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 14:36:52 +0200
On Thu, Jun 28, 2007 at 11:55:02AM +0000, Antti Kantee wrote:
> holds a reference. This might actually be fairly easy to duplicate by
> opening a file, removing it and doing unmount -f.
That did not reproduce it for me.
Martin
From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
martin@aprisoft.de
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 15:44:13 +0300
On Thu Jun 28 2007 at 12:40:04 +0000, Martin Husemann wrote:
> The following reply was made to PR kern/36572; it has been noted by GNATS.
>
> From: Martin Husemann <martin@duskware.de>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/36572: panic on NFS unmount
> Date: Thu, 28 Jun 2007 14:36:52 +0200
>
> On Thu, Jun 28, 2007 at 11:55:02AM +0000, Antti Kantee wrote:
> > holds a reference. This might actually be fairly easy to duplicate by
> > opening a file, removing it and doing unmount -f.
>
> That did not reproduce it for me.
You obviously need some luck in the vnode list order so that the parent
is cleaned before the child.
.. or my theory is wrong.
--
Antti Kantee <pooka@iki.fi> Of course he runs NetBSD
http://www.iki.fi/pooka/ http://www.NetBSD.org/
"la qualité la plus indispensable du cuisinier est l'exactitude"
From: Matthias Drochner <M.Drochner@fz-juelich.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 14:51:14 +0200
This reminds me of the old sillyrename problem -- didn't
find a PR but google found
http://osdir.com/ml/os.netbsd.devel.kernel/2003-04/msg00318.html
The problem is still present, at least it was some weeks ago.
best regards
Matthias
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDirig'in Baerbel Brumme-Bothe
Vorstand: Prof. Dr. Achim Bachem (Vorsitzender), Dr. Ulrich Krafft (stellv.
Vorsitzender)
From: Antti Kantee <pooka@cs.hut.fi>
To: gnats-bugs@netbsd.org, netbsd-bugs@netbsd.org, martin@aprisoft.de
Cc:
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 15:58:51 +0300
On Thu Jun 28 2007 at 12:45:03 +0000, Antti Kantee wrote:
> > On Thu, Jun 28, 2007 at 11:55:02AM +0000, Antti Kantee wrote:
> > > holds a reference. This might actually be fairly easy to duplicate by
> > > opening a file, removing it and doing unmount -f.
> >
> > That did not reproduce it for me.
>
> You obviously need some luck in the vnode list order so that the parent
> is cleaned before the child.
>
> .. or my theory is wrong.
I started thinking about this and concluded it should happen always.
And sure enough, touch foo ; sleep 10 < foo & rm foo ; unmount -f
made it panic for me.
--
Antti Kantee <pooka@iki.fi> Of course he runs NetBSD
http://www.iki.fi/pooka/ http://www.NetBSD.org/
"la qualité la plus indispensable du cuisinier est l'exactitude"
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: pooka@cs.hut.fi
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/36572: panic on NFS unmount
Date: Thu, 28 Jun 2007 22:00:50 +0900 (JST)
> And it can happen only in unmount(MNT_FORCE), since otherwise sillyrename
> holds a reference. This might actually be fairly easy to duplicate by
> opening a file, removing it and doing unmount -f.
>
> Maybe we should record a dependency to sillyrenamed nodes in the
> parent and force flushing of those before flushing the parent node.
>
> I'd tend to not care except that otherwise we leave .nfs-files hanging
> around.
right. (and there is revoke(2) as well.)
another easy way to fix it would be to save the filehandle of the parent
rather than its vnode pointer.
YAMAMOTO Takashi
From: YAMAMOTO Takashi <yamt@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: PR/36572 CVS commit: src/sys/nfs
Date: Mon, 6 Aug 2007 11:55:08 +0000 (UTC)
Module Name: src
Committed By: yamt
Date: Mon Aug 6 11:55:08 UTC 2007
Modified Files:
src/sys/nfs: nfs_node.c
Log Message:
nfs_inactive: turn a panic into a printf for now, as it isn't critical.
PR/36572 from Martin Husemann.
To generate a diff of this commit:
cvs rdiff -r1.94 -r1.95 src/sys/nfs/nfs_node.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->analyzed
State-Changed-By: yamt@netbsd.org
State-Changed-When: Mon, 06 Aug 2007 11:55:59 +0000
State-Changed-Why:
the problem is well-understood and workarounded.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.