NetBSD Problem Report #41154
From Manuel.Bouyer@lip6.fr Sun Apr 5 21:01:00 2009
Return-Path: <Manuel.Bouyer@lip6.fr>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 77A6B63B8A5
for <gnats-bugs@gnats.NetBSD.org>; Sun, 5 Apr 2009 21:01:00 +0000 (UTC)
Message-Id: <200904052100.n35L0ux2001651@horn.soc.lip6.fr>
Date: Sun, 5 Apr 2009 23:00:56 +0200 (MEST)
From: Manuel Bouyer <Manuel.Bouyer@lip6.fr>
Reply-To: Manuel.Bouyer@lip6.fr
To: gnats-bugs@gnats.NetBSD.org
Subject: possible races in NFS server code ?
X-Send-Pr-Version: 3.95
>Number: 41154
>Category: kern
>Synopsis: possible races in NFS server code ?
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Apr 05 21:05:00 +0000 2009
>Last-Modified: Wed Apr 08 20:20:05 +0000 2009
>Originator: Manuel Bouyer
>Release: NetBSD 5.0_RC3
>Organization:
>Environment:
System: NetBSD horn 5.0_RC3 NetBSD 5.0_RC3 (DISCODEBUG) #34: Sun Apr 5 21:37:10 MEST 2009 bouyer@disco:/home/bouyer/src-5/src/sys/arch/i386/compile/DISCODEBUG i386
Architecture: i386
Machine: i386
>Description:
While working on the NFS server code I noticed that several
places use members of struct nfssvc_sock in a way that is not
obviously thread-safe.
There appears to be no locking to prevent a nfssvc_sock that has been
checked as SLP_VALID to become invalid while in use.
Some places recheck SLP_VALID where appropriate but some don't.
Especially I suspect there's a possible use after free of
ns_nam in nfssvc_nfsd(), as nothing seems to prevent nfsrv_zapsock()
from freeing it while nd_nam points to it. More generally the tail
of nfssvc_nfsd() doens't look safe.
If concurent execution of nfssvc_addsock() and nfsrv_zapsock()
possible ? If so nfssvc_addsock() would need to take the ns_lock,
and revisit the way SLP_VALID is handled in these 2 functions.
Last I suspect a race between sys_nfssvc() and nfsrv_zapsock()
could cause a corruption of the UID hash lists. More generally
what prevents concurent updates of the ns_uidlruhead ?
>How-To-Repeat:
code inspection
>Fix:
>Audit-Trail:
From: Andrew Doran <ad@netbsd.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/41154: possible races in NFS server code ?
Date: Wed, 8 Apr 2009 20:08:42 +0000
On Wed, Apr 08, 2009 at 09:23:33PM +0200, Manuel Bouyer wrote:
> @@ -889,9 +887,12 @@
> closef(fp);
Calling closef() while holding nfsd_lock makes me uncomfortable. In this
case we know it is a socket, so I think it should be OK, but there could be
unexpected deadlocks. Better to free it afterwards I think..
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/41154: possible races in NFS server code ?
Date: Wed, 8 Apr 2009 20:17:08 +0000
On Wed, Apr 08, 2009 at 08:10:04PM +0000, Andrew Doran wrote:
> The following reply was made to PR kern/41154; it has been noted by GNATS.
>
> From: Andrew Doran <ad@netbsd.org>
> To: Manuel Bouyer <bouyer@antioche.eu.org>
> Cc: gnats-bugs@netbsd.org
> Subject: Re: kern/41154: possible races in NFS server code ?
> Date: Wed, 8 Apr 2009 20:08:42 +0000
>
> On Wed, Apr 08, 2009 at 09:23:33PM +0200, Manuel Bouyer wrote:
>
> > @@ -889,9 +887,12 @@
> > closef(fp);
>
> Calling closef() while holding nfsd_lock makes me uncomfortable. In this
> case we know it is a socket, so I think it should be OK, but there could be
> unexpected deadlocks. Better to free it afterwards I think..
>
Actually it looks like there *is* a problem with doing this.
socket upcall -> nfsrv_soupcall -> nfsrv_wakenfsd -> mutex_enter(&nfsd_lock)
So this defines the current lock order as socket::so_lock -> nfsd_lock.
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Andrew Doran <ad@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/41154: possible races in NFS server code ?
Date: Wed, 8 Apr 2009 22:17:41 +0200
On Wed, Apr 08, 2009 at 08:08:42PM +0000, Andrew Doran wrote:
> On Wed, Apr 08, 2009 at 09:23:33PM +0200, Manuel Bouyer wrote:
>
> > @@ -889,9 +887,12 @@
> > closef(fp);
>
> Calling closef() while holding nfsd_lock makes me uncomfortable. In this
> case we know it is a socket, so I think it should be OK, but there could be
> unexpected deadlocks. Better to free it afterwards I think..
In my patch it'a already after nfsd_lock has been released; it's released
just after TAILQ_REMOVE(&nfssvc_sockhead ...
We used to release the lock just before testing SLP_VALID and reaquire it just
after, my patch just remove this mutex_exit/mutex_enter around a
test. What may have confused you is that because of this there is also
a mutex_exit() in the other branch of the test. But the only thing that
happen locked which was not before is the SLP_VALID test.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.