NetBSD Problem Report #41154

From Manuel.Bouyer@lip6.fr  Sun Apr  5 21:01:00 2009
Return-Path: <Manuel.Bouyer@lip6.fr>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 77A6B63B8A5
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  5 Apr 2009 21:01:00 +0000 (UTC)
Message-Id: <200904052100.n35L0ux2001651@horn.soc.lip6.fr>
Date: Sun, 5 Apr 2009 23:00:56 +0200 (MEST)
From: Manuel Bouyer <Manuel.Bouyer@lip6.fr>
Reply-To: Manuel.Bouyer@lip6.fr
To: gnats-bugs@gnats.NetBSD.org
Subject: possible races in NFS server code ?
X-Send-Pr-Version: 3.95

>Number:         41154
>Category:       kern
>Synopsis:       possible races in NFS server code ?
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Apr 05 21:05:00 +0000 2009
>Last-Modified:  Wed Apr 08 20:20:05 +0000 2009
>Originator:     Manuel Bouyer
>Release:        NetBSD 5.0_RC3
>Organization:
>Environment:
System: NetBSD horn 5.0_RC3 NetBSD 5.0_RC3 (DISCODEBUG) #34: Sun Apr 5 21:37:10 MEST 2009 bouyer@disco:/home/bouyer/src-5/src/sys/arch/i386/compile/DISCODEBUG i386
Architecture: i386
Machine: i386
>Description:
	While working on the NFS server code I noticed that several
	places use members of struct nfssvc_sock in a way that is not
	obviously thread-safe.
	There appears to be no locking to prevent a nfssvc_sock that has been
	checked as SLP_VALID to become invalid while in use.
	Some places recheck SLP_VALID where appropriate but some don't.
	Especially I suspect there's a possible use after free of
	ns_nam in nfssvc_nfsd(), as nothing seems to prevent nfsrv_zapsock()
	from freeing it while nd_nam points to it. More generally the tail
	of nfssvc_nfsd() doens't look safe.

	If concurent execution of nfssvc_addsock() and nfsrv_zapsock()
	possible ? If so nfssvc_addsock() would need to take the ns_lock,
	and revisit the way SLP_VALID is handled in these 2 functions.

	Last I suspect a race between sys_nfssvc() and nfsrv_zapsock()
	could cause a corruption of the UID hash lists. More generally
	what prevents concurent updates of the ns_uidlruhead ?

>How-To-Repeat:
	code inspection
>Fix:

>Audit-Trail:
From: Andrew Doran <ad@netbsd.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/41154: possible races in NFS server code ?
Date: Wed, 8 Apr 2009 20:08:42 +0000

 On Wed, Apr 08, 2009 at 09:23:33PM +0200, Manuel Bouyer wrote:

 > @@ -889,9 +887,12 @@
 >  			closef(fp);

 Calling closef() while holding nfsd_lock makes me uncomfortable. In this
 case we know it is a socket, so I think it should be OK, but there could be
 unexpected deadlocks. Better to free it afterwards I think..

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/41154: possible races in NFS server code ?
Date: Wed, 8 Apr 2009 20:17:08 +0000

 On Wed, Apr 08, 2009 at 08:10:04PM +0000, Andrew Doran wrote:

 > The following reply was made to PR kern/41154; it has been noted by GNATS.
 > 
 > From: Andrew Doran <ad@netbsd.org>
 > To: Manuel Bouyer <bouyer@antioche.eu.org>
 > Cc: gnats-bugs@netbsd.org
 > Subject: Re: kern/41154: possible races in NFS server code ?
 > Date: Wed, 8 Apr 2009 20:08:42 +0000
 > 
 >  On Wed, Apr 08, 2009 at 09:23:33PM +0200, Manuel Bouyer wrote:
 >  
 >  > @@ -889,9 +887,12 @@
 >  >  			closef(fp);
 >  
 >  Calling closef() while holding nfsd_lock makes me uncomfortable. In this
 >  case we know it is a socket, so I think it should be OK, but there could be
 >  unexpected deadlocks. Better to free it afterwards I think..
 >  

 Actually it looks like there *is* a problem with doing this.

 socket upcall -> nfsrv_soupcall -> nfsrv_wakenfsd -> mutex_enter(&nfsd_lock)

 So this defines the current lock order as socket::so_lock -> nfsd_lock.

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Andrew Doran <ad@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/41154: possible races in NFS server code ?
Date: Wed, 8 Apr 2009 22:17:41 +0200

 On Wed, Apr 08, 2009 at 08:08:42PM +0000, Andrew Doran wrote:
 > On Wed, Apr 08, 2009 at 09:23:33PM +0200, Manuel Bouyer wrote:
 > 
 > > @@ -889,9 +887,12 @@
 > >  			closef(fp);
 > 
 > Calling closef() while holding nfsd_lock makes me uncomfortable. In this
 > case we know it is a socket, so I think it should be OK, but there could be
 > unexpected deadlocks. Better to free it afterwards I think..

 In my patch it'a already after nfsd_lock has been released; it's released
 just after TAILQ_REMOVE(&nfssvc_sockhead ...
 We used to release the lock just before testing SLP_VALID and reaquire it just
 after, my patch just remove this mutex_exit/mutex_enter around a
 test. What may have confused you is that because of this there is also
 a mutex_exit() in the other branch of the test. But the only thing that
 happen locked which was not before is the SLP_VALID test.


 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --
Home
PR Database Search
(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.