NetBSD Problem Report #41225

From www@NetBSD.org  Wed Apr 15 22:06:05 2009
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id C0A0863C1F9
	for <gnats-bugs@gnats.netbsd.org>; Wed, 15 Apr 2009 22:06:04 +0000 (UTC)
Message-Id: <20090415220600.6C64863B8A5@www.NetBSD.org>
Date: Wed, 15 Apr 2009 22:06:00 +0000 (UTC)
From: ad@netbsd.org
Reply-To: ad@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: sys_mqueue.c mq->mq_notify_proc can disappear
X-Send-Pr-Version: www-1.0

>Number:         41225
>Category:       kern
>Synopsis:       sys_mqueue.c mq->mq_notify_proc can disappear
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Apr 15 22:10:01 +0000 2009
>Last-Modified:  Sun Oct 06 16:11:06 +0000 2019
>Originator:     Andrew Doran
>Release:        5.0
>Organization:
The NetBSD Project
>Environment:
n/a
>Description:
Message queues are tied to file descriptors and so can live across
fork().

mq_notify() does: mq->mq_notify_proc = l->l_proc;

It looks like this process can exit while the mqueue still persists.

We can later try to send a signal to this process.
>How-To-Repeat:
Code inspection.

>Fix:
This problem occurs quite a few times in the kernel, esp. in device
drivers doing async I/O. :-(

Quick and dirty fix:

Change mq_notify() to store a pid_t and use p_find() when sending
signals. mqueue can send signals to unrelated processes after
the proc exits.

More reliable but ugly and slow:

Register an exithook for every mqueue that has mq_notify_proc set.
This is mostly ugly because at the moment it requires taking exec_lock
which is pretty heavyweight.

Maybe a better solution:

In addition to a pid, store a generation number per struct proc.
Every time a pid is allocated, increment the generation number.
Add a p_find_gen() or similar that looks for a pid and also
compares the generation numbers.
This number could be a 64-bit system global (under proc_lock) or
we could have one for each pid table slot. I don't personally see
a big disadvantage to having a global. ??

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->rmind
Responsible-Changed-By: rmind@NetBSD.org
Responsible-Changed-When: Wed, 15 Apr 2009 23:22:05 +0000
Responsible-Changed-Why:
Mine.


From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/41225: sys_mqueue.c mq->mq_notify_proc can disappear
Date: Thu, 16 Apr 2009 07:38:23 +0100

 On Wed, Apr 15, 2009 at 10:10:01PM +0000, ad@netbsd.org wrote:
 > >Synopsis:       sys_mqueue.c mq->mq_notify_proc can disappear
 > 
 > Quick and dirty fix:
 > 
 > Change mq_notify() to store a pid_t and use p_find() when sending
 > signals. mqueue can send signals to unrelated processes after
 > the proc exits.

 Can the message queue hold a reference count againt the proc table slot ?
 This would stop the slot being reused.

 I guess it should also be possibly to determine that the process had died
 when it tries to send a new signal - removing the need for the exithook.

 Just don't do the fubar recently added to linux where kernel code
 can no longer send a signal to a pid!

 	David

 -- 
 David Laight: david@l8s.co.uk

From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: ad@netbsd.org
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org
Subject: Re: kern/41225: sys_mqueue.c mq->mq_notify_proc can disappear
Date: Sat, 25 Apr 2009 17:59:20 +0100

 > >Number:         41225
 > >Category:       kern
 > >Synopsis:       sys_mqueue.c mq->mq_notify_proc can disappear
 >
 > ...
 >
 > >Description:
 > Message queues are tied to file descriptors and so can live across
 > fork().
 > 
 > mq_notify() does: mq->mq_notify_proc = l->l_proc;
 > 
 > It looks like this process can exit while the mqueue still persists.

 When process is closing the descriptor (by exit or directly), the
 mq_notify_proc is checked by mq_close_fop():

 http://nxr.netbsd.org/source/xref/sys/kern/sys_mqueue.c#304

 Therefore it seems to be a small race condition, with window here:

 http://nxr.netbsd.org/source/xref/sys/kern/sys_mqueue.c#739

 Holding proc_lock before release of mq->mq_mtx should fix the problem:

 http://www.netbsd.org/~rmind/mq_kpsignal.diff

 Looks good?

 > Maybe a better solution:
 > 
 > In addition to a pid, store a generation number per struct proc.
 > Every time a pid is allocated, increment the generation number.
 > Add a p_find_gen() or similar that looks for a pid and also
 > compares the generation numbers.
 > This number could be a 64-bit system global (under proc_lock) or
 > we could have one for each pid table slot. I don't personally see
 > a big disadvantage to having a global. ??

 That would be a general way to ensure uniqueness. Are there any other
 users (or other usefulness) of such interface?

 -- 
 Best regards,
 Mindaugas

Responsible-Changed-From-To: rmind->kern-bug-people
Responsible-Changed-By: rmind@NetBSD.org
Responsible-Changed-When: Sun, 06 Oct 2019 16:11:06 +0000
Responsible-Changed-Why:


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.