NetBSD Problem Report #47069

From julio+host-mini-jmmv@meroh.net  Sat Oct 13 20:35:36 2012
Return-Path: <julio+host-mini-jmmv@meroh.net>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 0304663E5BF
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 13 Oct 2012 20:35:35 +0000 (UTC)
Message-Id: <20121013203443.00E13161AA6@mini.meroh.net>
Date: Sat, 13 Oct 2012 16:34:42 -0400 (EDT)
From: julio+host-mini-jmmv@meroh.net
Reply-To: julio+host-mini-jmmv@meroh.net
To: gnats-bugs@gnats.NetBSD.org
Subject: posix_spawn/t_spawnattr locks up the system
X-Send-Pr-Version: 3.95

>Number:         47069
>Category:       kern
>Synopsis:       posix_spawn/t_spawnattr locks up the system
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    martin
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Oct 13 20:40:00 +0000 2012
>Last-Modified:  Fri Oct 19 00:05:02 +0000 2012
>Originator:     julio+host-mini-jmmv@meroh.net
>Release:        NetBSD 6.99.12 -- Sources as of 20121013.
>Organization:

>Environment:


System: NetBSD mini.meroh.net 6.99.12 NetBSD 6.99.12 (MINI) #32: Sat Oct 13 00:10:52 EDT 2012 sysbuild@netbsd.meroh.net:/home/sysbuild/macppc/obj/home/sysbuild/src/sys/arch/macppc/compile/MINI macppc
Architecture: powerpc
Machine: macppc
>Description:
	The lib/libc/gen/posix_spawn/t_spawnattr test program locks my
	-current macppc machine consistently.  Whenever I run this test,
	the t_spawnattr process ends up being listed as state STOP in
	top.  At that time, the system "appears" up in the sense that I
	can continue to use everything that is running (X remains up,
	top gets updated with new data, the machine responds to network
	connections...).  However, attempts to run any new program result
	in "stuckness" of the caller program.

	At first, I thought that access to the disk was being locked, but
	I now think that the problem is that the process table is left
	locked for some reason and thus no new processes can be started.

	Upon inspecting the implementation of posix_spawn, I *suspect* the
	following: the STOP state means that posix_spawn never completed
	correctly, because one of the first things the function does is
	transition the process to STOPped while it's being set up.
	Therefore, I'm guessing that there might be some locking issue with
	the process table.  Note that t_spawn works well, so I'm further
	guessing that this is related to the attributes manipulation.

	Lastly, I'm filing this as "kern" rather than "port-macppc" because
	I don't see any obvious MD code in the syscall.  However, it's
	possible that this is still macppc-specific for some reason, in
	which case this should be reclassified.
>How-To-Repeat:
	On a macppc machine running current:

	# cd /usr/tests/lib/libc/gen/posix_spawn
	# atf-run t_spawnattr

	Witness a machine lockup.
>Fix:


>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->martin
Responsible-Changed-By: martin@NetBSD.org
Responsible-Changed-When: Sat, 13 Oct 2012 21:05:34 +0000
Responsible-Changed-Why:
take


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sat, 13 Oct 2012 23:06:22 +0200

 Did this work with older kernels?

 Martin

From: Julio Merino <julio@meroh.net>
To: gnats-bugs@netbsd.org
Cc: martin@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sat, 13 Oct 2012 17:40:45 -0400

 On Sat, Oct 13, 2012 at 5:10 PM, Martin Husemann <martin@duskware.de> wrote:
 > The following reply was made to PR kern/47069; it has been noted by GNATS.
 >
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@NetBSD.org
 > Cc:
 > Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
 > Date: Sat, 13 Oct 2012 23:06:22 +0200
 >
 >  Did this work with older kernels?

 How old?  I don't recall any such problems at the beginning of August.

 -- 
 Julio Merino / @jmmv

From: Martin Husemann <martin@duskware.de>
To: Julio Merino <julio@meroh.net>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sat, 13 Oct 2012 23:44:41 +0200

 Any old would do ;-)
 So, since this code itself did not change in that timeframe, something
 else must be causing it.

 Could you bisect and narrow down the timeframe?

 Martin

From: Julio Merino <julio@meroh.net>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sat, 13 Oct 2012 21:31:30 -0400

 On Sat, Oct 13, 2012 at 5:44 PM, Martin Husemann <martin@duskware.de> wrote:
 > Any old would do ;-)
 > So, since this code itself did not change in that timeframe, something
 > else must be causing it.
 >
 > Could you bisect and narrow down the timeframe?

 Meh, it was non-obvious!

 Seems like the problem was introduced on October 2nd between 01:40 and
 01:50 GMT. The modified kernel files were:

 P sys/compat/common/kern_time_50.c
 P sys/compat/ibcs2/ibcs2_misc.c
 P sys/compat/linux/common/linux_time.c
 P sys/compat/linux32/common/linux32_time.c
 P sys/compat/netbsd32/netbsd32_compat_50.c
 P sys/compat/netbsd32/netbsd32_syscall.h
 P sys/compat/netbsd32/netbsd32_syscallargs.h
 P sys/compat/netbsd32/netbsd32_syscalls.c
 P sys/compat/netbsd32/netbsd32_sysent.c
 P sys/compat/netbsd32/netbsd32_time.c
 P sys/compat/netbsd32/syscalls.master
 P sys/kern/init_sysent.c
 P sys/kern/kern_time.c
 P sys/kern/syscalls.c
 P sys/kern/syscalls.master
 P sys/rump/include/rump/rump_syscalls.h
 P sys/rump/librump/rumpkern/rump_syscalls.c
 P sys/sys/param.h
 P sys/sys/syscall.h
 P sys/sys/syscallargs.h
 P sys/sys/timevar.h

 -- 
 Julio Merino / @jmmv

From: Martin Husemann <martin@duskware.de>
To: Julio Merino <julio@meroh.net>
Cc: gnats-bugs@netbsd.org, christos@NetBSD.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sun, 14 Oct 2012 21:26:05 +0200

 On Sat, Oct 13, 2012 at 09:31:30PM -0400, Julio Merino wrote:
 > Meh, it was non-obvious!
 > 
 > Seems like the problem was introduced on October 2nd between 01:40 and
 > 01:50 GMT. The modified kernel files were:

 The addition of clock_nanosleep(2)?
 Very strange, christos?

 Martin

From: christos@zoulas.com (Christos Zoulas)
To: Martin Husemann <martin@duskware.de>, Julio Merino <julio@meroh.net>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sun, 14 Oct 2012 15:44:02 -0400

 On Oct 14,  9:26pm, martin@duskware.de (Martin Husemann) wrote:
 -- Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system

 | On Sat, Oct 13, 2012 at 09:31:30PM -0400, Julio Merino wrote:
 | > Meh, it was non-obvious!
 | > 
 | > Seems like the problem was introduced on October 2nd between 01:40 and
 | > 01:50 GMT. The modified kernel files were:
 | 
 | The addition of clock_nanosleep(2)?
 | Very strange, christos?

 I tested it. Could it be the general mutex lossage on powerpc?
 Does the posix_spawn test fail on other platforms?

 christos

From: Martin Husemann <martin@duskware.de>
To: Christos Zoulas <christos@zoulas.com>
Cc: Julio Merino <julio@meroh.net>, gnats-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sun, 14 Oct 2012 21:47:00 +0200

 On Sun, Oct 14, 2012 at 03:44:02PM -0400, Christos Zoulas wrote:
 > I tested it. Could it be the general mutex lossage on powerpc?
 > Does the posix_spawn test fail on other platforms?

 Not on any I can currently easily test.

 Martin

From: christos@zoulas.com (Christos Zoulas)
To: Martin Husemann <martin@duskware.de>
Cc: Julio Merino <julio@meroh.net>, gnats-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sun, 14 Oct 2012 16:27:36 -0400

 On Oct 14,  9:47pm, martin@duskware.de (Martin Husemann) wrote:
 -- Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system

 | On Sun, Oct 14, 2012 at 03:44:02PM -0400, Christos Zoulas wrote:
 | > I tested it. Could it be the general mutex lossage on powerpc?
 | > Does the posix_spawn test fail on other platforms?
 | 
 | Not on any I can currently easily test.

 Try defining FULL ok kern_mutex.c for powerpc.

 christos

From: Julio Merino <julio@meroh.net>
To: gnats-bugs@netbsd.org
Cc: martin@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Wed, 17 Oct 2012 20:16:33 -0400

 On Sun, Oct 14, 2012 at 4:30 PM, Christos Zoulas <christos@zoulas.com> wrote:
 >  Try defining FULL ok kern_mutex.c for powerpc.

 Nope, nothing.  I'm assuming that you meant to just "#define FULL" in
 kern_mutex.c?  I also tried enabling LOCKDEBUG and DIAGNOSTIC.

 However, I noticed one more data point: when the system locks up, one
 of the t_spawnattr processes is in RUN state and the other in STOP
 state.  Both show 0K of memory in the SIZE and RES columns of top.

 -- 
 Julio Merino / @jmmv

From: christos@zoulas.com (Christos Zoulas)
To: Julio Merino <julio@meroh.net>, gnats-bugs@netbsd.org
Cc: martin@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Wed, 17 Oct 2012 20:21:52 -0400

 On Oct 17,  8:16pm, julio@meroh.net (Julio Merino) wrote:
 -- Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system

 | On Sun, Oct 14, 2012 at 4:30 PM, Christos Zoulas <christos@zoulas.com> wrote:
 | >  Try defining FULL ok kern_mutex.c for powerpc.
 | 
 | Nope, nothing.  I'm assuming that you meant to just "#define FULL" in
 | kern_mutex.c?  I also tried enabling LOCKDEBUG and DIAGNOSTIC.

 Yes.

 | However, I noticed one more data point: when the system locks up, one
 | of the t_spawnattr processes is in RUN state and the other in STOP
 | state.  Both show 0K of memory in the SIZE and RES columns of top.

 Fine, but the whole problem could be a mutex issue. Let's try this
 one first.

 christos

From: Julio Merino <julio@meroh.net>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@netbsd.org, martin@netbsd.org, gnats-admin@netbsd.org, 
	netbsd-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Thu, 18 Oct 2012 18:29:30 -0400

 On Wed, Oct 17, 2012 at 8:21 PM, Christos Zoulas <christos@zoulas.com> wrote:
 > On Oct 17,  8:16pm, julio@meroh.net (Julio Merino) wrote:
 > -- Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
 >
 > | On Sun, Oct 14, 2012 at 4:30 PM, Christos Zoulas <christos@zoulas.com> wrote:
 > | >  Try defining FULL ok kern_mutex.c for powerpc.
 > |
 > | Nope, nothing.  I'm assuming that you meant to just "#define FULL" in
 > | kern_mutex.c?  I also tried enabling LOCKDEBUG and DIAGNOSTIC.
 >
 > Yes.
 >
 > | However, I noticed one more data point: when the system locks up, one
 > | of the t_spawnattr processes is in RUN state and the other in STOP
 > | state.  Both show 0K of memory in the SIZE and RES columns of top.
 >
 > Fine, but the whole problem could be a mutex issue. Let's try this
 > one first.

 By "this one" you meant defining FULL, LOCKDEBUG or DIAGNOSTIC?  I
 already tried them all and none provided any further details :-/

 -- 
 Julio Merino / @jmmv

From: christos@zoulas.com (Christos Zoulas)
To: Julio Merino <julio@meroh.net>
Cc: gnats-bugs@netbsd.org, martin@netbsd.org, gnats-admin@netbsd.org, 
	netbsd-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Thu, 18 Oct 2012 20:00:33 -0400

 On Oct 18,  6:29pm, julio@meroh.net (Julio Merino) wrote:
 -- Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system

 | By "this one" you meant defining FULL, LOCKDEBUG or DIAGNOSTIC?  I
 | already tried them all and none provided any further details :-/

 FULL is not going to produce any output. It would hopefully fix the problem.
 If not, there is something else going on.

 christos

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.