NetBSD Problem Report #47069
From julio+host-mini-jmmv@meroh.net Sat Oct 13 20:35:36 2012
Return-Path: <julio+host-mini-jmmv@meroh.net>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
by www.NetBSD.org (Postfix) with ESMTP id 0304663E5BF
for <gnats-bugs@gnats.NetBSD.org>; Sat, 13 Oct 2012 20:35:35 +0000 (UTC)
Message-Id: <20121013203443.00E13161AA6@mini.meroh.net>
Date: Sat, 13 Oct 2012 16:34:42 -0400 (EDT)
From: julio+host-mini-jmmv@meroh.net
Reply-To: julio+host-mini-jmmv@meroh.net
To: gnats-bugs@gnats.NetBSD.org
Subject: posix_spawn/t_spawnattr locks up the system
X-Send-Pr-Version: 3.95
>Number: 47069
>Category: kern
>Synopsis: posix_spawn/t_spawnattr locks up the system
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: martin
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Oct 13 20:40:00 +0000 2012
>Last-Modified: Fri Oct 19 00:05:02 +0000 2012
>Originator: julio+host-mini-jmmv@meroh.net
>Release: NetBSD 6.99.12 -- Sources as of 20121013.
>Organization:
>Environment:
System: NetBSD mini.meroh.net 6.99.12 NetBSD 6.99.12 (MINI) #32: Sat Oct 13 00:10:52 EDT 2012 sysbuild@netbsd.meroh.net:/home/sysbuild/macppc/obj/home/sysbuild/src/sys/arch/macppc/compile/MINI macppc
Architecture: powerpc
Machine: macppc
>Description:
The lib/libc/gen/posix_spawn/t_spawnattr test program locks my
-current macppc machine consistently. Whenever I run this test,
the t_spawnattr process ends up being listed as state STOP in
top. At that time, the system "appears" up in the sense that I
can continue to use everything that is running (X remains up,
top gets updated with new data, the machine responds to network
connections...). However, attempts to run any new program result
in "stuckness" of the caller program.
At first, I thought that access to the disk was being locked, but
I now think that the problem is that the process table is left
locked for some reason and thus no new processes can be started.
Upon inspecting the implementation of posix_spawn, I *suspect* the
following: the STOP state means that posix_spawn never completed
correctly, because one of the first things the function does is
transition the process to STOPped while it's being set up.
Therefore, I'm guessing that there might be some locking issue with
the process table. Note that t_spawn works well, so I'm further
guessing that this is related to the attributes manipulation.
Lastly, I'm filing this as "kern" rather than "port-macppc" because
I don't see any obvious MD code in the syscall. However, it's
possible that this is still macppc-specific for some reason, in
which case this should be reclassified.
>How-To-Repeat:
On a macppc machine running current:
# cd /usr/tests/lib/libc/gen/posix_spawn
# atf-run t_spawnattr
Witness a machine lockup.
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->martin
Responsible-Changed-By: martin@NetBSD.org
Responsible-Changed-When: Sat, 13 Oct 2012 21:05:34 +0000
Responsible-Changed-Why:
take
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sat, 13 Oct 2012 23:06:22 +0200
Did this work with older kernels?
Martin
From: Julio Merino <julio@meroh.net>
To: gnats-bugs@netbsd.org
Cc: martin@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sat, 13 Oct 2012 17:40:45 -0400
On Sat, Oct 13, 2012 at 5:10 PM, Martin Husemann <martin@duskware.de> wrote:
> The following reply was made to PR kern/47069; it has been noted by GNATS.
>
> From: Martin Husemann <martin@duskware.de>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
> Date: Sat, 13 Oct 2012 23:06:22 +0200
>
> Did this work with older kernels?
How old? I don't recall any such problems at the beginning of August.
--
Julio Merino / @jmmv
From: Martin Husemann <martin@duskware.de>
To: Julio Merino <julio@meroh.net>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sat, 13 Oct 2012 23:44:41 +0200
Any old would do ;-)
So, since this code itself did not change in that timeframe, something
else must be causing it.
Could you bisect and narrow down the timeframe?
Martin
From: Julio Merino <julio@meroh.net>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sat, 13 Oct 2012 21:31:30 -0400
On Sat, Oct 13, 2012 at 5:44 PM, Martin Husemann <martin@duskware.de> wrote:
> Any old would do ;-)
> So, since this code itself did not change in that timeframe, something
> else must be causing it.
>
> Could you bisect and narrow down the timeframe?
Meh, it was non-obvious!
Seems like the problem was introduced on October 2nd between 01:40 and
01:50 GMT. The modified kernel files were:
P sys/compat/common/kern_time_50.c
P sys/compat/ibcs2/ibcs2_misc.c
P sys/compat/linux/common/linux_time.c
P sys/compat/linux32/common/linux32_time.c
P sys/compat/netbsd32/netbsd32_compat_50.c
P sys/compat/netbsd32/netbsd32_syscall.h
P sys/compat/netbsd32/netbsd32_syscallargs.h
P sys/compat/netbsd32/netbsd32_syscalls.c
P sys/compat/netbsd32/netbsd32_sysent.c
P sys/compat/netbsd32/netbsd32_time.c
P sys/compat/netbsd32/syscalls.master
P sys/kern/init_sysent.c
P sys/kern/kern_time.c
P sys/kern/syscalls.c
P sys/kern/syscalls.master
P sys/rump/include/rump/rump_syscalls.h
P sys/rump/librump/rumpkern/rump_syscalls.c
P sys/sys/param.h
P sys/sys/syscall.h
P sys/sys/syscallargs.h
P sys/sys/timevar.h
--
Julio Merino / @jmmv
From: Martin Husemann <martin@duskware.de>
To: Julio Merino <julio@meroh.net>
Cc: gnats-bugs@netbsd.org, christos@NetBSD.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sun, 14 Oct 2012 21:26:05 +0200
On Sat, Oct 13, 2012 at 09:31:30PM -0400, Julio Merino wrote:
> Meh, it was non-obvious!
>
> Seems like the problem was introduced on October 2nd between 01:40 and
> 01:50 GMT. The modified kernel files were:
The addition of clock_nanosleep(2)?
Very strange, christos?
Martin
From: christos@zoulas.com (Christos Zoulas)
To: Martin Husemann <martin@duskware.de>, Julio Merino <julio@meroh.net>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sun, 14 Oct 2012 15:44:02 -0400
On Oct 14, 9:26pm, martin@duskware.de (Martin Husemann) wrote:
-- Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
| On Sat, Oct 13, 2012 at 09:31:30PM -0400, Julio Merino wrote:
| > Meh, it was non-obvious!
| >
| > Seems like the problem was introduced on October 2nd between 01:40 and
| > 01:50 GMT. The modified kernel files were:
|
| The addition of clock_nanosleep(2)?
| Very strange, christos?
I tested it. Could it be the general mutex lossage on powerpc?
Does the posix_spawn test fail on other platforms?
christos
From: Martin Husemann <martin@duskware.de>
To: Christos Zoulas <christos@zoulas.com>
Cc: Julio Merino <julio@meroh.net>, gnats-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sun, 14 Oct 2012 21:47:00 +0200
On Sun, Oct 14, 2012 at 03:44:02PM -0400, Christos Zoulas wrote:
> I tested it. Could it be the general mutex lossage on powerpc?
> Does the posix_spawn test fail on other platforms?
Not on any I can currently easily test.
Martin
From: christos@zoulas.com (Christos Zoulas)
To: Martin Husemann <martin@duskware.de>
Cc: Julio Merino <julio@meroh.net>, gnats-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Sun, 14 Oct 2012 16:27:36 -0400
On Oct 14, 9:47pm, martin@duskware.de (Martin Husemann) wrote:
-- Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
| On Sun, Oct 14, 2012 at 03:44:02PM -0400, Christos Zoulas wrote:
| > I tested it. Could it be the general mutex lossage on powerpc?
| > Does the posix_spawn test fail on other platforms?
|
| Not on any I can currently easily test.
Try defining FULL ok kern_mutex.c for powerpc.
christos
From: Julio Merino <julio@meroh.net>
To: gnats-bugs@netbsd.org
Cc: martin@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Wed, 17 Oct 2012 20:16:33 -0400
On Sun, Oct 14, 2012 at 4:30 PM, Christos Zoulas <christos@zoulas.com> wrote:
> Try defining FULL ok kern_mutex.c for powerpc.
Nope, nothing. I'm assuming that you meant to just "#define FULL" in
kern_mutex.c? I also tried enabling LOCKDEBUG and DIAGNOSTIC.
However, I noticed one more data point: when the system locks up, one
of the t_spawnattr processes is in RUN state and the other in STOP
state. Both show 0K of memory in the SIZE and RES columns of top.
--
Julio Merino / @jmmv
From: christos@zoulas.com (Christos Zoulas)
To: Julio Merino <julio@meroh.net>, gnats-bugs@netbsd.org
Cc: martin@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Wed, 17 Oct 2012 20:21:52 -0400
On Oct 17, 8:16pm, julio@meroh.net (Julio Merino) wrote:
-- Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
| On Sun, Oct 14, 2012 at 4:30 PM, Christos Zoulas <christos@zoulas.com> wrote:
| > Try defining FULL ok kern_mutex.c for powerpc.
|
| Nope, nothing. I'm assuming that you meant to just "#define FULL" in
| kern_mutex.c? I also tried enabling LOCKDEBUG and DIAGNOSTIC.
Yes.
| However, I noticed one more data point: when the system locks up, one
| of the t_spawnattr processes is in RUN state and the other in STOP
| state. Both show 0K of memory in the SIZE and RES columns of top.
Fine, but the whole problem could be a mutex issue. Let's try this
one first.
christos
From: Julio Merino <julio@meroh.net>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@netbsd.org, martin@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Thu, 18 Oct 2012 18:29:30 -0400
On Wed, Oct 17, 2012 at 8:21 PM, Christos Zoulas <christos@zoulas.com> wrote:
> On Oct 17, 8:16pm, julio@meroh.net (Julio Merino) wrote:
> -- Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
>
> | On Sun, Oct 14, 2012 at 4:30 PM, Christos Zoulas <christos@zoulas.com> wrote:
> | > Try defining FULL ok kern_mutex.c for powerpc.
> |
> | Nope, nothing. I'm assuming that you meant to just "#define FULL" in
> | kern_mutex.c? I also tried enabling LOCKDEBUG and DIAGNOSTIC.
>
> Yes.
>
> | However, I noticed one more data point: when the system locks up, one
> | of the t_spawnattr processes is in RUN state and the other in STOP
> | state. Both show 0K of memory in the SIZE and RES columns of top.
>
> Fine, but the whole problem could be a mutex issue. Let's try this
> one first.
By "this one" you meant defining FULL, LOCKDEBUG or DIAGNOSTIC? I
already tried them all and none provided any further details :-/
--
Julio Merino / @jmmv
From: christos@zoulas.com (Christos Zoulas)
To: Julio Merino <julio@meroh.net>
Cc: gnats-bugs@netbsd.org, martin@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
Date: Thu, 18 Oct 2012 20:00:33 -0400
On Oct 18, 6:29pm, julio@meroh.net (Julio Merino) wrote:
-- Subject: Re: kern/47069: posix_spawn/t_spawnattr locks up the system
| By "this one" you meant defining FULL, LOCKDEBUG or DIAGNOSTIC? I
| already tried them all and none provided any further details :-/
FULL is not going to produce any output. It would hopefully fix the problem.
If not, there is something else going on.
christos
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.