NetBSD Problem Report #46147
From wiz@danbala.tuwien.ac.at Tue Mar 6 09:01:16 2012
Return-Path: <wiz@danbala.tuwien.ac.at>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
by www.NetBSD.org (Postfix) with ESMTP id BA1A863C426
for <gnats-bugs@gnats.NetBSD.org>; Tue, 6 Mar 2012 09:01:16 +0000 (UTC)
Message-Id: <20120306090120.2335B392184@danbala.tuwien.ac.at>
Date: Tue, 6 Mar 2012 10:01:20 +0100 (CET)
From: Thomas Klausner <wiz@NetBSD.org>
Reply-To: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@gnats.NetBSD.org
Subject: mono problem (pthread change result?)
X-Send-Pr-Version: 3.95
>Number: 46147
>Category: lib
>Synopsis: mono problem (pthread change result?)
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: joerg
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Mar 06 09:05:00 +0000 2012
>Closed-Date: Fri Mar 16 17:21:53 +0000 2012
>Last-Modified: Fri Mar 16 17:21:53 +0000 2012
>Originator: Thomas Klausner
>Release: NetBSD 6.99.3
>Organization:
>Environment:
Architecture: x86_64
Machine: amd64
>Description:
On 6.99.3/amd64, I have trouble with a few mono packages that built
fine on 5.99.64.
I've tried rebuilding mono, and this also fails now.
with CFLAGS=-g -O0 I get the following backtrace for the core dump during the build:
gmake[5]: Entering directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
/bin/sh .//mkinstalldirs build/deps
mkdir -p -- build/deps
touch build/deps/.stamp
gmake[6]: Entering directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
gmake[6]: gmcs: Command not found
gmake[6]: *** [build/deps/basic-profile-check.exe] Error 127
gmake[6]: Leaving directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
gmake[6]: Entering directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
*** The compiler 'gmcs' doesn't appear to be usable.
*** Trying the 'monolite' directory.
gmake[7]: Entering directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
gmake[8]: Entering directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
[1] Abort trap (core dumped) MONO_PATH=".//cl...
gmake[8]: *** [build/deps/basic-profile-check.exe] Error 134
gmake[8]: Leaving directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
..
# gdb ../mono/mini/mono mono.core
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /scratch/lang/mono/work/mono-2.10.6/mono/mini/mono...done.
[New process 1]
[New process 8]
[New process 7]
[New process 6]
[New process 5]
[New process 4]
[New process 3]
[New process 2]
Core was generated by `mono'.
Program terminated with signal 6, Aborted.
#0 0x00007f7ff70ed9da in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0 0x00007f7ff70ed9da in _lwp_kill () from /usr/lib/libc.so.12
#1 0x00007f7ff70ed312 in abort () at /archive/cvs/src/lib/libc/stdlib/abort.c:74
#2 0x00000000004d7de5 in mono_handle_native_sigsegv (signal=11, ctx=0x7f7fffffb3e0) at mini-exceptions.c:2245
#3 0x0000000000420685 in mono_sigsegv_signal_handler (_dummy=11, info=0x7f7fffffb360, context=0x7f7fffffb3e0) at mini.c:5848
#4 <signal handler called>
#5 GC_push_all_eager (bottom=0x7f7fffffb7e8 "�\377\377\177\177", top=0x7f8008000000 <Address 0x7f8008000000 out of bounds>) at mark.c:1468
#6 0x00000000006b3fa8 in GC_push_all_stack (bottom=0x7f7fffffb7e8 "�\377\377\177\177", top=0x7f8008000000 <Address 0x7f8008000000 out of bounds>) at mark.c:1521
#7 0x00000000006bbecd in pthread_push_all_stacks () at pthread_stop_world.c:297
#8 0x00000000006bbf49 in GC_push_all_stacks () at pthread_stop_world.c:332
#9 0x00000000006b71d2 in GC_default_push_other_roots () at os_dep.c:2255
#10 0x00000000006b53ac in GC_push_roots (all=1, cold_gc_frame=0x7f7fffffb8e4 "\177\177") at mark_rts.c:646
#11 0x00000000006b1dd7 in GC_mark_some (cold_gc_frame=0x7f7fffffb8e4 "\177\177") at mark.c:326
#12 0x00000000006abe0a in GC_stopped_mark (stop_func=0x6ab387 <GC_never_stop_func>) at alloc.c:543
#13 0x00000000006ab9eb in GC_try_to_collect_inner (stop_func=0x6ab387 <GC_never_stop_func>) at alloc.c:382
#14 0x00000000006b5d6e in GC_init_inner () at misc.c:807
#15 0x00000000006b596b in GC_init () at misc.c:517
#16 0x0000000000574bbc in mono_gc_base_init () at boehm-gc.c:126
#17 0x0000000000598d30 in mono_init_internal (filename=0x7f7fffffe428 ".//class/lib/monolite/mcs.exe", exe_filename=0x7f7fffffe428 ".//class/lib/monolite/mcs.exe", runtime_version=0x0) at domain.c:1286
#18 0x000000000059a0a1 in mono_init_from_assembly (domain_name=0x7f7fffffe428 ".//class/lib/monolite/mcs.exe", filename=0x7f7fffffe428 ".//class/lib/monolite/mcs.exe") at domain.c:1671
#19 0x000000000042140a in mini_init (filename=0x7f7fffffe428 ".//class/lib/monolite/mcs.exe", runtime_version=0x0) at mini.c:6321
#20 0x00000000004ad44b in mono_main (argc=7, argv=0x7f7fffffbdb8) at driver.c:1746
#21 0x0000000000412d8e in mono_main_with_options (argc=7, argv=0x7f7fffffbdb8) at main.c:66
#22 0x0000000000412dbe in main (argc=7, argv=0x7f7fffffbdb8) at main.c:97
Christos suggested:
pthread stack creation was changed in current.
I think joerg would be interested in looking at it.
Also, I need to limit firefox's stack size to ~300 (from the default 4096) to make it start.
Might this be caused by the same change?
The packages used as dependencies were built in a pbulk update build, mixed with packages
built around Feb 21.
For firefox, I've rebuilt xulrunner and firefox without a change.
>How-To-Repeat:
cd /usr/pkgsrc/lang/mono
make
>Fix:
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/46147: mono problem (pthread change result?)
Date: Tue, 6 Mar 2012 11:13:02 +0100
We don't have any ATF tests exercising pthread_attr_getstack() - seems like
it is time to add some...
Martin
From: Thomas Klausner <wiz@NetBSD.org>
To: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Cc: Joerg Sonnenberger <joerg@britannica.bec.de>
Subject: Re: lib/46147: mono problem (pthread change result?)
Date: Tue, 6 Mar 2012 13:03:40 +0100
drochner writes on current-users:
> Yes, I'm getting this now after I did a kernel+userland update.
> (Didn't touch any package.)
>
> Rolled back libpthread to Mon Feb 27, now it works again.
> So there is a regression in libpthread.
This reduces the window by a few days.
Thomas
Responsible-Changed-From-To: lib-bug-people->joerg
Responsible-Changed-By: joerg@NetBSD.org
Responsible-Changed-When: Tue, 06 Mar 2012 13:40:56 +0000
Responsible-Changed-Why:
Investigating what is going wrong
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Tue, 6 Mar 2012 20:19:52 +0100
If this code ever really worked, it is more by chance than by design.
E.g. it doesn't even use the defined interface for finding the thread
stack nor does it set one explicitly.
Joerg
From: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Tue, 6 Mar 2012 22:37:32 +0100
On Tue, Mar 06, 2012 at 07:20:04PM +0000, Joerg Sonnenberger wrote:
> If this code ever really worked, it is more by chance than by design.
> E.g. it doesn't even use the defined interface for finding the thread
> stack nor does it set one explicitly.
You're talking about mono or firefox here?
Thomas
From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Tue, 6 Mar 2012 16:55:02 -0500
On Tue, 6 Mar 2012 21:40:04 +0000 (UTC)
Thomas Klausner <wiz@NetBSD.org> wrote:
> The following reply was made to PR lib/46147; it has been noted by GNATS.
>
> From: Thomas Klausner <wiz@NetBSD.org>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: lib/46147 (mono problem (pthread change result?))
> Date: Tue, 6 Mar 2012 22:37:32 +0100
>
> On Tue, Mar 06, 2012 at 07:20:04PM +0000, Joerg Sonnenberger wrote:
> > If this code ever really worked, it is more by chance than by design.
> > E.g. it doesn't even use the defined interface for finding the thread
> > stack nor does it set one explicitly.
>
> You're talking about mono or firefox here?
I also wondered: mono, or boehm-gc? :)
Thanks,
--
Matt
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Tue, 6 Mar 2012 23:11:35 +0100
On Tue, Mar 06, 2012 at 09:40:04PM +0000, Thomas Klausner wrote:
> The following reply was made to PR lib/46147; it has been noted by GNATS.
>
> From: Thomas Klausner <wiz@NetBSD.org>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: lib/46147 (mono problem (pthread change result?))
> Date: Tue, 6 Mar 2012 22:37:32 +0100
>
> On Tue, Mar 06, 2012 at 07:20:04PM +0000, Joerg Sonnenberger wrote:
> > If this code ever really worked, it is more by chance than by design.
> > E.g. it doesn't even use the defined interface for finding the thread
> > stack nor does it set one explicitly.
>
> You're talking about mono or firefox here?
boehm-gc in mono.
Joerg
State-Changed-From-To: open->feedback
State-Changed-By: joerg@NetBSD.org
State-Changed-When: Thu, 08 Mar 2012 23:50:53 +0000
State-Changed-Why:
According to my testing, this works now. Comments about boehm-gc only
working by accident still apply though.
From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Thu, 8 Mar 2012 20:15:24 -0500
On Thu, 8 Mar 2012 23:50:54 +0000 (UTC)
joerg@NetBSD.org wrote:
> Synopsis: mono problem (pthread change result?)
>
> State-Changed-From-To: open->feedback
> State-Changed-By: joerg@NetBSD.org
> State-Changed-When: Thu, 08 Mar 2012 23:50:53 +0000
> State-Changed-Why:
> According to my testing, this works now. Comments about boehm-gc only
> working by accident still apply though.
Do you think that anything using boehm-gc with threads currently faces
the same problem as well? I use ECL with it, built with threads
enabled, and although it works pretty well, it's not 100% stable when I
heavily test multithreaded applications (and that on both Linux and
NetBSD). There has been some work to reimplement ECL's locking system
too though, and that's not considered stable either, but I use a custom
implementation here which is more straightforward.
But I'd very much like to know if boehm-gc itself is fundamentally
broken in that area.
Thanks,
--
Matt
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Fri, 9 Mar 2012 02:47:57 +0100
On Fri, Mar 09, 2012 at 01:20:04AM +0000, Matthew Mondor wrote:
> > According to my testing, this works now. Comments about boehm-gc only
> > working by accident still apply though.
>
> Do you think that anything using boehm-gc with threads currently faces
> the same problem as well? I use ECL with it, built with threads
> enabled, and although it works pretty well, it's not 100% stable when I
> heavily test multithreaded applications (and that on both Linux and
> NetBSD). There has been some work to reimplement ECL's locking system
> too though, and that's not considered stable either, but I use a custom
> implementation here which is more straightforward.
>
> But I'd very much like to know if boehm-gc itself is fundamentally
> broken in that area.
boehm-gc needs to find all references to objects on the thread stacks.
For that, it should use pthread_attr_get_np and pthread_attr_getstack on
NetBSD. This provides the bottom of the stack and the size, in combination
with __builtin_frame_address(0), that can precisely identify the stack
content. At the moment it does some guessing based on the stack stack
size, assuming alignment of the stack that isn't guaranteed by the
system.
Joerg
From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Thu, 8 Mar 2012 20:55:40 -0500
On Fri, 9 Mar 2012 01:50:04 +0000 (UTC)
Joerg Sonnenberger <joerg@britannica.bec.de> wrote:
> The following reply was made to PR lib/46147; it has been noted by GNATS.
>
> From: Joerg Sonnenberger <joerg@britannica.bec.de>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: lib/46147 (mono problem (pthread change result?))
> Date: Fri, 9 Mar 2012 02:47:57 +0100
>
> On Fri, Mar 09, 2012 at 01:20:04AM +0000, Matthew Mondor wrote:
> > > According to my testing, this works now. Comments about boehm-gc only
> > > working by accident still apply though.
> >
> > Do you think that anything using boehm-gc with threads currently faces
> > the same problem as well? I use ECL with it, built with threads
> > enabled, and although it works pretty well, it's not 100% stable when I
> > heavily test multithreaded applications (and that on both Linux and
> > NetBSD). There has been some work to reimplement ECL's locking system
> > too though, and that's not considered stable either, but I use a custom
> > implementation here which is more straightforward.
> >
> > But I'd very much like to know if boehm-gc itself is fundamentally
> > broken in that area.
>
> boehm-gc needs to find all references to objects on the thread stacks.
> For that, it should use pthread_attr_get_np and pthread_attr_getstack on
> NetBSD. This provides the bottom of the stack and the size, in combination
> with __builtin_frame_address(0), that can precisely identify the stack
> content. At the moment it does some guessing based on the stack stack
> size, assuming alignment of the stack that isn't guaranteed by the
> system.
Thanks for the details,
--
Matt
From: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Sat, 10 Mar 2012 14:39:30 +0100
On Thu, Mar 08, 2012 at 11:50:54PM +0000, joerg@NetBSD.org wrote:
> According to my testing, this works now. Comments about boehm-gc only
> working by accident still apply though.
Thank you!
Firefox works fine now.
mono hangs sometimes, but it's not segfaulting immediately like before, and I had hangs before.
I can't tell yet if their occurrence has increased or not.
So we can probably close this PR.
However, see PR 46165 for further issues that might be caused by the same changes.
Thomas
From: Thomas Klausner <wiz@NetBSD.org>
To: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Sun, 11 Mar 2012 14:05:51 +0100
On Sat, Mar 10, 2012 at 01:40:07PM +0000, Thomas Klausner wrote:
> mono hangs sometimes, but it's not segfaulting immediately like before, and I had hangs before.
> I can't tell yet if their occurrence has increased or not.
In my latest pbulk, mono built successfully, however gtk-sharp didn't:
Making all in generator
gmake[2]: Entering directory `/scratch/x11/gtk-sharp/work/gtk-sharp-2.12.10/generator'
/usr/pkg/bin/mcs /out:gapi_codegen.exe -define:OFF_T_8 ./AliasGen.cs ./BoxedGen.cs ./ByRefGen.cs ./CallbackGen.cs ./ChildProperty.cs ./ClassBase.cs ./ClassGen.cs ./CodeGenerator.cs ./ConstFilenameGen.cs ./ConstStringGen.cs ./Ctor.cs ./EnumGen.cs ./FieldBase.cs ./GenBase.cs ./GenerationInfo.cs ./HandleBase.cs ./IAccessor.cs ./IGeneratable.cs ./IManualMarshaler.cs ./InterfaceGen.cs ./LPGen.cs ./LPUGen.cs ./ManagedCallString.cs ./ManualGen.cs ./MarshalGen.cs ./MethodBase.cs ./MethodBody.cs ./Method.cs ./ObjectField.cs ./ObjectBase.cs ./ObjectGen.cs ./OpaqueGen.cs ./Parameters.cs ./Parser.cs ./Property.cs ./PropertyBase.cs ./ReturnValue.cs ./Signal.cs ./Signature.cs ./SimpleBase.cs ./SimpleGen.cs ./Statistics.cs ./StructBase.cs ./StructField.cs ./StructGen.cs ./SymbolTable.cs ./VirtualMethod.cs ./VMSignature.cs
Stacktrace:
gmake[2]: *** [gapi_codegen.exe] Abort trap (core dumped)
gmake[2]: Leaving directory `/scratch/x11/gtk-sharp/work/gtk-sharp-2.12.10/generator'
gmake[1]: *** [all-recursive] Error 1
So something's still not completely right (I didn't see this error before March).
Thomas
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Sun, 11 Mar 2012 21:04:10 +0100
On Sun, Mar 11, 2012 at 01:10:05PM +0000, Thomas Klausner wrote:
> In my latest pbulk, mono built successfully, however gtk-sharp didn't:
Works for me. But as I said, the boehm-gc code makes broken assumptions
about the stack, so it can easily fail due to environmental differences.
Joerg
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: lib/46147
Date: Wed, 14 Mar 2012 15:34:54 +0000
This looks like pkg/45828 - any comments on the i386 case?
State-Changed-From-To: feedback->closed
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Fri, 16 Mar 2012 17:21:53 +0000
State-Changed-Why:
In my latest build, gtk-sharp built; mono-addins hung. Retrying it worked.
So this looks like the trouble I had with mono before your changes.
Thank you!
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.