NetBSD Problem Report #46147

From wiz@danbala.tuwien.ac.at  Tue Mar  6 09:01:16 2012
Return-Path: <wiz@danbala.tuwien.ac.at>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id BA1A863C426
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  6 Mar 2012 09:01:16 +0000 (UTC)
Message-Id: <20120306090120.2335B392184@danbala.tuwien.ac.at>
Date: Tue,  6 Mar 2012 10:01:20 +0100 (CET)
From: Thomas Klausner <wiz@NetBSD.org>
Reply-To: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@gnats.NetBSD.org
Subject: mono problem (pthread change result?)
X-Send-Pr-Version: 3.95

>Number:         46147
>Category:       lib
>Synopsis:       mono problem (pthread change result?)
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    joerg
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Mar 06 09:05:00 +0000 2012
>Closed-Date:    Fri Mar 16 17:21:53 +0000 2012
>Last-Modified:  Fri Mar 16 17:21:53 +0000 2012
>Originator:     Thomas Klausner
>Release:        NetBSD 6.99.3
>Organization:

>Environment:


Architecture: x86_64
Machine: amd64
>Description:

On 6.99.3/amd64, I have trouble with a few mono packages that built
fine on 5.99.64.

I've tried rebuilding mono, and this also fails now.
with CFLAGS=-g -O0 I get the following backtrace for the core dump during the build:
gmake[5]: Entering directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
/bin/sh .//mkinstalldirs build/deps
mkdir -p -- build/deps
touch build/deps/.stamp
gmake[6]: Entering directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
gmake[6]: gmcs: Command not found
gmake[6]: *** [build/deps/basic-profile-check.exe] Error 127
gmake[6]: Leaving directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
gmake[6]: Entering directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
*** The compiler 'gmcs' doesn't appear to be usable.
*** Trying the 'monolite' directory.
gmake[7]: Entering directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
gmake[8]: Entering directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
[1]   Abort trap (core dumped) MONO_PATH=".//cl...
gmake[8]: *** [build/deps/basic-profile-check.exe] Error 134
gmake[8]: Leaving directory `/scratch/lang/mono/work/mono-2.10.6/mcs'
..
# gdb ../mono/mini/mono mono.core
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /scratch/lang/mono/work/mono-2.10.6/mono/mini/mono...done.
[New process 1]
[New process 8]
[New process 7]
[New process 6]
[New process 5]
[New process 4]
[New process 3]
[New process 2]
Core was generated by `mono'.
Program terminated with signal 6, Aborted.
#0  0x00007f7ff70ed9da in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0  0x00007f7ff70ed9da in _lwp_kill () from /usr/lib/libc.so.12
#1  0x00007f7ff70ed312 in abort () at /archive/cvs/src/lib/libc/stdlib/abort.c:74
#2  0x00000000004d7de5 in mono_handle_native_sigsegv (signal=11, ctx=0x7f7fffffb3e0) at mini-exceptions.c:2245
#3  0x0000000000420685 in mono_sigsegv_signal_handler (_dummy=11, info=0x7f7fffffb360, context=0x7f7fffffb3e0) at mini.c:5848
#4  <signal handler called>
#5  GC_push_all_eager (bottom=0x7f7fffffb7e8 "�\377\377\177\177", top=0x7f8008000000 <Address 0x7f8008000000 out of bounds>) at mark.c:1468
#6  0x00000000006b3fa8 in GC_push_all_stack (bottom=0x7f7fffffb7e8 "�\377\377\177\177", top=0x7f8008000000 <Address 0x7f8008000000 out of bounds>) at mark.c:1521
#7  0x00000000006bbecd in pthread_push_all_stacks () at pthread_stop_world.c:297
#8  0x00000000006bbf49 in GC_push_all_stacks () at pthread_stop_world.c:332
#9  0x00000000006b71d2 in GC_default_push_other_roots () at os_dep.c:2255
#10 0x00000000006b53ac in GC_push_roots (all=1, cold_gc_frame=0x7f7fffffb8e4 "\177\177") at mark_rts.c:646
#11 0x00000000006b1dd7 in GC_mark_some (cold_gc_frame=0x7f7fffffb8e4 "\177\177") at mark.c:326
#12 0x00000000006abe0a in GC_stopped_mark (stop_func=0x6ab387 <GC_never_stop_func>) at alloc.c:543
#13 0x00000000006ab9eb in GC_try_to_collect_inner (stop_func=0x6ab387 <GC_never_stop_func>) at alloc.c:382
#14 0x00000000006b5d6e in GC_init_inner () at misc.c:807
#15 0x00000000006b596b in GC_init () at misc.c:517
#16 0x0000000000574bbc in mono_gc_base_init () at boehm-gc.c:126
#17 0x0000000000598d30 in mono_init_internal (filename=0x7f7fffffe428 ".//class/lib/monolite/mcs.exe", exe_filename=0x7f7fffffe428 ".//class/lib/monolite/mcs.exe", runtime_version=0x0) at domain.c:1286
#18 0x000000000059a0a1 in mono_init_from_assembly (domain_name=0x7f7fffffe428 ".//class/lib/monolite/mcs.exe", filename=0x7f7fffffe428 ".//class/lib/monolite/mcs.exe") at domain.c:1671
#19 0x000000000042140a in mini_init (filename=0x7f7fffffe428 ".//class/lib/monolite/mcs.exe", runtime_version=0x0) at mini.c:6321
#20 0x00000000004ad44b in mono_main (argc=7, argv=0x7f7fffffbdb8) at driver.c:1746
#21 0x0000000000412d8e in mono_main_with_options (argc=7, argv=0x7f7fffffbdb8) at main.c:66
#22 0x0000000000412dbe in main (argc=7, argv=0x7f7fffffbdb8) at main.c:97


Christos suggested:
pthread stack creation was changed in current.
I think joerg would be interested in looking at it.


Also, I need to limit firefox's stack size to ~300 (from the default 4096) to make it start.
Might this be caused by the same change?

The packages used as dependencies were built in a pbulk update build, mixed with packages
built around Feb 21.

For firefox, I've rebuilt xulrunner and firefox without a change.
>How-To-Repeat:
cd /usr/pkgsrc/lang/mono
make
>Fix:


>Release-Note:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: lib/46147: mono problem (pthread change result?)
Date: Tue, 6 Mar 2012 11:13:02 +0100

 We don't have any ATF tests exercising pthread_attr_getstack() - seems like
 it is time to add some...

 Martin

From: Thomas Klausner <wiz@NetBSD.org>
To: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Cc: Joerg Sonnenberger <joerg@britannica.bec.de>
Subject: Re: lib/46147: mono problem (pthread change result?)
Date: Tue, 6 Mar 2012 13:03:40 +0100

 drochner writes on current-users:
 > Yes, I'm getting this now after I did a kernel+userland update.
 > (Didn't touch any package.)
 > 
 > Rolled back libpthread to Mon Feb 27, now it works again.
 > So there is a regression in libpthread.

 This reduces the window by a few days.
  Thomas

Responsible-Changed-From-To: lib-bug-people->joerg
Responsible-Changed-By: joerg@NetBSD.org
Responsible-Changed-When: Tue, 06 Mar 2012 13:40:56 +0000
Responsible-Changed-Why:
Investigating what is going wrong


From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Tue, 6 Mar 2012 20:19:52 +0100

 If this code ever really worked, it is more by chance than by design.
 E.g. it doesn't even use the defined interface for finding the thread
 stack nor does it set one explicitly.

 Joerg

From: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Tue, 6 Mar 2012 22:37:32 +0100

 On Tue, Mar 06, 2012 at 07:20:04PM +0000, Joerg Sonnenberger wrote:
 >  If this code ever really worked, it is more by chance than by design.
 >  E.g. it doesn't even use the defined interface for finding the thread
 >  stack nor does it set one explicitly.

 You're talking about mono or firefox here?
  Thomas

From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Tue, 6 Mar 2012 16:55:02 -0500

 On Tue,  6 Mar 2012 21:40:04 +0000 (UTC)
 Thomas Klausner <wiz@NetBSD.org> wrote:

 > The following reply was made to PR lib/46147; it has been noted by GNATS.
 > 
 > From: Thomas Klausner <wiz@NetBSD.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: lib/46147 (mono problem (pthread change result?))
 > Date: Tue, 6 Mar 2012 22:37:32 +0100
 > 
 >  On Tue, Mar 06, 2012 at 07:20:04PM +0000, Joerg Sonnenberger wrote:
 >  >  If this code ever really worked, it is more by chance than by design.
 >  >  E.g. it doesn't even use the defined interface for finding the thread
 >  >  stack nor does it set one explicitly.
 >  
 >  You're talking about mono or firefox here?

 I also wondered: mono, or boehm-gc? :)

 Thanks,
 -- 
 Matt

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Tue, 6 Mar 2012 23:11:35 +0100

 On Tue, Mar 06, 2012 at 09:40:04PM +0000, Thomas Klausner wrote:
 > The following reply was made to PR lib/46147; it has been noted by GNATS.
 > 
 > From: Thomas Klausner <wiz@NetBSD.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: lib/46147 (mono problem (pthread change result?))
 > Date: Tue, 6 Mar 2012 22:37:32 +0100
 > 
 >  On Tue, Mar 06, 2012 at 07:20:04PM +0000, Joerg Sonnenberger wrote:
 >  >  If this code ever really worked, it is more by chance than by design.
 >  >  E.g. it doesn't even use the defined interface for finding the thread
 >  >  stack nor does it set one explicitly.
 >  
 >  You're talking about mono or firefox here?

 boehm-gc in mono.

 Joerg

State-Changed-From-To: open->feedback
State-Changed-By: joerg@NetBSD.org
State-Changed-When: Thu, 08 Mar 2012 23:50:53 +0000
State-Changed-Why:
According to my testing, this works now. Comments about boehm-gc only
working by accident still apply though.


From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Thu, 8 Mar 2012 20:15:24 -0500

 On Thu,  8 Mar 2012 23:50:54 +0000 (UTC)
 joerg@NetBSD.org wrote:

 > Synopsis: mono problem (pthread change result?)
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: joerg@NetBSD.org
 > State-Changed-When: Thu, 08 Mar 2012 23:50:53 +0000
 > State-Changed-Why:
 > According to my testing, this works now. Comments about boehm-gc only
 > working by accident still apply though.

 Do you think that anything using boehm-gc with threads currently faces
 the same problem as well?  I use ECL with it, built with threads
 enabled, and although it works pretty well, it's not 100% stable when I
 heavily test multithreaded applications (and that on both Linux and
 NetBSD).  There has been some work to reimplement ECL's locking system
 too though, and that's not considered stable either, but I use a custom
 implementation here which is more straightforward.

 But I'd very much like to know if boehm-gc itself is fundamentally
 broken in that area.

 Thanks,
 -- 
 Matt

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Fri, 9 Mar 2012 02:47:57 +0100

 On Fri, Mar 09, 2012 at 01:20:04AM +0000, Matthew Mondor wrote:
 >  > According to my testing, this works now. Comments about boehm-gc only
 >  > working by accident still apply though.
 >  
 >  Do you think that anything using boehm-gc with threads currently faces
 >  the same problem as well?  I use ECL with it, built with threads
 >  enabled, and although it works pretty well, it's not 100% stable when I
 >  heavily test multithreaded applications (and that on both Linux and
 >  NetBSD).  There has been some work to reimplement ECL's locking system
 >  too though, and that's not considered stable either, but I use a custom
 >  implementation here which is more straightforward.
 >  
 >  But I'd very much like to know if boehm-gc itself is fundamentally
 >  broken in that area.

 boehm-gc needs to find all references to objects on the thread stacks.
 For that, it should use pthread_attr_get_np and pthread_attr_getstack on
 NetBSD. This provides the bottom of the stack and the size, in combination
 with __builtin_frame_address(0), that can precisely identify the stack
 content. At the moment it does some guessing based on the stack stack
 size, assuming alignment of the stack that isn't guaranteed by the
 system.

 Joerg

From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Thu, 8 Mar 2012 20:55:40 -0500

 On Fri,  9 Mar 2012 01:50:04 +0000 (UTC)
 Joerg Sonnenberger <joerg@britannica.bec.de> wrote:

 > The following reply was made to PR lib/46147; it has been noted by GNATS.
 > 
 > From: Joerg Sonnenberger <joerg@britannica.bec.de>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: lib/46147 (mono problem (pthread change result?))
 > Date: Fri, 9 Mar 2012 02:47:57 +0100
 > 
 >  On Fri, Mar 09, 2012 at 01:20:04AM +0000, Matthew Mondor wrote:
 >  >  > According to my testing, this works now. Comments about boehm-gc only
 >  >  > working by accident still apply though.
 >  >  
 >  >  Do you think that anything using boehm-gc with threads currently faces
 >  >  the same problem as well?  I use ECL with it, built with threads
 >  >  enabled, and although it works pretty well, it's not 100% stable when I
 >  >  heavily test multithreaded applications (and that on both Linux and
 >  >  NetBSD).  There has been some work to reimplement ECL's locking system
 >  >  too though, and that's not considered stable either, but I use a custom
 >  >  implementation here which is more straightforward.
 >  >  
 >  >  But I'd very much like to know if boehm-gc itself is fundamentally
 >  >  broken in that area.
 >  
 >  boehm-gc needs to find all references to objects on the thread stacks.
 >  For that, it should use pthread_attr_get_np and pthread_attr_getstack on
 >  NetBSD. This provides the bottom of the stack and the size, in combination
 >  with __builtin_frame_address(0), that can precisely identify the stack
 >  content. At the moment it does some guessing based on the stack stack
 >  size, assuming alignment of the stack that isn't guaranteed by the
 >  system.

 Thanks for the details,
 -- 
 Matt

From: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Sat, 10 Mar 2012 14:39:30 +0100

 On Thu, Mar 08, 2012 at 11:50:54PM +0000, joerg@NetBSD.org wrote:
 > According to my testing, this works now. Comments about boehm-gc only
 > working by accident still apply though.

 Thank you!

 Firefox works fine now.
 mono hangs sometimes, but it's not segfaulting immediately like before, and I had hangs before.
 I can't tell yet if their occurrence has increased or not.

 So we can probably close this PR.
 However, see PR 46165 for further issues that might be caused by the same changes.
  Thomas

From: Thomas Klausner <wiz@NetBSD.org>
To: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Sun, 11 Mar 2012 14:05:51 +0100

 On Sat, Mar 10, 2012 at 01:40:07PM +0000, Thomas Klausner wrote:
 >  mono hangs sometimes, but it's not segfaulting immediately like before, and I had hangs before.
 >  I can't tell yet if their occurrence has increased or not.

 In my latest pbulk, mono built successfully, however gtk-sharp didn't:
 Making all in generator
 gmake[2]: Entering directory `/scratch/x11/gtk-sharp/work/gtk-sharp-2.12.10/generator'
 /usr/pkg/bin/mcs /out:gapi_codegen.exe -define:OFF_T_8  ./AliasGen.cs ./BoxedGen.cs ./ByRefGen.cs ./CallbackGen.cs ./ChildProperty.cs ./ClassBase.cs ./ClassGen.cs ./CodeGenerator.cs ./ConstFilenameGen.cs ./ConstStringGen.cs ./Ctor.cs ./EnumGen.cs ./FieldBase.cs ./GenBase.cs ./GenerationInfo.cs ./HandleBase.cs ./IAccessor.cs ./IGeneratable.cs ./IManualMarshaler.cs ./InterfaceGen.cs ./LPGen.cs ./LPUGen.cs ./ManagedCallString.cs ./ManualGen.cs ./MarshalGen.cs ./MethodBase.cs ./MethodBody.cs ./Method.cs ./ObjectField.cs ./ObjectBase.cs ./ObjectGen.cs ./OpaqueGen.cs ./Parameters.cs ./Parser.cs ./Property.cs ./PropertyBase.cs ./ReturnValue.cs ./Signal.cs ./Signature.cs ./SimpleBase.cs ./SimpleGen.cs ./Statistics.cs ./StructBase.cs ./StructField.cs ./StructGen.cs ./SymbolTable.cs ./VirtualMethod.cs ./VMSignature.cs
 Stacktrace:

 gmake[2]: *** [gapi_codegen.exe] Abort trap (core dumped)
 gmake[2]: Leaving directory `/scratch/x11/gtk-sharp/work/gtk-sharp-2.12.10/generator'
 gmake[1]: *** [all-recursive] Error 1

 So something's still not completely right (I didn't see this error before March).
  Thomas

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: lib/46147 (mono problem (pthread change result?))
Date: Sun, 11 Mar 2012 21:04:10 +0100

 On Sun, Mar 11, 2012 at 01:10:05PM +0000, Thomas Klausner wrote:
 >  In my latest pbulk, mono built successfully, however gtk-sharp didn't:

 Works for me. But as I said, the boehm-gc code makes broken assumptions
 about the stack, so it can easily fail due to environmental differences.

 Joerg

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: lib/46147
Date: Wed, 14 Mar 2012 15:34:54 +0000

 This looks like pkg/45828 - any comments on the i386 case?

State-Changed-From-To: feedback->closed
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Fri, 16 Mar 2012 17:21:53 +0000
State-Changed-Why:
In my latest build, gtk-sharp built; mono-addins hung. Retrying it worked.
So this looks like the trouble I had with mono before your changes.
Thank you!


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.