NetBSD Problem Report #35099

From stix@stix.id.au  Thu Nov 23 07:09:49 2006
Return-Path: <stix@stix.id.au>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 8E9F663B8C1
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 23 Nov 2006 07:09:49 +0000 (UTC)
Message-Id: <20061123070943.51D73119C@kitt.stix.org.au>
Date: Thu, 23 Nov 2006 18:09:41 +1100 (EST)
From: stix@stix.id.au
Reply-To: stix@stix.id.au
To: gnats-bugs@NetBSD.org
Subject: pthread programs core on m68k
X-Send-Pr-Version: 3.95

>Number:         35099
>Category:       port-m68k
>Synopsis:       pthread programs core on m68k
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-m68k-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Nov 23 07:10:00 +0000 2006
>Closed-Date:    Wed Mar 28 07:40:30 +0000 2007
>Last-Modified:  Wed Mar 28 07:40:30 +0000 2007
>Originator:     Paul Ripke
>Release:        NetBSD 4.99.4 (-current 20061122ish)
>Organization:
>Environment:


System: NetBSD kitt.stix.org.au 4.99.4 NetBSD 4.99.4 (KITT) #0: Tue Nov 21 22:16:31 EST 2006 stix@zion.stix.org.au:/export/netbsd/current/obj.mac68k/export/netbsd/current/src/sys/arch/mac68k/compile/KITT mac68k

Architecture: m68k
Machine: mac68k
>Description:
Many pthread programs get SIGILL after a while. They appear to need to
have > 1 LWP (ie. not just switching in userspace). Since named(8) is now
threaded, it regularly will die with a SIGILL.

>How-To-Repeat:

Using "fblckgen" from http://stix.id.au/wiki/iotools as a simple-ish test
(it only has two threads, for starters):

ksh$ PTHREAD_DEBUGLOG=1 time ./fblckgen -ab 4k -c 0 | cat > /dev/null
time: Command terminated abnormally.
       11.90 real         2.48 user         3.70 sys

The "cat" above is required to get NLWP>1. Unfortunately, gdb cores trying
to analyse the core:

ksh$ gdb fblckgen fblckgen.core 
GNU gdb 5.3nb1
...
Core was generated by `fblckgen'.
Program terminated with signal 4, Illegal instruction.
Reading symbols from /usr/lib/libpthread.so.0...done.
Loaded symbols for /usr/lib/libpthread.so.0
Reading symbols from /usr/lib/libc.so.12...done.
Loaded symbols for /usr/lib/libc.so.12
Reading symbols from /usr/libexec/ld.elf_so...done.
Loaded symbols for /usr/libexec/ld.elf_so
#0  0x049ffbe4 in ?? ()
(gdb) thr app all bt

Thread 3 (Thread 22 ()):
#0  0x04025284 in pthread__locked_switch () from /usr/lib/libpthread.so.0
#1  0x06bffb78 in ?? ()
Memory fault (core dumped)

The debuglog always ends the same (with different addresses):

ksh$ debuglog -k | tail -20
(up 0x4e00000) sigev val 88880020
(up 0x4e00000) switching to 0xffe00000 (uc: U 0xffffb200 pc: 4025284)
(recycle 0xffe00000) recycling 0x4e00000
(up 0x4e00000) type 5 LWP 2 ev 0 intr 1
(fi 0x4e00000) victim 2 0x6a00000(1) lockholder 1
(rl 0x4e00000) entered
(rl 0x4e00000) intqueue 0x6a00000
(rl 0x4e00000) victim 0x6a00000 (uc T 0x6bffb6c) normal spinlocks: 1
(rl 0x4e00000) starting chain 0x6a00000 (uc T 0x6bffb6c pc 4029d08 sp 6bfff6c)
(rl 0x4e00000) returned from chain
(rl 0x4e00000) intqueue 0x6a00000
(rl 0x4e00000) victim 0x6a00000 (uc U 0x6bffb78) normal heldlock: 0x6690 switchto: 0xffe00000 (uc 0xffffb200 pc 4025284)
(rl 0x4e00000) exiting
(up 0x4e00000) sigev val 88880020
(up 0x4e00000) switching to 0xffe00000 (uc: U 0xffffb200 pc: 4025284)
(recycle 0xffe00000) recycling 0x4e00000
(up 0x4e00000) type 2 LWP 3 ev 1 intr 0
(up 0x4e00000) blocker 2 0xffe00000(1)
(up 0x4e00000) switching to 0x6a00000 (uc: U 0x6bffb78 pc: 4025284)
(recycle 0x6a00000) recycling 0x4e00000

Previously, with what was tagged as netbsd-4, before gcc4, etc, gdb would
get the following out of the core:

Thread 3 (Thread 22 ()):
#0  0x04023174 in pthread__locked_switch () from /usr/lib/libpthread.so.0
#1  0x06bffb70 in ?? ()
#2  0x040283b2 in pthread_cond_wait () from /usr/lib/libpthread.so.0
#3  0x00003548 in makeBlocks (dummy=0x0) at fblckgen.c:234
#4  0x040296ec in pthread_create () from /usr/lib/libpthread.so.0

Thread 2 (LWP 1):
#0  0x040584c2 in write () from /usr/lib/libc.so.12
#1  0x04022fca in write () from /usr/lib/libpthread.so.0
#2  0x000031be in main (argc=65536, argv=0x0) at fblckgen.c:179

Thread 1 (LWP 2):
#0  0x049ffbe4 in ?? ()
#1  0x040283b2 in pthread_cond_wait () from /usr/lib/libpthread.so.0
#2  0x00003548 in makeBlocks (dummy=0x0) at fblckgen.c:234
#3  0x040296ec in pthread_create () from /usr/lib/libpthread.so.0
#0  0x049ffbe4 in ?? ()

Which is odd, since the process only has 2 pthreads. The address
0x049ffbe4 appears to be bogus, and different cores all feature
a similar address.

I believe this problem is already known, but I couldn't find a PR
specifically for this issue.

>Fix:

Unknown.

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: ad@netbsd.org
State-Changed-When: Wed, 21 Mar 2007 17:40:15 +0000
State-Changed-Why:
Are you willing to try again with -current? I believe the problem may be
fixed (assuming that mac68k is working well).


From: Paul Ripke <stix@stix.id.au>
To: gnats-bugs@NetBSD.org
Cc: port-m68k-maintainer@netbsd.org, netbsd-bugs@netbsd.org,
	gnats-admin@netbsd.org, ad@netbsd.org
Subject: Re: port-m68k/35099 (pthread programs core on m68k)
Date: Wed, 28 Mar 2007 15:03:06 +1000

 On Wed, Mar 21, 2007 at 05:40:17PM +0000, ad@netbsd.org wrote:
 > Are you willing to try again with -current? I believe the problem may be
 > fixed (assuming that mac68k is working well).

 Running -current from about 2007-03-22 05:00 UTC on a Quadra 605.
 Can't see any of the issues I was having with any of my pthread
 programs. I did get errors running "make test" on pkgsrc perl, tho.
 I'll have another go at that later.

 I also tested the same vintage -current on sparc, it also behaved
 much better with pthread programs (even host(1) was breaking
 previously). It also passed "make test" perl with flying colours.

 IMHO, this PR can be closed.

 -- 
 Paul Ripke

State-Changed-From-To: feedback->closed
State-Changed-By: martin@netbsd.org
State-Changed-When: Wed, 28 Mar 2007 07:40:30 +0000
State-Changed-Why:
Submitter confirms the problem is fixed


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.