NetBSD Problem Report #46274

From martin@duskware.de  Wed Mar 28 14:14:03 2012
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id D101E63BBEC
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 28 Mar 2012 14:14:03 +0000 (UTC)
Message-Id: <20120328141403.D101E63BBEC@www.NetBSD.org>
Date: Wed, 28 Mar 2012 14:14:03 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@gnats.NetBSD.org
Subject: sparc64 running netbsd 32bit code causes a lot of cores
X-Send-Pr-Version: 3.95

>Number:         46274
>Category:       port-sparc64
>Synopsis:       sparc64 running netbsd 32bit code causes a lot of cores
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    martin
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Mar 28 14:15:00 +0000 2012
>Closed-Date:    Mon Sep 26 20:26:18 +0000 2016
>Last-Modified:  Mon Sep 26 20:26:18 +0000 2016
>Originator:     Martin Husemann
>Release:        NetBSD 6.99.4
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD thirdstage.duskware.de 6.99.4 NetBSD 6.99.4 (MODULAR) #20: Mon Mar 26 12:42:40 CEST 2012 martin@night-porter.duskware.de:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
Architecture: sparc64
Machine: sparc64
>Description:

When doing a chroot to a sparc install on sparc64-current many programs
die unexpectedly.

An example backtrace (though they seem to slightly differ):

#0  0x40155fa8 in __libc_thr_getspecific_stub (k=0)
    at /usr/src/lib/libc/thread-stub/thread-stub.c:321
#1  0x401dcd98 in choose_arena () at /usr/src/lib/libc/stdlib/jemalloc.c:1573
#2  imalloc (size=508) at /usr/src/lib/libc/stdlib/jemalloc.c:2988
#3  0x401dce14 in malloc (size=508)
    at /usr/src/lib/libc/stdlib/jemalloc.c:3702
#4  0x0001fda0 in ckmalloc ()
#5  0x0001fe74 in stalloc ()
#6  0x000200a0 in growstackblock ()
#7  0x00015e34 in padvance ()
#8  0x00016014 in shellexec ()
#9  0x00014f10 in evalcommand ()
#10 0x000141e4 in evaltree ()
#11 0x0001f6d0 in cmdloop ()
#12 0x0001f954 in main ()

The crash is here:
   0x40155f84 <__libc_thr_getspecific_stub>:    save  %sp, -96, %sp
   0x40155f88 <__libc_thr_getspecific_stub+4>:  sethi  %hi(0xfffff000), %g1
   0x40155f8c <__libc_thr_getspecific_stub+8>:  add  %i0, %i0, %g2
   0x40155f90 <__libc_thr_getspecific_stub+12>: add  %g2, %i0, %g2
   0x40155f94 <__libc_thr_getspecific_stub+16>: or  %g1, 0xbc, %g1
   0x40155f98 <__libc_thr_getspecific_stub+20>: sethi  %hi(0x113000), %l7
   0x40155f9c <__libc_thr_getspecific_stub+24>: 
    call  0x40150ca8 <__sparc_get_pc_thunk.l7>
   0x40155fa0 <__libc_thr_getspecific_stub+28>: 
    add  %l7, 0x204, %l7        ! 0x113204
   0x40155fa4 <__libc_thr_getspecific_stub+32>: sll  %g2, 2, %g2
=> 0x40155fa8 <__libc_thr_getspecific_stub+36>: ld  [ %l7 + %g1 ], %g1
   0x40155fac <__libc_thr_getspecific_stub+40>: ld  [ %g1 + %g2 ], %i0
   0x40155fb0 <__libc_thr_getspecific_stub+44>: ret 
   0x40155fb4 <__libc_thr_getspecific_stub+48>: restore 
   0x40155fb8 <__libc_thr_keydelete_stub>:      save  %sp, -96, %sp
   0x40155fbc <__libc_thr_keydelete_stub+4>:    sethi  %hi(0xfffff000), %g1
   0x40155fc0 <__libc_thr_keydelete_stub+8>:    add  %i0, %i0, %g2

g1             0xfffff0bc       -3908
g2             0x0              0
l7             0x402691a0       1076269472

This all looks sane and very similar to the values we get when running the
same code with a 32bit kernel (which works just fine).

>How-To-Repeat:

   mount $(sparc-disk) /mnt
   chroot /mnt /bin/sh
   cd usr/tests
   atf-run | atf-report

This will stop right away with sh.core files from the shell trying to set
up the atf pipeline.

>Fix:
n/a

>Release-Note:

>Audit-Trail:
From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-sparc64/46274: sparc64 running netbsd 32bit code causes a
 lot of cores
Date: Wed, 28 Mar 2012 17:47:35 -0400

 On Wed, 28 Mar 2012 14:15:01 +0000 (UTC)
 martin@NetBSD.org wrote:

 > #0  0x40155fa8 in __libc_thr_getspecific_stub (k=0)
 >     at /usr/src/lib/libc/thread-stub/thread-stub.c:321

 Looks very much like a problem in the recent TLS support code
 -- 
 Matt

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-sparc64/46274: sparc64 running netbsd 32bit code causes a lot of cores
Date: Thu, 29 Mar 2012 01:35:23 +0200

 I trapped the kernel when sending the signal and verified the faulting
 address. In this case it was 0x140268087 (clearly not mapped anywhere),
 where 0x40268087 would be fine and probably what the process tries to
 access.

 But I am not sure where to catch and fix it (nor where the pc-relative-address
 stub is generated).

 Should locore.s/trap.c truncate the fault address for 32bit processes
 before calling uvm_fault? Should uvm deal? Is the pc-relative addressing
 stub bogus?

 Martin

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: port-sparc64-maintainer@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: re: port-sparc64/46274: sparc64 running netbsd 32bit code causes a lot of cores
Date: Thu, 29 Mar 2012 14:58:48 +1100

 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: port-sparc64/46274: sparc64 running netbsd 32bit code causes a lot of cores
 > Date: Thu, 29 Mar 2012 01:35:23 +0200
 > 
 >  I trapped the kernel when sending the signal and verified the faulting
 >  address. In this case it was 0x140268087 (clearly not mapped anywhere),
 >  where 0x40268087 would be fine and probably what the process tries to
 >  access.

 nice catch.

 >  But I am not sure where to catch and fix it (nor where the pc-relative-address
 >  stub is generated).
 >  
 >  Should locore.s/trap.c truncate the fault address for 32bit processes
 >  before calling uvm_fault? Should uvm deal? Is the pc-relative addressing
 >  stub bogus?

 can you explain this a little more?  where in the code have you
 confirmed this, etc?

 i'd guess that we should mask addresses somewhere, but i'm more
 curious why it isn't happening already.

 i don't think uvm should have to deal with this.  we're not
 feeding it anything valid.  we're feeding it addresses outside
 of VM_*USER addresses.


 .mrg.

From: Martin Husemann <martin@duskware.de>
To: matthew green <mrg@eterna.com.au>
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-sparc64/46274: sparc64 running netbsd 32bit code causes a lot of cores
Date: Thu, 29 Mar 2012 12:12:39 +0200

 On Thu, Mar 29, 2012 at 02:58:48PM +1100, matthew green wrote:
 > i'd guess that we should mask addresses somewhere, but i'm more
 > curious why it isn't happening already.

 No, certainly not "after the deed" (it was too late last night, I guess).
 I will dig further to find where this originates - I guess we somewhere
 deliver values differently to userland with 64bit kernels and userland
 calculates the addresses based on that.

 Slightly tricky to find if you can't invoke ktrace (because the invoking
 shell already dies). More kernel debugging needed...

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-sparc64/46274: sparc64 running netbsd 32bit code causes a lot of cores
Date: Thu, 29 Mar 2012 15:56:47 +0200

 I don't know if this is related, but I'll document it here anyway:

 when trying to boot a full 32bit install with a 64bit kernel, the exec
 of /sbin/init fails with a strange data fault as well:

 root file system type: ffs
 init: copying out path `/sbin/init' 11
 vmcmds 7
 vmcmd[0] = vmcmd_map_pagedvn 0x10000/0x6000 fd@0 prot=05 flags=4
 vmcmd[1] = vmcmd_map_readvn 0x24000/0x698 fd@0x4000 prot=07 flags=4
 vmcmd[2] = vmcmd_map_pagedvn 0x40030000/0x12000 fd@0 prot=05 flags=2
 vmcmd[3] = vmcmd_map_zero 0x12000/0x10000 fd@0 prot=00 flags=1
 vmcmd[4] = vmcmd_map_readvn 0x22000/0xbe0 fd@0x12000 prot=03 flags=1
 vmcmd[5] = vmcmd_map_zero 0xff7fe000/0x600000 fd@0 prot=00 flags=8
 vmcmd[6] = vmcmd_map_zero 0xffdfe000/0x200000 fd@0 prot=03 flags=8
 execve_runproc finished
 panic: System process (pid 1) got sig 11

 Stopped in pid 1.1 (init) at    netbsd:cpu_Debugger+0x4:        nop
 db{1}> bt
 panic(1290078, 1, b, 400, 1000000, 2) at netbsd:panic+0x24
 issignal(3ce9800, 0, 3ce9800, 0, 0, b) at netbsd:issignal+0x3d4
 lwp_userret(3ce9800, 128dba8, 3d01c78, 8, 12a53c8, 127d930) at netbsd:lwp_userre
 t+0x200
 data_access_fault(5ac8fed0, 30, 40032a5c, 140052002, 140052814, 800801) at netbs
 d:data_access_fault+0x798

 Note that the faulting adress (0x40032a5c) should be properly mapped
 by vmcmd[2], backed by a vnode, and should just have been paged in.
 This happens before any syscall from init happens, AFAICT.

 Martin

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: port-sparc64-maintainer@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: re: port-sparc64/46274: sparc64 running netbsd 32bit code causes a lot of cores
Date: Fri, 30 Mar 2012 10:57:58 +1100

 have you tried SYSCALL_DEBUG?


 .mrg.

From: Martin Husemann <martin@homeworld.netbsd.org>
To: matthew green <mrg@eterna.com.au>
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-sparc64/46274: sparc64 running netbsd 32bit code causes a
 lot of cores
Date: Fri, 30 Mar 2012 08:31:13 +0000

 On Fri, Mar 30, 2012 at 10:57:58AM +1100, matthew green wrote:
 > 
 > 
 > have you tried SYSCALL_DEBUG?

 Yes - so I should rephrase: "unless I did something stupid or SYSCALL_DEBUG
 is missing something, there are no syscalls from the init process before
 this happens".

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-sparc64/46274: sparc64 running netbsd 32bit code causes a lot of cores
Date: Fri, 30 Mar 2012 11:44:05 +0200

 Side note: a 64bit kernel using the staticaly linked (32bit) /rescue/init
 gets as far as the chroot environment and runs into the same problems
 then.

 Martin

Responsible-Changed-From-To: port-sparc64-maintainer->martin
Responsible-Changed-By: martin@NetBSD.org
Responsible-Changed-When: Fri, 06 Nov 2015 20:59:32 +0000
Responsible-Changed-Why:
I fixed it


State-Changed-From-To: open->pending-pullups
State-Changed-By: martin@NetBSD.org
State-Changed-When: Fri, 06 Nov 2015 20:59:32 +0000
State-Changed-Why:
Waiting on pullup-7 #1028; pullup-6 #1343


State-Changed-From-To: pending-pullups->closed
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Mon, 26 Sep 2016 20:26:18 +0000
State-Changed-Why:
pulled up recently. thanks!


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.