NetBSD Problem Report #54953
From tsutsui@ceres.dti.ne.jp Mon Feb 10 17:22:55 2020
Return-Path: <tsutsui@ceres.dti.ne.jp>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 0B7291A9213
for <gnats-bugs@gnats.NetBSD.org>; Mon, 10 Feb 2020 17:22:55 +0000 (UTC)
Message-Id: <202002101722.01AHMiBQ023682@ceres.dti.ne.jp>
Date: Tue, 11 Feb 2020 02:22:44 +0900 (JST)
From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Reply-To: tsutsui@ceres.dti.ne.jp
To: gnats-bugs@NetBSD.org
Cc: tsutsui@ceres.dti.ne.jp
Subject: 5.0 binaries on 9.0_RC2 macppc dumps core in jemalloc(3)
X-Send-Pr-Version: 3.95
>Number: 54953
>Category: port-macppc
>Synopsis: 5.0 binaries on 9.0_RC2 macppc dumps core in jemalloc(3)
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: port-macppc-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Feb 10 17:25:00 +0000 2020
>Last-Modified: Wed Feb 12 14:15:01 +0000 2020
>Originator: Izumi Tsutsui
>Release: NetBSD 9.0_RC2
>Organization:
>Environment:
System: NetBSD 9.0_RC2 macppc (GENERIC)
Architecture: powerpc
Machine: macppc (probably affects all powerpc ports?)
>Description:
Many NetBSD/macppc 5.0 binaries (probably using malloc(3)) gets
kernel trap and dumps core on NetBSD/macppc 9.0_RC2.
5.99.10 binary (tcsh) also dumps core.
- 6.0 binaries work without problem.
- The 5.99.10 tcsh worked on 8.1.
>How-To-Repeat:
# uname -a
NetBSD lancer 9.0_RC2 NetBSD 9.0_RC2 (GENERIC) #5: Tue Feb 11 00:45:10 JST 2020 tsutsui@mirage:/s/netbsd-9/src/sys/arch/macppc/compile/GENERIC macppc
# file 5.0/bin/test
5.0/bin/test: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), dynamically linked, interpreter /libexec/ld.elf_so, for NetBSD 5.0, not stripped
# ldd 5.0/bin/test
5.0/bin/test:
-lc.12 => /lib/libc.so.12
# 5.0/bin/test
[ 5360.1396803] trap: pid 766.1 (test): user read DSI trap @ 0x1802d90 by 0xfddaae50 (DSISR 0x40000000, err=14)
[1] Segmentation fault (core dumped) 5.0/bin/test
# gdb 5.0/bin/test test.core
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "powerpc--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from 5.0/bin/test...
(No debugging symbols found in 5.0/bin/test)
[New process 1]
Core was generated by `test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xfddaae50 in malloc () from /lib/libc.so.12
(gdb) bt
#0 0xfddaae50 in malloc () from /lib/libc.so.12
#1 0xfddff4e4 in __setlocale () from /lib/libc.so.12
#2 0xfdd27698 in __setlocale_mb_len_max_32 () from /lib/libc.so.12
#3 0x018018dc in main ()
(gdb)
---
# file /usr/local/bin/tcsh
/usr/local/bin/tcsh: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), dynamically linked, interpreter /usr/libexec/ld.elf_so, for NetBSD 5.99.10, stripped
# ldd /usr/local/bin/tcsh
/usr/local/bin/tcsh:
-ltermcap.0 => /usr/lib/libtermcap.so.0
-lc.12 => /usr/lib/libc.so.12
-lcrypt.1 => /usr/lib/libcrypt.so.1
# /usr/local/bin/tcsh
[ 5565.0401313] trap: pid 359.1 (tcsh): user read DSI trap @ 0x18544e0 by 0xfdd77f50 (DSISR 0x40000000, err=14)
[1] Segmentation fault (core dumped) /usr/local/bin/tcsh
# gdb /usr/local/bin/tcsh tcsh.core
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "powerpc--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/tcsh...
(No debugging symbols found in /usr/local/bin/tcsh)
[New process 1]
Core was generated by `tcsh'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xfdd77f50 in je_jemalloc_prefork () from /usr/lib/libc.so.12
(gdb) bt
#0 0xfdd77f50 in je_jemalloc_prefork () from /usr/lib/libc.so.12
#1 0xfddead30 in fork () from /usr/lib/libc.so.12
#2 0x01845d6c in ?? ()
#3 0x01804618 in ?? ()
#4 0x018020a4 in ?? ()
#5 0xfdee1d24 in _rtld_start () from /usr/libexec/ld.elf_so
(gdb)
>Fix:
No idea.
>Audit-Trail:
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-macppc/54953: 5.0 binaries on 9.0_RC2 macppc dumps core in
jemalloc(3)
Date: Tue, 11 Feb 2020 17:37:50 +0300
$ gdb -q netbsd-5.2.3/bin/test
...
Program received signal SIGSEGV, Segmentation fault.
0xfdda2268 in malloc () from /lib/libc.so.12
(gdb) x/5i $pc-16
0xfdda2258 <malloc+88>: lwz r31,0(r4)
0xfdda225c <malloc+92>: add r5,r4,r31
0xfdda2260 <malloc+96>: lwz r6,-12(r5)
0xfdda2264 <malloc+100>: add r31,r6,r2
=> 0xfdda2268 <malloc+104>: lbz r7,0(r31)
(gdb) p/x $r31
$11 = 0x1802ca4
(gdb) x/x $r31
0x1802ca4: Cannot access memory at address 0x1802ca4
(gdb) p/x $r2
$12 = 0x1809ca4
(gdb) p/x $r6
$13 = 0xffff9000
(gdb) p/x $r5
$14 = 0xfdeb7d28
(gdb) p/x $r4
$15 = 0xfdda2248
(gdb) x/x $r4 # uwe: R_PPC_REL32 _GLOBAL_OFFSET_TABLE_ (see below)
0xfdda2248 <malloc+72>: 0x00115ae0
$ powerpc--netbsd-objdump -dr jemalloc.pico | less +/'<malloc>:'
00003e88 <malloc>:
...
# uwe: this is where $r4 above points to
3ed0: 00 00 00 00 .long 0x0
3ed0: R_PPC_REL32 _GLOBAL_OFFSET_TABLE_
...
3ee0: 83 e4 00 00 lwz r31,0(r4)
3ee4: 7c a4 fa 14 add r5,r4,r31
3ee8: 80 c5 00 00 lwz r6,0(r5)
3eea: R_PPC_GOT_TPREL16 je_tsd_tls
3eec: 7f e6 12 14 add r31,r6,r2
3eec: R_PPC_TLS je_tsd_tls
3ef0: 88 ff 00 00 lbz r7,0(r31)
Compiling that file with -S -mregnames I get this asm (prettified a
bit to improve readability):
lwz %r31, 0(%r4)
add %r5, %r4, %r31
lwz %r6, je_tsd_tls@got@tprel(%r5)
add %r31, %r6, je_tsd_tls@tls
lbz %r7, 0(%r31)
-uwe
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-macppc/54953: 5.0 binaries on 9.0_RC2 macppc dumps core in
jemalloc(3)
Date: Wed, 12 Feb 2020 00:51:17 +0300
I think I see what the problem is. In netbsd-5 we had crt0 (so it's
part of the old binary) that does:
/*
* Initialize the Small Data Area registers.
* _SDA_BASE is defined in the SVR4 ABI for PPC.
* _SDA2_BASE is defined in the E[mbedded] ABI for PPC.
*/
__asm( "lis %r13,_SDA_BASE_@ha;"
"addi %r13,%r13,_SDA_BASE_@l;"
"lis %r2,_SDA2_BASE_@ha;"
"addi %r2,%r2,_SDA2_BASE_@l" );
But now we use %r2 for TLS! So malloc() works fine the first few
times that it's called from the libc init. Then the old binary entry
point is called (old crt0) and it overwrites TLS magic in %r2 with
_SDA2_BASE_. Next time malloc() is called and checks its TLS stuff,
the %r2 contains wrong value and pop goes the weasel.
-uwe
From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: port-macppc-maintainer@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org,
tsutsui@ceres.dti.ne.jp
Subject: Re: port-macppc/54953: 5.0 binaries on 9.0_RC2 macppc dumps core in
jemalloc(3)
Date: Tue, 11 Feb 2020 14:08:57 -0800
> On Feb 11, 2020, at 1:55 PM, Valery Ushakov <uwe@stderr.spb.ru> wrote:
>
> /*
> * Initialize the Small Data Area registers.
> * _SDA_BASE is defined in the SVR4 ABI for PPC.
> * _SDA2_BASE is defined in the E[mbedded] ABI for PPC.
> */
> __asm( "lis %r13,_SDA_BASE_@ha;"
> "addi %r13,%r13,_SDA_BASE_@l;"
> "lis %r2,_SDA2_BASE_@ha;"
> "addi %r2,%r2,_SDA2_BASE_@l" );
>
> But now we use %r2 for TLS! So malloc() works fine the first few
> times that it's called from the libc init. Then the old binary entry
> point is called (old crt0) and it overwrites TLS magic in %r2 with
> _SDA2_BASE_. Next time malloc() is called and checks its TLS stuff,
> the %r2 contains wrong value and pop goes the weasel.
Um, are we no longer adhering to the PPC SVR4 ABI??
-- thorpej
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc: Jason Thorpe <thorpej@me.com>
Subject: Re: port-macppc/54953: 5.0 binaries on 9.0_RC2 macppc dumps core in
jemalloc(3)
Date: Wed, 12 Feb 2020 01:48:40 +0300
On Tue, Feb 11, 2020 at 22:10:02 +0000, Jason Thorpe wrote:
> Um, are we no longer adhering to the PPC SVR4 ABI??
%r2 = _SDA2_BASE_ is EABI, not SVR4, that one only has
%r13 = _SDA_BASE_ which we still do
-uwe
From: Jason Thorpe <thorpej@me.com>
To: Valery Ushakov <uwe@stderr.spb.ru>
Cc: gnats-bugs@netbsd.org
Subject: Re: port-macppc/54953: 5.0 binaries on 9.0_RC2 macppc dumps core in
jemalloc(3)
Date: Tue, 11 Feb 2020 15:26:45 -0800
> On Feb 11, 2020, at 2:48 PM, Valery Ushakov <uwe@stderr.spb.ru> wrote:
>=20
> On Tue, Feb 11, 2020 at 22:10:02 +0000, Jason Thorpe wrote:
>=20
>> Um, are we no longer adhering to the PPC SVR4 ABI??
>=20
> %r2 =3D _SDA2_BASE_ is EABI, not SVR4, that one only has
> %r13 =3D _SDA_BASE_ which we still do
Gah, I mis-read the comment, thanks. So I guess the question is why we =
ever bothered with the EABI small data.
-- thorpej
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc: Jason Thorpe <thorpej@me.com>
Subject: Re: port-macppc/54953: 5.0 binaries on 9.0_RC2 macppc dumps core in
jemalloc(3)
Date: Wed, 12 Feb 2020 12:13:24 +0300
On Tue, Feb 11, 2020 at 15:26:45 -0800, Jason Thorpe wrote:
> Gah, I mis-read the comment, thanks. So I guess the question is why
> we ever bothered with the EABI small data.
A bit late to worry about that now, isn't it? :)
I guess (wildly) this was initially a binutils bug and _SDA2_BASE_ was
always defined, and so we stuck it into r2 in crt0 "b/c it was there"
as one guy famously said.
See e.g. port-macppc/47464 for what is most likely related fallout
from the binutils fixes in that area.
-uwe
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc: Jason Thorpe <thorpej@me.com>
Subject: Re: port-macppc/54953: 5.0 binaries on 9.0_RC2 macppc dumps core in
jemalloc(3)
Date: Wed, 12 Feb 2020 15:39:20 +0300
Perhaps we can add a hack to ld.so that would check .note.netbsd.ident
and if that is missing or predates 6.0 then it would scan the few
instructions at the entry point (i.e. old crt0's _start) lookig for
the lis/la pair of instructions that sets %r2 and just nop the out?
-uwe
From: Jason Thorpe <thorpej@me.com>
To: Valery Ushakov <uwe@stderr.spb.ru>
Cc: gnats-bugs@netbsd.org
Subject: Re: port-macppc/54953: 5.0 binaries on 9.0_RC2 macppc dumps core in
jemalloc(3)
Date: Wed, 12 Feb 2020 06:11:56 -0800
> On Feb 12, 2020, at 4:39 AM, Valery Ushakov <uwe@stderr.spb.ru> wrote:
>
> Perhaps we can add a hack to ld.so that would check .note.netbsd.ident
> and if that is missing or predates 6.0 then it would scan the few
> instructions at the entry point (i.e. old crt0's _start) lookig for
> the lis/la pair of instructions that sets %r2 and just nop the out?
Sounds like a reasonable fix to me.
-- thorpej
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.