NetBSD Problem Report #58158

From manu@netbsd.org  Tue Apr 16 15:49:27 2024
Return-Path: <manu@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id B78411A9238
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 16 Apr 2024 15:49:27 +0000 (UTC)
Message-Id: <20240416154926.DD2FF84E77@mail.netbsd.org>
Date: Tue, 16 Apr 2024 15:49:26 +0000 (UTC)
From: manu@netbsd.org
Reply-To: manu@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: TLS pointer gets randomly NULL when running i386 binaries on NetBSD-10.0/amd64
X-Send-Pr-Version: 3.95

>Number:         58158
>Category:       kern
>Synopsis:       TLS pointer gets randomly NULL when running i386 binaries on NetBSD-10.0/amd64
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Apr 16 15:50:00 +0000 2024
>Last-Modified:  Tue Apr 16 16:59:50 +0000 2024
>Originator:     Emmanuel Dreyfus
>Release:        NetBSD 10.0
>Organization:
NetBSD
>Environment:
NetBSD lego 10.0 NetBSD 10.0 (XEN3_DOMU) #0: Thu Mar 28 08:33:33 UTC 2024  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/xen/compile/XEN3_DOMU amd64
>Description:
I cross-build i386 packages on an amd64 machine, using pkg_comp with
CFLAHS-m32 in /etc/mk.conf.

That worked well on NetBSD-9 but after upgradint to NetBSD-10.0, I get
random SIGSEGV during the build. It seems to happen in malloc-related code, and the culprit is a NULL GS register.

Here is an example:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xf4bde597 in tsd_fetch_impl (minimal=false, init=true)
    at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
270     return tsd;

(gdb) bt
#0  0xf4bde597 in tsd_fetch_impl (minimal=false, init=true)
    at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
#1  tsd_fetch ()
    at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:291
#2  imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>)
    at /usr/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2036
#3  malloc (size=size@entry=60)
    at /usr/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2075
#4  0xf4c3e1fd in _citrus_lookup_seq_open (rcl=rcl@entry=0xfff0b898, 
    name=0xf4c6ba6b "/usr/share/nls/nls.alias", ignore_case=0)
    at /usr/src/lib/libc/citrus/citrus_lookup.c:277
#5  0xf4c3e35e in _citrus_lookup_simple (
    name=name@entry=0xf4c6ba6b "/usr/share/nls/nls.alias", 
    key=key@entry=0xf4c6d2d6 <_lc_C_locale_name> "C", 
    linebuf=linebuf@entry=0xfff0b8df "", linebufsize=linebufsize@entry=1024,
    ignore_case=ignore_case@entry=0)
    at /usr/src/lib/libc/citrus/citrus_lookup.c:340
#6  0xf4c30ec6 in __unaliasname (bufsize=1024, buf=0xfff0b8df,  
    alias=0xf4c6d2d6 <_lc_C_locale_name> "C",
    dbname=0xf4c6ba6b "/usr/share/nls/nls.alias")
    at /usr/src/lib/libc/citrus/citrus_aliasname_local.h:36
#7  _catopen_l (name=0xf4c61a6c "libc", oflag=<optimized out>, 
    loc=0xf4cacaa0 <_lc_global_locale>) at /usr/src/lib/libc/nls/catopen.c:108
#8  0xf4c31131 in _catopen (name=name@entry=0xf4c61a6c "libc", oflag=1)
    at /usr/src/lib/libc/compat/../locale/setlocale_local.h:93
#9  0xf4b63625 in __strsignal (num=num@entry=11, buf=0xf4cad0e0 <buf> "", 
    buflen=2048) at /usr/src/lib/libc/string/__strsignal.c:67
#10 0xf4b128e0 in _strsignal (sig=sig@entry=11)
    at /usr/src/lib/libc/string/strsignal.c:55
#11 0x00d82e5d in signal_crash (signo=11)
    at /usr/src/external/gpl3/binutils/usr.bin/gas/../../dist/gas/messages.c:329
#12 <signal handler called>
#13 0xf4bde597 in tsd_fetch_impl (minimal=false, init=true)
    at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
#14 tsd_fetch ()
    at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:291
#15 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>)
    at /usr/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2036
#16 malloc (size=512)
    at /usr/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2075
#17 0x00dc29d8 in xrealloc ()
#18 0x00daf1cd in build_group_lists (abfd=0xf4926000, sec=0xf4a863cc, 
    inf=0xdecb14 <groups>)
    at /usr/src/external/gpl3/binutils/usr.bin/gas/../../dist/gas/config/obj-elf.c:2448
#19 0xf4daa30b in bfd_map_over_sections (abfd=0xf4926000, 
    operation=0xdaf10b <build_group_lists>, user_storage=0xdecb14 <groups>)
    at /usr/src/external/gpl3/binutils/lib/libbfd/../../dist/bfd/section.c:1362
#20 0x00daeefc in elf_adjust_symtab ()
    at /usr/src/external/gpl3/binutils/usr.bin/gas/../../dist/gas/config/obj-elf.c:2475
#21 0x00d9785c in write_object_file ()
    at /usr/src/external/gpl3/binutils/usr.bin/gas/../../dist/gas/write.c:2428
#22 0x00dc4275 in main (argc=<optimized out>, argv=<optimized out>)
    at /usr/src/external/gpl3/binutils/usr.bin/gas/../../dist/gas/as.c:1403

(gdb) disas 0xf4bde597
   0xf4bde56c <+0>:    push   %ebp
   0xf4bde56d <+1>:    mov    %esp,%ebp
   0xf4bde56f <+3>:    push   %edi
   0xf4bde570 <+4>:    push   %esi
   0xf4bde571 <+5>:    push   %ebx
   0xf4bde572 <+6>:    sub    $0x30,%esp
   0xf4bde575 <+9>:    call   0xf4c5d339 <__x86.get_pc_thunk.bx>
   0xf4bde57a <+14>:   add    $0xcaa86,%ebx
   0xf4bde580 <+20>:   mov    0x8(%ebp),%edi
   0xf4bde583 <+23>:   mov    0x285c(%ebx),%eax
   0xf4bde589 <+29>:   test   %eax,%eax
   0xf4bde58b <+31>:   jne    0xf4bde66f <malloc+259>
   0xf4bde591 <+37>:   mov    -0x2d0(%ebx),%ecx
=> 0xf4bde597 <+43>:   mov    %gs:0x0,%esi
   0xf4bde59e <+50>:   add    %ecx,%esi

(gdb) info reg $gs
gs             0x0                 0


>How-To-Repeat:
 Bulk-build i386 packages on an amd64 machine
>Fix:
None known yet

>Release-Note:

>Audit-Trail:

>Unformatted:
 (spz) I can corroborate: that looks a lot like the train wreck that was
 http://shadow.netbsd.org/pub/pkgsrc/packages/reports/HEAD/NetBSD-10.0-i386/20240412.1302/meta/report.html
 It doesn't seem to be entirely deterministic, though, 
 http://shadow.netbsd.org/pub/pkgsrc/packages/reports/2024Q1/NetBSD-10.0-i386/20240415.0345/meta/report.html
 and the NetBSD-9 i386 built fine on the same kernel.
 Issues like this also didn't happen, through many iterations, with a
 -10 kernel from 20231231.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.