NetBSD Problem Report #57536

From www@netbsd.org  Fri Jul 21 23:03:12 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 7AB391A923D
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 21 Jul 2023 23:03:12 +0000 (UTC)
Message-Id: <20230721230310.777AF1A923E@mollari.NetBSD.org>
Date: Fri, 21 Jul 2023 23:03:10 +0000 (UTC)
From: logothesia@disroot.org
Reply-To: logothesia@disroot.org
To: gnats-bugs@NetBSD.org
Subject: nbctfmerge hangs on evbarm during build.sh release
X-Send-Pr-Version: www-1.0

>Number:         57536
>Category:       toolchain
>Synopsis:       nbctfmerge hangs on evbarm during build.sh release
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    toolchain-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jul 21 23:05:00 +0000 2023
>Originator:     logothesia
>Release:        9.3
>Organization:
Acme Software Solutions
>Environment:
NetBSD pi 9.3 NetBSD 9.3 (RPI) #0: Thu Aug  4 15:30:37 UTC 2022  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/RPI evbarm
>Description:
I encountered a strange issue running `build.sh release' for NetBSD 10-beta, attempting to upgrade a 9.3 system. The sources were acquired like this:

curl -O https://cdn.netbsd.org/pub/NetBSD/NetBSD-release-10/tar_files/src.tar.gz

This is the actual command:

./build.sh -O ../obj -T ../tools -U -u -N 1 release

Prior to running `release' both `distribution' and `kernel=RPI' were run with identical arguments. The purpose of running release was to generate a kernel image file bootable by the Raspberry Pi, as suggested in https://wiki.netbsd.org/ports/evbarm/raspberry_pi/#index16h2

Partway through the run, nbctfmerge hanged for an extended amount of time. Top showed it as:
SIZE   RES STATE      TIME   WCPU    CPU COMMAND
 70M   27M parked     0:45  0.00%  0.00% nbctfmerge

I attached gdb to the running process and got the following backtraces:

(gdb) info threads
  Id   Target Id                 Frame
* 1    LWP 2 of process 17102 "" 0x70c79ca4 in ___lwp_park60 () from /usr/lib/libc.so.12
  2    LWP 1 of process 17102 "" 0x70c79ca4 in ___lwp_park60 () from /usr/lib/libc.so.12
(gdb) thread apply all bt

Thread 2 (LWP 1 of process 17102):
#0  0x70c79ca4 in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x70d09244 in ?? () from /usr/lib/libpthread.so.1
#2  0x70c1706c in je_malloc_mutex_lock_slow () from /usr/lib/libc.so.12
#3  0x70c08d2c in je_arena_tcache_fill_small () from /usr/lib/libc.so.12
#4  0x70bc3d4c in je_tcache_alloc_small_hard () from /usr/lib/libc.so.12
#5  0x70c10010 in malloc () from /usr/lib/libc.so.12
#6  0x000174f8 in xmalloc ()
#7  0x00017518 in xcalloc ()
#8  0x0001336c in ctf_load ()
#9  0x00016d9c in read_file ()
#10 0x00017148 in read_ctf ()
#11 0x000156e4 in main ()

Thread 1 (LWP 2 of process 17102):
#0  0x70c79ca4 in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x70d0a9a8 in pthread_cond_timedwait () from /usr/lib/libpthread.so.1
#2  0x000142c4 in worker_thread ()
#3  0x70d0c8f8 in ?? () from /usr/lib/libpthread.so.1
#4  0x70c79ae0 in __mknod50 () from /usr/lib/libc.so.12
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

I downloaded the debug.tgz set for 9.3 and reattached gdb, but the new backtraces didn't look too informative:

(gdb) thread apply all bt

Thread 2 (LWP 1 of process 17102):
#0  0x70c79ca4 in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x70d09244 in pthread__mutex_lock_slow (ptm=ptm@entry=0x70602148, ts=ts@entry=0x0) at /usr/src/lib/libpthread/pthread_mutex.c:384
#2  0x70d09788 in pthread_mutex_lock (ptm=ptm@entry=0x70602148) at /usr/src/lib/libpthread/pthread_mutex.c:208
#3  0x70c1706c in malloc_mutex_lock_final (mutex=0x70602108) at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/mutex.h:153
#4  je_malloc_mutex_lock_slow (mutex=mutex@entry=0x70602108) at /usr/src/external/bsd/jemalloc/lib/../dist/src/mutex.c:84
#5  0x70c08d2c in malloc_mutex_lock (mutex=0x70602108, tsdn=0x70d78008) at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/mutex.h:217
#6  je_arena_tcache_fill_small (tsdn=0x70d78008, arena=0x706003c0, tcache=0x70d780f8, tbin=0x70d78168, binind=4, prof_accumbytes=0) at /usr/src/external/bsd/jemalloc/lib/../dist/src/arena.c:1264
#7  0x70bc3d4c in je_tcache_alloc_small_hard (tsdn=tsdn@entry=0x70d78008, arena=<optimized out>, tcache=tcache@entry=0x70d780f8, tbin=tbin@entry=0x70d78168, binind=binind@entry=4, tcache_success=tcache_success@entry=0x7fecc51f) at /usr/src/external/bsd/jemalloc/lib/../dist/src/tcache.c:93
#8  0x70c10010 in tcache_alloc_small (slow_path=false, zero=false, binind=4, size=<optimized out>, tcache=0x70d780f8, arena=<optimized out>, tsd=<optimized out>) at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tcache_inlines.h:60
#9  arena_malloc (slow_path=false, tcache=0x70d780f8, zero=false, ind=4, size=<optimized out>, arena=0x0, tsdn=<optimized out>) at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/arena_inlines_b.h:94
#10 iallocztm (slow_path=false, arena=0x0, is_internal=false, tcache=0x70d780f8, zero=false, ind=4, size=<optimized out>, tsdn=<optimized out>) at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/jemalloc_internal_inlines_c.h:53
#11 imalloc_no_sample (ind=4, usize=40, size=<optimized out>, tsd=0x70d78008, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at /usr/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:1742
#12 imalloc_body (tsd=0x70d78008, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at /usr/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:1938
#13 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at /usr/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2038
#14 malloc (size=<optimized out>) at /usr/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2071
#15 0x000174f8 in xmalloc ()
#16 0x00017518 in xcalloc ()
#17 0x0001336c in ctf_load ()
#18 0x00016d9c in read_file ()
#19 0x00017148 in read_ctf ()
#20 0x000156e4 in main ()

Thread 1 (LWP 2 of process 17102):
#0  0x70c79ca4 in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x70d0a9a8 in pthread_cond_timedwait (cond=0x482d4 <wq+56>, mutex=0x482b4 <wq+24>, abstime=0x0) at /usr/src/lib/libpthread/pthread_cond.c:168
#2  0x000142c4 in worker_thread ()
#3  0x70d0c8f8 in pthread__create_tramp (cookie=0x70cf4000) at /usr/src/lib/libpthread/pthread.c:592
#4  0x70c79ae0 in __mknod50 () from /usr/lib/libc.so.12
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

In the end, I prodded the process with `pkill -STOP nbctfmerge; pkill -CONT nbctfmerge' to see if it would wake up, but to no avail.

>How-To-Repeat:

>Fix:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.