NetBSD Problem Report #57536
From www@netbsd.org Fri Jul 21 23:03:12 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 7AB391A923D
for <gnats-bugs@gnats.NetBSD.org>; Fri, 21 Jul 2023 23:03:12 +0000 (UTC)
Message-Id: <20230721230310.777AF1A923E@mollari.NetBSD.org>
Date: Fri, 21 Jul 2023 23:03:10 +0000 (UTC)
From: logothesia@disroot.org
Reply-To: logothesia@disroot.org
To: gnats-bugs@NetBSD.org
Subject: nbctfmerge hangs on evbarm during build.sh release
X-Send-Pr-Version: www-1.0
>Number: 57536
>Category: toolchain
>Synopsis: nbctfmerge hangs on evbarm during build.sh release
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: toolchain-manager
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Jul 21 23:05:00 +0000 2023
>Originator: logothesia
>Release: 9.3
>Organization:
Acme Software Solutions
>Environment:
NetBSD pi 9.3 NetBSD 9.3 (RPI) #0: Thu Aug 4 15:30:37 UTC 2022 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/RPI evbarm
>Description:
I encountered a strange issue running `build.sh release' for NetBSD 10-beta, attempting to upgrade a 9.3 system. The sources were acquired like this:
curl -O https://cdn.netbsd.org/pub/NetBSD/NetBSD-release-10/tar_files/src.tar.gz
This is the actual command:
./build.sh -O ../obj -T ../tools -U -u -N 1 release
Prior to running `release' both `distribution' and `kernel=RPI' were run with identical arguments. The purpose of running release was to generate a kernel image file bootable by the Raspberry Pi, as suggested in https://wiki.netbsd.org/ports/evbarm/raspberry_pi/#index16h2
Partway through the run, nbctfmerge hanged for an extended amount of time. Top showed it as:
SIZE RES STATE TIME WCPU CPU COMMAND
70M 27M parked 0:45 0.00% 0.00% nbctfmerge
I attached gdb to the running process and got the following backtraces:
(gdb) info threads
Id Target Id Frame
* 1 LWP 2 of process 17102 "" 0x70c79ca4 in ___lwp_park60 () from /usr/lib/libc.so.12
2 LWP 1 of process 17102 "" 0x70c79ca4 in ___lwp_park60 () from /usr/lib/libc.so.12
(gdb) thread apply all bt
Thread 2 (LWP 1 of process 17102):
#0 0x70c79ca4 in ___lwp_park60 () from /usr/lib/libc.so.12
#1 0x70d09244 in ?? () from /usr/lib/libpthread.so.1
#2 0x70c1706c in je_malloc_mutex_lock_slow () from /usr/lib/libc.so.12
#3 0x70c08d2c in je_arena_tcache_fill_small () from /usr/lib/libc.so.12
#4 0x70bc3d4c in je_tcache_alloc_small_hard () from /usr/lib/libc.so.12
#5 0x70c10010 in malloc () from /usr/lib/libc.so.12
#6 0x000174f8 in xmalloc ()
#7 0x00017518 in xcalloc ()
#8 0x0001336c in ctf_load ()
#9 0x00016d9c in read_file ()
#10 0x00017148 in read_ctf ()
#11 0x000156e4 in main ()
Thread 1 (LWP 2 of process 17102):
#0 0x70c79ca4 in ___lwp_park60 () from /usr/lib/libc.so.12
#1 0x70d0a9a8 in pthread_cond_timedwait () from /usr/lib/libpthread.so.1
#2 0x000142c4 in worker_thread ()
#3 0x70d0c8f8 in ?? () from /usr/lib/libpthread.so.1
#4 0x70c79ae0 in __mknod50 () from /usr/lib/libc.so.12
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
I downloaded the debug.tgz set for 9.3 and reattached gdb, but the new backtraces didn't look too informative:
(gdb) thread apply all bt
Thread 2 (LWP 1 of process 17102):
#0 0x70c79ca4 in ___lwp_park60 () from /usr/lib/libc.so.12
#1 0x70d09244 in pthread__mutex_lock_slow (ptm=ptm@entry=0x70602148, ts=ts@entry=0x0) at /usr/src/lib/libpthread/pthread_mutex.c:384
#2 0x70d09788 in pthread_mutex_lock (ptm=ptm@entry=0x70602148) at /usr/src/lib/libpthread/pthread_mutex.c:208
#3 0x70c1706c in malloc_mutex_lock_final (mutex=0x70602108) at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/mutex.h:153
#4 je_malloc_mutex_lock_slow (mutex=mutex@entry=0x70602108) at /usr/src/external/bsd/jemalloc/lib/../dist/src/mutex.c:84
#5 0x70c08d2c in malloc_mutex_lock (mutex=0x70602108, tsdn=0x70d78008) at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/mutex.h:217
#6 je_arena_tcache_fill_small (tsdn=0x70d78008, arena=0x706003c0, tcache=0x70d780f8, tbin=0x70d78168, binind=4, prof_accumbytes=0) at /usr/src/external/bsd/jemalloc/lib/../dist/src/arena.c:1264
#7 0x70bc3d4c in je_tcache_alloc_small_hard (tsdn=tsdn@entry=0x70d78008, arena=<optimized out>, tcache=tcache@entry=0x70d780f8, tbin=tbin@entry=0x70d78168, binind=binind@entry=4, tcache_success=tcache_success@entry=0x7fecc51f) at /usr/src/external/bsd/jemalloc/lib/../dist/src/tcache.c:93
#8 0x70c10010 in tcache_alloc_small (slow_path=false, zero=false, binind=4, size=<optimized out>, tcache=0x70d780f8, arena=<optimized out>, tsd=<optimized out>) at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tcache_inlines.h:60
#9 arena_malloc (slow_path=false, tcache=0x70d780f8, zero=false, ind=4, size=<optimized out>, arena=0x0, tsdn=<optimized out>) at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/arena_inlines_b.h:94
#10 iallocztm (slow_path=false, arena=0x0, is_internal=false, tcache=0x70d780f8, zero=false, ind=4, size=<optimized out>, tsdn=<optimized out>) at /usr/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/jemalloc_internal_inlines_c.h:53
#11 imalloc_no_sample (ind=4, usize=40, size=<optimized out>, tsd=0x70d78008, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at /usr/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:1742
#12 imalloc_body (tsd=0x70d78008, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at /usr/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:1938
#13 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at /usr/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2038
#14 malloc (size=<optimized out>) at /usr/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2071
#15 0x000174f8 in xmalloc ()
#16 0x00017518 in xcalloc ()
#17 0x0001336c in ctf_load ()
#18 0x00016d9c in read_file ()
#19 0x00017148 in read_ctf ()
#20 0x000156e4 in main ()
Thread 1 (LWP 2 of process 17102):
#0 0x70c79ca4 in ___lwp_park60 () from /usr/lib/libc.so.12
#1 0x70d0a9a8 in pthread_cond_timedwait (cond=0x482d4 <wq+56>, mutex=0x482b4 <wq+24>, abstime=0x0) at /usr/src/lib/libpthread/pthread_cond.c:168
#2 0x000142c4 in worker_thread ()
#3 0x70d0c8f8 in pthread__create_tramp (cookie=0x70cf4000) at /usr/src/lib/libpthread/pthread.c:592
#4 0x70c79ae0 in __mknod50 () from /usr/lib/libc.so.12
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
In the end, I prodded the process with `pkill -STOP nbctfmerge; pkill -CONT nbctfmerge' to see if it would wake up, but to no avail.
>How-To-Repeat:
>Fix:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.