NetBSD Problem Report #59784
From www@netbsd.org Sat Nov 22 16:18:07 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id E529C1A923A
for <gnats-bugs@gnats.NetBSD.org>; Sat, 22 Nov 2025 16:18:06 +0000 (UTC)
Message-Id: <20251122161804.D1BED1A923C@mollari.NetBSD.org>
Date: Sat, 22 Nov 2025 16:18:04 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: dlopening and dlclosing libpthread is broken
X-Send-Pr-Version: www-1.0
>Number: 59784
>Category: lib
>Synopsis: dlopening and dlclosing libpthread is broken
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: riastradh
>State: needs-pullups
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Nov 22 16:20:00 +0000 2025
>Closed-Date:
>Last-Modified: Sat Nov 29 16:10:01 +0000 2025
>Originator: Taylor R Campbell
>Release: current, 11, 10, 9, ...
>Organization:
Locked and Unloaded LLC
>Environment:
>Description:
A program that dlopens (a library linked against) libpthread
and then dlcloses it can find itself in a pretty pickle with
mysterious symptoms like this:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000079bbe310cccc in ?? ()
#0 0x000079bbe310cccc in ?? ()
#1 0x000079bbe2e9847c in __deregister_frame_info_bases () from /usr/lib/libgcc_s.so.1
#2 0x000079bbe2e86365 in __do_global_dtors_aux () from /usr/lib/libgcc_s.so.1
#3 0x000079bbe311ac00 in ?? ()
#4 0x000079bbe2e99a79 in _fini () from /usr/lib/libgcc_s.so.1
#5 0x000079bbe3585120 in atexit_handler_stack () from /usr/lib/libc.so.12
#6 0x00007f7ff709fbe1 in _rtld_call_initfini_function (mask=0x7f7fff539130, func=0x79bbe2e99a70 <_fini>) at /home/riastradh/netbsd/11/src/libexec/ld.elf_so/rtld.c:152
#7 _rtld_call_fini_function (obj=0x79bbe2e9ddf0, mask=0x7f7fff539130, cur_objgen=4) at /home/riastradh/netbsd/11/src/libexec/ld.elf_so/rtld.c:167
#8 0x00007f7ff70a06a6 in _rtld_call_fini_functions (force=1, mask=0x7f7fff539130) at /home/riastradh/netbsd/11/src/libexec/ld.elf_so/rtld.c:213
#9 _rtld_exit () at /home/riastradh/netbsd/11/src/libexec/ld.elf_so/rtld.c:431
#10 0x000079bbe32c895f in __cxa_finalize (dso=dso@entry=0x0) at /home/riastradh/netbsd/11/src/lib/libc/stdlib/atexit.c:222
#11 0x000079bbe32c853b in exit (status=status@entry=0) at /home/riastradh/netbsd/11/src/lib/libc/stdlib/exit.c:60
#12 0x000079bbe3592b90 in pass (ctx=0x79bbe359e860 <Current>) at /home/riastradh/netbsd/11/src/external/bsd/atf/dist/atf-c/tc.c:337
#13 0x000079bbe35931d5 in atf_tc_run (tc=0x792168 <atfu_dlopen_tc>, resfile=<optimized out>) at /home/riastradh/netbsd/11/src/external/bsd/atf/dist/atf-c/tc.c:1041
#14 0x000079bbe359000e in atf_tp_run (tp=tp@entry=0x7f7fff5392c0, tcname=<optimized out>, resfile=<optimized out>) at /home/riastradh/netbsd/11/src/external/bsd/atf/dist/atf-c/tp.c:205
#15 0x000079bbe358fb95 in run_tc (exitcode=<synthetic pointer>, p=0x7f7fff5392e0, tp=0x7f7fff5392c0) at /home/riastradh/netbsd/11/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:510
#16 controlled_main (exitcode=<synthetic pointer>, add_tcs_hook=0x78fad8 <atfu_tp_add_tcs>, argv=<optimized out>, argc=<optimized out>) at /home/riastradh/netbsd/11/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:580
#17 atf_tp_main (argc=<optimized out>, argv=<optimized out>, add_tcs_hook=add_tcs_hook@entry=0x78fad8 <atfu_tp_add_tcs>) at /home/riastradh/netbsd/11/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:610
#18 0x000000000078fcb6 in main (argc=<optimized out>, argv=<optimized out>) at /home/riastradh/netbsd/11/src/tests/lib/libpthread/dlopen/t_dlopen.c:163
#19 0x000000000078f4eb in ___start (cleanup=<optimized out>, ps_strings=0x7f7fff539fe0) at /home/riastradh/netbsd/11/src/lib/csu/common/crt0-common.c:375
#20 0x00007f7ff70a68d0 in ?? () from /usr/libexec/ld.elf_so
#21 0x0000000000000005 in ?? ()
#22 0x00007f7fff539968 in ?? ()
#23 0x00007f7fff539971 in ?? ()
#24 0x00007f7fff53998b in ?? ()
#25 0x00007f7fff5399ae in ?? ()
#26 0x00007f7fff5399c9 in ?? ()
#27 0x0000000000000000 in ?? ()
Setting a breakpoint on __deregister_frame_info_bases and
single-stepping through it reveals that the crash is trying to
jump into code in libpthread.so that no longer exists, after
dlclose, in order to call __libc_mutex_lock via PLT. Why is it
trying to jump there?
What happened is:
1. The program dlopened (a library linked against) libpthread.
2. The program called pthread_mutex_lock -- or rather,
__libc_mutex_lock, renamed via #define in <pthread.h>.
3. The symbol __libc_mutex_lock has two definitions:
(a) A weak definition in libc.so -- the no-op thread stub.
(b) A strong definition in libpthread.so -- the real one.
Lazy binding of the symbol chooses the strong one, so the
entry for __libc_mutex_lock in the .got.plt is bound to
libpthread.so's definition, as shown by `info proc mappings'
and single-stepping in gdb:
(gdb) info proc mappings
...
0x7ee838cfb000 0x7ee838d03000 0x8000 0x7000 r-x CNPD /lib/libpthread.so.1.5
...
(gdb) display/i $pc
1: x/i $pc
=> 0x7ee838a8a402 <__deregister_frame_info_bases+4>: push %r12
(gdb) si
...
(gdb) si
0x00007ee838a8a477 in __deregister_frame_info_bases ()
from /usr/lib/libgcc_s.so.1
1: x/i $pc
=> 0x7ee838a8a477 <__deregister_frame_info_bases+121>:
call 0x7ee838a78150 <__libc_mutex_lock@plt>
(gdb) si
0x00007ee838a78150 in __libc_mutex_lock@plt () from /usr/lib/libgcc_s.so.1
1: x/i $pc
=> 0x7ee838a78150 <__libc_mutex_lock@plt>:
jmp *0x17f42(%rip) # 0x7ee838a90098 <__libc_mutex_lock@got.plt>
(gdb) x/xg $rip + 6 + 0x17f42
0x7ee838a90098 <__libc_mutex_lock@got.plt>: 0x00007ee838cfeccc
(gdb) si
pthread_mutex_lock (ptm=0x7ee838a90400 <object_mutex>)
at /home/riastradh/netbsd/11/src/lib/libpthread/pthread_mutex.c:204
1: x/i $pc
=> 0x7ee838cfeccc <pthread_mutex_lock>:
mov 0x92b5(%rip),%rax # 0x7ee838d07f88
Note that 0x7ee838cfeccc lies in the interval
[0x7ee838cfb000,0x7ee838d03000) where libpthread.so is
mapped.
4. dlclose unmapped everything in libpthread.so -- including the
pages of instructions that the .got.plt entry for
__libc_mutex_lock now points to, and dlclose has no
mechanism to _unbind_ this.
5. The next thing that tried to call __libc_mutex_lock jumped
into oblivion where libpthread.so used to be. In the test
case above, that happened to be in some mysterious code path
at program exit, but it could just as well have been, say,
one of the stdio(3) functions taking a FILE lock.
(gdb) si
0x00007ee838a8a477 in __deregister_frame_info_bases ()
from /usr/lib/libgcc_s.so.1
1: x/i $pc
=> 0x7ee838a8a477 <__deregister_frame_info_bases+121>:
call 0x7ee838a78150 <__libc_mutex_lock@plt>
(gdb) si
0x00007ee838a78150 in __libc_mutex_lock@plt () from /usr/lib/libgcc_s.so.1
1: x/i $pc
=> 0x7ee838a78150 <__libc_mutex_lock@plt>:
jmp *0x17f42(%rip) # 0x7ee838a90098 <__libc_mutex_lock@got.plt>
(gdb) si
0x00007ee838cfeccc in ?? ()
1: x/i $pc
=> 0x7ee838cfeccc: <error: Cannot access memory at address 0x7ee838cfeccc>
Why doesn't RTLD_LOCAL limit the scope of libpthread.so's
__libc_mutex_lock definition so only those .got.plt entries for
objects that dlclose is unloading will point to the
libpthread.so one, and any .got.plt entries for objects in the
global namespace will get the libc.so weak one?
=> Because the library that the test dlopens, which is linked
against libpthread.so, is _also_ linked against libgcc_s.so,
which is already marked with -Wl,-z,nodelete -- and
libgcc_s.so's .got.plt entry for __libc_mutex_lock is
resolved in the RTLD_LOCAL scope and bound to
libpthread.so's __libc_mutex_lock. If we remove libgcc_s.so
(by not using LIBISCXX=yes in the test library -- not sure
why we're using that anyway), the symptom goes away.
>How-To-Repeat:
cd /usr/tests/lib/libpthread/dlopen
atf-run | atf-report
Caveat: This no longer works as a test case for this particular
bug in HEAD, because __deregister_frame_info_bases has changed
to avoid taking a lock with __libc_mutex_lock. Need to
construct a test case that still works in HEAD in spite of
those changes.
>Fix:
Add to lib/libpthread/Makefile:
LDADD+= -Wl,-z,nodelete
This prevents rtld from actually unloading libpthread.
The same is probably needed for any library that provides
strong definitions of a symbol that is still used when the
library isn't loaded, via a weak definition from some other
source -- like __libc_mutex_lock.
It's a dark corner of ELF wizardry that we probably don't use
much outside of libpthread.so but I can't rule out the
possibility that someone has dabbled in such nefarious magic
elsewhere.
>Release-Note:
>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59784 CVS commit: src/tests/lib/libpthread/dlopen
Date: Sat, 22 Nov 2025 20:04:02 +0000
Module Name: src
Committed By: riastradh
Date: Sat Nov 22 20:04:02 UTC 2025
Modified Files:
src/tests/lib/libpthread/dlopen: t_dlopen.c
Log Message:
tests/lib/libpthread: Test unloading libpthread after lazy binding.
If you dlopen libpthread and dlclose it again, the thread stubs like
pthread_mutex_lock need to continue working -- a library might have
calls to it in order to support thread-safety for threaded
applications, but that library needs to continue working even in
non-threaded applications after lazy binding of the libpthread symbol
instead of the libc stub.
PR lib/59784: dlopening and dlclosing libpthread is broken
To generate a diff of this commit:
cvs rdiff -u -r1.1 -r1.2 src/tests/lib/libpthread/dlopen/t_dlopen.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59784 CVS commit: src/tests/lib/libpthread/dlopen
Date: Sat, 22 Nov 2025 20:05:21 +0000
Module Name: src
Committed By: riastradh
Date: Sat Nov 22 20:05:20 UTC 2025
Modified Files:
src/tests/lib/libpthread/dlopen: t_dso_pthread_create.c
Log Message:
tests/lib/libpthread: Don't abuse xfail.
Use a signal handler to check for SIGABRT, rather than
atf_tc_expect_signal.
xfail is for when there is a bug that we haven't fixed yet and the
test manifests a symptom of that bug -- a list of xfails is a list of
open bugs to be fixed. In this case, we are verifying that
pthread_create _correctly_ raises SIGABRT (or fails with nonzero
return code -- both are acceptable outcomes, really), and there is no
bug here at the moment.
Prompted by (but unrelated to):
PR lib/59784: dlopening and dlclosing libpthread is broken
To generate a diff of this commit:
cvs rdiff -u -r1.1 -r1.2 \
src/tests/lib/libpthread/dlopen/t_dso_pthread_create.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59784 CVS commit: src
Date: Sun, 23 Nov 2025 22:11:42 +0000
Module Name: src
Committed By: riastradh
Date: Sun Nov 23 22:11:42 UTC 2025
Modified Files:
src: UPDATING
src/lib/libpthread: Makefile
src/tests/lib/libpthread/dlopen: t_dlopen.c
Log Message:
libpthread: Link with -Wl,-z,nodelete.
Can't safely unload libpthread because of the interaction with libc
thread stubs.
PR lib/59784: dlopening and dlclosing libpthread is broken
To generate a diff of this commit:
cvs rdiff -u -r1.386 -r1.387 src/UPDATING
cvs rdiff -u -r1.102 -r1.103 src/lib/libpthread/Makefile
cvs rdiff -u -r1.2 -r1.3 src/tests/lib/libpthread/dlopen/t_dlopen.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: lib-bug-people->riastradh
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Sun, 23 Nov 2025 22:15:42 +0000
Responsible-Changed-Why:
fixed in HEAD with tests, needs pullup-11, pullup-10, pullup-9
State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 23 Nov 2025 22:15:42 +0000
State-Changed-Why:
mine
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59784 CVS commit: src
Date: Sat, 29 Nov 2025 14:39:36 +0000
Module Name: src
Committed By: riastradh
Date: Sat Nov 29 14:39:36 UTC 2025
Modified Files:
src: UPDATING
src/lib/libpthread: shlib_version
Log Message:
libpthread: Touch comment in shlib_version for recent LDADD.
This provokes relinking libpthread.so with the new arguments, without
needing manual intervention to follow a note in UPDATING.
PR lib/59784: dlopening and dlclosing libpthread is broken
To generate a diff of this commit:
cvs rdiff -u -r1.387 -r1.388 src/UPDATING
cvs rdiff -u -r1.23 -r1.24 src/lib/libpthread/shlib_version
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59784 CVS commit: [netbsd-11] src
Date: Sat, 29 Nov 2025 16:08:02 +0000
Module Name: src
Committed By: martin
Date: Sat Nov 29 16:08:01 UTC 2025
Modified Files:
src/lib/libpthread [netbsd-11]: Makefile shlib_version
src/tests/lib/libpthread/dlopen [netbsd-11]: t_dlopen.c
t_dso_pthread_create.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #110):
lib/libpthread/Makefile: revision 1.103
tests/lib/libpthread/dlopen/t_dso_pthread_create.c: revision 1.2
tests/lib/libpthread/dlopen/t_dlopen.c: revision 1.2
tests/lib/libpthread/dlopen/t_dlopen.c: revision 1.3
lib/libpthread/shlib_version: revision 1.24
tests/lib/libpthread: Test unloading libpthread after lazy binding.
If you dlopen libpthread and dlclose it again, the thread stubs like
pthread_mutex_lock need to continue working -- a library might have
calls to it in order to support thread-safety for threaded
applications, but that library needs to continue working even in
non-threaded applications after lazy binding of the libpthread symbol
instead of the libc stub.
PR lib/59784: dlopening and dlclosing libpthread is broken
tests/lib/libpthread: Don't abuse xfail.
Use a signal handler to check for SIGABRT, rather than
atf_tc_expect_signal.
xfail is for when there is a bug that we haven't fixed yet and the
test manifests a symptom of that bug -- a list of xfails is a list of
open bugs to be fixed. In this case, we are verifying that
pthread_create _correctly_ raises SIGABRT (or fails with nonzero
return code -- both are acceptable outcomes, really), and there is no
bug here at the moment.
Prompted by (but unrelated to):
PR lib/59784: dlopening and dlclosing libpthread is broken
libpthread: Link with -Wl,-z,nodelete.
Can't safely unload libpthread because of the interaction with libc
thread stubs.
PR lib/59784: dlopening and dlclosing libpthread is broken
libpthread: Touch comment in shlib_version for recent LDADD.
This provokes relinking libpthread.so with the new arguments, without
needing manual intervention to follow a note in UPDATING.
PR lib/59784: dlopening and dlclosing libpthread is broken
To generate a diff of this commit:
cvs rdiff -u -r1.100 -r1.100.2.1 src/lib/libpthread/Makefile
cvs rdiff -u -r1.20.2.1 -r1.20.2.2 src/lib/libpthread/shlib_version
cvs rdiff -u -r1.1 -r1.1.48.1 src/tests/lib/libpthread/dlopen/t_dlopen.c \
src/tests/lib/libpthread/dlopen/t_dso_pthread_create.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.