NetBSD Problem Report #56490
From he@smistad.uninett.no Tue Nov 9 10:09:53 2021
Return-Path: <he@smistad.uninett.no>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id E82C71A9239
for <gnats-bugs@gnats.NetBSD.org>; Tue, 9 Nov 2021 10:09:52 +0000 (UTC)
Message-Id: <20211109100945.8866743FD5C@smistad.uninett.no>
Date: Tue, 9 Nov 2021 11:09:45 +0100 (CET)
From: he@NetBSD.org
Reply-To: he@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: Very sporadic dig core dump, inside libpthread
X-Send-Pr-Version: 3.95
>Number: 56490
>Category: bin
>Synopsis: Very sporadic dig core dump, inside libpthread
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Nov 09 10:10:00 +0000 2021
>Originator: Havard Eidnes
>Release: NetBSD 9.99.92
>Organization:
I try...
>Environment:
System: NetBSD smistad.uninett.no 9.99.92 NetBSD 9.99.92 (GENERIC) #0: Fri Nov 5 23:39:47 UTC 2021 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
While diagnosing an entirely different problem, I wrote a
small shell script which loops doing "dig", and which reports
progress and any errors in case of a non-zero exit status from
dig. By oversight I left this running over night, and *one*
of the dig invocations caused a segmentation fault:
: {1} ; sh bin/check-2.sh
2549609 .............................[1] Segmentation fault (core dumped) dig -6 @<redacted>. <redacted>. a
3540270 ..............................^C...........................
: {2} ; ls -l dig.core
-rw------- 1 he users 3549296 Nov 9 04:49 dig.core
: {3} ; gdb /usr/bin/dig dig.core
GNU gdb (GDB) 11.0.50.20200914-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/dig...
Reading symbols from /usr/libdata/debug//usr/bin/dig.debug...
[New process 11157]
[New process 2827]
[New process 14233]
[New process 28895]
Core was generated by `dig'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 pthread__mutex_wakeup (cur=0xdededededededede, self=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
at /usr/src/lib/libpthread/pthread_mutex.c:532
532 */
[Current thread is 1 (process 11157)]
(gdb) where
#0 pthread__mutex_wakeup (cur=0xdededededededede, self=<optimized out>)
at /usr/src/lib/libpthread/pthread_mutex.c:532
#1 0x000074023fa0a330 in pthread_mutex_unlock (ptm=ptm@entry=0x74024274dad0)
at /usr/src/lib/libpthread/pthread_mutex.c:503
#2 0x0000740240e41243 in task_ready (task=0x74024274dac0)
at /usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/task.c:354
#3 isc_task_detach (taskp=taskp@entry=0x16fc1e770 <global_task>)
at /usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/task.c:409
#4 0x000000016fa16518 in destroy_libs ()
at /usr/src/external/mpl/bind/bin/dig/../../dist/bin/dig/dighost.c:4356
#5 0x000000016fa0c4db in dig_shutdown ()
at /usr/src/external/mpl/bind/bin/dig/../../dist/bin/dig/dig.c:2707
#6 0x000000016fa16886 in main (argc=5, argv=0x7f7ffffd1008)
at /usr/src/external/mpl/bind/bin/dig/../../dist/bin/dig/dig.c:2717
(gdb) i reg
rax 0x0 0
rbx 0xdededededededede -2387225703656530210
rcx 0x0 0
rdx 0x740242a2e800 127553056729088
rsi 0x0 0
rdi 0xdededededededede -2387225703656530210
rbp 0x7f7ffffd0f20 0x7f7ffffd0f20
rsp 0x7f7ffffd0cf0 0x7f7ffffd0cf0
r8 0x0 0
r9 0x0 0
r10 0x740242a52a08 127553056877064
r11 0x246 582
r12 0x0 0
r13 0x80 128
r14 0x7f7ffffd0cf0 140187732348144
r15 0x0 0
rip 0x74023fa09d10 0x74023fa09d10 <pthread__mutex_wakeup+57>
eflags 0x10202 [ IF RF ]
cs 0x47 71
ss 0x3f 63
ds 0x23 35
es 0x23 35
fs 0x0 0
gs 0x0 0
--Type <RET> for more, q to quit, c to continue without paging--
fs_base <unavailable>
gs_base <unavailable>
(gdb) x/i
Argument required (starting display address).
(gdb) x/i 0x74023fa09d10
=> 0x74023fa09d10 <pthread__mutex_wakeup+57>: mov (%rbx),%r15
(gdb)
Not entirely sure where to point the finger, but it veers
slightly in the direction of NetBSD's pthread library(?),
hence this report.
Any other suggestions? Does this instead point towards the
BIND code?
>How-To-Repeat:
It seems to be "somewhat difficult" to reproduce, ref. there
being 3.1M successful invocations and just one failure...
>Fix:
See above.
The test script I used was
#! /bin/sh
count=0
cr=$(echo " " | tr " " "\015")
while true; do
out=$(dig -6 @<redacted> <redacted>. a)
if [ $? != 0 ]; then
echo
echo "$out"
else
count=$(($count + 1))
echo -n $cr
c=$(($count % 60))
echo -n "$count "
while [ $c != 0 ]; do
echo -n .
c=$(($c - 1))
done
fi
done
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.