NetBSD Problem Report #56490

From he@smistad.uninett.no  Tue Nov  9 10:09:53 2021
Return-Path: <he@smistad.uninett.no>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E82C71A9239
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  9 Nov 2021 10:09:52 +0000 (UTC)
Message-Id: <20211109100945.8866743FD5C@smistad.uninett.no>
Date: Tue,  9 Nov 2021 11:09:45 +0100 (CET)
From: he@NetBSD.org
Reply-To: he@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: Very sporadic dig core dump, inside libpthread
X-Send-Pr-Version: 3.95

>Number:         56490
>Category:       bin
>Synopsis:       Very sporadic dig core dump, inside libpthread
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Nov 09 10:10:00 +0000 2021
>Originator:     Havard Eidnes
>Release:        NetBSD 9.99.92
>Organization:
  I try...
>Environment:


System: NetBSD smistad.uninett.no 9.99.92 NetBSD 9.99.92 (GENERIC) #0: Fri Nov 5 23:39:47 UTC 2021 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:

	While diagnosing an entirely different problem, I wrote a
	small shell script which loops doing "dig", and which reports
	progress and any errors in case of a non-zero exit status from
	dig.  By oversight I left this running over night, and *one*
	of the dig invocations caused a segmentation fault:

: {1} ; sh bin/check-2.sh 
2549609 .............................[1]   Segmentation fault (core dumped) dig -6 @<redacted>. <redacted>. a


3540270 ..............................^C...........................
: {2} ; ls -l dig.core
-rw-------  1 he  users  3549296 Nov  9 04:49 dig.core
: {3} ; gdb /usr/bin/dig dig.core
GNU gdb (GDB) 11.0.50.20200914-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/dig...
Reading symbols from /usr/libdata/debug//usr/bin/dig.debug...
[New process 11157]
[New process 2827]
[New process 14233]
[New process 28895]
Core was generated by `dig'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  pthread__mutex_wakeup (cur=0xdededededededede, self=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at /usr/src/lib/libpthread/pthread_mutex.c:532
532                      */
[Current thread is 1 (process 11157)]
(gdb) where
#0  pthread__mutex_wakeup (cur=0xdededededededede, self=<optimized out>)
    at /usr/src/lib/libpthread/pthread_mutex.c:532
#1  0x000074023fa0a330 in pthread_mutex_unlock (ptm=ptm@entry=0x74024274dad0)
    at /usr/src/lib/libpthread/pthread_mutex.c:503
#2  0x0000740240e41243 in task_ready (task=0x74024274dac0)
    at /usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/task.c:354
#3  isc_task_detach (taskp=taskp@entry=0x16fc1e770 <global_task>)
    at /usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/task.c:409
#4  0x000000016fa16518 in destroy_libs ()
    at /usr/src/external/mpl/bind/bin/dig/../../dist/bin/dig/dighost.c:4356
#5  0x000000016fa0c4db in dig_shutdown ()
    at /usr/src/external/mpl/bind/bin/dig/../../dist/bin/dig/dig.c:2707
#6  0x000000016fa16886 in main (argc=5, argv=0x7f7ffffd1008)
    at /usr/src/external/mpl/bind/bin/dig/../../dist/bin/dig/dig.c:2717
(gdb) i reg
rax            0x0                 0
rbx            0xdededededededede  -2387225703656530210
rcx            0x0                 0
rdx            0x740242a2e800      127553056729088
rsi            0x0                 0
rdi            0xdededededededede  -2387225703656530210
rbp            0x7f7ffffd0f20      0x7f7ffffd0f20
rsp            0x7f7ffffd0cf0      0x7f7ffffd0cf0
r8             0x0                 0
r9             0x0                 0
r10            0x740242a52a08      127553056877064
r11            0x246               582
r12            0x0                 0
r13            0x80                128
r14            0x7f7ffffd0cf0      140187732348144
r15            0x0                 0
rip            0x74023fa09d10      0x74023fa09d10 <pthread__mutex_wakeup+57>
eflags         0x10202             [ IF RF ]
cs             0x47                71
ss             0x3f                63
ds             0x23                35
es             0x23                35
fs             0x0                 0
gs             0x0                 0
--Type <RET> for more, q to quit, c to continue without paging--
fs_base        <unavailable>
gs_base        <unavailable>
(gdb) x/i
Argument required (starting display address).
(gdb) x/i 0x74023fa09d10
=> 0x74023fa09d10 <pthread__mutex_wakeup+57>:       mov    (%rbx),%r15
(gdb) 

	Not entirely sure where to point the finger, but it veers
	slightly in the direction of NetBSD's pthread library(?),
	hence this report.

	Any other suggestions?  Does this instead point towards the
	BIND code? 

>How-To-Repeat:
	It seems to be "somewhat difficult" to reproduce, ref. there
	being 3.1M successful invocations and just one failure...

>Fix:
	See above.

	The test script I used was

#! /bin/sh

count=0
cr=$(echo " " | tr " " "\015")
while true; do
        out=$(dig -6 @<redacted> <redacted>. a)
        if [ $? != 0 ]; then
                echo
                echo "$out"
        else
                count=$(($count + 1))
                echo -n $cr
                c=$(($count % 60))
                echo -n "$count "
                while [ $c != 0 ]; do
                        echo -n .
                        c=$(($c - 1))
                done
        fi
done

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.