NetBSD Problem Report #56770

From martin@duskware.de  Sat Mar 26 17:21:35 2022
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 64B091A9239
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 26 Mar 2022 17:21:35 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: rump threxec test randomly hangs the rump server process
X-Send-Pr-Version: 3.95

>Number:         56770
>Category:       kern
>Synopsis:       rump threxec test randomly hangs the rump server process
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Mar 26 17:25:00 +0000 2022
>Originator:     Martin Husemann
>Release:        NetBSD 9.99.95
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD thirdstage.duskware.de 9.99.95 NetBSD 9.99.95 (MODULAR) #536: Sat Mar 26 09:19:27 CET 2022 martin@thirdstage.duskware.de:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
Architecture: sparc64
Machine: sparc64
>Description:

The tests/lib/librumpclient/t_exec tests reproducably (like 1 out of 10 times)
hangs in the rump_server (and can't clean that server up after timeout).

>How-To-Repeat:

On sparc64:

cd /usr/tests/lib/librumpclient
atf-run t_exec|atf-report

and repeat untill it gets stuck at:

t_exec (1/1): 5 test cases
    cloexec: [0.580718s] Passed.
    exec: [0.568253s] Passed.
    noexec: [0.562706s] Passed.
    threxec: <<-- here

gdb of the server process shows lots of waiting and two interesting other
threads:

[Switching to thread 23 (LWP 26770 of process 20281)]
#0  0x00000000407fbe30 in lwproc_proc_free (p=0x40b82040)
    at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:173
173                     LIST_INSERT_HEAD(&initproc->p_children, child, p_sibling);
(gdb) p *p
$1 = {p_list = {le_next = 0x40947ac0 <rumpns_proc0>, le_prev = 0x40b823c0}, 
[..]
(gdb) bt
#0  0x00000000407fbe30 in lwproc_proc_free (p=0x40b82040)
    at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:173
#1  lwproc_freelwp (l=<optimized out>)
    at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:341
#2  rump_lwproc_switch (newlwp=<optimized out>)
    at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:521
#3  0x00000000407fc438 in lwproc_makelwp (p=0x40b823c0, 
    doswitch=<optimized out>, procmake=<optimized out>)
    at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:385
#4  0x00000000407fc620 in rump_lwproc_newlwp (pid=<optimized out>)
    at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:439
#5  0x0000000040e06e2c in lwproc_newlwp (pid=23)
    at /usr/src/lib/librumpuser/rumpuser_sp.c:212
#6  serv_handlesyscall (data=0x4065e900 "", rhdr=0x406250c8, spc=0x40f0dbc0)
    at /usr/src/lib/librumpuser/rumpuser_sp.c:684
#7  serv_workbouncer (arg=<optimized out>)
    at /usr/src/lib/librumpuser/rumpuser_sp.c:767
#8  0x000000004100fcc8 in pthread__create_tramp (cookie=0x45188400)
    at /usr/src/lib/libpthread/pthread.c:564
#9  0x00000000412665d8 in _lwp_kill () from /usr/lib/libc.so.12

(gdb) thread 6
[Switching to thread 6 (LWP 29185 of process 20281)]
#0  pthread__mutex_lock_slow (ptm=0x4028ce80, ts=0x0)
    at /usr/src/lib/libpthread/pthread_mutex.c:368
368                             if (error < 0 && errno == ETIMEDOUT) {
(gdb) bt
#0  pthread__mutex_lock_slow (ptm=0x4028ce80, ts=0x0)
    at /usr/src/lib/libpthread/pthread_mutex.c:368
#1  0x0000000040e09e00 in rumpuser_mutex_enter (mtx=0x4028ce80)
    at /usr/src/lib/librumpuser/rumpuser_pth.c:202
#2  0x00000000407f4674 in mutex_enter (mtx=0x40947f00 <rumpns_proc_lock>)
    at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/locks.c:166
#3  0x00000000407ce194 in proc_alloc_lwpid (p=0x40b82040, l=0x5b380b00)
    at /usr/src/lib/librump/../../sys/rump/../kern/kern_proc.c:1183
#4  0x00000000407fc328 in lwproc_makelwp (p=0x40b82040, 
    doswitch=<optimized out>, procmake=<optimized out>)
    at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:365
#5  0x00000000407f9cf8 in rump_schedule ()
    at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/scheduler.c:268
#6  0x0000000040e06e20 in lwproc_newlwp (pid=23)
    at /usr/src/lib/librumpuser/rumpuser_sp.c:211
#7  serv_handlesyscall (data=0x4065eaa0 "", rhdr=0x40625488, spc=0x40f0dbc0)
    at /usr/src/lib/librumpuser/rumpuser_sp.c:684
#8  serv_workbouncer (arg=<optimized out>)
    at /usr/src/lib/librumpuser/rumpuser_sp.c:767
#9  0x000000004100fcc8 in pthread__create_tramp (cookie=0x40de2400)
    at /usr/src/lib/libpthread/pthread.c:564
#10 0x00000000412665d8 in _lwp_kill () from /usr/lib/libc.so.12

not sure both are related to the issue, but all others look boring.

>Fix:
n/a

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.