NetBSD Problem Report #55466

From martin@duskware.de  Mon Jul  6 15:36:33 2020
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id EAD331A9213
	for <gnats-bugs@gnats.NetBSD.org>; Mon,  6 Jul 2020 15:36:32 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: can not complete a full test run
X-Send-Pr-Version: 3.95

>Number:         55466
>Category:       kern
>Synopsis:       can not complete a full test run
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jul 06 15:40:00 +0000 2020
>Closed-Date:    Wed May 17 11:26:54 +0000 2023
>Last-Modified:  Wed May 17 11:26:54 +0000 2023
>Originator:     Martin Husemann
>Release:        NetBSD 9.99.69
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD space-truckin.duskware.de 9.99.69 NetBSD 9.99.69 (GENERIC) #90: Mon Jul 6 11:46:24 CEST 2020 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/evbarm/compile/GENERIC evbarm
Architecture: earmv7hfeb
Machine: evbarm
>Description:

When doing a full test run there is a high chance it locks up here:

sbin/ifconfig/t_bridge (542/873): 1 test cases
    manybridges:

top shows a rump_server process busy looping and console is totaly
dead.   

Killing the rump_server process (with -9) unlocks the console again.

If not doing a full test run, but just:

cd /usr/tests/sbin/ifconfig && atf-run t_bridge | atf-report

everything works as expected:
Tests root: /usr/tests/sbin/ifconfig

t_bridge (1/1): 1 test cases
    manybridges: [38.488706s] Passed.
[38.490496s]

Summary for 1 test programs:
    1 passed test cases.
    0 failed test cases.
    0 expected failed test cases.
    0 skipped test cases.


Note that this test does not use RUMP at all, so the leftover rump_server
process must be from some earlier test program.

>How-To-Repeat:
s/a

>Fix:
n/a

>Release-Note:

>Audit-Trail:
From: Jukka Ruohonen <jruohonen@iki.fi>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/55466: can not complete a full test run
Date: Mon, 6 Jul 2020 19:42:49 +0300

 On Mon, Jul 06, 2020 at 03:40:01PM +0000, martin@NetBSD.org wrote:
 > Note that this test does not use RUMP at all, so the leftover rump_server
 > process must be from some earlier test program.

 With respect to the recent Qemu run:

 http://releng.netbsd.org/b5reports/evbarm-aarch64/2020/2020.07.05.19.40.27/test.tps

 While the test passed, the clean-up routine contained this:

 tc-so:Burnt down bridge65274
 tc-se:[1]   Killed                  ifconfig "bridge${bridge}" destroy >/dev/null ...
 tc-so:Burnt down bridge65273

 Later on, when the subsequent t_repeated_link_addr is executed, the system
 hangs with:

 [...]
 tc-so:Restored state of vioif0 to up
 tc-so:Skipping lo0
 tc-so:Skipping bridge65273
 tc-se:t_repeated_link_addr: ERROR: Unreachable

 Where 'bridge65273' is a left-over from the previous test.

 - Jukka

From: Martin Husemann <martin@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55466: can not complete a full test run
Date: Mon, 6 Jul 2020 17:13:43 +0000

 On another machine the full test run completed, but it also had a
 leftover rump_server process.

 Most threads were in pthread_cond_timedwait(), this one was special:

 [Switching to thread 95 (LWP 1947 of process 14722)]
 #0  0x00000000407e32d8 in lwproc_proc_free (p=0x411be040)
     at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:173
 173                     LIST_INSERT_HEAD(&initproc->p_children, child, p_sibling);
 (gdb) bt
 #0  0x00000000407e32d8 in lwproc_proc_free (p=0x411be040)
     at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:173
 #1  lwproc_freelwp (l=<optimized out>)
     at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:341
 #2  rump_lwproc_switch (newlwp=<optimized out>)
     at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:521
 #3  0x00000000407e3700 in lwproc_makelwp (p=0x41feab00, 
     doswitch=<optimized out>, procmake=<optimized out>)
     at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:385
 #4  0x00000000407e3860 in rump_lwproc_newlwp (pid=<optimized out>)
     at /usr/src/lib/librump/../../sys/rump/librump/rumpkern/lwproc.c:439
 #5  0x0000000040e07c2c in lwproc_newlwp (pid=427)
     at /usr/src/lib/librumpuser/rumpuser_sp.c:212
 #6  serv_handlesyscall (rhdr=0x638b6fc8, rhdr=0x638b6fc8, data=0x638b8460 "", 
     spc=0x40f0f608) at /usr/src/lib/librumpuser/rumpuser_sp.c:684
 #7  serv_workbouncer (arg=<optimized out>)
     at /usr/src/lib/librumpuser/rumpuser_sp.c:767
 #8  0x000000004100f328 in pthread__create_tramp (cookie=0x59003400)
     at /usr/src/lib/libpthread/pthread.c:560
 #9  0x0000000041266698 in _lwp_kill () from /usr/lib/libc.so.12


 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/55466: can not complete a full test run
Date: Thu, 14 Oct 2021 17:49:27 +0200

 While I have not seen this on the original double core evbearmv7 machine
 in a while, it now hits me quite reproducably on a dual core macppc
 machine.

 This is a showstopper for netbsd-10.

 Martin

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Wed, 17 May 2023 11:20:34 +0000
State-Changed-Why:
Is this still bust?


State-Changed-From-To: feedback->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Wed, 17 May 2023 11:26:54 +0000
State-Changed-Why:
Left over busy looping rump_server processes show up every now and then,
but this PR does not help diagnose the individual test failures or
bogus tests causing this (or: rump bugs?)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.