NetBSD Problem Report #55340

From martin@duskware.de  Wed Jun  3 14:35:45 2020
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 6D9D01A9217
	for <gnats-bugs@gnats.NetBSD.org>; Wed,  3 Jun 2020 14:35:45 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: kernel hang during rump test run + ddb failure
X-Send-Pr-Version: 3.95

>Number:         55340
>Category:       port-macppc
>Synopsis:       kernel hang during rump test run + ddb failure
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-macppc-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jun 03 14:40:00 +0000 2020
>Last-Modified:  Sat Jun 20 11:40:01 +0000 2020
>Originator:     Martin Husemann
>Release:        NetBSD 9.99.64
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD gethsemane.duskware.de 9.99.64 NetBSD 9.99.64 (GETHSEMANE) #45: Wed Jun 3 12:29:44 CEST 2020 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/macppc/compile/GETHSEMANE macppc
Architecture: powerpc
Machine: macppc
>Description:

Running atf test on a dual G4 macppc today locked up the system.
It stopped in /usr/tests/rump/rumpkern/t_sp:


rump/rumpkern/t_sp (749/847): 10 test cases
    basic: [0.376696s] Passed.
    fork_fakeauth: [0.381822s] Passed.
    fork_pipecomm: [0.375336s] Passed.
    fork_simple: [0.380721s] Passed.
    reconnect: [301.288133s] Failed: Test case timed out after 300 seconds
    signal: [0.414238s] Passed.
    sigsafe: [5.543676s] Passed.
    stress_killer: [11.301062s] Passed.
    stress_long: 

After watching it hang for > 10 minutes I dropped into ddb on the console,
but that did not work well either:

Stopped in pid 2868.16103 (rump_server) at      netbsd:zstty_stint+0x1d4:       
b       netbsd:zstty_stint+0x140
db{0}> bt
0x16c23ea0: at zsc_intr_hard+0x74
0x16c23ec0: at zshard+0x18
0x16c23ed0: at intr_deliver.isra.1+0x90
0x16c23ef0: at pic_handle_intr+0x178
0x16c23f20: at trapstart+0x6b0
[ 12392.8289365] trap: kernel read DSI trap @ 0xea73ff14 by 0x12b1e8 (DSISR 0x40000000, err=14), lr 0x12b798
[ 12392.8289365] panic: trap
[ 12392.8289365] cpu0: Begin traceback...
[ 12392.8289365] 0x16c23960: at vpanic+0x12c
[ 12392.8289365] 0x16c23990: at panic+0x50
[ 12392.8289365] 0x16c239d0: at trap+0x100
[ 12392.8289365] 0x16c23a80: kernel DSI read trap @ 0xea73ff14 by db_stack_trace_print+0x11c: srr1=0x32
[ 12392.8289365]             r1=0x16c23b50 cr=0x20244444 xer=0 ctr=0x10a274 dsisr=0x40000000
[ 12392.8289365] 0x16c23b50: at db_stack_trace_print+0x6c8
[ 12392.8289365] 0x16c23bc0: at db_command+0x12c
[ 12392.8289365] 0x16c23c60: at db_command_loop+0xd8
[ 12392.8289365] 0x16c23d40: at db_trap+0xe0
[ 12392.8289365] 0x16c23d70: at kdb_trap+0x120
[ 12392.8289365] 0x16c23db0: at trapstart+0x95c
[ 12392.8289365] saved LR(0x804d924a) is invalid.cpu0: End traceback...
[ 12392.8289365] halting CPU 1
[ 12392.8289365] dumpsys: TBD
[ 12392.8289365] rebooting


>How-To-Repeat:
s/a

>Fix:
n/a

>Audit-Trail:
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, port-macppc-maintainer@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: port-macppc/55340: kernel hang during rump test run + ddb failure
Date: Mon, 8 Jun 2020 10:10:01 +0900

 Hi.

 Could you try the patch attached in port-powerpc/55325:

    http://gnats.netbsd.org/55325

 With this patch, kernel does not hung for my Mac mini (uniprocessor):

 | Tests root: /usr/tests/rump/rumpkern
 |
 | t_sp (1/1): 10 test cases
 |     basic: [1.030238s] Passed.
 |     fork_fakeauth: [0.528322s] Passed.
 |     fork_pipecomm: [0.530721s] Passed.
 |     fork_simple: [0.545351s] Passed.
 |     reconnect: [3.729749s] Passed.
 |     signal: [0.952547s] Passed.
 |     sigsafe: [6.074471s] Passed.
 |     stress_killer: [323.604229s] Passed.
 |     stress_long: [604.847985s] Failed: Test case timed out after 300 seconds
 |     stress_short: [603.281914s] Failed: Test case timed out after 300 seconds
 | [1545.160124s]
 |
 | Failed test cases:
 |     t_sp:stress_long, t_sp:stress_short
 |
 | Summary for 1 test programs:
 |     8 passed test cases.
 |     2 failed test cases.
 |     0 expected failed test cases.
 |     0 skipped test cases.

 Or, DIAGNOSTIC may help you. (Unfortunately, DIAGNOSTIC is disabled for
 macppc for a very long time.) If your machine has gem(4), a dirty hack
 described in port-macppc/55326:

    http://gnats.netbsd.org/55326

 is necessary in order to enable DIAGNOSTIC.

 Thanks,
 rin

From: Martin Husemann <martin@duskware.de>
To: Rin Okuyama <rokuyama.rk@gmail.com>
Cc: gnats-bugs@netbsd.org
Subject: Re: port-macppc/55340: kernel hang during rump test run + ddb failure
Date: Sat, 20 Jun 2020 13:37:12 +0200

 On Mon, Jun 08, 2020 at 10:10:01AM +0900, Rin Okuyama wrote:
 > Hi.
 > 
 > Could you try the patch attached in port-powerpc/55325:
 > 
 >   http://gnats.netbsd.org/55325
 > 
 > With this patch, kernel does not hung for my Mac mini (uniprocessor):

 I applied the patch and enabled DIAGNOSTIC - this makes the test run a lot
 better!

 It still dies due to PR 55272, but that is unrelated.

 Martin

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.