NetBSD Problem Report #56087

From he@smistad.uninett.no  Thu Apr  1 08:56:02 2021
Return-Path: <he@smistad.uninett.no>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 1BE6C1A9217
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  1 Apr 2021 08:56:02 +0000 (UTC)
Message-Id: <20210401085556.940CE43FC5B@smistad.uninett.no>
Date: Thu,  1 Apr 2021 10:55:56 +0200 (CEST)
From: he@NetBSD.org
Reply-To: he@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: dual-CPU macppc panic: pr_phinpage_check: [pmap_upvopl] ...
X-Send-Pr-Version: 3.95

>Number:         56087
>Category:       kern
>Synopsis:       dual-CPU macppc panic: pr_phinpage_check: [pmap_upvopl] ...
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Apr 01 09:00:01 +0000 2021
>Last-Modified:  Thu Apr 01 13:10:01 +0000 2021
>Originator:     he@NetBSD.org
>Release:        NetBSD 9.99.81
>Organization:
	I try...
>Environment:
System: NetBSD bramley.urc.uninett.no 9.99.81 NetBSD 9.99.81 (GENERIC.MP) #0: Tue Mar 30 19:45:04 UTC 2021  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/macppc/compile/GENERIC.MP macppc
Architecture: powerpc
Machine: macppc
>Description:

	I recently had to re-install this "mirror drive doors" G4
	powermac because I got errors writing to the (old, re-used)
	drive.

	While extracting the new src.tar.gz, and both running "systat
	-w 5 vm" and just after having started "top", I see out of the
	corner of my eye the console screen go black, and once it's
	back up again, I see this backtrace in the message buffer:

[ 4454.5805633] panic: pr_phinpage_check: [pmap_upvopl] item 0x4c87ce0 poolid 36 != 1
[ 4454.5805633] cpu0: Begin traceback...
[ 4454.5805633] 0x37707a80: at vpanic+0x12c
[ 4454.5805633] 0x37707ab0: at panic+0x50
[ 4454.5805633] 0x37707af0: at pool_put+0x580
[ 4454.5805633] 0x37707b40: at pmap_pvo_free_list.isra.0+0x6c
[ 4454.5805633] 0x37707b60: at pmap_remove+0x10c
[ 4454.5805633] 0x37707b90: at uvm_pagermapout+0x24
[ 4454.5805633] 0x37707bc0: at genfs_getpages+0x12d4
[ 4454.5805633] 0x37707cd0: at VOP_GETPAGES+0x6c
[ 4454.5805633] 0x37707d10: at ufs_balloc_range+0x184
[ 4454.5805633] 0x37707d80: at ffs_write+0x67c
[ 4454.5805633] 0x37707e10: at VOP_WRITE+0x50
[ 4454.5805633] 0x37707e40: at vn_write+0x140
[ 4454.5805633] 0x37707e70: at dofilewrite+0x8c
[ 4454.5805633] 0x37707ec0: at syscall+0x2a4
[ 4454.5805633] 0x37707f20: user SC trap #4 by 0xfdd2451c: srr1=0xd032
[ 4454.5805633]             r1=0xffffe4e0 cr=0x44444484 xer=0 ctr=0xfdd24514
[ 4454.5805633] cpu0: End traceback...
[ 4454.5805633] halting CPU 1
[ 4454.5805633] dumpsys: TBD
[ 4454.5805633] rebooting

	With this particular drive I've not seen any write errors on
	the console, so I don't think that is the cause of this problem.

	I'm running with both CPUs active if that makes any
	difference:

[   1.0000000] mainbus0 (root)
[   1.0000000] cpu0 at mainbus0: 7455 (Revision 3.3), ID 0 (primary)
[   1.0000000] cpu0: HID0 0x84d0c1bc<EMCP,TBEN,HIGH_BAT_EN,NAP,DPM,ICE,DCE,XBSEN,SGE,BTIC,LRSTK,FOLD,BHT>, powersave: 1
[   1.0000000] cpu0: 1250.00 MHz, 256KB L2 cache no parity, 2MB no-parity L3 cache (DDR SRAM) at 6:1 ratio
[   1.0000000] cpu1 at mainbus0: 7455 (Revision 3.3), ID 1
[   1.0000000] cpu1: HID0 0x84d0c1bc<EMCP,TBEN,HIGH_BAT_EN,NAP,DPM,ICE,DCE,XBSEN,SGE,BTIC,LRSTK,FOLD,BHT>, powersave: 1
[   1.0000000] cpu1: 1250.00 MHz, 256KB L2 cache no parity, 2MB no-parity L3 cache (DDR SRAM) at 6:1 ratio

	I realize that this *may* be a powerpc- or even macppc-
	specific problem, but so far I've categorized is as a "kern"
	bug.  (I did notice that the 9.1 MP kernel doesn't really come
	properly up, seemingly stuck before initializing IPsec, though
	that's probably a different problem.)

>How-To-Repeat:
	See above.

>Fix:
	No idea, sorry...

>Audit-Trail:
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, Havard Eidnes <he@netbsd.org>
Cc: 
Subject: Re: kern/56087: dual-CPU macppc panic: pr_phinpage_check:
 [pmap_upvopl] ...
Date: Thu, 1 Apr 2021 18:06:12 +0900

 Probably same problem as reported by port-powerpc/55325:
 	http://gnats.netbsd.org/55325

 The patch attached to the PR may improve the situation, but
 we need real fix for pmap for powepc/oea...

 Thanks,
 rin

From: Havard Eidnes <he@NetBSD.org>
To: rokuyama.rk@gmail.com
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56087: dual-CPU macppc panic: pr_phinpage_check:
 [pmap_upvopl] ...
Date: Thu, 01 Apr 2021 15:08:17 +0200 (CEST)

 > Probably same problem as reported by port-powerpc/55325:
 > 	http://gnats.netbsd.org/55325
 >
 > The patch attached to the PR may improve the situation, but
 > we need real fix for pmap for powepc/oea...

 First, thanks for the suggestion!

 Sadly, in my case it didn't.  I applied the patch quoted in the
 PR to -current, and built the GENERIC.MP kernel and tried booting
 it.  It seemed to have a negative influence on the interrupt
 system, and I didn't fully get the new kernel up; the console
 showed (transcribed from a photo of the screen):

 [   4.6999901] virq != 0, value 10
 [   4.6999901] virq != 0, value 10
 [   4.6999901] virq != 0, value 10
 [   4.6999901] virq != 0, value 10
 [   4.6999901] virq != 0, value 10
 [   4.7100758] virq != 0, value 10
 [   4.7100758] virq != 0, value 10
 [   4.7100758] virq != 0, value 10
 [   4.7100758] virq != 0, value 10
 [   4.7100758] virq != 0, value 10
 [   4.7200755] uhidev 0 at uhub2 port 1 configuration 1
 [   4.7200755] uhidev0: Mitsumi Electric (0x05ac) Apple Extended USB Keyboard
 [   4.7200755] virq != 0, value 10
 [   4.7200755] virq != 0, value 10
 [   4.7200755] virq != 0, value 10
 [   4.7200755] ukbd0 at uhiddev0
 [   4.7200755] virq != 0, value 1
 [   4.7200755] virq != 0, value 10
 [   4.7200755] virq != 0, value 1
 [   4.7200755] virq != 0, value 10

 etc. ending in

 [   5.0999900] virq != 0, value 10
 [   5.1299891] virq != 0, value 1
 [   5.1299891] wskbd0 at ukbd0: console keyboard using wsdisplay0
 [   5.1299891] uhidev1 at uhub2 port 1 configurat

 and that's it; it seems to be wedged there.

 Also, as far as I know there is no serial console possibility on
 this machine, so I can't capture the start of the boot messages,
 and the power button is the only possible next action.

 Aand... I just now heard the "gong" sound from my basement where
 this Mac is placed, indicating that this, perhaps unsurprisingly,
 is also a problem for single-CPU systems (I got the same
 backtrace as originally reported in this PR in my kernel message
 buffer).

 Regards,

 - Havard

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.