NetBSD Problem Report #53611

From tsutsui@ceres.dti.ne.jp  Sun Sep 16 16:49:42 2018
Return-Path: <tsutsui@ceres.dti.ne.jp>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 79F327A156
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 16 Sep 2018 16:49:42 +0000 (UTC)
Message-Id: <201809161649.w8GGnZEh006783@ceres.dti.ne.jp>
Date: Mon, 17 Sep 2018 01:49:35 +0900 (JST)
From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Reply-To: tsutsui@ceres.dti.ne.jp
To: gnats-bugs@NetBSD.org
Cc: tsutsui@ceres.dti.ne.jp
Subject: NetBSD/pmax 8.0 RAMDISK kernel hang on 3MIN (5000/125)
X-Send-Pr-Version: 3.95

>Number:         53611
>Category:       port-pmax
>Synopsis:       NetBSD/pmax 8.0 RAMDISK kernel hang on 3MIN (5000/125)
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    tsutsui
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Sep 16 16:50:00 +0000 2018
>Closed-Date:    Fri Nov 16 15:02:20 +0000 2018
>Last-Modified:  Fri Nov 16 15:02:20 +0000 2018
>Originator:     Izumi Tsutsui
>Release:        NetBSD 8.0
>Organization:
>Environment:
System: NetBSD 8.0 (RAMDISK) #0: Tue Jul 17 14:59:51 UTC 2018 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/pmax/compile/RAMDISK
Architecture: mipsel
Machine: pmax
>Description:
NetBSD/pmax 8.0 RAMDISK kernel hangs during boot
(right after framebuffer is attached?)

---
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
    2018 The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 8.0 (RAMDISK) #0: Tue Jul 17 14:59:51 UTC 2018
	mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/pmax/compile/RAMDISK
DECstation 5000/125 (3MIN)
total memory = 98304 KB
avail memory = 90456 KB
mainbus0 (root)
cpu0 at mainbus0: MIPS R3000A CPU (0x230) Rev. 3.0 with MIPS R3010 FPC Rev. 3.0
cpu0: 64KB/4B direct-mapped Instruction cache, 64 TLB entries
cpu0: 64KB/4B direct-mapped write-through Data cache
tc0 at mainbus0: 12.5 MHz clock
ioasic0 at tc0 slot 3 offset 0x0
le0 at ioasic0 offset 0xc0000: address 08:00:2b:24:e1:af
le0: 32 receive buffers, 8 transmit buffers
zsc0 at ioasic0 offset 0x100000
zsc0: channel 0 not configured
zstty0 at zsc0 channel 1
zsc1 at ioasic0 offset 0x180000
lkkbd0 at zsc1 channel 0
lkkbd0: no keyboard
wskbd0 at lkkbd0 mux 1
zstty1 at zsc1 channel 1 (console i/o)
mcclock0 at ioasic0 offset 0x200000: mc146818 or compatible
asc0 at ioasic0 offset 0x300000: NCR53C94, 25MHz, SCSI ID 7
scsibus0 at asc0: 8 targets, 8 luns per target
device PMAF-AA  at tc0 slot 2 offset 0x0 not configured
tfb0 at tc0 slot 1 offset 0x0: 1280x1024, 8,24bpp

---

Disabling framebuffer by userconf does not help:

---
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
    2018 The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 8.0 (RAMDISK) #0: Tue Jul 17 14:59:51 UTC 2018
	mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/pmax/compile/RAMDISK
DECstation 5000/125 (3MIN)
total memory = 98304 KB
avail memory = 90456 KB
userconf: configure system autoconfiguration:
uc> disable tfb*
[ 32] tfb* disabled
uc> quit
Continuing...
mainbus0 (root)
cpu0 at mainbus0: MIPS R3000A CPU (0x230) Rev. 3.0 with MIPS R3010 FPC Rev. 3.0
cpu0: 64KB/4B direct-mapped Instruction cache, 64 TLB entries
cpu0: 64KB/4B direct-mapped write-through Data cache
tc0 at mainbus0: 12.5 MHz clock
ioasic0 at tc0 slot 3 offset 0x0
le0 at ioasic0 offset 0xc0000: address 08:00:2b:24:e1:af
le0: 32 receive buffers, 8 transmit buffers
zsc0 at ioasic0 offset 0x100000
zsc0: channel 0 not configured
zstty0 at zsc0 channel 1
zsc1 at ioasic0 offset 0x180000
lkkbd0 at zsc1 channel 0
lkkbd0: no keyboard
wskbd0 at lkkbd0 mux 1
zstty1 at zsc1 channel 1 (console i/o)
mcclock0 at ioasic0 offset 0x200000: mc146818 or compatible
asc0 at ioasic0 offset 0x300000: NCR53C94, 25MHz, SCSI ID 7
scsibus0 at asc0: 8 targets, 8 luns per target
device PMAF-AA  at tc0 slot 2 offset 0x0 not configured
device PMAG-JA  at tc0 slot 1 offset 0x0 not configured
sfb0 at tc0 slot 0 offset 0x0: 1280x1024, 8bpp

---
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
    2018 The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 8.0 (RAMDISK) #0: Tue Jul 17 14:59:51 UTC 2018
	mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/pmax/compile/RAMDISK
DECstation 5000/125 (3MIN)
total memory = 98304 KB
avail memory = 90456 KB
userconf: configure system autoconfiguration:
uc> disable tfb*
[ 32] tfb* disabled
uc> disable sfb*
[ 33] sfb* disabled
uc> quit
Continuing...
mainbus0 (root)
cpu0 at mainbus0: MIPS R3000A CPU (0x230) Rev. 3.0 with MIPS R3010 FPC Rev. 3.0
cpu0: 64KB/4B direct-mapped Instruction cache, 64 TLB entries
cpu0: 64KB/4B direct-mapped write-through Data cache
tc0 at mainbus0: 12.5 MHz clock
ioasic0 at tc0 slot 3 offset 0x0
le0 at ioasic0 offset 0xc0000: address 08:00:2b:24:e1:af
le0: 32 receive buffers, 8 transmit buffers
zsc0 at ioasic0 offset 0x100000
zsc0: channel 0 not configured
zstty0 at zsc0 channel 1
zsc1 at ioasic0 offset 0x180000
lkkbd0 at zsc1 channel 0
lkkbd0: no keyboard
wskbd0 at lkkbd0 mux 1
zstty1 at zsc1 channel 1 (console i/o)
mcclock0 at ioasic0 offset 0x200000: mc146818 or compatible
asc0 at ioasic0 offset 0x300000: NCR53C94, 25MHz, SCSI ID 7
scsibus0 at asc0: 8 targets, 8 luns per target
device PMAF-AA  at tc0 slot 2 offset 0x0 not configured
device PMAG-JA  at tc0 slot 1 offset 0x0 not configured
device PMAGB-BA at tc0 slot 0 offset 0x0 not configured

---

On gxemul-0.3.6.2 (which has partial 3MIN support)
it says there is a NULL pointer derference:
> [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x00000000 <(no symbol)> ]

---
% ~/gxemul-0.3.6.2/gxemul -X -e 3min -d gxemul-pmax.img -d b:NetBSD-8.0-pmax.iso
GXemul-0.3.6.2   Copyright (C) 2003-2005  Anders Gavare
Read the source code and/or documentation for other Copyright messages.

Simple setup...
    net: simulating 10.0.0.0/8 (max outgoing: TCP=60, UDP=60)
        simulated gateway: 10.0.0.254 (60:50:40:30:20:10)
            using nameserver 192.168.1.1
    machine "default":
        memory: 32 MB
        bintrans: i386, 16 MB translation cache at 0xb35f0000
        cpu0: R3000A (I+D = 8+8 KB)
        device  0 at 0x001c000000: dec_ioasic
        device  1 at 0x001c0c0000: le_sram (dyntrans R/W)
        device  2 at 0x001c1c0000: le [10:20:30:00:00:10]
        device  3 at 0x001c100000: scc
        device  4 at 0x001c180000: scc
        device  5 at 0x001c200000: mc146818
        device  6 at 0x001c300000: asc
        device  7 at 0x001c340000: asc_dma_address_reg (dyntrans R/W)
        device  8 at 0x001c380000: asc_dma (dyntrans R/W)
        device  9 at 0x0010000000: fb [PMAG-BA] (dyntrans R/W)
        device 10 at 0x0010200000: bt459
        device 11 at 0x0010300000: bt459_irq
        device 12 at 0x0010380000: turbochannel [PMAG-BA]
        device 13 at 0x0014000000: turbochannel
        device 14 at 0x0018000000: turbochannel
        device 15 at 0x00a0000000: ram [mirror]
        machine: DECstation 5000/112 or 145 (3MIN, KN02BA) (33.00 MHz)
        bootstring(+bootarg): boot 3/rz1/ -a
        diskimage: gxemul-pmax.img
            SCSI DISK id 0, read/write, 2048 MB (4194304 sectors)
        diskimage: NetBSD-8.0-pmax.iso
            SCSI CD-ROM id 1, read-only, 344 MB (705488 sectors) (BOOT)
        DEC boot: loadaddr=0xa0700000, pc=0xa0700000: 13 blocks
        starting cpu0 at 0xa0700000
-------------------------------------------------------------------------------


NetBSD/pmax 8.0 ISO 9660 Primary Bootstrap

NetBSD/pmax 8.0 Secondary Bootstrap, Revision 1.5 (Tue Jul 17 14:59:51 UTC 2018)

Boot: 3/rz1/
Loading: 3/rz1/netbsd.pmax
open 3/rz1/netbsd.pmax: No such file or directory
Loading: 3/rz1/netbsd
5730096+80672=0x58aea8
Starting at 0x80030000

[ dec_ioasic: unimplemented write to address 0x40160, data=0x0000000000000003 ]
[ dec_ioasic: unimplemented write to address 0x40170, data=0x000000000000000e ]
segment  0 start 00000000 size 02000000
phys segment: 0x2000000 @ 0
adding 0x243000 @ 0x5bd000 to freelist 1
adding 0x1800000 @ 0x800000 to freelist 0

dec_ioasic: unimplemented read from address 0x0 ]
[ dec_ioasic: unimplemented write to address 0x40020, data=0x0000000004100000 ]
[ dec_ioasic: unimplemented write to address 0x401b0, data=0x0000000000000000 ]
[ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x00000000 <(no symbol)> ]
a0 points to: [00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00]
PROM emulation: unimplemented JUMP TABLE vector 0x200 (decimal function 64)
GXemul> quit
Press enter to quit.
---

According to gxemul -t (trace) output, it fails in ubc_init()?

---
 :
        <ssp_init(&softint_lock,0xc1d5e0f8,0,0x804bae40,..)>
        <ubc_init(&softint_lock,0xc1d5e0f8,0,0x804bae40,..)>
          <uvm_obj_init(0x804acf74,0x803efd8c,1,-2,..)>
 :
          <kmem_zalloc(0x10000,1,0,-2,..)>
 :
            <0x803cc750(0xc0030000,0,0x10000,0x80483d60,..)>
a0 points to: [00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00]
PROM emulation: unimplemented JUMP TABLE vector 0x280 (decimal function 80)
GXemul> 

(note 0x803cc750 in GENERIC is memset(9))
---

Note:
- The same GENERIC kernel on gxemul 3MAX emulation works
  (boots upto multiuser) without problem.
- NetBSD/pmax 5.99.48 GENERIC kernel also works on my 3MIN.
- NetBSD/pmax 7.2 GENERIC kernel seems to have the same issue.

>How-To-Repeat:
Install old gxemul 0.3.6.2 (which has partial 3MIN support)
and boot NetBSD/pmax 8.0 netbsd-INSTALL with 3MIN emulation.

Or boot netbsd-INSTALL from NetBSD/pmax 8.0 on real 5000/125.

>Fix:
No idea.

>Release-Note:

>Audit-Trail:
From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: gnats-bugs@NetBSD.org
Cc: tsutsui@ceres.dti.ne.jp
Subject: Re: port-pmax/53611: NetBSD/pmax 8.0 RAMDISK kernel hang on 3MIN (5000/125)
Date: Mon, 17 Sep 2018 05:32:11 +0900

 > NetBSD/pmax 8.0 RAMDISK kernel hangs during boot
  :
 > On gxemul-0.3.6.2 (which has partial 3MIN support)
 > it says there is a NULL pointer derference:
 > > [ warning: LOW reference: vaddr=0x00000000, exception TLBL, pc=0x00000000 <(no symbol)> ]
  :
 > According to gxemul -t (trace) output, it fails in ubc_init()?

 This was not correct, but maybe caused by random interrupts.

 After further investigation using gxemul -t, disabling framebuffer
 interrupts in the following line solves the hang on gxemul:
  https://nxr.netbsd.org/xref/src/sys/dev/tc/cfb.c?r=1.61#283

 According to dec_3min.c, 3MIN seems to have a quirk that
 interrups from each TC slot are directly connected to
 MIPS_INT_MASK_[012] lines, not via IOASIC as other PMAXen.

 It looks we had to remove spl adjustment code in dec_3min_intr_establish()
 function after mips interrupt reorganazation between netbsd-5 and -6.
 The following patch seems to solve the problem on my 3MIN
 (and vsync interrupts are accounted properly per systat(1) output),
 but I can't confirm if it's correct (I don't know about the mips
 interrupt handler changes).

 Index: pmax/dec_3min.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/pmax/pmax/dec_3min.c,v
 retrieving revision 1.73
 diff -u -p -d -r1.73 dec_3min.c
 --- pmax/dec_3min.c	24 Mar 2014 19:31:40 -0000	1.73
 +++ pmax/dec_3min.c	16 Sep 2018 20:12:04 -0000
 @@ -292,12 +292,7 @@ dec_3min_intr_establish(device_t dev, vo
  	case SYS_DEV_OPT0:
  	case SYS_DEV_OPT1:
  	case SYS_DEV_OPT2:
 -		/* it's an option slot */
 -		{
 -		int s = splhigh();
 -		s |= mask;
 -		splx(s);
 -		}
 +		/* it's an option slot and handled via MIPS_INT_MASK_[012] */
  		break;
  	default:
  		/* it's a baseboard device going via the IOASIC */

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: port-pmax-maintainer@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: re: port-pmax/53611: NetBSD/pmax 8.0 RAMDISK kernel hang on 3MIN (5000/125)
Date: Mon, 17 Sep 2018 10:39:55 +1000

 >   	case SYS_DEV_OPT0:
 >   	case SYS_DEV_OPT1:
 >   	case SYS_DEV_OPT2:
 >  -		/* it's an option slot */
 >  -		{
 >  -		int s = splhigh();
 >  -		s |= mask;
 >  -		splx(s);
 >  -		}
 >  +		/* it's an option slot and handled via MIPS_INT_MASK_[012] */
 >   		break;

 this code is very mind boggling, but it's clearly wrong in the modern
 mips interrupt world, compared to what 'mask' is set to.  even then,
 why does intr_establish change the ipl!?

 i'm pretty sure you should commit this.  nicely found.

 thanks!


 .mrg.

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: mrg@eterna.com.au
Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: port-pmax/53611: NetBSD/pmax 8.0 RAMDISK kernel hang on 3MIN (5000/125)
Date: Mon, 17 Sep 2018 11:14:40 +0900

 > >  -		/* it's an option slot */
 > >  -		{
 > >  -		int s = splhigh();
 > >  -		s |= mask;
 > >  -		splx(s);
 > >  -		}
 > >  +		/* it's an option slot and handled via MIPS_INT_MASK_[012] */

 > this code is very mind boggling, but it's clearly wrong in the modern
 > mips interrupt world, compared to what 'mask' is set to.  even then,
 > why does intr_establish change the ipl!?

 It looks enabling interrupts for MIPS_INT_MASK_[012] for TC slots.
 Maybe they are disabled even on spl0() if no interrupt handler
 is established.

 It means if sfb(4) and tfb(4) drivers (or other TC devices) are
 disabled (i.e. no handler is registered) with -current code
 the kernel still hangs by unhandled stray interrupts once after
 interrupts are enabled.

 I'm not sure how I can enable/disable MIPS_INT_[012] in intr_establish()
 and intr_disestablish() functions, though..

 ---
 Izumi Tsustui

From: "Izumi Tsutsui" <tsutsui@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53611 CVS commit: src/sys/arch/pmax/pmax
Date: Mon, 17 Sep 2018 16:52:29 +0000

 Module Name:	src
 Committed By:	tsutsui
 Date:		Mon Sep 17 16:52:29 UTC 2018

 Modified Files:
 	src/sys/arch/pmax/pmax: dec_3min.c

 Log Message:
 Fix hangup after framebuffers are attached on 3MIN.  PR port-pmax/53611

 Ok'ed by mrg@.  Should be pulled up to netbsd-7 and netbsd-8.


 To generate a diff of this commit:
 cvs rdiff -u -r1.73 -r1.74 src/sys/arch/pmax/pmax/dec_3min.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: port-pmax-maintainer->tsutsui
Responsible-Changed-By: tsutsui@NetBSD.org
Responsible-Changed-When: Tue, 18 Sep 2018 14:14:34 +0000
Responsible-Changed-Why:


State-Changed-From-To: open->needs-pullups
State-Changed-By: tsutsui@NetBSD.org
State-Changed-When: Tue, 18 Sep 2018 14:14:34 +0000
State-Changed-Why:
I'll send pullup requests later..


State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: tsutsui@NetBSD.org
State-Changed-When: Sat, 22 Sep 2018 09:28:11 +0000
State-Changed-Why:
[pullup-8 #1033] [pullup-7 #1641]


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53611 CVS commit: [netbsd-8] src/sys/arch/pmax/pmax
Date: Sun, 23 Sep 2018 17:51:09 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Sep 23 17:51:09 UTC 2018

 Modified Files:
 	src/sys/arch/pmax/pmax [netbsd-8]: dec_3min.c

 Log Message:
 Pull up following revision(s) (requested by tsutsui in ticket #1033):

 	sys/arch/pmax/pmax/dec_3min.c: revision 1.74

 Fix hangup after framebuffers are attached on 3MIN.  PR port-pmax/53611

 Ok'ed by mrg@.  Should be pulled up to netbsd-7 and netbsd-8.


 To generate a diff of this commit:
 cvs rdiff -u -r1.73 -r1.73.22.1 src/sys/arch/pmax/pmax/dec_3min.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Stephen Borrill" <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53611 CVS commit: [netbsd-7] src/sys/arch/pmax/pmax
Date: Tue, 30 Oct 2018 10:17:23 +0000

 Module Name:	src
 Committed By:	sborrill
 Date:		Tue Oct 30 10:17:23 UTC 2018

 Modified Files:
 	src/sys/arch/pmax/pmax [netbsd-7]: dec_3min.c

 Log Message:
 Pull up the following revisions(s) (requested by tsutsui in ticket #1641):
 	sys/arch/pmax/pmax/dec_3min.c:	revision 1.74

 Fix hangup after framebuffers are attached on 3MIN.
 Addresses PR port-pmax/53611


 To generate a diff of this commit:
 cvs rdiff -u -r1.73 -r1.73.4.1 src/sys/arch/pmax/pmax/dec_3min.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: tsutsui@NetBSD.org
State-Changed-When: Fri, 16 Nov 2018 15:02:20 +0000
State-Changed-Why:
Pullups complete.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.