NetBSD Problem Report #57197

From jdbaker@consolidated.net  Tue Jan 24 04:08:11 2023
Return-Path: <jdbaker@consolidated.net>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id EF47D1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 24 Jan 2023 04:08:10 +0000 (UTC)
Message-Id: <20230124040234.706CA13E11B@spike.technoskunk.fur>
Date: Mon, 23 Jan 2023 22:02:34 -0600 (CST)
From: jdbaker@consolidated.net
Reply-To: jdbaker@consolidated.net
To: gnats-bugs@NetBSD.org
Subject: GENERIC kernel crash on pentium-III and earlier CPUs
X-Send-Pr-Version: 3.95

>Number:         57197
>Category:       port-i386
>Synopsis:       GENERIC kernel crash on pentium-III and earlier CPUs
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-i386-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jan 24 04:10:00 +0000 2023
>Closed-Date:    Wed Aug 02 13:51:03 +0000 2023
>Last-Modified:  Wed Aug 02 13:51:03 +0000 2023
>Originator:     John D. Baker
>Release:        NetBSD 10.0_BETA
>Organization:
>Environment:
NetBSD 10.0_BETA (PLEXOR) #5: Sun Jan 22 18:35:43 CST 2023 sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/compile/PLEXOR
Architecture: i386
Machine: i386
>Description:

Booting the GENERIC kernel (or one which includes the GENERIC config)
on a system with a pentium-III or lesser CPU (VIA Samuel, Am5x86) crashes
as follows:

pentium-III:
[   1.0000000] NetBSD 10.0_BETA (PLEXOR) #4: Wed Jan 18 21:10:13 CST 2023
[   1.0000000]
sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/com
pile/PLEXOR
[   1.0000000] total memory = 510 MB
[   1.0000000] avail memory = 488 MB
[...]
[   1.0000030] cpu0 at mainbus0
[   1.0000030] cpu0: Intel 686-class, 936MHz, id 0x68a
[   1.0000030] cpu0: node 0, package 0, core 0, smt 0
[...]
[  30.4197804] fatal page fault in supervisor mode
[  30.4197804] trap type 6 code 0 eip 0xc0617718 cs 0x8 eflags 0x10246 cr2
0x1000003c ilevel 0x7 esp 0xc0a29500
[  30.4197804] curlwp 0xc16ac040 pid 0 lid 2 lowest kstack 0xd7c902c0
kernel: supervisor trap page fault, code=0
Stopped in pid 0.2 (system) at  netbsd:hardclock+0x23:  movl 3c(%esi),%eax
db{0}> bt
hardclock(10000000,d7a92ec4,c02864a1,0,10000000,c068d9f3,c027fc14,16b6000,72
00,d7f375f0) at netbsd:hardclock+0x23
clockintr(0,10000000,c068d9f3,c027fc14,16b6000,7200,d7f375f0,0,c1761000,c010
313a) at netbsd:clockintr+0x36
intr_kdtrace_wrapper(c1990540,d7a92ed4,6,d7a90010,c0620030,c16b0010,d7a90010
,c16ac040,1,d7a92f34) at netbsd:intr_kdtrace_wrapper+0x21
Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
--- interrupt ---
cx8_spllower(1,0,d7a92f6c,c02864a1,c186c800,c1761000,c1990540,7,0,c186ba00)
at netbsd:cx8_spllower+0x14
intr_biglock_wrapper(c186be80,d7c92f10,0,0,0,0,0,0,0,0) at
netbsd:intr_biglock_wrapper+0x68
--- switch to interrupt stack ---
Xintr_legacy5() at netbsd:Xintr_legacy5+0xda
--- interrupt ---
x86_stihlt(c16ac040,0,c0637e70,0,0,c168f100,c0a10100,c168f100,d7c90000,c168f
100)
 at netbsd:x86_stihlt+0x5
idle_loop(c16ac040,cbb000,cc4000,0,c01005a8,0,0,0,0,0) at
netbsd:idle_loop+0x153


Am5x86:
[   1.0000000] NetBSD 10.0_BETA (GENERIC) #4: Wed Jan 18 20:46:54 CST 2023
[   1.0000000]
sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/com
pile/GENERIC
[   1.0000000] total memory = 65148 KB
[   1.0000000] avail memory = 38620 KB
[...]
[   1.0000040] cpu0: AMD 486-class, id 0x4f4
[   1.0000040] cpu0: node 0, package 0, core 0, smt 0
[...]
[   1.0000040] fatal page fault in supervisor mode
[   1.0000040] trap type 6 code 0 eip 0xc0d3d7d8 cs 0xc57b0008 eflags
0x10246 cr2 0x3c ilevel 0x7 esp 0
[   1.0000040] curlwp 0xc165a840 pid 0 lid 0 lowest kstack 0xc19f32c0
kernel: supervisor trap page fault, code=0
Stopped in pid 0.0 (system) at  netbsd:hardclock+0x23:  movl
3c(%esi),%eax
db{0}> bt
hardclock(0,0,c57bff6c,c04ac8f1,0,0,0,0,0,0) at netbsd:hardclock+0x23
clockintr(0,0,0,0,0,0,0,0,c1c72000,c010322a) at netbsd:clockintr+0x2a
intr_kdtrace_wrapper(c1c33b80,c19f5d9c,0,0,0,0,0,0,0,0) at
netbsd:intr_kdtrace_wrapper+0x21
--- switch to interrupt stack ---
Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
--- interrupt ---
outb(c16260c0,c1623f80,0,20,1,0,0,c16c5a80,c19f5e94,0) at netbsd:outb+0x9
intr_establish_xname(0,c16260c0,0,1,7,c04c96b5,0,0,c134f916,0) at
netbsd:intr_establish_xname+0x2ba
isa_intr_establish_xname(0,0,1,7,c04c96b5,0,c134f916,c19f5f14,c04c9baf,0) at
netbsd:isa_intr_establish_xname+0x91
isa_intr_establish(0,0,1,7,c04c96b5,0,c19f5f60,c0d3d19a,c04b6858,1000) at
netbsd:isa_intr_establish+0x3c
i8254_initclocks(c04b6858,1000,3,c11b0770,c6020000,c601f000,c1670b40,0,c19f5
f60,c0e5f527) at netbsd:i8254_initclocks+0x3a
initclocks(3,0,64,0,0,0,0,0,2a6a000,0) at netbsd:initclocks+0x1c
main(0,0,0,0,0,0,0,0,0,0) at netbsd:main+0x365


VIA "Samuel":
[   1.0000000] NetBSD 10.0_BETA (NEOWARE) #4: Wed Jan 18 21:19:15 CST 2023
[   1.0000000] 	sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/compile/NEOWARE
[   1.0000000] total memory = 959 MB
[   1.0000000] avail memory = 930 MB
[...]
[   1.0000040] cpu0 at mainbus0
[   1.0000040] cpu0: VIA Samuel 2, id 0x673
[   1.0000040] cpu0: node 0, package 0, core 0, smt 0
[...]
[   1.0011329] fatal page fault in supervisor mode
[   1.0011329] trap type 6 code 0 eip 0xc055c308 cs 0x8 eflags 0x10246 cr2 0x3c ilevel 0x7 esp 0x6
[   1.0011329] curlwp 0xc095f140 pid 0 lid 0 lowest kstack 0xc0bf92c0
kernel: supervisor trap page fault, code=0
Stopped in pid 0.0 (system) at  netbsd:hardclock+0x23:  movl    3c(%esi),%eax
db{0}> bt
hardclock(0,d95d2f6c,c022c031,0,0,0,0,0,0,0) at netbsd:hardclock+0x23
clockintr(0,0,0,0,0,0,0,0,c1ecc000,c010313a) at netbsd:clockintr+0x36
intr_kdtrace_wrapper(c213c500,c0bfbd9c,0,0,0,0,0,0,0,0) at netbsd:intr_kdtrace_w
rapper+0x21
--- switch to interrupt stack ---
Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
--- interrupt ---
outb(c0955480,c0953540,0,20,1,0,0,c099ffc0,c0bfbe94,0) at netbsd:outb+0x9
intr_establish_xname(0,c0955480,0,1,7,c0248305,0,0,c0813d2b,0) at netbsd:intr_es
tablish_xname+0x2ba
isa_intr_establish_xname(0,0,1,7,c0248305,0,c0813d2b,c0bfbf14,c02487cc,0) at net
bsd:isa_intr_establish_xname+0x91
isa_intr_establish(0,0,1,7,c0248305,0,c0bfbf60,c055bcca,c0235c38,1000) at netbsd
:isa_intr_establish+0x3c
i8254_initclocks(c0235c38,1000,3,c0795264,da280000,da27f000,bfe8,c0bfbf60,c05c08
24,c06797d7) at netbsd:i8254_initclocks+0x3a
initclocks(3,5,64,0,0,0,0,0,16800000,0) at netbsd:initclocks+0x1c
main(0,0,0,0,0,0,0,0,0,0) at netbsd:main+0x365
db{0}> 


After bisecting the source, the fault was introduced with:

  src/sys/arch/x86/x86/intr.c r1.163

Pentium-4 and later CPUs appear to be unaffected.

Curiously, on the Am5x86, the NET4501 kernel (or one derived from it)
boots without issues.

The added code between r1.162 and r1.163 looks innocuous enough, but
it apparently trips up these older/lesser CPUs.

>How-To-Repeat:
Boot GENERIC (or GENERIC-derived) kernel on system with pentium-III or
earlier CPU.

>Fix:
Workaround:  revert "src/sys/arch/x86/x86/intr.c" to r1.162

>Release-Note:

>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier CPUs
Date: Tue, 24 Jan 2023 19:47:47 +1100

 > [   1.0000030] cpu0: Intel 686-class, 936MHz, id 0x68a
 > [   1.0000030] cpu0: node 0, package 0, core 0, smt 0
 > [...]
 > [  30.4197804] fatal page fault in supervisor mode
 > [  30.4197804] trap type 6 code 0 eip 0xc0617718 cs 0x8 eflags 0x10246 c=
 r2

 can you provide a little more of the dmesg above the 'fatal page fault'
 for all instances?  ie, what was the previous message / what was going
 on the system right now.

 some of the fault in the interrupt handler, but some are faulting when
 an interrupt is being established.

 > intr_biglock_wrapper(c186be80,d7c92f10,0,0,0,0,0,0,0,0) at netbsd:intr_b=
 iglock_wrapper+0x68
 > --- switch to interrupt stack ---
 > Xintr_legacy5() at netbsd:Xintr_legacy5+0xda
 > --- interrupt ---
 > x86_stihlt(c16ac040,0,c0637e70,0,0,c168f100,c0a10100,c168f100,d7c90000,c=
 168f100) at netbsd:x86_stihlt+0x5
 > idle_loop(c16ac040,cbb000,cc4000,0,c01005a8,0,0,0,0,0) at netbsd:idle_lo=
 op+0x153

 panicked after taking an interrupt.

 > hardclock(0,0,c57bff6c,c04ac8f1,0,0,0,0,0,0) at netbsd:hardclock+0x23
 > clockintr(0,0,0,0,0,0,0,0,c1c72000,c010322a) at netbsd:clockintr+0x2a
 > intr_kdtrace_wrapper(c1c33b80,c19f5d9c,0,0,0,0,0,0,0,0) at netbsd:intr_k=
 dtrace_wrapper+0x21
 > --- switch to interrupt stack ---
 > Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
 > --- interrupt ---
 > outb(c16260c0,c1623f80,0,20,1,0,0,c16c5a80,c19f5e94,0) at netbsd:outb+0x=
 9
 > intr_establish_xname(0,c16260c0,0,1,7,c04c96b5,0,0,c134f916,0) at netbsd=
 :intr_establish_xname+0x2ba

 this time, taking an interrupt while setting one up.

 > Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
 > --- interrupt ---
 > outb(c0955480,c0953540,0,20,1,0,0,c099ffc0,c0bfbe94,0) at netbsd:outb+0x=
 9
 > intr_establish_xname(0,c0955480,0,1,7,c0248305,0,0,c0813d2b,0) at netbsd=
 :intr_establish_xname+0x2ba

 as above.

 i can't tell where the "outb" call in intr_establish_xname() comes
 from.  in my GENERIC, this ends being:

    0xc04b0acd <+693>:     call   0xc0f86a47 <__x86_indirect_thunk_eax>
 >> 0xc04b0ad2 <+698>:     mov    %fs:0x304,%edx
    0xc04b0ad9 <+705>:     mov    -0x5c(%ebp),%eax
    0xc04b0adc <+708>:     cmp    %edx,%eax
    0xc04b0ade <+710>:     je     0xc04b0aed <intr_establish_xname+725>

 is intr_establish_xname+0x2ba().  this is x86_curcpu() it seems:

 (gdb) l *(intr_establish_xname+0x2ba)
 0xc04b0ad2 is in intr_establish_xname (./machine/cpu.h:53).
 48      __inline __always_inline static struct cpu_info * __unused
 ...
 53              __asm volatile("movl %%fs:%1, %0" :
 54                  "=3Dr" (ci) :
 55                  "m"
 56                  (*(struct cpu_info * const *)offsetof(struct cpu_info,=
  ci_self)));

 this turns to for me to be here:

 (gdb) l *(intr_establish_xname+708)
 ...
 969             /* All set up, so add a route for the interrupt and unmask=
  it. */
 970             (*pic->pic_addroute)(pic, ci, pin, idt_vec, type);
 971             if (ci =3D=3D curcpu() || !mp_online) {
 972                     intr_hwunmask_xcall(ih, NULL);


 John, can you check your netbsd.gdb and see if you can confirm the
 same addresses, or point out where yours are?

From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, jdbaker@consolidated.net
Subject: Re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier CPUs
Date: Tue, 24 Jan 2023 09:46:44 +0000

 This is the same problem as:
 https://mail-index.netbsd.org/tech-kern/2022/12/10/msg028574.html

 The legacy clockintr has a magic extra argument that intr_establish
 and intr_kdtrace_wrapper (and intr_biglock_wrapper) don't know about,
 requiring a sketchy function pointer cast (ding ding ding, alarm bell)
 to even pass to intr_establish, and intr_wrapper_kdtrace causes it to
 fail.

 The real fix is to change the bogus magic extra argument, but as a
 workaround we could pass through a flag to avoid intr_wrapper_kdtrace.
 Downside is it will not be possible to dtrace these interrupts, but
 upside is it doesn't require summoning Cthulhu to aid in legacy ISA
 clock surgery.

State-Changed-From-To: open->analyzed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Tue, 24 Jan 2023 09:50:52 +0000
State-Changed-Why:
uwe analyzed this last month


From: Taylor R Campbell <riastradh@NetBSD.org>
To: jdbaker@consolidated.net, uwe@NetBSD.org
Cc: gnats-bugs@netbsd.org, port-i386-maintainer@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier CPUs
Date: Tue, 24 Jan 2023 10:57:49 +0000

 This is a multi-part message in MIME format.
 --=_La7nkjIJilZyzQ04m7iYdJZjSuvrF8DG

 Can you try the attached patch?

 --=_La7nkjIJilZyzQ04m7iYdJZjSuvrF8DG
 Content-Type: text/plain; charset="ISO-8859-1"; name="x86clockintr"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="x86clockintr.patch"

 From e20d2a498a991899ab794174c5fcae888cbc84a4 Mon Sep 17 00:00:00 2001
 From: Taylor R Campbell <riastradh@NetBSD.org>
 Date: Tue, 24 Jan 2023 10:00:45 +0000
 Subject: [PATCH] x86/intr: Work around sleazy clockintr with a secret frame
  argument.

 PR kern/57197
 ---
  sys/arch/x86/include/intr_private.h | 39 +++++++++++++++++++++++++++++
  sys/arch/x86/isa/clock.c            | 14 +++++++----
  sys/arch/x86/x86/intr.c             | 16 ++++++++++--
  3 files changed, 62 insertions(+), 7 deletions(-)
  create mode 100644 sys/arch/x86/include/intr_private.h

 diff --git a/sys/arch/x86/include/intr_private.h b/sys/arch/x86/include/int=
 r_private.h
 new file mode 100644
 index 000000000000..183e904a7dba
 --- /dev/null
 +++ b/sys/arch/x86/include/intr_private.h
 @@ -0,0 +1,39 @@
 +/*	$NetBSD$	*/
 +
 +/*-
 + * Copyright (c) 2023 The NetBSD Foundation, Inc.
 + * All rights reserved.
 + *
 + * Redistribution and use in source and binary forms, with or without
 + * modification, are permitted provided that the following conditions
 + * are met:
 + * 1. Redistributions of source code must retain the above copyright
 + *    notice, this list of conditions and the following disclaimer.
 + * 2. Redistributions in binary form must reproduce the above copyright
 + *    notice, this list of conditions and the following disclaimer in the
 + *    documentation and/or other materials provided with the distribution.
 + *
 + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTO=
 RS
 + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIM=
 ITED
 + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICU=
 LAR
 + * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTO=
 RS
 + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF =
 THE
 + * POSSIBILITY OF SUCH DAMAGE.
 + */
 +
 +#ifndef	_X86_INTR_PRIVATE_H_
 +#define	_X86_INTR_PRIVATE_H_
 +
 +/*
 + * XXX This is a horrible kludge to let intr_establish_xname detect
 + * when it needs to handle a sleazy extra argument to the interrupt
 + * handler that's not part of the normal interrupt handler signature.
 + */
 +int i8254_clockintr(void *, struct intrframe *);
 +
 +#endif	/* _X86_INTR_PRIVATE_H_ */
 diff --git a/sys/arch/x86/isa/clock.c b/sys/arch/x86/isa/clock.c
 index c50704cd13a8..399bbd46c6ed 100644
 --- a/sys/arch/x86/isa/clock.c
 +++ b/sys/arch/x86/isa/clock.c
 @@ -152,6 +152,7 @@ __KERNEL_RCSID(0, "$NetBSD: clock.c,v 1.39 2020/05/29 1=
 2:30:41 rin Exp $");
  #include <x86/lock.h>
  #include <machine/specialreg.h>
  #include <x86/rtc.h>
 +#include <x86/intr_private.h>
 =20
  #ifndef __x86_64__
  #include "mca.h"
 @@ -188,8 +189,6 @@ void (*x86_delay)(unsigned int) =3D i8254_delay;
  void		sysbeep(int, int);
  static void     tickle_tc(void);
 =20
 -static int	clockintr(void *, struct intrframe *);
 -
  int 		sysbeepdetach(device_t, int);
 =20
  static unsigned int	gettick_broken_latch(void);
 @@ -371,8 +370,8 @@ tickle_tc(void)
 =20
  }
 =20
 -static int
 -clockintr(void *arg, struct intrframe *frame)
 +int
 +i8254_clockintr(void *arg, struct intrframe *frame)
  {
  	tickle_tc();
 =20
 @@ -555,9 +554,14 @@ i8254_initclocks(void)
  	/*
  	 * XXX If you're doing strange things with multiple clocks, you might
  	 * want to keep track of clock handlers.
 +	 *
 +	 * XXX This is an abuse of the interrupt handler signature with
 +	 * __FPTRCAST which requires a special case for IPL_CLOCK in
 +	 * intr_establish_xname.  Please fix this nonsense!  See also
 +	 * the comments about i8254_clockintr in x86/x86/intr.c.
  	 */
  	(void)isa_intr_establish(NULL, 0, IST_PULSE, IPL_CLOCK,
 -	    __FPTRCAST(int (*)(void *), clockintr), 0);
 +	    __FPTRCAST(int (*)(void *), i8254_clockintr), 0);
  }
 =20
  void
 diff --git a/sys/arch/x86/x86/intr.c b/sys/arch/x86/x86/intr.c
 index 5bde34cf7514..54474897377a 100644
 --- a/sys/arch/x86/x86/intr.c
 +++ b/sys/arch/x86/x86/intr.c
 @@ -162,6 +162,8 @@ __KERNEL_RCSID(0, "$NetBSD: intr.c,v 1.163 2022/10/29 1=
 3:59:04 riastradh Exp $")
  #include <machine/i8259.h>
  #include <machine/pio.h>
 =20
 +#include <x86/intr_private.h>
 +
  #include "ioapic.h"
  #include "lapic.h"
  #include "pci.h"
 @@ -944,11 +946,21 @@ intr_establish_xname(int legacy_irq, struct pic *pic,=
  int pin, int type,
  	ih->ih_slot =3D slot;
  	strlcpy(ih->ih_xname, xname, sizeof(ih->ih_xname));
  #ifdef KDTRACE_HOOKS
 -	ih->ih_fun =3D intr_kdtrace_wrapper;
 -	ih->ih_arg =3D ih;
 +	/*
 +	 * XXX i8254_clockintr is special -- takes a magic extra
 +	 * argument.  This should be fixed properly in some way that
 +	 * doesn't involve sketchy function pointer casts.  See also
 +	 * the comments in x86/isa/clock.c.
 +	 */
 +	if (handler !=3D __FPTRCAST(int (*)(void *), i8254_clockintr)) {
 +		ih->ih_fun =3D intr_kdtrace_wrapper;
 +		ih->ih_arg =3D ih;
 +	}
  #endif
  #ifdef MULTIPROCESSOR
  	if (!mpsafe) {
 +		KASSERT(handler !=3D			/* XXX */
 +		    __FPTRCAST(int (*)(void *), i8254_clockintr));
  		ih->ih_fun =3D intr_biglock_wrapper;
  		ih->ih_arg =3D ih;
  	}

 --=_La7nkjIJilZyzQ04m7iYdJZjSuvrF8DG--

From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier
 CPUs
Date: Tue, 24 Jan 2023 14:30:37 +0300

 LGTM

 -uwe

From: "John D. Baker" <jdbaker@consolidated.net>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier
 CPUs
Date: Tue, 24 Jan 2023 19:15:13 -0600 (CST)

 On Tue, 24 Jan 2023, matthew green wrote:

 >  can you provide a little more of the dmesg above the 'fatal page fault'
 >  for all instances?  ie, what was the previous message / what was going
 >  on the system right now.
 >  
 >  some of the fault in the interrupt handler, but some are faulting when
 >  an interrupt is being established.

 This time booting the same GENERIC kernel on all test systems:

 Am5x86 w/more context:
 [   1.0000000] NetBSD 10.0_BETA (GENERIC) #5: Tue Jan 24 09:10:19 CST 2023
 [   1.0000000] 	sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/compile/GENERIC
 [   1.0000000] total memory = 65148 KB
 [   1.0000000] avail memory = 38620 KB
 [...]
 [   1.0000040] cpu0: AMD 486-class, id 0x4f4
 [   1.0000040] cpu0: node 0, package 0, core 0, smt 0
 [...]
 [   1.0000040] atabus0 at wdc0 channel 0
 [   1.0000040] pcppi0 at isa0 port 0x61
 [   1.0000040] midi0 at pcppi0: PC speaker
 [   1.0000040] sysbeep0 at pcppi0
 [   1.0000040] isapnp0 at isa0 port 0x279
 [   1.0000040] attimer0: attached to pcppi0
 [   1.0000040] WARNING: system needs entropy for security; see entropy(7)
 [   1.0000040] fatal page fault in supervisor mode
 [   1.0000040] trap type 6 code 0 eip 0xc0d3d7d8 cs 0xc57b0008 eflags 0x10246 cr2 0x3c ilevel 0x7 esp 0
 [   1.0000040] curlwp 0xc165a840 pid 0 lid 0 lowest kstack 0xc19f32c0
 kernel: supervisor trap page fault, code=0
 Stopped in pid 0.0 (system) at  netbsd:hardclock+0x23:  movl    3c(%esi),%eax
 db{0}> bt
 hardclock(0,0,c57bff6c,c04ac8f1,0,0,0,0,0,0) at netbsd:hardclock+0x23
 clockintr(0,0,0,0,0,0,0,0,c1c72000,c010322a) at netbsd:clockintr+0x2a
 intr_kdtrace_wrapper(c1c33b80,c19f5d9c,0,0,0,0,0,0,0,0) at netbsd:intr_kdtrace_wrapper+0x21
 --- switch to interrupt stack ---
 Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
 --- interrupt ---
 outb(c16260c0,c1623f80,0,20,1,0,0,c16c5a80,c19f5e94,0) at netbsd:outb+0x9
 intr_establish_xname(0,c16260c0,0,1,7,c04c96b5,0,0,c134f91a,0) at netbsd:intr_establish_xname+0x2ba
 isa_intr_establish_xname(0,0,1,7,c04c96b5,0,c134f91a,c19f5f14,c04c9baf,0) at netbsd:isa_intr_establish_xname+0x91
 isa_intr_establish(0,0,1,7,c04c96b5,0,c19f5f60,c0d3d19a,c04b6858,1000) at netbsd:isa_intr_establish+0x3c
 i8254_initclocks(c04b6858,1000,3,c11b0770,c6020000,c601f000,c1670b40,0,c19f5f60,c0e5f527) at netbsd:i8254_initclocks+0x3a
 initclocks(3,0,64,0,0,0,0,0,2a6a000,0) at netbsd:initclocks+0x1c
 main(0,0,0,0,0,0,0,0,0,0) at netbsd:main+0x365
 db{0}> 


 pentium-III w/more context:
 [   1.0000000] NetBSD 10.0_BETA (GENERIC) #5: Tue Jan 24 09:10:19 CST 2023
 [   1.0000000] 	sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/compile/GENERIC
 [   1.0000000] total memory = 510 MB
 [   1.0000000] avail memory = 475 MB
 [...]
 [   1.0000040] cpu0 at mainbus0
 [   1.0000040] cpu0: Intel 686-class, 937MHz, id 0x68a
 [   1.0000040] cpu0: node 0, package 0, core 0, smt 0
 [...]
 [   1.0044400] isa0 at ichlpcib0
 [   1.0044400] lpt1 at isa0 port 0x278-0x27b irq : polled
 [   1.0044400] com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 16-byte FIFO
 [   1.0044400] com0: console
 [   1.0044400] com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 16-byte FIFO
 [   1.0044400] fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
 [   1.0044400] fwohci0: BUS reset
 [   1.0044400] fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
 [   1.0044400] ieee1394if0: 1 nodes, maxhop <= 0 cable IRM irm(0) (me)
 [   1.0044400] ieee1394if0: bus manager 0
 [   1.0044400] WARNING: system needs entropy for security; see entropy(7)
 [   1.0044400] fatal page fault in supervisor mode
 [   1.0044400] trap type 6 code 0 eip 0xc0d3d7d8 cs 0x8 eflags 0x10246 cr2 0x3c ilevel 0x7 esp 0x6
 [   1.0044400] curlwp 0xc165a840 pid 0 lid 0 lowest kstack 0xc1a972c0
 kernel: supervisor trap page fault, code=0
 Stopped in pid 0.0 (system) at  netbsd:hardclock+0x23:  movl    3c(%esi),%eax
 db{0}> bt
 hardclock(0,0,d8830f6c,c04ac8f1,0,0,0,0,0,0) at netbsd:hardclock+0x23
 clockintr(0,0,0,0,0,0,0,0,c250a000,c010322a) at netbsd:clockintr+0x2a
 intr_kdtrace_wrapper(c2731cc0,c1a99d9c,0,0,0,0,0,0,0,0) at netbsd:intr_kdtrace_w
 rapper+0x21
 --- switch to interrupt stack ---
 Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
 --- interrupt ---
 outb(c16260c0,c1623f80,0,20,1,0,0,c16c5a80,c1a99e94,0) at netbsd:outb+0x9
 intr_establish_xname(0,c16260c0,0,1,7,c04c96b5,0,0,c134f91a,0) at netbsd:intr_es
 tablish_xname+0x2ba
 isa_intr_establish_xname(0,0,1,7,c04c96b5,0,c134f91a,c1a99f14,c04c9baf,0) at net
 bsd:isa_intr_establish_xname+0x91
 isa_intr_establish(0,0,1,7,c04c96b5,0,c1a99f60,c0d3d19a,c04b6858,1000) at netbsd
 :isa_intr_establish+0x3c
 i8254_initclocks(c04b6858,1000,3,c11b0770,d97fd000,d97fc000,c1670b40,0,c1a99f60,
 c0e5f527) at netbsd:i8254_initclocks+0x3a
 initclocks(3,4,64,0,0,0,0,0,15432000,0) at netbsd:initclocks+0x1c
 main(0,0,0,0,0,0,0,0,0,0) at netbsd:main+0x365
 db{0}> 


 VIA Samuel 2 (Eden MSP) w/more context:
 [   1.0000000] NetBSD 10.0_BETA (GENERIC) #5: Tue Jan 24 09:10:19 CST 2023
 [   1.0000000] 	sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/compile/GENERIC
 [   1.0000000] total memory = 1015 MB
 [   1.0000000] avail memory = 971 MB
 [...]
 [   1.0000040] cpu0 at mainbus0
 [   1.0000040] cpu0: VIA Samuel 2, id 0x673
 [   1.0000040] cpu0: node 0, package 0, core 0, smt 0
 [...]
 [   1.0016861] isa0 at viapcib0
 [   1.0016861] lpt0 at isa0 port 0x378-0x37b irq 7
 [   1.0016861] com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 16-byte FIFO
 [   1.0016861] com0: console
 [   1.0016861] acpicpu0 at cpu0: ACPI CPU
 [   1.0016861] WARNING: system needs entropy for security; see entropy(7)
 [   1.0016861] fatal page fault in supervisor mode
 [   1.0016861] trap type 6 code 0 eip 0xc0d3d7d8 cs 0x8 eflags 0x10246 cr2 0x3c ilevel 0x7 esp 0x6
 [   1.0016861] curlwp 0xc165a840 pid 0 lid 0 lowest kstack 0xc1a972c0
 kernel: supervisor trap page fault, code=0
 Stopped in pid 0.0 (system) at  netbsd:hardclock+0x23:  movl    3c(%esi),%eax
 db{0}> bt
 hardclock(0,0,da534f6c,c04ac8f1,0,0,0,0,0,0) at netbsd:hardclock+0x23
 clockintr(0,0,0,0,0,0,0,0,c2e37000,c010322a) at netbsd:clockintr+0x2a
 intr_kdtrace_wrapper(c306f500,c1a99d9c,0,0,0,0,0,0,0,0) at netbsd:intr_kdtrace_w
 rapper+0x21
 --- switch to interrupt stack ---
 Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
 --- interrupt ---
 outb(c16260c0,c1623f80,0,20,1,0,0,c16c5a80,c1a99e94,0) at netbsd:outb+0x9
 intr_establish_xname(0,c16260c0,0,1,7,c04c96b5,0,0,c134f91a,0) at netbsd:intr_es
 tablish_xname+0x2ba
 isa_intr_establish_xname(0,0,1,7,c04c96b5,0,c134f91a,c1a99f14,c04c9baf,0) at net
 bsd:isa_intr_establish_xname+0x91
 isa_intr_establish(0,0,1,7,c04c96b5,0,c1a99f60,c0d3d19a,c04b6858,1000) at netbsd
 :isa_intr_establish+0x3c
 i8254_initclocks(c04b6858,1000,3,c11b0770,db1cf000,db1ce000,cb1c,c1a99f60,c0da1c
 f4,c0e5f527) at netbsd:i8254_initclocks+0x3a
 initclocks(3,5,64,0,0,0,0,0,16800000,0) at netbsd:initclocks+0x1c
 main(0,0,0,0,0,0,0,0,0,0) at netbsd:main+0x365
 db{0}> 

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]consolidated[flyspeck]net  OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: matthew green <mrg@eterna.com.au>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@netbsd.org, port-i386-maintainer@netbsd.org,
    gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
    jdbaker@consolidated.net, uwe@NetBSD.org
Subject: re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier CPUs
Date: Wed, 25 Jan 2023 17:01:58 +1100

 be nice to put a little more details in what needs fixing:

 arch/i386/i386/vector.S:#define INTRSTUB1(name, num, sub, off, early_ack, =
 late_ack, mask, unmask, level_mask) \

 has this:

 	/* switch stack if necessary, and push a ptr to our intrframe */ \
 	IDEPTH_INCR

 the last part of IDEPTH_INCR being:

 999:    pushl   %eax; /* eax =3D=3D pointer to intrframe */ \

 so it's _this_ that becomes the 2nd arg for clockintr().


 .mrg.

From: "John D. Baker" <jdbaker@consolidated.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier
 CPUs
Date: Wed, 25 Jan 2023 09:32:39 -0600 (CST)

 On Tue, 24 Jan 2023, Taylor R Campbell wrote:

 > Can you try the attached patch?

 With the patch applied, the resulting GENERIC boots successfully on all
 the test systems.  The net4501 (Am5x86) hung on shutdown, but that's
 likely due to memory shortage since GENERIC consumes half of available
 memory.

 I tested against 10.0_BETA.  I'll re-check against -current although
 they should be the same at this point.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]consolidated[flyspeck]net  OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57197 CVS commit: src/sys/arch/x86
Date: Wed, 25 Jan 2023 15:54:53 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Wed Jan 25 15:54:53 UTC 2023

 Modified Files:
 	src/sys/arch/x86/isa: clock.c
 	src/sys/arch/x86/x86: intr.c
 Added Files:
 	src/sys/arch/x86/include: intr_private.h

 Log Message:
 x86/intr: Work around sleazy clockintr with a secret frame argument.

 PR kern/57197


 To generate a diff of this commit:
 cvs rdiff -u -r0 -r1.1 src/sys/arch/x86/include/intr_private.h
 cvs rdiff -u -r1.40 -r1.41 src/sys/arch/x86/isa/clock.c
 cvs rdiff -u -r1.163 -r1.164 src/sys/arch/x86/x86/intr.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 25 Jan 2023 23:49:18 +0000
State-Changed-Why:
XXX pullup-10


From: Taylor R Campbell <riastradh@NetBSD.org>
To: matthew green <mrg@eterna.com.au>
Cc: gnats-bugs@netbsd.org, port-i386-maintainer@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	jdbaker@consolidated.net, uwe@NetBSD.org
Subject: Re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier CPUs
Date: Wed, 25 Jan 2023 23:49:56 +0000

 > Date: Wed, 25 Jan 2023 17:01:58 +1100
 > from: matthew green <mrg@eterna.com.au>
 >=20
 > be nice to put a little more details in what needs fixing:
 >=20
 > arch/i386/i386/vector.S:#define INTRSTUB1(name, num, sub, off, early_ack,=
  late_ack, mask, unmask, level_mask) \
 >=20
 > has this:
 >=20
 > 	/* switch stack if necessary, and push a ptr to our intrframe */ \
 > 	IDEPTH_INCR
 >=20
 > the last part of IDEPTH_INCR being:
 >=20
 > 999:    pushl   %eax; /* eax =3D=3D pointer to intrframe */ \
 >=20
 > so it's _this_ that becomes the 2nd arg for clockintr().

 Tempted to say there should be a struct cpu_info::ci_iframe just like
 ci_idepth, and when an interrupt handler is interrupted, it should
 just be saved on the stack (perhaps in the same stack slot!) and
 restored on return.

 That way, i8254_clockintr could just do curcpu()->ci_iframe instead of
 these horrible function pointer casts.

From: "John D. Baker" <jdbaker@consolidated.net>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-i386/57197 (GENERIC kernel crash on pentium-III and earlier
 CPUs)
Date: Thu, 30 Mar 2023 18:48:48 -0500 (CDT)

 On Wed, 25 Jan 2023, riastradh@NetBSD.org wrote:

 > Synopsis: GENERIC kernel crash on pentium-III and earlier CPUs
 > 
 > State-Changed-From-To: analyzed->needs-pullups
 > State-Changed-By: riastradh@NetBSD.org
 > State-Changed-When: Wed, 25 Jan 2023 23:49:18 +0000
 > State-Changed-Why:
 > XXX pullup-10

 To date, no pullup request has been submitted for this PR.  As such I'm
 still carrying locally the previously-posted patches so that my lesser
 i386 boxen can boot NetBSD-10.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]consolidated[flyspeck]net  OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Fri, 31 Mar 2023 06:19:43 +0000
State-Changed-Why:
pullup-10 #136 https://releng.netbsd.org/cgi-bin/req-10.cgi?show=136


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57197 CVS commit: [netbsd-10] src/sys/arch/x86
Date: Sat, 1 Apr 2023 15:11:00 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sat Apr  1 15:11:00 UTC 2023

 Modified Files:
 	src/sys/arch/x86/isa [netbsd-10]: clock.c
 	src/sys/arch/x86/x86 [netbsd-10]: intr.c
 Added Files:
 	src/sys/arch/x86/include [netbsd-10]: intr_private.h

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #136):

 	sys/arch/x86/x86/intr.c: revision 1.164
 	sys/arch/x86/isa/clock.c: revision 1.41
 	sys/arch/x86/include/intr_private.h: revision 1.1

 x86/intr: Work around sleazy clockintr with a secret frame argument.
 PR kern/57197


 To generate a diff of this commit:
 cvs rdiff -u -r0 -r1.1.2.2 src/sys/arch/x86/include/intr_private.h
 cvs rdiff -u -r1.39 -r1.39.20.1 src/sys/arch/x86/isa/clock.c
 cvs rdiff -u -r1.163 -r1.163.2.1 src/sys/arch/x86/x86/intr.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 02 Aug 2023 13:51:03 +0000
State-Changed-Why:
fixed in HEAD and pulled up to 10
problem new since 9, no need for pullups to <=9


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.