NetBSD Problem Report #57197
From jdbaker@consolidated.net Tue Jan 24 04:08:11 2023
Return-Path: <jdbaker@consolidated.net>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id EF47D1A9239
for <gnats-bugs@gnats.NetBSD.org>; Tue, 24 Jan 2023 04:08:10 +0000 (UTC)
Message-Id: <20230124040234.706CA13E11B@spike.technoskunk.fur>
Date: Mon, 23 Jan 2023 22:02:34 -0600 (CST)
From: jdbaker@consolidated.net
Reply-To: jdbaker@consolidated.net
To: gnats-bugs@NetBSD.org
Subject: GENERIC kernel crash on pentium-III and earlier CPUs
X-Send-Pr-Version: 3.95
>Number: 57197
>Category: port-i386
>Synopsis: GENERIC kernel crash on pentium-III and earlier CPUs
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: port-i386-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jan 24 04:10:00 +0000 2023
>Closed-Date: Wed Aug 02 13:51:03 +0000 2023
>Last-Modified: Wed Aug 02 13:51:03 +0000 2023
>Originator: John D. Baker
>Release: NetBSD 10.0_BETA
>Organization:
>Environment:
NetBSD 10.0_BETA (PLEXOR) #5: Sun Jan 22 18:35:43 CST 2023 sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/compile/PLEXOR
Architecture: i386
Machine: i386
>Description:
Booting the GENERIC kernel (or one which includes the GENERIC config)
on a system with a pentium-III or lesser CPU (VIA Samuel, Am5x86) crashes
as follows:
pentium-III:
[ 1.0000000] NetBSD 10.0_BETA (PLEXOR) #4: Wed Jan 18 21:10:13 CST 2023
[ 1.0000000]
sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/com
pile/PLEXOR
[ 1.0000000] total memory = 510 MB
[ 1.0000000] avail memory = 488 MB
[...]
[ 1.0000030] cpu0 at mainbus0
[ 1.0000030] cpu0: Intel 686-class, 936MHz, id 0x68a
[ 1.0000030] cpu0: node 0, package 0, core 0, smt 0
[...]
[ 30.4197804] fatal page fault in supervisor mode
[ 30.4197804] trap type 6 code 0 eip 0xc0617718 cs 0x8 eflags 0x10246 cr2
0x1000003c ilevel 0x7 esp 0xc0a29500
[ 30.4197804] curlwp 0xc16ac040 pid 0 lid 2 lowest kstack 0xd7c902c0
kernel: supervisor trap page fault, code=0
Stopped in pid 0.2 (system) at netbsd:hardclock+0x23: movl 3c(%esi),%eax
db{0}> bt
hardclock(10000000,d7a92ec4,c02864a1,0,10000000,c068d9f3,c027fc14,16b6000,72
00,d7f375f0) at netbsd:hardclock+0x23
clockintr(0,10000000,c068d9f3,c027fc14,16b6000,7200,d7f375f0,0,c1761000,c010
313a) at netbsd:clockintr+0x36
intr_kdtrace_wrapper(c1990540,d7a92ed4,6,d7a90010,c0620030,c16b0010,d7a90010
,c16ac040,1,d7a92f34) at netbsd:intr_kdtrace_wrapper+0x21
Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
--- interrupt ---
cx8_spllower(1,0,d7a92f6c,c02864a1,c186c800,c1761000,c1990540,7,0,c186ba00)
at netbsd:cx8_spllower+0x14
intr_biglock_wrapper(c186be80,d7c92f10,0,0,0,0,0,0,0,0) at
netbsd:intr_biglock_wrapper+0x68
--- switch to interrupt stack ---
Xintr_legacy5() at netbsd:Xintr_legacy5+0xda
--- interrupt ---
x86_stihlt(c16ac040,0,c0637e70,0,0,c168f100,c0a10100,c168f100,d7c90000,c168f
100)
at netbsd:x86_stihlt+0x5
idle_loop(c16ac040,cbb000,cc4000,0,c01005a8,0,0,0,0,0) at
netbsd:idle_loop+0x153
Am5x86:
[ 1.0000000] NetBSD 10.0_BETA (GENERIC) #4: Wed Jan 18 20:46:54 CST 2023
[ 1.0000000]
sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/com
pile/GENERIC
[ 1.0000000] total memory = 65148 KB
[ 1.0000000] avail memory = 38620 KB
[...]
[ 1.0000040] cpu0: AMD 486-class, id 0x4f4
[ 1.0000040] cpu0: node 0, package 0, core 0, smt 0
[...]
[ 1.0000040] fatal page fault in supervisor mode
[ 1.0000040] trap type 6 code 0 eip 0xc0d3d7d8 cs 0xc57b0008 eflags
0x10246 cr2 0x3c ilevel 0x7 esp 0
[ 1.0000040] curlwp 0xc165a840 pid 0 lid 0 lowest kstack 0xc19f32c0
kernel: supervisor trap page fault, code=0
Stopped in pid 0.0 (system) at netbsd:hardclock+0x23: movl
3c(%esi),%eax
db{0}> bt
hardclock(0,0,c57bff6c,c04ac8f1,0,0,0,0,0,0) at netbsd:hardclock+0x23
clockintr(0,0,0,0,0,0,0,0,c1c72000,c010322a) at netbsd:clockintr+0x2a
intr_kdtrace_wrapper(c1c33b80,c19f5d9c,0,0,0,0,0,0,0,0) at
netbsd:intr_kdtrace_wrapper+0x21
--- switch to interrupt stack ---
Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
--- interrupt ---
outb(c16260c0,c1623f80,0,20,1,0,0,c16c5a80,c19f5e94,0) at netbsd:outb+0x9
intr_establish_xname(0,c16260c0,0,1,7,c04c96b5,0,0,c134f916,0) at
netbsd:intr_establish_xname+0x2ba
isa_intr_establish_xname(0,0,1,7,c04c96b5,0,c134f916,c19f5f14,c04c9baf,0) at
netbsd:isa_intr_establish_xname+0x91
isa_intr_establish(0,0,1,7,c04c96b5,0,c19f5f60,c0d3d19a,c04b6858,1000) at
netbsd:isa_intr_establish+0x3c
i8254_initclocks(c04b6858,1000,3,c11b0770,c6020000,c601f000,c1670b40,0,c19f5
f60,c0e5f527) at netbsd:i8254_initclocks+0x3a
initclocks(3,0,64,0,0,0,0,0,2a6a000,0) at netbsd:initclocks+0x1c
main(0,0,0,0,0,0,0,0,0,0) at netbsd:main+0x365
VIA "Samuel":
[ 1.0000000] NetBSD 10.0_BETA (NEOWARE) #4: Wed Jan 18 21:19:15 CST 2023
[ 1.0000000] sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/compile/NEOWARE
[ 1.0000000] total memory = 959 MB
[ 1.0000000] avail memory = 930 MB
[...]
[ 1.0000040] cpu0 at mainbus0
[ 1.0000040] cpu0: VIA Samuel 2, id 0x673
[ 1.0000040] cpu0: node 0, package 0, core 0, smt 0
[...]
[ 1.0011329] fatal page fault in supervisor mode
[ 1.0011329] trap type 6 code 0 eip 0xc055c308 cs 0x8 eflags 0x10246 cr2 0x3c ilevel 0x7 esp 0x6
[ 1.0011329] curlwp 0xc095f140 pid 0 lid 0 lowest kstack 0xc0bf92c0
kernel: supervisor trap page fault, code=0
Stopped in pid 0.0 (system) at netbsd:hardclock+0x23: movl 3c(%esi),%eax
db{0}> bt
hardclock(0,d95d2f6c,c022c031,0,0,0,0,0,0,0) at netbsd:hardclock+0x23
clockintr(0,0,0,0,0,0,0,0,c1ecc000,c010313a) at netbsd:clockintr+0x36
intr_kdtrace_wrapper(c213c500,c0bfbd9c,0,0,0,0,0,0,0,0) at netbsd:intr_kdtrace_w
rapper+0x21
--- switch to interrupt stack ---
Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
--- interrupt ---
outb(c0955480,c0953540,0,20,1,0,0,c099ffc0,c0bfbe94,0) at netbsd:outb+0x9
intr_establish_xname(0,c0955480,0,1,7,c0248305,0,0,c0813d2b,0) at netbsd:intr_es
tablish_xname+0x2ba
isa_intr_establish_xname(0,0,1,7,c0248305,0,c0813d2b,c0bfbf14,c02487cc,0) at net
bsd:isa_intr_establish_xname+0x91
isa_intr_establish(0,0,1,7,c0248305,0,c0bfbf60,c055bcca,c0235c38,1000) at netbsd
:isa_intr_establish+0x3c
i8254_initclocks(c0235c38,1000,3,c0795264,da280000,da27f000,bfe8,c0bfbf60,c05c08
24,c06797d7) at netbsd:i8254_initclocks+0x3a
initclocks(3,5,64,0,0,0,0,0,16800000,0) at netbsd:initclocks+0x1c
main(0,0,0,0,0,0,0,0,0,0) at netbsd:main+0x365
db{0}>
After bisecting the source, the fault was introduced with:
src/sys/arch/x86/x86/intr.c r1.163
Pentium-4 and later CPUs appear to be unaffected.
Curiously, on the Am5x86, the NET4501 kernel (or one derived from it)
boots without issues.
The added code between r1.162 and r1.163 looks innocuous enough, but
it apparently trips up these older/lesser CPUs.
>How-To-Repeat:
Boot GENERIC (or GENERIC-derived) kernel on system with pentium-III or
earlier CPU.
>Fix:
Workaround: revert "src/sys/arch/x86/x86/intr.c" to r1.162
>Release-Note:
>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier CPUs
Date: Tue, 24 Jan 2023 19:47:47 +1100
> [ 1.0000030] cpu0: Intel 686-class, 936MHz, id 0x68a
> [ 1.0000030] cpu0: node 0, package 0, core 0, smt 0
> [...]
> [ 30.4197804] fatal page fault in supervisor mode
> [ 30.4197804] trap type 6 code 0 eip 0xc0617718 cs 0x8 eflags 0x10246 c=
r2
can you provide a little more of the dmesg above the 'fatal page fault'
for all instances? ie, what was the previous message / what was going
on the system right now.
some of the fault in the interrupt handler, but some are faulting when
an interrupt is being established.
> intr_biglock_wrapper(c186be80,d7c92f10,0,0,0,0,0,0,0,0) at netbsd:intr_b=
iglock_wrapper+0x68
> --- switch to interrupt stack ---
> Xintr_legacy5() at netbsd:Xintr_legacy5+0xda
> --- interrupt ---
> x86_stihlt(c16ac040,0,c0637e70,0,0,c168f100,c0a10100,c168f100,d7c90000,c=
168f100) at netbsd:x86_stihlt+0x5
> idle_loop(c16ac040,cbb000,cc4000,0,c01005a8,0,0,0,0,0) at netbsd:idle_lo=
op+0x153
panicked after taking an interrupt.
> hardclock(0,0,c57bff6c,c04ac8f1,0,0,0,0,0,0) at netbsd:hardclock+0x23
> clockintr(0,0,0,0,0,0,0,0,c1c72000,c010322a) at netbsd:clockintr+0x2a
> intr_kdtrace_wrapper(c1c33b80,c19f5d9c,0,0,0,0,0,0,0,0) at netbsd:intr_k=
dtrace_wrapper+0x21
> --- switch to interrupt stack ---
> Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
> --- interrupt ---
> outb(c16260c0,c1623f80,0,20,1,0,0,c16c5a80,c19f5e94,0) at netbsd:outb+0x=
9
> intr_establish_xname(0,c16260c0,0,1,7,c04c96b5,0,0,c134f916,0) at netbsd=
:intr_establish_xname+0x2ba
this time, taking an interrupt while setting one up.
> Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
> --- interrupt ---
> outb(c0955480,c0953540,0,20,1,0,0,c099ffc0,c0bfbe94,0) at netbsd:outb+0x=
9
> intr_establish_xname(0,c0955480,0,1,7,c0248305,0,0,c0813d2b,0) at netbsd=
:intr_establish_xname+0x2ba
as above.
i can't tell where the "outb" call in intr_establish_xname() comes
from. in my GENERIC, this ends being:
0xc04b0acd <+693>: call 0xc0f86a47 <__x86_indirect_thunk_eax>
>> 0xc04b0ad2 <+698>: mov %fs:0x304,%edx
0xc04b0ad9 <+705>: mov -0x5c(%ebp),%eax
0xc04b0adc <+708>: cmp %edx,%eax
0xc04b0ade <+710>: je 0xc04b0aed <intr_establish_xname+725>
is intr_establish_xname+0x2ba(). this is x86_curcpu() it seems:
(gdb) l *(intr_establish_xname+0x2ba)
0xc04b0ad2 is in intr_establish_xname (./machine/cpu.h:53).
48 __inline __always_inline static struct cpu_info * __unused
...
53 __asm volatile("movl %%fs:%1, %0" :
54 "=3Dr" (ci) :
55 "m"
56 (*(struct cpu_info * const *)offsetof(struct cpu_info,=
ci_self)));
this turns to for me to be here:
(gdb) l *(intr_establish_xname+708)
...
969 /* All set up, so add a route for the interrupt and unmask=
it. */
970 (*pic->pic_addroute)(pic, ci, pin, idt_vec, type);
971 if (ci =3D=3D curcpu() || !mp_online) {
972 intr_hwunmask_xcall(ih, NULL);
John, can you check your netbsd.gdb and see if you can confirm the
same addresses, or point out where yours are?
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, jdbaker@consolidated.net
Subject: Re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier CPUs
Date: Tue, 24 Jan 2023 09:46:44 +0000
This is the same problem as:
https://mail-index.netbsd.org/tech-kern/2022/12/10/msg028574.html
The legacy clockintr has a magic extra argument that intr_establish
and intr_kdtrace_wrapper (and intr_biglock_wrapper) don't know about,
requiring a sketchy function pointer cast (ding ding ding, alarm bell)
to even pass to intr_establish, and intr_wrapper_kdtrace causes it to
fail.
The real fix is to change the bogus magic extra argument, but as a
workaround we could pass through a flag to avoid intr_wrapper_kdtrace.
Downside is it will not be possible to dtrace these interrupts, but
upside is it doesn't require summoning Cthulhu to aid in legacy ISA
clock surgery.
State-Changed-From-To: open->analyzed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Tue, 24 Jan 2023 09:50:52 +0000
State-Changed-Why:
uwe analyzed this last month
From: Taylor R Campbell <riastradh@NetBSD.org>
To: jdbaker@consolidated.net, uwe@NetBSD.org
Cc: gnats-bugs@netbsd.org, port-i386-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier CPUs
Date: Tue, 24 Jan 2023 10:57:49 +0000
This is a multi-part message in MIME format.
--=_La7nkjIJilZyzQ04m7iYdJZjSuvrF8DG
Can you try the attached patch?
--=_La7nkjIJilZyzQ04m7iYdJZjSuvrF8DG
Content-Type: text/plain; charset="ISO-8859-1"; name="x86clockintr"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="x86clockintr.patch"
From e20d2a498a991899ab794174c5fcae888cbc84a4 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh@NetBSD.org>
Date: Tue, 24 Jan 2023 10:00:45 +0000
Subject: [PATCH] x86/intr: Work around sleazy clockintr with a secret frame
argument.
PR kern/57197
---
sys/arch/x86/include/intr_private.h | 39 +++++++++++++++++++++++++++++
sys/arch/x86/isa/clock.c | 14 +++++++----
sys/arch/x86/x86/intr.c | 16 ++++++++++--
3 files changed, 62 insertions(+), 7 deletions(-)
create mode 100644 sys/arch/x86/include/intr_private.h
diff --git a/sys/arch/x86/include/intr_private.h b/sys/arch/x86/include/int=
r_private.h
new file mode 100644
index 000000000000..183e904a7dba
--- /dev/null
+++ b/sys/arch/x86/include/intr_private.h
@@ -0,0 +1,39 @@
+/* $NetBSD$ */
+
+/*-
+ * Copyright (c) 2023 The NetBSD Foundation, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTO=
RS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIM=
ITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICU=
LAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTO=
RS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF =
THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _X86_INTR_PRIVATE_H_
+#define _X86_INTR_PRIVATE_H_
+
+/*
+ * XXX This is a horrible kludge to let intr_establish_xname detect
+ * when it needs to handle a sleazy extra argument to the interrupt
+ * handler that's not part of the normal interrupt handler signature.
+ */
+int i8254_clockintr(void *, struct intrframe *);
+
+#endif /* _X86_INTR_PRIVATE_H_ */
diff --git a/sys/arch/x86/isa/clock.c b/sys/arch/x86/isa/clock.c
index c50704cd13a8..399bbd46c6ed 100644
--- a/sys/arch/x86/isa/clock.c
+++ b/sys/arch/x86/isa/clock.c
@@ -152,6 +152,7 @@ __KERNEL_RCSID(0, "$NetBSD: clock.c,v 1.39 2020/05/29 1=
2:30:41 rin Exp $");
#include <x86/lock.h>
#include <machine/specialreg.h>
#include <x86/rtc.h>
+#include <x86/intr_private.h>
=20
#ifndef __x86_64__
#include "mca.h"
@@ -188,8 +189,6 @@ void (*x86_delay)(unsigned int) =3D i8254_delay;
void sysbeep(int, int);
static void tickle_tc(void);
=20
-static int clockintr(void *, struct intrframe *);
-
int sysbeepdetach(device_t, int);
=20
static unsigned int gettick_broken_latch(void);
@@ -371,8 +370,8 @@ tickle_tc(void)
=20
}
=20
-static int
-clockintr(void *arg, struct intrframe *frame)
+int
+i8254_clockintr(void *arg, struct intrframe *frame)
{
tickle_tc();
=20
@@ -555,9 +554,14 @@ i8254_initclocks(void)
/*
* XXX If you're doing strange things with multiple clocks, you might
* want to keep track of clock handlers.
+ *
+ * XXX This is an abuse of the interrupt handler signature with
+ * __FPTRCAST which requires a special case for IPL_CLOCK in
+ * intr_establish_xname. Please fix this nonsense! See also
+ * the comments about i8254_clockintr in x86/x86/intr.c.
*/
(void)isa_intr_establish(NULL, 0, IST_PULSE, IPL_CLOCK,
- __FPTRCAST(int (*)(void *), clockintr), 0);
+ __FPTRCAST(int (*)(void *), i8254_clockintr), 0);
}
=20
void
diff --git a/sys/arch/x86/x86/intr.c b/sys/arch/x86/x86/intr.c
index 5bde34cf7514..54474897377a 100644
--- a/sys/arch/x86/x86/intr.c
+++ b/sys/arch/x86/x86/intr.c
@@ -162,6 +162,8 @@ __KERNEL_RCSID(0, "$NetBSD: intr.c,v 1.163 2022/10/29 1=
3:59:04 riastradh Exp $")
#include <machine/i8259.h>
#include <machine/pio.h>
=20
+#include <x86/intr_private.h>
+
#include "ioapic.h"
#include "lapic.h"
#include "pci.h"
@@ -944,11 +946,21 @@ intr_establish_xname(int legacy_irq, struct pic *pic,=
int pin, int type,
ih->ih_slot =3D slot;
strlcpy(ih->ih_xname, xname, sizeof(ih->ih_xname));
#ifdef KDTRACE_HOOKS
- ih->ih_fun =3D intr_kdtrace_wrapper;
- ih->ih_arg =3D ih;
+ /*
+ * XXX i8254_clockintr is special -- takes a magic extra
+ * argument. This should be fixed properly in some way that
+ * doesn't involve sketchy function pointer casts. See also
+ * the comments in x86/isa/clock.c.
+ */
+ if (handler !=3D __FPTRCAST(int (*)(void *), i8254_clockintr)) {
+ ih->ih_fun =3D intr_kdtrace_wrapper;
+ ih->ih_arg =3D ih;
+ }
#endif
#ifdef MULTIPROCESSOR
if (!mpsafe) {
+ KASSERT(handler !=3D /* XXX */
+ __FPTRCAST(int (*)(void *), i8254_clockintr));
ih->ih_fun =3D intr_biglock_wrapper;
ih->ih_arg =3D ih;
}
--=_La7nkjIJilZyzQ04m7iYdJZjSuvrF8DG--
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier
CPUs
Date: Tue, 24 Jan 2023 14:30:37 +0300
LGTM
-uwe
From: "John D. Baker" <jdbaker@consolidated.net>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier
CPUs
Date: Tue, 24 Jan 2023 19:15:13 -0600 (CST)
On Tue, 24 Jan 2023, matthew green wrote:
> can you provide a little more of the dmesg above the 'fatal page fault'
> for all instances? ie, what was the previous message / what was going
> on the system right now.
>
> some of the fault in the interrupt handler, but some are faulting when
> an interrupt is being established.
This time booting the same GENERIC kernel on all test systems:
Am5x86 w/more context:
[ 1.0000000] NetBSD 10.0_BETA (GENERIC) #5: Tue Jan 24 09:10:19 CST 2023
[ 1.0000000] sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/compile/GENERIC
[ 1.0000000] total memory = 65148 KB
[ 1.0000000] avail memory = 38620 KB
[...]
[ 1.0000040] cpu0: AMD 486-class, id 0x4f4
[ 1.0000040] cpu0: node 0, package 0, core 0, smt 0
[...]
[ 1.0000040] atabus0 at wdc0 channel 0
[ 1.0000040] pcppi0 at isa0 port 0x61
[ 1.0000040] midi0 at pcppi0: PC speaker
[ 1.0000040] sysbeep0 at pcppi0
[ 1.0000040] isapnp0 at isa0 port 0x279
[ 1.0000040] attimer0: attached to pcppi0
[ 1.0000040] WARNING: system needs entropy for security; see entropy(7)
[ 1.0000040] fatal page fault in supervisor mode
[ 1.0000040] trap type 6 code 0 eip 0xc0d3d7d8 cs 0xc57b0008 eflags 0x10246 cr2 0x3c ilevel 0x7 esp 0
[ 1.0000040] curlwp 0xc165a840 pid 0 lid 0 lowest kstack 0xc19f32c0
kernel: supervisor trap page fault, code=0
Stopped in pid 0.0 (system) at netbsd:hardclock+0x23: movl 3c(%esi),%eax
db{0}> bt
hardclock(0,0,c57bff6c,c04ac8f1,0,0,0,0,0,0) at netbsd:hardclock+0x23
clockintr(0,0,0,0,0,0,0,0,c1c72000,c010322a) at netbsd:clockintr+0x2a
intr_kdtrace_wrapper(c1c33b80,c19f5d9c,0,0,0,0,0,0,0,0) at netbsd:intr_kdtrace_wrapper+0x21
--- switch to interrupt stack ---
Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
--- interrupt ---
outb(c16260c0,c1623f80,0,20,1,0,0,c16c5a80,c19f5e94,0) at netbsd:outb+0x9
intr_establish_xname(0,c16260c0,0,1,7,c04c96b5,0,0,c134f91a,0) at netbsd:intr_establish_xname+0x2ba
isa_intr_establish_xname(0,0,1,7,c04c96b5,0,c134f91a,c19f5f14,c04c9baf,0) at netbsd:isa_intr_establish_xname+0x91
isa_intr_establish(0,0,1,7,c04c96b5,0,c19f5f60,c0d3d19a,c04b6858,1000) at netbsd:isa_intr_establish+0x3c
i8254_initclocks(c04b6858,1000,3,c11b0770,c6020000,c601f000,c1670b40,0,c19f5f60,c0e5f527) at netbsd:i8254_initclocks+0x3a
initclocks(3,0,64,0,0,0,0,0,2a6a000,0) at netbsd:initclocks+0x1c
main(0,0,0,0,0,0,0,0,0,0) at netbsd:main+0x365
db{0}>
pentium-III w/more context:
[ 1.0000000] NetBSD 10.0_BETA (GENERIC) #5: Tue Jan 24 09:10:19 CST 2023
[ 1.0000000] sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/compile/GENERIC
[ 1.0000000] total memory = 510 MB
[ 1.0000000] avail memory = 475 MB
[...]
[ 1.0000040] cpu0 at mainbus0
[ 1.0000040] cpu0: Intel 686-class, 937MHz, id 0x68a
[ 1.0000040] cpu0: node 0, package 0, core 0, smt 0
[...]
[ 1.0044400] isa0 at ichlpcib0
[ 1.0044400] lpt1 at isa0 port 0x278-0x27b irq : polled
[ 1.0044400] com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 16-byte FIFO
[ 1.0044400] com0: console
[ 1.0044400] com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, 16-byte FIFO
[ 1.0044400] fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
[ 1.0044400] fwohci0: BUS reset
[ 1.0044400] fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
[ 1.0044400] ieee1394if0: 1 nodes, maxhop <= 0 cable IRM irm(0) (me)
[ 1.0044400] ieee1394if0: bus manager 0
[ 1.0044400] WARNING: system needs entropy for security; see entropy(7)
[ 1.0044400] fatal page fault in supervisor mode
[ 1.0044400] trap type 6 code 0 eip 0xc0d3d7d8 cs 0x8 eflags 0x10246 cr2 0x3c ilevel 0x7 esp 0x6
[ 1.0044400] curlwp 0xc165a840 pid 0 lid 0 lowest kstack 0xc1a972c0
kernel: supervisor trap page fault, code=0
Stopped in pid 0.0 (system) at netbsd:hardclock+0x23: movl 3c(%esi),%eax
db{0}> bt
hardclock(0,0,d8830f6c,c04ac8f1,0,0,0,0,0,0) at netbsd:hardclock+0x23
clockintr(0,0,0,0,0,0,0,0,c250a000,c010322a) at netbsd:clockintr+0x2a
intr_kdtrace_wrapper(c2731cc0,c1a99d9c,0,0,0,0,0,0,0,0) at netbsd:intr_kdtrace_w
rapper+0x21
--- switch to interrupt stack ---
Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
--- interrupt ---
outb(c16260c0,c1623f80,0,20,1,0,0,c16c5a80,c1a99e94,0) at netbsd:outb+0x9
intr_establish_xname(0,c16260c0,0,1,7,c04c96b5,0,0,c134f91a,0) at netbsd:intr_es
tablish_xname+0x2ba
isa_intr_establish_xname(0,0,1,7,c04c96b5,0,c134f91a,c1a99f14,c04c9baf,0) at net
bsd:isa_intr_establish_xname+0x91
isa_intr_establish(0,0,1,7,c04c96b5,0,c1a99f60,c0d3d19a,c04b6858,1000) at netbsd
:isa_intr_establish+0x3c
i8254_initclocks(c04b6858,1000,3,c11b0770,d97fd000,d97fc000,c1670b40,0,c1a99f60,
c0e5f527) at netbsd:i8254_initclocks+0x3a
initclocks(3,4,64,0,0,0,0,0,15432000,0) at netbsd:initclocks+0x1c
main(0,0,0,0,0,0,0,0,0,0) at netbsd:main+0x365
db{0}>
VIA Samuel 2 (Eden MSP) w/more context:
[ 1.0000000] NetBSD 10.0_BETA (GENERIC) #5: Tue Jan 24 09:10:19 CST 2023
[ 1.0000000] sysop@plex760.technoskunk.fur:/r0/build/netbsd-10/obj/i386/sys/arch/i386/compile/GENERIC
[ 1.0000000] total memory = 1015 MB
[ 1.0000000] avail memory = 971 MB
[...]
[ 1.0000040] cpu0 at mainbus0
[ 1.0000040] cpu0: VIA Samuel 2, id 0x673
[ 1.0000040] cpu0: node 0, package 0, core 0, smt 0
[...]
[ 1.0016861] isa0 at viapcib0
[ 1.0016861] lpt0 at isa0 port 0x378-0x37b irq 7
[ 1.0016861] com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 16-byte FIFO
[ 1.0016861] com0: console
[ 1.0016861] acpicpu0 at cpu0: ACPI CPU
[ 1.0016861] WARNING: system needs entropy for security; see entropy(7)
[ 1.0016861] fatal page fault in supervisor mode
[ 1.0016861] trap type 6 code 0 eip 0xc0d3d7d8 cs 0x8 eflags 0x10246 cr2 0x3c ilevel 0x7 esp 0x6
[ 1.0016861] curlwp 0xc165a840 pid 0 lid 0 lowest kstack 0xc1a972c0
kernel: supervisor trap page fault, code=0
Stopped in pid 0.0 (system) at netbsd:hardclock+0x23: movl 3c(%esi),%eax
db{0}> bt
hardclock(0,0,da534f6c,c04ac8f1,0,0,0,0,0,0) at netbsd:hardclock+0x23
clockintr(0,0,0,0,0,0,0,0,c2e37000,c010322a) at netbsd:clockintr+0x2a
intr_kdtrace_wrapper(c306f500,c1a99d9c,0,0,0,0,0,0,0,0) at netbsd:intr_kdtrace_w
rapper+0x21
--- switch to interrupt stack ---
Xintr_legacy0() at netbsd:Xintr_legacy0+0xda
--- interrupt ---
outb(c16260c0,c1623f80,0,20,1,0,0,c16c5a80,c1a99e94,0) at netbsd:outb+0x9
intr_establish_xname(0,c16260c0,0,1,7,c04c96b5,0,0,c134f91a,0) at netbsd:intr_es
tablish_xname+0x2ba
isa_intr_establish_xname(0,0,1,7,c04c96b5,0,c134f91a,c1a99f14,c04c9baf,0) at net
bsd:isa_intr_establish_xname+0x91
isa_intr_establish(0,0,1,7,c04c96b5,0,c1a99f60,c0d3d19a,c04b6858,1000) at netbsd
:isa_intr_establish+0x3c
i8254_initclocks(c04b6858,1000,3,c11b0770,db1cf000,db1ce000,cb1c,c1a99f60,c0da1c
f4,c0e5f527) at netbsd:i8254_initclocks+0x3a
initclocks(3,5,64,0,0,0,0,0,16800000,0) at netbsd:initclocks+0x1c
main(0,0,0,0,0,0,0,0,0,0) at netbsd:main+0x365
db{0}>
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]consolidated[flyspeck]net OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: matthew green <mrg@eterna.com.au>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@netbsd.org, port-i386-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
jdbaker@consolidated.net, uwe@NetBSD.org
Subject: re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier CPUs
Date: Wed, 25 Jan 2023 17:01:58 +1100
be nice to put a little more details in what needs fixing:
arch/i386/i386/vector.S:#define INTRSTUB1(name, num, sub, off, early_ack, =
late_ack, mask, unmask, level_mask) \
has this:
/* switch stack if necessary, and push a ptr to our intrframe */ \
IDEPTH_INCR
the last part of IDEPTH_INCR being:
999: pushl %eax; /* eax =3D=3D pointer to intrframe */ \
so it's _this_ that becomes the 2nd arg for clockintr().
.mrg.
From: "John D. Baker" <jdbaker@consolidated.net>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier
CPUs
Date: Wed, 25 Jan 2023 09:32:39 -0600 (CST)
On Tue, 24 Jan 2023, Taylor R Campbell wrote:
> Can you try the attached patch?
With the patch applied, the resulting GENERIC boots successfully on all
the test systems. The net4501 (Am5x86) hung on shutdown, but that's
likely due to memory shortage since GENERIC consumes half of available
memory.
I tested against 10.0_BETA. I'll re-check against -current although
they should be the same at this point.
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]consolidated[flyspeck]net OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57197 CVS commit: src/sys/arch/x86
Date: Wed, 25 Jan 2023 15:54:53 +0000
Module Name: src
Committed By: riastradh
Date: Wed Jan 25 15:54:53 UTC 2023
Modified Files:
src/sys/arch/x86/isa: clock.c
src/sys/arch/x86/x86: intr.c
Added Files:
src/sys/arch/x86/include: intr_private.h
Log Message:
x86/intr: Work around sleazy clockintr with a secret frame argument.
PR kern/57197
To generate a diff of this commit:
cvs rdiff -u -r0 -r1.1 src/sys/arch/x86/include/intr_private.h
cvs rdiff -u -r1.40 -r1.41 src/sys/arch/x86/isa/clock.c
cvs rdiff -u -r1.163 -r1.164 src/sys/arch/x86/x86/intr.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 25 Jan 2023 23:49:18 +0000
State-Changed-Why:
XXX pullup-10
From: Taylor R Campbell <riastradh@NetBSD.org>
To: matthew green <mrg@eterna.com.au>
Cc: gnats-bugs@netbsd.org, port-i386-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
jdbaker@consolidated.net, uwe@NetBSD.org
Subject: Re: port-i386/57197: GENERIC kernel crash on pentium-III and earlier CPUs
Date: Wed, 25 Jan 2023 23:49:56 +0000
> Date: Wed, 25 Jan 2023 17:01:58 +1100
> from: matthew green <mrg@eterna.com.au>
>=20
> be nice to put a little more details in what needs fixing:
>=20
> arch/i386/i386/vector.S:#define INTRSTUB1(name, num, sub, off, early_ack,=
late_ack, mask, unmask, level_mask) \
>=20
> has this:
>=20
> /* switch stack if necessary, and push a ptr to our intrframe */ \
> IDEPTH_INCR
>=20
> the last part of IDEPTH_INCR being:
>=20
> 999: pushl %eax; /* eax =3D=3D pointer to intrframe */ \
>=20
> so it's _this_ that becomes the 2nd arg for clockintr().
Tempted to say there should be a struct cpu_info::ci_iframe just like
ci_idepth, and when an interrupt handler is interrupted, it should
just be saved on the stack (perhaps in the same stack slot!) and
restored on return.
That way, i8254_clockintr could just do curcpu()->ci_iframe instead of
these horrible function pointer casts.
From: "John D. Baker" <jdbaker@consolidated.net>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-i386/57197 (GENERIC kernel crash on pentium-III and earlier
CPUs)
Date: Thu, 30 Mar 2023 18:48:48 -0500 (CDT)
On Wed, 25 Jan 2023, riastradh@NetBSD.org wrote:
> Synopsis: GENERIC kernel crash on pentium-III and earlier CPUs
>
> State-Changed-From-To: analyzed->needs-pullups
> State-Changed-By: riastradh@NetBSD.org
> State-Changed-When: Wed, 25 Jan 2023 23:49:18 +0000
> State-Changed-Why:
> XXX pullup-10
To date, no pullup request has been submitted for this PR. As such I'm
still carrying locally the previously-posted patches so that my lesser
i386 boxen can boot NetBSD-10.
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]consolidated[flyspeck]net OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Fri, 31 Mar 2023 06:19:43 +0000
State-Changed-Why:
pullup-10 #136 https://releng.netbsd.org/cgi-bin/req-10.cgi?show=136
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57197 CVS commit: [netbsd-10] src/sys/arch/x86
Date: Sat, 1 Apr 2023 15:11:00 +0000
Module Name: src
Committed By: martin
Date: Sat Apr 1 15:11:00 UTC 2023
Modified Files:
src/sys/arch/x86/isa [netbsd-10]: clock.c
src/sys/arch/x86/x86 [netbsd-10]: intr.c
Added Files:
src/sys/arch/x86/include [netbsd-10]: intr_private.h
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #136):
sys/arch/x86/x86/intr.c: revision 1.164
sys/arch/x86/isa/clock.c: revision 1.41
sys/arch/x86/include/intr_private.h: revision 1.1
x86/intr: Work around sleazy clockintr with a secret frame argument.
PR kern/57197
To generate a diff of this commit:
cvs rdiff -u -r0 -r1.1.2.2 src/sys/arch/x86/include/intr_private.h
cvs rdiff -u -r1.39 -r1.39.20.1 src/sys/arch/x86/isa/clock.c
cvs rdiff -u -r1.163 -r1.163.2.1 src/sys/arch/x86/x86/intr.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 02 Aug 2023 13:51:03 +0000
State-Changed-Why:
fixed in HEAD and pulled up to 10
problem new since 9, no need for pullups to <=9
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.