NetBSD Problem Report #58936
From www@netbsd.org Thu Dec 26 16:21:03 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
client-signature RSA-PSS (2048 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 845251A9238
for <gnats-bugs@gnats.NetBSD.org>; Thu, 26 Dec 2024 16:21:03 +0000 (UTC)
Message-Id: <20241226162102.118D31A923A@mollari.NetBSD.org>
Date: Thu, 26 Dec 2024 16:21:02 +0000 (UTC)
From: als@thangorodrim.ch
Reply-To: als@thangorodrim.ch
To: gnats-bugs@NetBSD.org
Subject: Building lang/swi-prolog-lite on NetBSD 10.1/sparc64 crashes the machine
X-Send-Pr-Version: www-1.0
>Number: 58936
>Category: port-sparc64
>Synopsis: Building lang/swi-prolog-lite on NetBSD 10.1/sparc64 crashes the machine
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: martin
>State: pending-pullups
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Dec 26 16:25:00 +0000 2024
>Closed-Date:
>Last-Modified: Tue Dec 31 01:50:01 +0000 2024
>Originator: Alexander Schreiber
>Release: NetBSD 10.1
>Organization:
not much
>Environment:
Sun Ultra-45 running NetBSD 10.1 sparc64 - unfortunately, the mainboard died a day after this test.
>Description:
Trying to build lang/swi-prolog-lite from pkgsrc on NetBSD/sparc64 crashes the system on some machine types. This was first reported by John Klos in https://mail-index.netbsd.org/port-sparc64/2024/12/18/msg003292.html and replicated by me on a Sun Ultra-45 (single CPU system).
The last output on the ssh session that initiated "make package" was:
[ 81%] Generating lib/clpfdlib.tex
and then the connection eventually dropped.
Console output on the graphic console was:
login: [ 1524.6727578] panic: fault type 104 for invalid va 9097b2684d812b0
[ 1524.6727578] cpu0: Begin traceback...
[ 1524.6727578] cpu0: End traceback...
Stopped in pid 9003.9003 (swipl) at netbsd:cpu_Debugger+0x4: nop
db{0}> bt
panic(1996a28, 68, 9097b2684d812b0, 0 , 59, 59) at netbsd:panic:+0x20
data_access_fault(3727172e0, 68, 1018178, 9097b2684d812b0, 9097b2684d80396, 118018) at netbsd:data_access_fault+0x508
?(9097b2684d80396, 10cafa800, 4fe, 9097b278877ab96, 10184c, 0) at 1010700
copyin_proc(0, 9097b2684d80396, 10cafa800, 500, e0048000,0) at netbsd:copyin_proc+0x24
proc_getauxv(10bc2a440, 372717678, 372717658, 10cafa800, 500, 9097b2684d80396) at netbsd:proc_getauxv+0x74
real_coredump_elf64(10bc34140, 372717910, 1c60a48, a0, 1, 10bc2a440) at netbsd:real_coredump_elf64+0x19c
coredump_elf64(10bc34140, 372717910, 106e8ee80, 0, 4e, 1c70f00) at netbsd:coredump_elf64+0x2c
coredump(10bc34140, 0, 1ccbc00, 0, 109068800, 10bc2a440) at netbsd:coredump+0x4ac
sigexit(10bc34140, b, 180000, 10bc2a440, 106e70100, 1c71080) at netbsd:sigexit+0x27c
postsig(b, 0, 10bc344b8, 372717b70, 10bc2a440, 10bc34140) at netbsd:postsig+0x198
lwp_userret(10bc34140, 1000000, 20000, 100000, 91a002, 10bc2a440) at netbsd:lwp_userret+0x1ac
data_access_fault(372717ed0, 30, 40566bd0, fffffffffffff2b0, ffffffffffffe000, 10bc2a440) at netbsd:data_access_fault+0x470
?(ffffffffffffa0c8, 40787b08, ffffffffffffa0c8, 40788110, 5310d, 0) at 1010700
db{0}>
There was an swipl.core file, but it was empty (ulimit -c was unlimited)
This was also tried on two Sun V100 machines, were it merely resulted in the swipl binary exiting with sig11 at the same place, also leaving behind an empty core file.
So I suspect this is CPU model specific, because:
- Sun V100: UltraSPARC-IIe
- Sun Ultra-45: UltraSPARC IIIi
- Sun Fire v450: UltraSPARC IIIi
I'm currently unable to help with any tests, as my Ultra-45 mainboard died two days after this test and I'm currently hunting for a replacement board. Once I have a functioning Ultra-45 again, I'm available for testing.
Kernel on the Ultra-45 was a lightly modified GENERIC (enabled igx, but that didn't pick up the Sun 10G card and enabled INSECURE to be able to run X). Kernel on the V100s was GENERIC with a lot of unneeded drivers yanked and DIAGNOSTIC enabled.
The Ultra-45 was running with NFS root at the time, as the firmware refuses to boot from the NetBSD install on the internal disk.
>How-To-Repeat:
Grab a SUN machine with UltraSPARC IIIi CPU (replicated on Sun Ultra-45 and Fire v450) and try to build lang/swi-prolog-lite from pkgsrc. Observe the machine crashing.
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: port-sparc64-maintainer->martin
Responsible-Changed-By: martin@NetBSD.org
Responsible-Changed-When: Thu, 26 Dec 2024 16:58:11 +0000
Responsible-Changed-Why:
take
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-sparc64/58936 (Building lang/swi-prolog-lite on NetBSD
10.1/sparc64 crashes the machine)
Date: Fri, 27 Dec 2024 08:47:49 +0100
Which version of swi-prolog-lite did you try to build?
The package in pkgsrc-current fails to compile for me (and does not crash
the machine nor leave any core file):
/usr/pkgobj/lang/swi-prolog-lite/work/swipl-8.0.2/packages/ssl/crypto4pl.c: In function 'get_padding':
/usr/pkgobj/lang/swi-prolog-lite/work/swipl-8.0.2/packages/ssl/crypto4pl.c:851:69: error: 'RSA_SSLV23_PADDING' undeclared (first use in this function); did you mean 'RSA_PKCS1_PADDING'?
851 | else if ( a == ATOM_sslv23 && mode == RSA_MODE ) *padding = RSA_SSLV23_PADDING;
| ^~~~~~~~~~~~~~~~~~
| RSA_PKCS1_PADDING
Martin
From: Alexander Schreiber <als@thangorodrim.ch>
To: gnats-bugs@netbsd.org
Cc: martin@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: port-sparc64/58936 (Building lang/swi-prolog-lite on NetBSD
10.1/sparc64 crashes the machine)
Date: Fri, 27 Dec 2024 10:18:35 +0100
> Which version of swi-prolog-lite did you try to build?
> The package in pkgsrc-current fails to compile for me (and does not crash
> the machine nor leave any core file):
The Makefile says:
# $NetBSD: Makefile,v 1.34 2024/08/25 06:18:58 wiz Exp $
PKGNAME= swi-prolog-lite-${SWIPLVERS}
PKGREVISION= 3
and Makefile.common says:
SWIPLVERS= 8.0.2
This should be from pkgsrc-2024Q3.
I've tried "make package" on NetBSD/amd64, but there the build fails with
epic error spew at
[ 74%] Building C object packages/ssl/CMakeFiles/plugin_ssl4pl.dir/ssl4pl.c.o
with _lots_ of "this is deprecated" warnings and finally a bunch of errors
that indicate bitrot (boiling down to "you are trying to use _really_ old
stuff from the ssl libs"). Not surprised, given that pkgsrc has version 8.0.2
(upstream indicates release on Tue Mar 5 13:19:09 2019) and upstream at
https://www.swi-prolog.org/download/stable/src/ is currently at 9.2.9:
swipl-9.2.9.tar.gz
The current version 9.2.9 blows up _much_ earlier due to what looks like
mismatched assumptions about networking APIs for IPv6.
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-sparc64/58936 (Building lang/swi-prolog-lite on NetBSD
10.1/sparc64 crashes the machine)
Date: Fri, 27 Dec 2024 16:06:20 +0100
I get the same failure on NetBSD 10.1 and still no core dump nor any
crash, so I am afraid I can not reproduce this issue.
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-sparc64/58936 (Building lang/swi-prolog-lite on NetBSD
10.1/sparc64 crashes the machine)
Date: Sat, 28 Dec 2024 14:43:41 +0100
I could reproduce it (thanks Alexander for off-list help with that):
[ 76%] Built target archive
[ 76%] Generating SWI-Prolog-8.0.2.tex
[ 76%] Generating intro.tex
[ 76%] Generating overview.tex
[ 76%] Generating builtin.tex
[ 76%] Generating module.tex
[ 76%] Generating foreign.tex
[ 76%] Generating runtime.tex
[ 76%] Generating hack.tex
[ 76%] Generating summary.tex
[ 76%] Generating xpce.tex
[ 77%] Generating glossary.tex
[ 77%] Generating ide.tex
[ 77%] Generating license.tex
[ 77%] Generating threads.tex
[ 78%] Generating engines.tex
[ 78%] Generating profile.tex
[ 78%] Generating attvar.tex
[ 78%] Generating chr.tex
[ 78%] Generating xref.tex
[ 78%] Generating bit64.tex
[ 78%] Generating dialect.tex
[ 78%] Generating extensions.tex
[ 78%] Generating tabling.tex
[ 78%] Generating lib/clpfdlib.tex
this tries to dump core:
proc_getauxv(pid 21737 cmd swipl) - ps_envstr 0x697220616e737765 nenvstr 1920154122 auxv 0x69722065020d47bd
with obviously bogus ps_nenvstr, so we calculate a auxv vector way out of
bounds.
Due to bugs in the sparc64 kernel the kernel crashes with:
[ 2338.2400852] panic: fault type 104 for invalid va 69722065020d4e78
[ 2338.3200858] cpu1: Begin traceback...
[ 2338.3600860] cpu1: End traceback...
[ 2338.4000864] Frame pointer is at 0x2526e6551
[ 2338.4500869] Call traceback:
[ 2338.4900872] netbsd:cpu_reboot+0x260(1cc46d8, 105a96140, ff0f0000000001, 0, 1c6f000, 1cc46d8) fp = 2526e6631
[ 2338.6100882] netbsd:kern_reboot+0x64(104, 0, 1cbc800, 0, 0, 105a96140) fp = 2526e66e1
[ 2338.7000889] netbsd:vpanic+0x18c(104, 0, ff0f0000000001, e0048000, 197f5f8, 0) fp = 2526e6791
[ 2338.8000898] netbsd:panic+0x20(1943498, 2526e7188, 1cc4528, 1cc4400, 1cc3000, 104) fp = 2526e6841
[ 2338.9100906] netbsd:data_access_fault+0x508(1943498, 68, 69722065020d4e78, 0, 0, 104414b90) fp = 2526e6901
[ 2339.0300920] netbsd:1010700+0(2526e72e0, 68, 1018128, 69722065020d4e78, 69722065020d47bd, 118018) fp = 2526e6a31
[ 2339.1500931] netbsd:copyin_proc+0x24(69722065020d47bd, 105443000, 500, 69722064074977bd, 101848c, 0) fp = 2526e6c11
[ 2339.2800947] netbsd:proc_getauxv+0x74(0, 69722065020d47bd, 105443000, 500, e0048000, 0) fp = 2526e6cd1
[ 2339.3900951] netbsd:real_coredump_elf64+0x19c(105a94780, 2526e7678, 2526e7658, 105443000, 500, 69722065020d47bd) fp = 2526e6da1
[ 2339.5300963] netbsd:coredump_elf64+0x2c(105a96140, 2526e7910, 1c5fc48, a0, 1, 105a94780) fp = 2526e6f81
[ 2339.6400972] netbsd:coredump+0x4ac(105a96140, 2526e7910, 103754700, 0, 4e, 1c70100) fp = 2526e7031
[ 2339.7500982] netbsd:sigexit+0x27c(105a96140, 0, 1cc9c00, 0, 10434ac00, 105a94780) fp = 2526e71e1
[ 2339.8500991] netbsd:postsig+0x198(105a96140, b, 180000, 105a94780, 103736380, 1c70280) fp = 2526e72c1
[ 2339.9701000] netbsd:lwp_userret+0x1ac(b, 0, 105a963b8, 2526e7b70, 105a94780, 105a96140) fp = 2526e73f1
[ 2340.0801012] netbsd:data_access_fault+0x470(105a96140, 1000000, 20000, 100000, 91a0002, 105a94780) fp = 2526e74f1
[ 2340.2001020] netbsd:1010700+0(2526e7ed0, 30, 40566bd0, ffffffffffffee78, ffffffffffffe000, 105a94780) fp = 2526e7621
[ 2340.3301031] netbsd:40566b64+0(ffffffffffffa078, 40787b08, ffffffffffffa078, 40788110, 5310d, 0) fp = ffffffffffff9661
I have fixed the copyin* fault handling for this case, but we still get a
zero sized core file, so it is hard to debug how the process got into
that state (and the build still fails).
The process has overwritten some parts of its memory with (random?) things,
and if those random things happen to match certain patterns it would trigger
the kernel crash.
Probably not worth to further debug with this old version.
Martin
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58936 CVS commit: src/sys/arch/sparc64/sparc64
Date: Sat, 28 Dec 2024 13:48:07 +0000
Module Name: src
Committed By: martin
Date: Sat Dec 28 13:48:07 UTC 2024
Modified Files:
src/sys/arch/sparc64/sparc64: trap.c
Log Message:
PR 58936: do not panic if we hit a VA-hole address in copyin/copyout.
To generate a diff of this commit:
cvs rdiff -u -r1.198 -r1.199 src/sys/arch/sparc64/sparc64/trap.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->pending-pullups
State-Changed-By: martin@NetBSD.org
State-Changed-When: Sat, 28 Dec 2024 13:55:25 +0000
State-Changed-Why:
Fixed in -current, [pullup-10 #1035]
From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58936 CVS commit: [netbsd-10] src/sys/arch/sparc64/sparc64
Date: Tue, 31 Dec 2024 01:45:51 +0000
Module Name: src
Committed By: snj
Date: Tue Dec 31 01:45:51 UTC 2024
Modified Files:
src/sys/arch/sparc64/sparc64 [netbsd-10]: trap.c
Log Message:
Pull up following revision(s) (requested by martin in ticket #1035):
sys/arch/sparc64/sparc64/trap.c: 1.199
PR 58936: do not panic if we hit a VA-hole address in copyin/copyout.
To generate a diff of this commit:
cvs rdiff -u -r1.194 -r1.194.4.1 src/sys/arch/sparc64/sparc64/trap.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.