NetBSD Problem Report #57280
From www@netbsd.org Tue Mar 21 18:38:54 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id F13F71A9239
for <gnats-bugs@gnats.NetBSD.org>; Tue, 21 Mar 2023 18:38:53 +0000 (UTC)
Message-Id: <20230321183852.01F6E1A923C@mollari.NetBSD.org>
Date: Tue, 21 Mar 2023 18:38:51 +0000 (UTC)
From: denis@ovsienko.info
Reply-To: denis@ovsienko.info
To: gnats-bugs@NetBSD.org
Subject: Octeon boot fails (kernel: bus error (load or store) trap)
X-Send-Pr-Version: www-1.0
>Number: 57280
>Category: port-mips
>Synopsis: Octeon boot fails (kernel: bus error (load or store) trap)
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: riastradh
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Mar 21 18:40:00 +0000 2023
>Closed-Date: Sun Jul 30 15:07:01 +0000 2023
>Last-Modified: Sun Jul 30 15:07:01 +0000 2023
>Originator: Denis Ovsienko
>Release: NetBSD 10 beta
>Organization:
>Environment:
EdgeRouter 4
>Description:
reading netbsd
7481080 bytes read in 217 ms (32.9 MiB/s)
reading netbsd.md5
** Unable to read file netbsd.md5 **
Allocating memory for ELF segment: addr: 0xffffffff80200000 (adjusted to: 0x200000), size 0x63b980
## Loading big-endian Linux kernel with entry point: 0xffffffff80200000 ...
Bootloader: Done loading app on coremask:
0xf
Starting cores:
0xf
[ 1.0000000] MIPS32/64 params: cpu arch: 256
[ 1.0000000] MIPS32/64 params: TLB entries: 256
[ 1.0000000] MIPS32/64 params: Icache: line=128, total=79872, ways=39, sets=16, colors=0
[ 1.0000000] MIPS32/64 params: Dcache: line=128, total=32768, ways=32, sets=8, colors=0
[ 1.0000000] MIPS32/64 params: SDcache: line=128, total=1048576, ways=8, sets=1024, colors=16
[ 1.0000000] Dcache is coherent
[ 1.0000000] u-boot bootmem desc @ 0x6c108 version 3.0
[ 1.0000000] phys segment: 0x100000 @ 0x100000
[ 1.0000000] adding 0x100000 @ 0x100000 to freelist 0
[ 1.0000000] phys segment: 0xf574000 @ 0x83c000
[ 1.0000000] adding 0xf574000 @ 0x83c000 to freelist 0
[ 1.0000000] phys segment: 0xe000 @ 0xfdb4000
[ 1.0000000] adding 0xe000 @ 0xfdb4000 to freelist 0
[ 1.0000000] phys segment: 0x2f0 @ 0xffb6000 (short)
[ 1.0000000] phys segment: 0x80 @ 0xffd6000 (short)
[ 1.0000000] phys segment: 0x60 @ 0xfff6000 (short)
[ 1.0000000] phys segment: 0x2effe000 @ 0x20000000
[ 1.0000000] adding 0x2effe000 @ 0x20000000 to freelist 0
[ 1.0000000] phys segment: 0x1080 @ 0x4f000000 (short)
[ 1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
[ 1.0000000] 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
[ 1.0000000] 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023
[ 1.0000000] The NetBSD Foundation, Inc. All rights reserved.
[ 1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[ 1.0000000] The Regents of the University of California. All rights reserved.
[ 1.0000000] NetBSD 10.0_BETA (OCTEON) #0: Mon Mar 20 17:25:14 UTC 2023
[ 1.0000000] mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbmips/compile/OCTEON
[ 1.0000000] Cavium Octeon CN7130-1000
[ 1.0000000] total memory = 1024 MB
[ 1.0000000] avail memory = 982 MB
[ 1.0000000] mainbus0 (root)
[ 1.0000000] cpunode0 at mainbus0: 4 cores, crypto+kasumi, 64bit-mul, unaligned-access ok
[ 1.0000000] cpu0 at cpunode0 core 0: 1000.00MHz
[ 1.0000000] cpu0: Cavium CN7130-1000 (0xd9602) Rev. 2 with built-in FPU
[ 1.0000000] cpu0: 256 TLB entries, 512TB (49-bit) VAs, 512TB (49-bit) PAs, 256MB max page size
[ 1.0000000] cpu0: 78KB/128B 39-way set-associative L1 instruction cache
[ 1.0000000] cpu0: 32KB/128B 32-way set-associative write-through coherent L1 data cache
[ 1.0000000] cpu0: 1024KB/128B 8-way set-associative write-back L2 unified cache
[ 1.0000000] cpu1 at cpunode0 core 1: disabled (uniprocessor kernel)
[ 1.0000000] cpu2 at cpunode0 core 2: disabled (uniprocessor kernel)
[ 1.0000000] cpu3 at cpunode0 core 3: disabled (uniprocessor kernel)
[ 1.0000000] wdog0 at cpunode0: default period is 4 seconds
[ 1.0000000] iobus0 at mainbus0
[ 1.0000000] iobus0: initializing POW
[ 1.0000000] iobus0: initializing FPA
[ 1.0000000] octrnm0 at iobus0 address 0x0001180040000000
[ 1.0000000] pid 0(system): trap: cpu0, bus error (load or store) in kernel mode
[ 1.0000000] status=0xa3, cause=0x1c, epc=0xffffffff80210568, vaddr=0xc0000000001fe000
[ 1.0000000] tf=0x980000000fdb7870 ksp=0x980000000fdb79b0 ra=0xffffffff80210540 ppl=0x98000000
[ 1.0000000] kernel: bus error (load or store) trap
Stopped in pid 0.0 (system) at netbsd:octrnm_attach+0xf8: ld a2,0(v0)
db> trace
0x980000000fdb79b0: octrnm_attach+0xf8 (0xffffffff808102c8,0x9001180040000000,0,
0xfffffeb2a7fa0000) ra 0xffffffff804b908c sz 32
0x980000000fdb79d0: config_attach_internal+0x1d4 (0xffffffff808102c8,0x900118004
0000000,0,0xfffffeb2a7fa0000) ra 0xffffffff804b933c sz 80
0x980000000fdb7a20: config_found+0x144 (0xffffffff808102c8,0x9001180040000000,0,
0xfffffeb2a7fa0000) ra 0xffffffff8020d148 sz 112
0x980000000fdb7a90: iobus_attach+0x198 (0xffffffff808102c8,0x9001180040000000,0,
0xfffffeb2a7fa0000) ra 0xffffffff804b908c sz 192
0x980000000fdb7b50: config_attach_internal+0x1d4 (0xffffffff808102c8,0x900118004
0000000,0,0xfffffeb2a7fa0000) ra 0xffffffff804b933c sz 80
0x980000000fdb7ba0: config_found+0x144 (0xffffffff808102c8,0x9001180040000000,0,
0xfffffeb2a7fa0000) ra 0xffffffff80206668 sz 112
0x980000000fdb7c10: mainbus_attach+0x120 (0xffffffff808102c8,0x9001180040000000,
0,0xfffffeb2a7fa0000) ra 0xffffffff804b908c sz 336
0x980000000fdb7d60: config_attach_internal+0x1d4 (0xffffffff808102c8,0x900118004
0000000,0,0xfffffeb2a7fa0000) ra 0xffffffff804b952c sz 80
0x980000000fdb7db0: config_rootfound+0x54 (0xffffffff808102c8,0,0,0xfffffeb2a7fa
0000) ra 0xffffffff802004b8 sz 80
0x980000000fdb7e00: cpu_configure+0x28 (0xffffffff808102c8,0,0,0xfffffeb2a7fa000
0) ra 0xffffffff80617428 sz 16
0x980000000fdb7e10: main+0x3f8 (0xffffffff808102c8,0,0,0xfffffeb2a7fa0000) ra 0x
ffffffff802000d4 sz 144
0x980000000fdb7ea0: kernel_text+0xd4 (0xffffffff808102c8,0,0,0xfffffeb2a7fa0000)
ra 0 sz 0
User-level: pid 0.0
db>
>How-To-Repeat:
1. Write https://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-10/202303210730Z/evbmips-mips64eb/binary/gzimg/octeon.img.gz to USB storage.
2. Boot EdgeRouter 4 from the USB storage, FAT partition 1, file "netbsd".
3. Observe the crash very soon.
Serial console access can be provided if necessary.
>Fix:
>Release-Note:
>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57280 CVS commit: src/sys/arch/mips/cavium/dev
Date: Tue, 21 Mar 2023 22:07:29 +0000
Module Name: src
Committed By: riastradh
Date: Tue Mar 21 22:07:29 UTC 2023
Modified Files:
src/sys/arch/mips/cavium/dev: octeon_rnm.c
Log Message:
octrnm(4): Raise delay on startup.
According to CN50XX-HRM-V0.99E and CN78XX-HM-0.99E:
The entropy is provided by the jitter of 125 of 128 free-running
oscillators XORed into a 128-bit LFSR. The LFSR accumulates entropy
over 81 cycles, after which it is fed into a SHA-1 engine.
[...]
The SHA-1 engine runs once every 81 cycles.
[...]
The hardware produces new 64-bit random number every 81 cycles.
The last sentence means that we only need to wait 81 cycles _between_
consecutive SHA-1 outputs (which isn't relevant anyway because we
reconfigure it into raw mode later), but the first two quotes might
mean that we need to wait 81+81 cycles for the _first_ output to be
produced on boot when running the self-test.
Now, in this case, the self-test is run with the LFSR unhooked, by
clearing the RNM_CTL_STATUS[ENT_EN] bit, so that SHA-1 is computed
from a known input -- this is really just paranoia to make sure that
_some_ functions of the device (which is conjured out of thin air at
a fixed virtual address, with no firmware bindings to guide us)
behave as we expect.
And it's not clear if it really does take 81+81 cycles for the first
SHA-1 output to appear when the LFSR isn't feeding into it anyway.
But experimentally, delay of 81+81 cycles seems to work whereas a
delay of only 81 cycles crashes.
PR kern/57280
XXX pullup-10
XXX pullup-9
To generate a diff of this commit:
cvs rdiff -u -r1.15 -r1.16 src/sys/arch/mips/cavium/dev/octeon_rnm.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Tue, 21 Mar 2023 22:11:57 +0000
State-Changed-Why:
Candidate fix committed -- can you try a new kernel?
Patch should apply cleanly on netbsd-10 too.
From: Denis Ovsienko <denis@ovsienko.info>
To: riastradh@NetBSD.org
Cc: gnats-bugs@netbsd.org, port-mips-maintainer@netbsd.org,
netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: port-mips/57280 (Octeon boot fails (kernel: bus error (load or
store) trap))
Date: Wed, 22 Mar 2023 19:28:26 +0000
With
http://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/202303220350Z/evbmips-mips64eb/binary/gzimg/octeon.img.gz
this particular problem no longer occurs, thank you!
--
Denis Ovsienko
State-Changed-From-To: feedback->needs-pullups
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 01 Apr 2023 23:20:18 +0000
State-Changed-Why:
confirmed fixed, now needs to get into -9 and -10
State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: gutteridge@NetBSD.org
State-Changed-When: Thu, 27 Jul 2023 21:57:13 +0000
State-Changed-Why:
Pullups submitted as pullup-10 #256, pullup-9 #1674.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57280 CVS commit: [netbsd-10] src/sys/arch/mips/cavium/dev
Date: Sun, 30 Jul 2023 11:39:33 +0000
Module Name: src
Committed By: martin
Date: Sun Jul 30 11:39:33 UTC 2023
Modified Files:
src/sys/arch/mips/cavium/dev [netbsd-10]: octeon_rnm.c
Log Message:
Pull up following revision(s) (requested by gutteridge in ticket #256):
sys/arch/mips/cavium/dev/octeon_rnm.c: revision 1.16
octrnm(4): Raise delay on startup.
According to CN50XX-HRM-V0.99E and CN78XX-HM-0.99E:
The entropy is provided by the jitter of 125 of 128 free-running
oscillators XORed into a 128-bit LFSR. The LFSR accumulates entropy
over 81 cycles, after which it is fed into a SHA-1 engine.
[...]
The SHA-1 engine runs once every 81 cycles.
[...]
The hardware produces new 64-bit random number every 81 cycles.
The last sentence means that we only need to wait 81 cycles _between_
consecutive SHA-1 outputs (which isn't relevant anyway because we
reconfigure it into raw mode later), but the first two quotes might
mean that we need to wait 81+81 cycles for the _first_ output to be
produced on boot when running the self-test.
Now, in this case, the self-test is run with the LFSR unhooked, by
clearing the RNM_CTL_STATUS[ENT_EN] bit, so that SHA-1 is computed
from a known input -- this is really just paranoia to make sure that
_some_ functions of the device (which is conjured out of thin air at
a fixed virtual address, with no firmware bindings to guide us)
behave as we expect.
And it's not clear if it really does take 81+81 cycles for the first
SHA-1 output to appear when the LFSR isn't feeding into it anyway.
But experimentally, delay of 81+81 cycles seems to work whereas a
delay of only 81 cycles crashes.
PR kern/57280
To generate a diff of this commit:
cvs rdiff -u -r1.15 -r1.15.4.1 src/sys/arch/mips/cavium/dev/octeon_rnm.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57280 CVS commit: [netbsd-9] src/sys/arch/mips/cavium/dev
Date: Sun, 30 Jul 2023 11:41:48 +0000
Module Name: src
Committed By: martin
Date: Sun Jul 30 11:41:48 UTC 2023
Modified Files:
src/sys/arch/mips/cavium/dev [netbsd-9]: octeon_rnm.c
Log Message:
Pull up following revision(s) (requested by gutteridge in ticket #256):
sys/arch/mips/cavium/dev/octeon_rnm.c: revision 1.16 (patch)
octrnm(4): Raise delay on startup.
According to CN50XX-HRM-V0.99E and CN78XX-HM-0.99E:
The entropy is provided by the jitter of 125 of 128 free-running
oscillators XORed into a 128-bit LFSR. The LFSR accumulates entropy
over 81 cycles, after which it is fed into a SHA-1 engine.
[...]
The SHA-1 engine runs once every 81 cycles.
[...]
The hardware produces new 64-bit random number every 81 cycles.
The last sentence means that we only need to wait 81 cycles _between_
consecutive SHA-1 outputs (which isn't relevant anyway because we
reconfigure it into raw mode later), but the first two quotes might
mean that we need to wait 81+81 cycles for the _first_ output to be
produced on boot when running the self-test.
Now, in this case, the self-test is run with the LFSR unhooked, by
clearing the RNM_CTL_STATUS[ENT_EN] bit, so that SHA-1 is computed
from a known input -- this is really just paranoia to make sure that
_some_ functions of the device (which is conjured out of thin air at
a fixed virtual address, with no firmware bindings to guide us)
behave as we expect.
And it's not clear if it really does take 81+81 cycles for the first
SHA-1 output to appear when the LFSR isn't feeding into it anyway.
But experimentally, delay of 81+81 cycles seems to work whereas a
delay of only 81 cycles crashes.
PR kern/57280
To generate a diff of this commit:
cvs rdiff -u -r1.2.4.2 -r1.2.4.3 src/sys/arch/mips/cavium/dev/octeon_rnm.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: port-mips-maintainer->riastradh
Responsible-Changed-By: gutteridge@NetBSD.org
Responsible-Changed-When: Sun, 30 Jul 2023 15:06:34 +0000
Responsible-Changed-Why:
Give credit to riastradh@.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: gutteridge@NetBSD.org
State-Changed-When: Sun, 30 Jul 2023 15:07:01 +0000
State-Changed-Why:
Pullups completed, closing PR.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.