NetBSD Problem Report #57280

From www@netbsd.org  Tue Mar 21 18:38:54 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id F13F71A9239
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 21 Mar 2023 18:38:53 +0000 (UTC)
Message-Id: <20230321183852.01F6E1A923C@mollari.NetBSD.org>
Date: Tue, 21 Mar 2023 18:38:51 +0000 (UTC)
From: denis@ovsienko.info
Reply-To: denis@ovsienko.info
To: gnats-bugs@NetBSD.org
Subject: Octeon boot fails (kernel: bus error (load or store) trap)
X-Send-Pr-Version: www-1.0

>Number:         57280
>Category:       port-mips
>Synopsis:       Octeon boot fails (kernel: bus error (load or store) trap)
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    riastradh
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Mar 21 18:40:00 +0000 2023
>Closed-Date:    Sun Jul 30 15:07:01 +0000 2023
>Last-Modified:  Sun Jul 30 15:07:01 +0000 2023
>Originator:     Denis Ovsienko
>Release:        NetBSD 10 beta
>Organization:
>Environment:
EdgeRouter 4
>Description:
reading netbsd
7481080 bytes read in 217 ms (32.9 MiB/s)
reading netbsd.md5
** Unable to read file netbsd.md5 **
Allocating memory for ELF segment: addr: 0xffffffff80200000 (adjusted to: 0x200000), size 0x63b980
## Loading big-endian Linux kernel with entry point: 0xffffffff80200000 ...
Bootloader: Done loading app on coremask:
 0xf
Starting cores:
 0xf
[   1.0000000] MIPS32/64 params: cpu arch: 256
[   1.0000000] MIPS32/64 params: TLB entries: 256
[   1.0000000] MIPS32/64 params: Icache: line=128, total=79872, ways=39, sets=16, colors=0
[   1.0000000] MIPS32/64 params: Dcache: line=128, total=32768, ways=32, sets=8, colors=0
[   1.0000000] MIPS32/64 params: SDcache: line=128, total=1048576, ways=8, sets=1024, colors=16
[   1.0000000]   Dcache is coherent
[   1.0000000] u-boot bootmem desc @ 0x6c108 version 3.0
[   1.0000000] phys segment: 0x100000 @ 0x100000
[   1.0000000] adding 0x100000 @ 0x100000 to freelist 0
[   1.0000000] phys segment: 0xf574000 @ 0x83c000
[   1.0000000] adding 0xf574000 @ 0x83c000 to freelist 0
[   1.0000000] phys segment: 0xe000 @ 0xfdb4000
[   1.0000000] adding 0xe000 @ 0xfdb4000 to freelist 0
[   1.0000000] phys segment: 0x2f0 @ 0xffb6000 (short)
[   1.0000000] phys segment: 0x80 @ 0xffd6000 (short)
[   1.0000000] phys segment: 0x60 @ 0xfff6000 (short)
[   1.0000000] phys segment: 0x2effe000 @ 0x20000000
[   1.0000000] adding 0x2effe000 @ 0x20000000 to freelist 0
[   1.0000000] phys segment: 0x1080 @ 0x4f000000 (short)
[   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
[   1.0000000]     2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
[   1.0000000]     2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023
[   1.0000000]     The NetBSD Foundation, Inc.  All rights reserved.
[   1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[   1.0000000]     The Regents of the University of California.  All rights reserved.

[   1.0000000] NetBSD 10.0_BETA (OCTEON) #0: Mon Mar 20 17:25:14 UTC 2023
[   1.0000000]  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbmips/compile/OCTEON
[   1.0000000] Cavium Octeon CN7130-1000
[   1.0000000] total memory = 1024 MB
[   1.0000000] avail memory = 982 MB
[   1.0000000] mainbus0 (root)
[   1.0000000] cpunode0 at mainbus0: 4 cores, crypto+kasumi, 64bit-mul, unaligned-access ok
[   1.0000000] cpu0 at cpunode0 core 0: 1000.00MHz
[   1.0000000] cpu0: Cavium CN7130-1000 (0xd9602) Rev. 2 with built-in FPU
[   1.0000000] cpu0: 256 TLB entries, 512TB (49-bit) VAs, 512TB (49-bit) PAs, 256MB max page size
[   1.0000000] cpu0: 78KB/128B 39-way set-associative L1 instruction cache
[   1.0000000] cpu0: 32KB/128B 32-way set-associative write-through coherent L1 data cache
[   1.0000000] cpu0: 1024KB/128B 8-way set-associative write-back L2 unified cache
[   1.0000000] cpu1 at cpunode0 core 1: disabled (uniprocessor kernel)
[   1.0000000] cpu2 at cpunode0 core 2: disabled (uniprocessor kernel)
[   1.0000000] cpu3 at cpunode0 core 3: disabled (uniprocessor kernel)
[   1.0000000] wdog0 at cpunode0: default period is 4 seconds
[   1.0000000] iobus0 at mainbus0
[   1.0000000] iobus0: initializing POW
[   1.0000000] iobus0: initializing FPA
[   1.0000000] octrnm0 at iobus0 address 0x0001180040000000
[   1.0000000] pid 0(system): trap: cpu0, bus error (load or store) in kernel mode
[   1.0000000] status=0xa3, cause=0x1c, epc=0xffffffff80210568, vaddr=0xc0000000001fe000
[   1.0000000] tf=0x980000000fdb7870 ksp=0x980000000fdb79b0 ra=0xffffffff80210540 ppl=0x98000000
[   1.0000000] kernel: bus error (load or store) trap
Stopped in pid 0.0 (system) at  netbsd:octrnm_attach+0xf8:      ld      a2,0(v0)

db> trace
0x980000000fdb79b0: octrnm_attach+0xf8 (0xffffffff808102c8,0x9001180040000000,0,
0xfffffeb2a7fa0000) ra 0xffffffff804b908c sz 32
0x980000000fdb79d0: config_attach_internal+0x1d4 (0xffffffff808102c8,0x900118004
0000000,0,0xfffffeb2a7fa0000) ra 0xffffffff804b933c sz 80
0x980000000fdb7a20: config_found+0x144 (0xffffffff808102c8,0x9001180040000000,0,
0xfffffeb2a7fa0000) ra 0xffffffff8020d148 sz 112
0x980000000fdb7a90: iobus_attach+0x198 (0xffffffff808102c8,0x9001180040000000,0,
0xfffffeb2a7fa0000) ra 0xffffffff804b908c sz 192
0x980000000fdb7b50: config_attach_internal+0x1d4 (0xffffffff808102c8,0x900118004
0000000,0,0xfffffeb2a7fa0000) ra 0xffffffff804b933c sz 80
0x980000000fdb7ba0: config_found+0x144 (0xffffffff808102c8,0x9001180040000000,0,
0xfffffeb2a7fa0000) ra 0xffffffff80206668 sz 112
0x980000000fdb7c10: mainbus_attach+0x120 (0xffffffff808102c8,0x9001180040000000,
0,0xfffffeb2a7fa0000) ra 0xffffffff804b908c sz 336
0x980000000fdb7d60: config_attach_internal+0x1d4 (0xffffffff808102c8,0x900118004
0000000,0,0xfffffeb2a7fa0000) ra 0xffffffff804b952c sz 80
0x980000000fdb7db0: config_rootfound+0x54 (0xffffffff808102c8,0,0,0xfffffeb2a7fa
0000) ra 0xffffffff802004b8 sz 80
0x980000000fdb7e00: cpu_configure+0x28 (0xffffffff808102c8,0,0,0xfffffeb2a7fa000
0) ra 0xffffffff80617428 sz 16
0x980000000fdb7e10: main+0x3f8 (0xffffffff808102c8,0,0,0xfffffeb2a7fa0000) ra 0x
ffffffff802000d4 sz 144
0x980000000fdb7ea0: kernel_text+0xd4 (0xffffffff808102c8,0,0,0xfffffeb2a7fa0000)
 ra 0 sz 0
User-level: pid 0.0
db> 

>How-To-Repeat:
1. Write https://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-10/202303210730Z/evbmips-mips64eb/binary/gzimg/octeon.img.gz to USB storage.
2. Boot EdgeRouter 4 from the USB storage, FAT partition 1, file "netbsd".
3. Observe the crash very soon.

Serial console access can be provided if necessary.
>Fix:

>Release-Note:

>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57280 CVS commit: src/sys/arch/mips/cavium/dev
Date: Tue, 21 Mar 2023 22:07:29 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Tue Mar 21 22:07:29 UTC 2023

 Modified Files:
 	src/sys/arch/mips/cavium/dev: octeon_rnm.c

 Log Message:
 octrnm(4): Raise delay on startup.

 According to CN50XX-HRM-V0.99E and CN78XX-HM-0.99E:

    The entropy is provided by the jitter of 125 of 128 free-running
    oscillators XORed into a 128-bit LFSR.  The LFSR accumulates entropy
    over 81 cycles, after which it is fed into a SHA-1 engine.
    [...]
    The SHA-1 engine runs once every 81 cycles.
    [...]
    The hardware produces new 64-bit random number every 81 cycles.

 The last sentence means that we only need to wait 81 cycles _between_
 consecutive SHA-1 outputs (which isn't relevant anyway because we
 reconfigure it into raw mode later), but the first two quotes might
 mean that we need to wait 81+81 cycles for the _first_ output to be
 produced on boot when running the self-test.

 Now, in this case, the self-test is run with the LFSR unhooked, by
 clearing the RNM_CTL_STATUS[ENT_EN] bit, so that SHA-1 is computed
 from a known input -- this is really just paranoia to make sure that
 _some_ functions of the device (which is conjured out of thin air at
 a fixed virtual address, with no firmware bindings to guide us)
 behave as we expect.

 And it's not clear if it really does take 81+81 cycles for the first
 SHA-1 output to appear when the LFSR isn't feeding into it anyway.
 But experimentally, delay of 81+81 cycles seems to work whereas a
 delay of only 81 cycles crashes.

 PR kern/57280

 XXX pullup-10
 XXX pullup-9


 To generate a diff of this commit:
 cvs rdiff -u -r1.15 -r1.16 src/sys/arch/mips/cavium/dev/octeon_rnm.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Tue, 21 Mar 2023 22:11:57 +0000
State-Changed-Why:
Candidate fix committed -- can you try a new kernel?
Patch should apply cleanly on netbsd-10 too.


From: Denis Ovsienko <denis@ovsienko.info>
To: riastradh@NetBSD.org
Cc: gnats-bugs@netbsd.org, port-mips-maintainer@netbsd.org,
        netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: port-mips/57280 (Octeon boot fails (kernel: bus error (load or
 store) trap))
Date: Wed, 22 Mar 2023 19:28:26 +0000

 With
 http://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/202303220350Z/evbmips-mips64eb/binary/gzimg/octeon.img.gz
 this particular problem no longer occurs, thank you!

 -- 
     Denis Ovsienko

State-Changed-From-To: feedback->needs-pullups
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 01 Apr 2023 23:20:18 +0000
State-Changed-Why:
confirmed fixed, now needs to get into -9 and -10


State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: gutteridge@NetBSD.org
State-Changed-When: Thu, 27 Jul 2023 21:57:13 +0000
State-Changed-Why:
Pullups submitted as pullup-10 #256, pullup-9 #1674.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57280 CVS commit: [netbsd-10] src/sys/arch/mips/cavium/dev
Date: Sun, 30 Jul 2023 11:39:33 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Jul 30 11:39:33 UTC 2023

 Modified Files:
 	src/sys/arch/mips/cavium/dev [netbsd-10]: octeon_rnm.c

 Log Message:
 Pull up following revision(s) (requested by gutteridge in ticket #256):

 	sys/arch/mips/cavium/dev/octeon_rnm.c: revision 1.16

 octrnm(4): Raise delay on startup.

 According to CN50XX-HRM-V0.99E and CN78XX-HM-0.99E:
     The entropy is provided by the jitter of 125 of 128 free-running
     oscillators XORed into a 128-bit LFSR.  The LFSR accumulates entropy
     over 81 cycles, after which it is fed into a SHA-1 engine.
     [...]
     The SHA-1 engine runs once every 81 cycles.
     [...]
     The hardware produces new 64-bit random number every 81 cycles.

 The last sentence means that we only need to wait 81 cycles _between_
 consecutive SHA-1 outputs (which isn't relevant anyway because we
 reconfigure it into raw mode later), but the first two quotes might
 mean that we need to wait 81+81 cycles for the _first_ output to be
 produced on boot when running the self-test.

 Now, in this case, the self-test is run with the LFSR unhooked, by
 clearing the RNM_CTL_STATUS[ENT_EN] bit, so that SHA-1 is computed
 from a known input -- this is really just paranoia to make sure that
 _some_ functions of the device (which is conjured out of thin air at
 a fixed virtual address, with no firmware bindings to guide us)
 behave as we expect.

 And it's not clear if it really does take 81+81 cycles for the first
 SHA-1 output to appear when the LFSR isn't feeding into it anyway.

 But experimentally, delay of 81+81 cycles seems to work whereas a
 delay of only 81 cycles crashes.
 PR kern/57280


 To generate a diff of this commit:
 cvs rdiff -u -r1.15 -r1.15.4.1 src/sys/arch/mips/cavium/dev/octeon_rnm.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57280 CVS commit: [netbsd-9] src/sys/arch/mips/cavium/dev
Date: Sun, 30 Jul 2023 11:41:48 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Jul 30 11:41:48 UTC 2023

 Modified Files:
 	src/sys/arch/mips/cavium/dev [netbsd-9]: octeon_rnm.c

 Log Message:
 Pull up following revision(s) (requested by gutteridge in ticket #256):

 	sys/arch/mips/cavium/dev/octeon_rnm.c: revision 1.16 (patch)

 octrnm(4): Raise delay on startup.

 According to CN50XX-HRM-V0.99E and CN78XX-HM-0.99E:
     The entropy is provided by the jitter of 125 of 128 free-running
     oscillators XORed into a 128-bit LFSR.  The LFSR accumulates entropy
     over 81 cycles, after which it is fed into a SHA-1 engine.
     [...]
     The SHA-1 engine runs once every 81 cycles.
     [...]
     The hardware produces new 64-bit random number every 81 cycles.

 The last sentence means that we only need to wait 81 cycles _between_
 consecutive SHA-1 outputs (which isn't relevant anyway because we
 reconfigure it into raw mode later), but the first two quotes might
 mean that we need to wait 81+81 cycles for the _first_ output to be
 produced on boot when running the self-test.

 Now, in this case, the self-test is run with the LFSR unhooked, by
 clearing the RNM_CTL_STATUS[ENT_EN] bit, so that SHA-1 is computed
 from a known input -- this is really just paranoia to make sure that
 _some_ functions of the device (which is conjured out of thin air at
 a fixed virtual address, with no firmware bindings to guide us)
 behave as we expect.

 And it's not clear if it really does take 81+81 cycles for the first
 SHA-1 output to appear when the LFSR isn't feeding into it anyway.

 But experimentally, delay of 81+81 cycles seems to work whereas a
 delay of only 81 cycles crashes.
 PR kern/57280


 To generate a diff of this commit:
 cvs rdiff -u -r1.2.4.2 -r1.2.4.3 src/sys/arch/mips/cavium/dev/octeon_rnm.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: port-mips-maintainer->riastradh
Responsible-Changed-By: gutteridge@NetBSD.org
Responsible-Changed-When: Sun, 30 Jul 2023 15:06:34 +0000
Responsible-Changed-Why:
Give credit to riastradh@.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: gutteridge@NetBSD.org
State-Changed-When: Sun, 30 Jul 2023 15:07:01 +0000
State-Changed-Why:
Pullups completed, closing PR.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.