NetBSD Problem Report #57614
From www@netbsd.org Sun Sep 10 22:00:09 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id A34841A923C
for <gnats-bugs@gnats.NetBSD.org>; Sun, 10 Sep 2023 22:00:09 +0000 (UTC)
Message-Id: <20230910220007.8911B1A923F@mollari.NetBSD.org>
Date: Sun, 10 Sep 2023 22:00:07 +0000 (UTC)
From: joelp@sdf.org
Reply-To: joelp@sdf.org
To: gnats-bugs@NetBSD.org
Subject: Kernel panic rebooting a Pine64 RockPro64 with netbsd10.0_beta; successful after panic reboot.
X-Send-Pr-Version: www-1.0
>Number: 57614
>Category: port-evbarm
>Synopsis: Kernel panic rebooting a Pine64 RockPro64 with netbsd10.0_beta; successful after panic reboot.
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-evbarm-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Sep 10 22:05:00 +0000 2023
>Last-Modified: Tue Sep 12 23:20:01 +0000 2023
>Originator: Joel P.
>Release: NetBSD 10.0_BETA
>Organization:
(individual)
>Environment:
rock64# uname -a
NetBSD rock64 10.0_BETA NetBSD 10.0_BETA (ROCKPRO64) #0: Sat Sep 2 11:44:00 PDT 2023 joelp@cow.labdatum.net:/home/joelp/src-netbsd10/obj/sys/arch/evbarm/compile/ROCKPRO64 evbarm
>Description:
Initial power on or an initiated reboot results in a kernel panic. When the panic reboots the machine (or after cont command in db), it will complete boot after fsck. I have a serial console attached.
There are two different panics, 1) starts with usb_syncmem(), 2) starts with usb_allocmem().
#1
Starting devpubd.
[ 4.5020523] panic: Trap: Data Abort (EL1): Translation Fault L0 with read access for 0000000000000000: pc ffffc0000018c6b4: ldp x0, x1, [x5]
[ 4.5020523] cpu3: Begin traceback...
[ 4.5020523] trace fp ffffc001557776c0
[ 4.5020523] fp ffffc001557776f0 vpanic() at ffffc00000590398 netbsd:vpanic+0x178
[ 4.5020523] fp ffffc00155777750 panic() at ffffc000005904a4 netbsd:panic+0x44
[ 4.5120594] fp ffffc001557777e0 data_abort_handler() at ffffc000000b599c netbsd:data_abort_handler+0x1ec
[ 4.5120594] tf ffffc00155777850 el1_trap() at ffffc000000b6f84 netbsd:el1_vectors+0x784
[ 4.5120594] ---- Data Abort (EL1): trapframe 0xffffc00155777850 (304 bytes) ----
[ 4.5120594] pc=ffffc0000018c6b4, spsr=0000000060000005
[ 4.5120594] esr=0000000096000004, far=0000000000000000
[ 4.5120594] x0=ffffc0016665f7c0, x1=0000000000000000
[ 4.5120594] x2=0000000000000000, x3=0000000000000010
[ 4.5120594] x4=0000000000000005, x5=0000000000000000
[ 4.5120594] x6=0000000000002000, x7=ffff00000408f700
[ 4.5220548] x8=0000000000001f80, x9=0000000000000008
[ 4.5220548] x10=ffffc000000abbd4, x11=000000000322bf00
[ 4.5220548] x12=fffffc00000b4b0c, x13=fffffc00000b4b11
[ 4.5220548] x14=0000000000000020, x15=ffff0000030f15f8
[ 4.5220548] x16=ffffc000000abbd4, x17=0000f51dd8e2c374
[ 4.5220548] x18=0000000000001000, x19=ffffc0004029c000
[ 4.5220548] x20=ffff000002d2c2e0, x21=ffffc0016665f790
[ 4.5220548] x22=ffffc0016665f6b0, x23=ffff000004442c80
[ 4.5220548] x24=ffffc0016665f720, x25=ffff0000004aff40
[ 4.5220548] x26=ffffc0016665f640, x27=00000000ffffff80
[ 4.5220548] x28=ffff000004442d28, fp=x29=ffffc00155777b80
[ 4.5220548] lr=x30=ffffc0000022cfe4, sp=ffffc00155777b80
[ 4.5220548] ------------------------------------------------
[ 4.5220548] fp ffffc00155777b80 usb_syncmem() at ffffc0000018c6b4 netbsd:usb_syncmem+0x20
[ 4.5320546] fp ffffc00155777c00 usbd_transfer() at ffffc00000184ef8 netbsd:usbd_transfer+0x218
[ 4.5320546] fp ffffc00155777c40 usbd_do_request_len() at ffffc00000186978 netbsd:usbd_do_request_len+0xd8
[ 4.5320546] fp ffffc00155777cb0 usbd_get_initial_ddesc() at ffffc00000187494 netbsd:usbd_get_initial_ddesc+0x34
[ 4.5320546] fp ffffc00155777d20 usbd_new_device() at ffffc0000018b284 netbsd:usbd_new_device+0x214
[ 4.5420541] fp ffffc00155777dc0 uhub_explore() at ffffc0000018f6dc netbsd:uhub_explore+0x4fc
[ 4.5420541] fp ffffc00155777e70 usb_discover() at ffffc0000018260c netbsd:usb_discover+0x5c
[ 4.5520554] fp ffffc00155777ea0 usb_event_thread() at ffffc00000182a4c netbsd:usb_event_thread+0x7c
[ 4.5520554] tf ffffc00155777ed0 el0_trap() at ffffc000000b6ff0 netbsd:el1_trap_exit+0x68
[ 4.5520554] cpu3: End traceback...
Stopped in pid 0.210 (system) at netbsd:cpu_Debugger+0xc: ldp
x29, x30, [sp],#16
db{3}>
#2
IPv6 mode: host
Configuring network interfaces:.
Adding interface aliases:.
Waiting for duplicate address detection to finish...
Starting dhcpcd.
[ 4.8321866] panic: Trap: Data Abort (EL1): Translation Fault L0 with read access for 0000000000000000: pc ffffc0000018c1b0: ldr x5, [x2]
[ 4.8321866] cpu5: Begin traceback...
[ 4.8321866] trace fp ffffc001557e9750
[ 4.8321866] fp ffffc001557e9780 vpanic() at ffffc00000590398 netbsd:vpanic+0x178
[ 4.8321866] fp ffffc001557e97e0 panic() at ffffc000005904a4 netbsd:panic+0x44
[ 4.8321866] fp ffffc001557e9870 data_abort_handler() at ffffc000000b599c netbsd:data_abort_handler+0x1ec
[ 4.8321866] tf ffffc001557e98e0 el1_trap() at ffffc000000b6f84 netbsd:el1_vectors+0x784
[ 4.8321866] ---- Data Abort (EL1): trapframe 0xffffc001557e98e0 (304 bytes) ----
[ 4.8321866] pc=ffffc0000018c1b0, spsr=0000000020000005
[ 4.8421934] esr=0000000096000004, far=0000000000000000
[ 4.8421934] x0=0000000000000000, x1=ffffc00166678f00
[ 4.8421934] x2=0000000000000000, x3=0000000000000000
[ 4.8421934] x4=ffff000003705d28, x5=ffff0000f4b73b80
[ 4.8421934] x6=0000000000000000, x7=ffffffffffffffff
[ 4.8421934] x8=00000000000010c0, x9=0000000000000010
[ 4.8421934] x10=ffffc000000abbd4, x11=0000000007f444c8
[ 4.8421934] x12=0000000007f29f58, x13=0000000007f29f48
[ 4.8421934] x14=0000000007f29f50, x15=0000000007f29f28
[ 4.8421934] x16=ffffc000000abbd4, x17=0000fd9d4266c244
[ 4.8421934] x18=0000000007f29000, x19=ffffc00000f0ee20
[ 4.8421934] x20=ffff0000f4b73b80, x21=0000000000000008
[ 4.8421934] x22=ffffc00001315b80, x23=ffff000003705d28
[ 4.8421934] x24=0000000000000002, x25=0000000000000000
[ 4.8421934] x26=0000000000000002, x27=0000000000031128
[ 4.8421934] x28=0000000000000000, fp=x29=ffffc001557e9c10
[ 4.8421934] lr=x30=ffffc0000018c198, sp=ffffc001557e9c10
[ 4.8421934] ------------------------------------------------
[ 4.8421934] fp ffffc001557e9c10 usb_allocmem() at ffffc0000018c1b0 netbsd:usb_allocmem+0x7c
[ 4.8521885] fp ffffc001557e9c70 ohci_open() at ffffc0000022c418 netbsd:ohci_open+0x198
[ 4.8521885] fp ffffc001557e9cd0 usbd_setup_pipe_flags() at ffffc0000018a9e0 netbsd:usbd_setup_pipe_flags+0xe0
[ 4.8521885] fp ffffc001557e9d20 usbd_new_device() at ffffc0000018b3ac netbsd:usbd_new_device+0x33c
[ 4.8521885] fp ffffc001557e9dc0 uhub_explore() at ffffc0000018f6dc netbsd:uhub_explore+0x4fc
[ 4.8521885] fp ffffc001557e9e70 usb_discover() at ffffc0000018260c netbsd:usb_discover+0x5c
[ 4.8621872] fp ffffc001557e9ea0 usb_event_thread() at ffffc00000182a4c netbsd:usb_event_thread+0x7c
[ 4.8621872] tf ffffc001557e9ed0 el0_trap() at ffffc000000b6ff0 netbsd:el1_trap_exit+0x68
[ 4.8621872] cpu5: End traceback...
[ 4.8621872] rebooting...
First few lines of dmesg:
[ 1.000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
[ 1.000000] 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
[ 1.000000] 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023
[ 1.000000] The NetBSD Foundation, Inc. All rights reserved.
[ 1.000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[ 1.000000] The Regents of the University of California. All rights reserved.
[ 1.000000] NetBSD 10.0_BETA (ROCKPRO64) #0: Sat Sep 2 11:44:00 PDT 2023
[ 1.000000] joelp@cow:/home/joelp/src-netbsd10/obj/sys/arch/evbarm/compile/ROCKPRO64
[ 1.000000] total memory = 3914 MB
[ 1.000000] avail memory = 3772 MB
>How-To-Repeat:
Use a Pine64 4GB RockPro64 with a recent netbsd-10 (mine is Sep 02).
With a serial console attached or with HDMI monitor attached, you can see the kernel panic every other reboot. It's either a power on (from full power off) or from a (for example) shutdown -r now.
You will see the panic and it will then reboot or halt in debugger. After the automatic reboot or after cont command in debugger, the kernel successfully boots and the system runs correctly.
I only recently noticed the panic (it just seemed to take longer to boot), but unfortunately, I can't put a date on a build when it first started.
Thanks - Joel
>Fix:
No known fix with recent kernel builds as of Sep 10.
Is it possible the second reboot pauses long enough during fsck to avoid a race condition causing the panic?
>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org, joelp@sdf.org
Cc: port-evbarm-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: port-evbarm/57614: Kernel panic rebooting a Pine64 RockPro64 with netbsd10.0_beta; successful after panic reboot.
Date: Tue, 12 Sep 2023 06:34:36 +1000
do you have devices plugged in to the usb2 (ehci) port? if possible,
can you use the usb3 (xhci) port only and see if the problem occurs?
i've seen this problem (at least the syncmem version of it) irregularly
for a while on rk3399 (pinebookpro mostly, not very often on rockpro64),
but i haven't yet figured out what causes it.
i don't think there's a race condition related fsck, while it normally
happens at reboot, i've seen it after being up, and then later inserting
a usb device and it crashed.
(i thought i'd logged a PR about it, but i can't find it.)
can you build a kernel with USB_DEBUG, UHUB_DEBUG, XHCI_DEBUG, and
EHCI_DEBUG debug options enabled, and with set these sysctl values
in /etc sysctl.conf:
hw.usb.debug=1
hw.uhub.debug=1
hw.xhci.debug=1
hw.ehci.debug=1
and then when it crashes, you can either collect a crash dump or if
the serial console works properly, simply run this:
db{1}> show kernhist usbhist
which will output a bunch debugging events from before the crash.
thanks
.mrg.
From: joelp@sdf.org
To: "matthew green" <mrg@eterna.com.au>
Cc: gnats-bugs@netbsd.org, joelp@sdf.org, port-evbarm-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: re: port-evbarm/57614: Kernel panic rebooting a Pine64 RockPro64
with netbsd10.0_beta; successful after panic reboot.
Date: Tue, 12 Sep 2023 23:19:15 -0000
I have an APC UPS attached to a USB2 port (and running apcupsd) when I see
the panic on boot.
I do not see a panic when I attach the UPS to the TypeA USB3 port. I
rebooted several times to be sure -- no panic.
I've been unable to get any more information from the debugger. I first
tried without DDB and DDB_ONPANIC=1, and about every 2nd or 3rd time, the
debugger would break, but 'show kernhist usbhist' did not give any more
details.
Looking at the docs, I tried increasing the value of hw.usb.debug, etc,
sysctl settings but that didn't help either.
I then added DDB and DDB_ONPANIC=1 options and that kernel never landed in
the debugger after several manual reboot attempts. It does still panic and
reboots (see below).
Also, per the docs, it mentions I can use boot netbsd -d to start in ddb;
however I'm never able to continue running under ddb:
[ 1.0000000] total memory = 3909 MB
[ 1.0000000] avail memory = 3768 MB
[ 1.0000000] entropy: ready
[ 1.0000000] Entering DDB...
Stopped in pid 0.0 (system) at netbsd:cpu_Debugger+0xc: ldp
x29, x30
, [sp],#16
db{0}> continue
After that continue, there are several more panics, then it freezes. I'm
probably missing something here, sorry, my kernel debugging background is
Windows unfortunately :)
Is there any more config or kernel options I can include? Is there
something more I need to do after boot netbsd -d?
Here's the panic that won't break into debugger with DDB set:
Waiting for duplicate address detection to finish...
Starting dhcpcd.
[ 5.0229996] panic: kernel debugging assertion
"usb_valid_block_p(f->ufd_block, &usb_blk_fraglist)" failed: file
"/lab/src-netbsd10/src/sys/dev/usb/usb_mem.c", line 303 usb_allocmem: usb
frag 0xffffc001553cf080: unknown block pointer 0x0
[ 5.0229996] cpu2: Begin traceback...
[ 5.0229996] trace fp ffffc001552cca60
[ 5.0229996] fp ffffc001552cca90 vpanic() at ffffc000005b8768
netbsd:vpanic+0x178
[ 5.0229996] fp ffffc001552ccaf0 kern_assert() at ffffc00000851588
netbsd:kern_assert+0x58
[ 5.0330044] fp ffffc001552ccb80 usb_allocmem() at ffffc00000197a4c
netbsd:usb_allocmem+0x5ac
[ 5.0330044] fp ffffc001552ccc00 ohci_open() at ffffc0000023f668
netbsd:ohci_open+0x198
[ 5.0330044] fp ffffc001552ccc50 usbd_setup_pipe_flags() at
ffffc00000193670 netbsd:usbd_setup_pipe_flags+0x1a0
[ 5.0330044] fp ffffc001552cccc0 usbd_new_device() at ffffc000001956a4
netbsd:usbd_new_device+0x744
[ 5.0430010] fp ffffc001552ccd60 uhub_explore() at ffffc0000019b200
netbsd:uhub_explore+0x2d0
[ 5.0430010] fp ffffc001552cce20 usb_discover() at ffffc000001846a0
netbsd:usb_discover+0x70
[ 5.0430010] fp ffffc001552cce70 usb_event_thread() at ffffc00000184988
netbsd:usb_event_thread+0xc8
[ 5.0430010] tf ffffc001552cced0 el0_trap() at ffffc000000b6ff0
netbsd:el1_trap_exit+0x68
[ 5.0530002] cpu2: End traceback...
[ 5.0530002] rebooting...
Here are my changes to the kernel config (copied from GENERIC64):
#options DIAGNOSTIC # internal consistency checks
options DDB
options DDB_ONPANIC=1
options DEBUG
options USB_DEBUG
options UHUB_DEBUG
options XHCI_DEBUG
options EHCI_DEBUG
#options LOCKDEBUG
Thanks - Joel
> do you have devices plugged in to the usb2 (ehci) port? if possible,
> can you use the usb3 (xhci) port only and see if the problem occurs?
>
> i've seen this problem (at least the syncmem version of it) irregularly
> for a while on rk3399 (pinebookpro mostly, not very often on rockpro64),
> but i haven't yet figured out what causes it.
>
> i don't think there's a race condition related fsck, while it normally
> happens at reboot, i've seen it after being up, and then later inserting
> a usb device and it crashed.
>
> (i thought i'd logged a PR about it, but i can't find it.)
>
> can you build a kernel with USB_DEBUG, UHUB_DEBUG, XHCI_DEBUG, and
> EHCI_DEBUG debug options enabled, and with set these sysctl values
> in /etc sysctl.conf:
>
> hw.usb.debug=1
> hw.uhub.debug=1
> hw.xhci.debug=1
> hw.ehci.debug=1
>
> and then when it crashes, you can either collect a crash dump or if
> the serial console works properly, simply run this:
>
> db{1}> show kernhist usbhist
>
> which will output a bunch debugging events from before the crash.
>
> thanks
>
>
> .mrg.
>
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.