NetBSD Problem Report #57614

From www@netbsd.org  Sun Sep 10 22:00:09 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A34841A923C
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 10 Sep 2023 22:00:09 +0000 (UTC)
Message-Id: <20230910220007.8911B1A923F@mollari.NetBSD.org>
Date: Sun, 10 Sep 2023 22:00:07 +0000 (UTC)
From: joelp@sdf.org
Reply-To: joelp@sdf.org
To: gnats-bugs@NetBSD.org
Subject: Kernel panic rebooting a Pine64 RockPro64 with netbsd10.0_beta; successful after panic reboot.
X-Send-Pr-Version: www-1.0

>Number:         57614
>Category:       port-evbarm
>Synopsis:       Kernel panic rebooting a Pine64 RockPro64 with netbsd10.0_beta; successful after panic reboot.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-evbarm-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Sep 10 22:05:00 +0000 2023
>Last-Modified:  Tue Sep 12 23:20:01 +0000 2023
>Originator:     Joel P.
>Release:        NetBSD 10.0_BETA
>Organization:
(individual)
>Environment:
rock64# uname -a
NetBSD rock64 10.0_BETA NetBSD 10.0_BETA (ROCKPRO64) #0: Sat Sep  2 11:44:00 PDT 2023  joelp@cow.labdatum.net:/home/joelp/src-netbsd10/obj/sys/arch/evbarm/compile/ROCKPRO64 evbarm
>Description:
Initial power on or an initiated reboot results in a kernel panic. When the panic reboots the machine (or after cont command in db), it will complete boot after fsck. I have a serial console attached.

There are two different panics, 1) starts with usb_syncmem(), 2) starts with usb_allocmem().

#1
Starting devpubd.
[   4.5020523] panic: Trap: Data Abort (EL1): Translation Fault L0 with read access for 0000000000000000: pc ffffc0000018c6b4: ldp x0, x1, [x5]

[   4.5020523] cpu3: Begin traceback...
[   4.5020523] trace fp ffffc001557776c0
[   4.5020523] fp ffffc001557776f0 vpanic() at ffffc00000590398 netbsd:vpanic+0x178
[   4.5020523] fp ffffc00155777750 panic() at ffffc000005904a4 netbsd:panic+0x44
[   4.5120594] fp ffffc001557777e0 data_abort_handler() at ffffc000000b599c netbsd:data_abort_handler+0x1ec
[   4.5120594] tf ffffc00155777850 el1_trap() at ffffc000000b6f84 netbsd:el1_vectors+0x784
[   4.5120594] ---- Data Abort (EL1): trapframe 0xffffc00155777850 (304 bytes) ----
[   4.5120594]     pc=ffffc0000018c6b4,   spsr=0000000060000005
[   4.5120594]    esr=0000000096000004,    far=0000000000000000
[   4.5120594]     x0=ffffc0016665f7c0,     x1=0000000000000000
[   4.5120594]     x2=0000000000000000,     x3=0000000000000010
[   4.5120594]     x4=0000000000000005,     x5=0000000000000000
[   4.5120594]     x6=0000000000002000,     x7=ffff00000408f700
[   4.5220548]     x8=0000000000001f80,     x9=0000000000000008
[   4.5220548]    x10=ffffc000000abbd4,    x11=000000000322bf00
[   4.5220548]    x12=fffffc00000b4b0c,    x13=fffffc00000b4b11
[   4.5220548]    x14=0000000000000020,    x15=ffff0000030f15f8
[   4.5220548]    x16=ffffc000000abbd4,    x17=0000f51dd8e2c374
[   4.5220548]    x18=0000000000001000,    x19=ffffc0004029c000
[   4.5220548]    x20=ffff000002d2c2e0,    x21=ffffc0016665f790
[   4.5220548]    x22=ffffc0016665f6b0,    x23=ffff000004442c80
[   4.5220548]    x24=ffffc0016665f720,    x25=ffff0000004aff40
[   4.5220548]    x26=ffffc0016665f640,    x27=00000000ffffff80
[   4.5220548]    x28=ffff000004442d28, fp=x29=ffffc00155777b80
[   4.5220548] lr=x30=ffffc0000022cfe4,     sp=ffffc00155777b80
[   4.5220548] ------------------------------------------------
[   4.5220548] fp ffffc00155777b80 usb_syncmem() at ffffc0000018c6b4 netbsd:usb_syncmem+0x20
[   4.5320546] fp ffffc00155777c00 usbd_transfer() at ffffc00000184ef8 netbsd:usbd_transfer+0x218
[   4.5320546] fp ffffc00155777c40 usbd_do_request_len() at ffffc00000186978 netbsd:usbd_do_request_len+0xd8
[   4.5320546] fp ffffc00155777cb0 usbd_get_initial_ddesc() at ffffc00000187494 netbsd:usbd_get_initial_ddesc+0x34
[   4.5320546] fp ffffc00155777d20 usbd_new_device() at ffffc0000018b284 netbsd:usbd_new_device+0x214
[   4.5420541] fp ffffc00155777dc0 uhub_explore() at ffffc0000018f6dc netbsd:uhub_explore+0x4fc
[   4.5420541] fp ffffc00155777e70 usb_discover() at ffffc0000018260c netbsd:usb_discover+0x5c
[   4.5520554] fp ffffc00155777ea0 usb_event_thread() at ffffc00000182a4c netbsd:usb_event_thread+0x7c
[   4.5520554] tf ffffc00155777ed0 el0_trap() at ffffc000000b6ff0 netbsd:el1_trap_exit+0x68
[   4.5520554] cpu3: End traceback...
Stopped in pid 0.210 (system) at        netbsd:cpu_Debugger+0xc:        ldp
x29, x30, [sp],#16
db{3}>

#2
IPv6 mode: host
Configuring network interfaces:.
Adding interface aliases:.
Waiting for duplicate address detection to finish...
Starting dhcpcd.
[   4.8321866] panic: Trap: Data Abort (EL1): Translation Fault L0 with read access for 0000000000000000: pc ffffc0000018c1b0: ldr x5, [x2]

[   4.8321866] cpu5: Begin traceback...
[   4.8321866] trace fp ffffc001557e9750
[   4.8321866] fp ffffc001557e9780 vpanic() at ffffc00000590398 netbsd:vpanic+0x178
[   4.8321866] fp ffffc001557e97e0 panic() at ffffc000005904a4 netbsd:panic+0x44
[   4.8321866] fp ffffc001557e9870 data_abort_handler() at ffffc000000b599c netbsd:data_abort_handler+0x1ec
[   4.8321866] tf ffffc001557e98e0 el1_trap() at ffffc000000b6f84 netbsd:el1_vectors+0x784
[   4.8321866] ---- Data Abort (EL1): trapframe 0xffffc001557e98e0 (304 bytes) ----
[   4.8321866]     pc=ffffc0000018c1b0,   spsr=0000000020000005
[   4.8421934]    esr=0000000096000004,    far=0000000000000000
[   4.8421934]     x0=0000000000000000,     x1=ffffc00166678f00
[   4.8421934]     x2=0000000000000000,     x3=0000000000000000
[   4.8421934]     x4=ffff000003705d28,     x5=ffff0000f4b73b80
[   4.8421934]     x6=0000000000000000,     x7=ffffffffffffffff
[   4.8421934]     x8=00000000000010c0,     x9=0000000000000010
[   4.8421934]    x10=ffffc000000abbd4,    x11=0000000007f444c8
[   4.8421934]    x12=0000000007f29f58,    x13=0000000007f29f48
[   4.8421934]    x14=0000000007f29f50,    x15=0000000007f29f28
[   4.8421934]    x16=ffffc000000abbd4,    x17=0000fd9d4266c244
[   4.8421934]    x18=0000000007f29000,    x19=ffffc00000f0ee20
[   4.8421934]    x20=ffff0000f4b73b80,    x21=0000000000000008
[   4.8421934]    x22=ffffc00001315b80,    x23=ffff000003705d28
[   4.8421934]    x24=0000000000000002,    x25=0000000000000000
[   4.8421934]    x26=0000000000000002,    x27=0000000000031128
[   4.8421934]    x28=0000000000000000, fp=x29=ffffc001557e9c10
[   4.8421934] lr=x30=ffffc0000018c198,     sp=ffffc001557e9c10
[   4.8421934] ------------------------------------------------
[   4.8421934] fp ffffc001557e9c10 usb_allocmem() at ffffc0000018c1b0 netbsd:usb_allocmem+0x7c
[   4.8521885] fp ffffc001557e9c70 ohci_open() at ffffc0000022c418 netbsd:ohci_open+0x198
[   4.8521885] fp ffffc001557e9cd0 usbd_setup_pipe_flags() at ffffc0000018a9e0 netbsd:usbd_setup_pipe_flags+0xe0
[   4.8521885] fp ffffc001557e9d20 usbd_new_device() at ffffc0000018b3ac netbsd:usbd_new_device+0x33c
[   4.8521885] fp ffffc001557e9dc0 uhub_explore() at ffffc0000018f6dc netbsd:uhub_explore+0x4fc
[   4.8521885] fp ffffc001557e9e70 usb_discover() at ffffc0000018260c netbsd:usb_discover+0x5c
[   4.8621872] fp ffffc001557e9ea0 usb_event_thread() at ffffc00000182a4c netbsd:usb_event_thread+0x7c
[   4.8621872] tf ffffc001557e9ed0 el0_trap() at ffffc000000b6ff0 netbsd:el1_trap_exit+0x68
[   4.8621872] cpu5: End traceback...
[   4.8621872] rebooting...


First few lines of dmesg:
[     1.000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
[     1.000000]     2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
[     1.000000]     2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023
[     1.000000]     The NetBSD Foundation, Inc.  All rights reserved.
[     1.000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[     1.000000]     The Regents of the University of California.  All rights reserved.

[     1.000000] NetBSD 10.0_BETA (ROCKPRO64) #0: Sat Sep  2 11:44:00 PDT 2023
[     1.000000]         joelp@cow:/home/joelp/src-netbsd10/obj/sys/arch/evbarm/compile/ROCKPRO64
[     1.000000] total memory = 3914 MB
[     1.000000] avail memory = 3772 MB
>How-To-Repeat:
Use a Pine64 4GB RockPro64 with a recent netbsd-10 (mine is Sep 02).

With a serial console attached or with HDMI monitor attached, you can see the kernel panic every other reboot. It's either a power on (from full power off) or from a (for example) shutdown -r now.

You will see the panic and it will then reboot or halt in debugger. After the automatic reboot or after cont command in debugger, the kernel successfully boots and the system runs correctly.

I only recently noticed the panic (it just seemed to take longer to boot), but unfortunately, I can't put a date on a build when it first started.

Thanks - Joel
>Fix:
No known fix with recent kernel builds as of Sep 10.

Is it possible the second reboot pauses long enough during fsck to avoid a race condition causing the panic?

>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org, joelp@sdf.org
Cc: port-evbarm-maintainer@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: port-evbarm/57614: Kernel panic rebooting a Pine64 RockPro64 with netbsd10.0_beta; successful after panic reboot.
Date: Tue, 12 Sep 2023 06:34:36 +1000

 do you have devices plugged in to the usb2 (ehci) port?  if possible,
 can you use the usb3 (xhci) port only and see if the problem occurs?

 i've seen this problem (at least the syncmem version of it) irregularly
 for a while on rk3399 (pinebookpro mostly, not very often on rockpro64),
 but i haven't yet figured out what causes it.

 i don't think there's a race condition related fsck, while it normally
 happens at reboot, i've seen it after being up, and then later inserting
 a usb device and it crashed.

 (i thought i'd logged a PR about it, but i can't find it.)

 can you build a kernel with USB_DEBUG, UHUB_DEBUG, XHCI_DEBUG, and
 EHCI_DEBUG debug options enabled, and with set these sysctl values
 in /etc sysctl.conf:

    hw.usb.debug=1
    hw.uhub.debug=1
    hw.xhci.debug=1
    hw.ehci.debug=1

 and then when it crashes, you can either collect a crash dump or if
 the serial console works properly, simply run this:

   db{1}> show kernhist usbhist

 which will output a bunch debugging events from before the crash.

 thanks


 .mrg.

From: joelp@sdf.org
To: "matthew green" <mrg@eterna.com.au>
Cc: gnats-bugs@netbsd.org, joelp@sdf.org, port-evbarm-maintainer@netbsd.org,
        gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: re: port-evbarm/57614: Kernel panic rebooting a Pine64 RockPro64
 with netbsd10.0_beta; successful after panic reboot.
Date: Tue, 12 Sep 2023 23:19:15 -0000

 I have an APC UPS attached to a USB2 port (and running apcupsd) when I see
 the panic on boot.

 I do not see a panic when I attach the UPS to the TypeA USB3 port. I
 rebooted several times to be sure -- no panic.

 I've been unable to get any more information from the debugger. I first
 tried without DDB and DDB_ONPANIC=1, and about every 2nd or 3rd time, the
 debugger would break, but 'show kernhist usbhist' did not give any more
 details.

 Looking at the docs, I tried increasing the value of hw.usb.debug, etc,
 sysctl settings but that didn't help either.

 I then added DDB and DDB_ONPANIC=1 options and that kernel never landed in
 the debugger after several manual reboot attempts. It does still panic and
 reboots (see below).

 Also, per the docs, it mentions I can use boot netbsd -d to start in ddb;
 however I'm never able to continue running under ddb:

 [   1.0000000] total memory = 3909 MB
 [   1.0000000] avail memory = 3768 MB
 [   1.0000000] entropy: ready
 [   1.0000000] Entering DDB...
 Stopped in pid 0.0 (system) at  netbsd:cpu_Debugger+0xc:        ldp    
 x29, x30
 , [sp],#16
 db{0}> continue

 After that continue, there are several more panics, then it freezes. I'm
 probably missing something here, sorry, my kernel debugging background is
 Windows unfortunately :)

 Is there any more config or kernel options I can include? Is there
 something more I need to do after boot netbsd -d?

 Here's the panic that won't break into debugger with DDB set:

 Waiting for duplicate address detection to finish...
 Starting dhcpcd.
 [   5.0229996] panic: kernel debugging assertion
 "usb_valid_block_p(f->ufd_block, &usb_blk_fraglist)" failed: file
 "/lab/src-netbsd10/src/sys/dev/usb/usb_mem.c", line 303 usb_allocmem: usb
 frag 0xffffc001553cf080: unknown block pointer 0x0
 [   5.0229996] cpu2: Begin traceback...
 [   5.0229996] trace fp ffffc001552cca60
 [   5.0229996] fp ffffc001552cca90 vpanic() at ffffc000005b8768
 netbsd:vpanic+0x178
 [   5.0229996] fp ffffc001552ccaf0 kern_assert() at ffffc00000851588
 netbsd:kern_assert+0x58
 [   5.0330044] fp ffffc001552ccb80 usb_allocmem() at ffffc00000197a4c
 netbsd:usb_allocmem+0x5ac
 [   5.0330044] fp ffffc001552ccc00 ohci_open() at ffffc0000023f668
 netbsd:ohci_open+0x198
 [   5.0330044] fp ffffc001552ccc50 usbd_setup_pipe_flags() at
 ffffc00000193670 netbsd:usbd_setup_pipe_flags+0x1a0
 [   5.0330044] fp ffffc001552cccc0 usbd_new_device() at ffffc000001956a4
 netbsd:usbd_new_device+0x744
 [   5.0430010] fp ffffc001552ccd60 uhub_explore() at ffffc0000019b200
 netbsd:uhub_explore+0x2d0
 [   5.0430010] fp ffffc001552cce20 usb_discover() at ffffc000001846a0
 netbsd:usb_discover+0x70
 [   5.0430010] fp ffffc001552cce70 usb_event_thread() at ffffc00000184988
 netbsd:usb_event_thread+0xc8
 [   5.0430010] tf ffffc001552cced0 el0_trap() at ffffc000000b6ff0
 netbsd:el1_trap_exit+0x68
 [   5.0530002] cpu2: End traceback...
 [   5.0530002] rebooting...

 Here are my changes to the kernel config (copied from GENERIC64):

 #options    DIAGNOSTIC  # internal consistency checks
 options     DDB
 options     DDB_ONPANIC=1
 options     DEBUG
 options     USB_DEBUG
 options     UHUB_DEBUG
 options     XHCI_DEBUG
 options     EHCI_DEBUG
 #options    LOCKDEBUG

 Thanks - Joel

 > do you have devices plugged in to the usb2 (ehci) port?  if possible,
 > can you use the usb3 (xhci) port only and see if the problem occurs?
 >
 > i've seen this problem (at least the syncmem version of it) irregularly
 > for a while on rk3399 (pinebookpro mostly, not very often on rockpro64),
 > but i haven't yet figured out what causes it.
 >
 > i don't think there's a race condition related fsck, while it normally
 > happens at reboot, i've seen it after being up, and then later inserting
 > a usb device and it crashed.
 >
 > (i thought i'd logged a PR about it, but i can't find it.)
 >
 > can you build a kernel with USB_DEBUG, UHUB_DEBUG, XHCI_DEBUG, and
 > EHCI_DEBUG debug options enabled, and with set these sysctl values
 > in /etc sysctl.conf:
 >
 >    hw.usb.debug=1
 >    hw.uhub.debug=1
 >    hw.xhci.debug=1
 >    hw.ehci.debug=1
 >
 > and then when it crashes, you can either collect a crash dump or if
 > the serial console works properly, simply run this:
 >
 >   db{1}> show kernhist usbhist
 >
 > which will output a bunch debugging events from before the crash.
 >
 > thanks
 >
 >
 > .mrg.
 >

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.