NetBSD Problem Report #58073

From www@netbsd.org  Sun Mar 24 11:50:51 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 3F0401A9239
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 24 Mar 2024 11:50:51 +0000 (UTC)
Message-Id: <20240324115049.1C22F1A923A@mollari.NetBSD.org>
Date: Sun, 24 Mar 2024 11:50:49 +0000 (UTC)
From: jspath55@gmail.com
Reply-To: jspath55@gmail.com
To: gnats-bugs@NetBSD.org
Subject: panic: Trap: Data Abort (EL1): Translation Fault L0 on Pi3 during automated tests
X-Send-Pr-Version: www-1.0

>Number:         58073
>Category:       port-evbarm
>Synopsis:       panic: Trap: Data Abort (EL1): Translation Fault L0 on Pi3 during automated tests
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-evbarm-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Mar 24 11:55:00 +0000 2024
>Last-Modified:  Fri Mar 29 14:20:02 +0000 2024
>Originator:     Jim Spath
>Release:        10.0 RC6
>Organization:
>Environment:
NetBSD nb3b.home 10.0_RC6 NetBSD 10.0_RC6 (GENERIC64) #0: Tue Mar 12 10:19:02 UTC 2024  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64 evbarm
>Description:
After several uneventuful usr-test runs on a Pi3B, the system panicked and rebooted.
The usr test suite was nearly complete; log portions follow below.
It looks like the last commands run were in the fs/ffs/t_snapshot case snapshotstress.

Automated Test Framework run command:
/usr/bin/atf-run | /usr/bin/tee /log/tests-${nm}-${dt}.log | /usr/bin/atf-report >/log/tests-${nm}-${dt}.txt 2>/log/tests-${nm}-${dt}.err

Log files: 
-rw-r--r--   1 root  wheel    413629 Mar 22 08:03 tests-nb3b.home-202403220442.txt
-rw-r--r--   1 root  wheel  10327858 Mar 22 08:03 tests-nb3b.home-202403220442.log

$ tail  /log/tests-nb3b.home-202403220442.txt
    extattr_simple: [0.297141s] Passed.
[0.600340s]

fs/ffs/t_fifos (756/935): 1 test cases
    fifos: [0.281438s] Passed.
[0.292858s]

fs/ffs/t_snapshot (757/935): 2 test cases
    snapshot: [1.241567s] Passed.
    snapshotstress:

$ tail  /log/tests-nb3b.home-202403220442.log
tc-so:super-block backups (for fsck_ffs -b #) at:
tc-so:32, 2536, 5040, 7544,
tc-so:[   1.0000000] entropy: ready
tc-so:[   2.0100050] /dev/fss0: file system not clean (fs_clean=0); please fsck(8)
tc-end: 1711094632.792737, snapshot, passed
tc-start: 1711094632.804160, snapshotstress
tc-so:ffs.img: 4.9MB (10000 sectors) block size 4096, fragment size 512
tc-so:  using 4 cylinder groups of 1.22MB, 313 blks, 608 inodes.
tc-so:super-block backups (for fsck_ffs -b #) at:
tc-so:32, 2536, 5040, 7544,

The /var/log/message report end:

Mar 22 07:40:29 nb3b ntpd[1739]: error resolving pool 1.netbsd.pool.ntp.org: Temporary failure in name resolution (2)
Mar 22 07:40:55 nb3b root: /usr/tests/sys/rc/h_simple: ERROR: the restart command does not take any parameters
Mar 22 07:40:55 nb3b root: /usr/tests/sys/rc/h_simple: ERROR: the start command does not take any parameters
Mar 22 07:40:56 nb3b root: /usr/tests/sys/rc/h_simple: ERROR: the stop command does not take any parameters
Mar 22 07:40:57 nb3b dhcpcd[731]: ps_root_recvmsg: Host is down
Mar 22 07:41:34 nb3b ntpd[1739]: error resolving pool 1.netbsd.pool.ntp.org: Temporary failure in name resolution (2)
Mar 22 07:42:02 nb3b dhcpcd[731]: ps_root_recvmsg: Host is down
Mar 22 08:03:31 nb3b inetd[10767]: 5439/tcp: max spawn rate (0 in 60 seconds) already met; closing for 600 seconds
Mar 22 08:05:04  syslogd[715]: restart
Mar 22 08:05:04  /netbsd: [ 465954.2272048] panic: Trap: Data Abort (EL1): Translation Fault L0 with read access for 000000000000005a: pc ffffc000003d0e34: ldrb w3, [x22,#90]
Mar 22 08:05:04  /netbsd:
Mar 22 08:05:04  /netbsd: [ 465954.2272048] cpu0: Begin traceback...
Mar 22 08:05:04  /netbsd: [ 465954.2272048] trace fp ffffc0009a497550
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497580 vpanic() at ffffc000004ef218 netbsd:vpanic+0x178
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a4975e0 panic() at ffffc000004ef324 netbsd:panic+0x44
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497670 data_abort_handler() at ffffc000000a962c netbsd:data_abort_handler+0x1ec
Mar 22 08:05:04  /netbsd: [ 465954.2272048] tf ffffc0009a4976e0 el1_trap() at ffffc000000aaf84 netbsd:el1_vectors+0x784
Mar 22 08:05:04  /netbsd: [ 465954.2272048] ---- Data Abort (EL1): trapframe 0xffffc0009a4976e0 (304 bytes) ----
Mar 22 08:05:04  /netbsd: [ 465954.2272048]     pc=ffffc000003d0e34,   spsr=0000000020000005
Mar 22 08:05:04  /netbsd: [ 465954.2272048]    esr=0000000096000004,    far=000000000000005a
Mar 22 08:05:04  /netbsd: [ 465954.2272048]     x0=ffff00000b1ca9f8,     x1=ffff00003a9cc0d0
Mar 22 08:05:04  /netbsd: [ 465954.2272048]     x2=0000000000000000,     x3=0000000000000001
Mar 22 08:05:04  /netbsd: [ 465954.2272048]     x4=ffff00003a9fe3e8,     x5=ffff00003abf7000
Mar 22 08:05:04  /netbsd: [ 465954.2272048]     x6=ffff00002858c4d0,     x7=ffff00003a9c9b80
Mar 22 08:05:04  /netbsd: [ 465954.2272048]     x8=0000000000000418,     x9=ffff00003b0d9bc0
Mar 22 08:05:04  /netbsd: [ 465954.2272048]    x10=ffffc000000a07d4,    x11=000000000000003f
Mar 22 08:05:04  /netbsd: [ 465954.2272048]    x12=000003fffffff738,    x13=000003fffffff746
Mar 22 08:05:04  /netbsd: [ 465954.2272048]    x14=0000000000000005,    x15=0000fffffffdd1b0
Mar 22 08:05:04  /netbsd: [ 465954.2272048]    x16=ffffc0000009e384,    x17=0000f5864516c4f4
Mar 22 08:05:04  /netbsd: [ 465954.2272048]    x18=00000000ffffffff,    x19=ffff00003a9cc200
Mar 22 08:05:04  /netbsd: [ 465954.2272048]    x20=ffff00000b1ca9a0,    x21=ffff00003a9cc250
Mar 22 08:05:04  /netbsd: [ 465954.2272048]    x22=0000000000000000,    x23=ffff00003a9fe000
Mar 22 08:05:04  /netbsd: [ 465954.2272048]    x24=ffff00003a8a2640,    x25=ffff00003a9fe400
Mar 22 08:05:04  /netbsd: [ 465954.2272048]    x26=ffff00003a9fe3e8,    x27=000000000000005a
Mar 22 08:05:04  /netbsd: [ 465954.2272048]    x28=ffff00003a8a2670, fp=x29=ffffc0009a497a10
Mar 22 08:05:04  /netbsd: [ 465954.2272048] lr=x30=ffffc000003d1e00,     sp=ffffc0009a497a10
Mar 22 08:05:04  /netbsd: [ 465954.2272048] ------------------------------------------------
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497a10 dwc2_assign_and_init_hc() at ffffc000003d0e34 netbsd:dwc2_assign_and_init_hc+0x84
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497a90 dwc2_hcd_select_transactions() at ffffc000003d1dfc netbsd:dwc2_hcd_select_transactions+0x15c
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497b00 dwc2_release_channel() at ffffc000003d48e0 netbsd:dwc2_release_channel+0xe0
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497b30 dwc2_hc_xfercomp_intr() at ffffc000003d58ec netbsd:dwc2_hc_xfercomp_intr+0x32c
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497b80 dwc2_handle_hcd_intr() at ffffc000003d68ac netbsd:dwc2_handle_hcd_intr+0x568
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497c00 dwc2_intr() at ffffc000003cc4b8 netbsd:dwc2_intr+0xe4
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497c30 bcm2835_icu_intr() at ffffc0000001d77c netbsd:bcm2835_icu_intr+0x1c
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497c50 pic_dispatch() at ffffc000000023a8 netbsd:pic_dispatch+0x44
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497c90 pic_do_pending_ints() at ffffc00000002858 netbsd:pic_do_pending_ints+0x358
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497e28 cpu_idle() at ffffc000000a56f0 netbsd:cpu_idle+0x4c
Mar 22 08:05:04  /netbsd: [ 465954.2272048] fp ffffc0009a497e70 idle_loop() at ffffc000004a2844 netbsd:idle_loop+0xb4
Mar 22 08:05:04  /netbsd: [ 465954.2272048] tf ffffc0009a497ed0 el0_trap() at ffffc000000aaff0 netbsd:el1_trap_exit+0x68
Mar 22 08:05:04  /netbsd: [ 465954.2272048] cpu0: End traceback...
Mar 22 08:05:04  /netbsd: [ 465954.2272048] rebooting...
Mar 22 08:05:04  /netbsd: [   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
[...]
Mar 22 08:05:04  /netbsd: [   1.4858384] sdmmc0: direct I/O error 5, r=6 p=0xffffc000aa9c6e4c write
Mar 22 08:05:04  /netbsd: [   1.5358413] sdmmc0: SD card status: 4-bit, C10, U3, V30, A2
[...]
>How-To-Repeat:
Unknown.
I will isolate this case and run it more frequently to see if/when the fault repeats.
At least 10 automated test runs finished without this problem occurring.
>Fix:
Unknown.

>Audit-Trail:
From: Jim Spath <jspath55@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-evbarm/58073: panic: Trap: Data Abort (EL1): Translation
 Fault L0 on Pi3 during automated tests
Date: Fri, 29 Mar 2024 10:19:01 -0400

 On Sun, Mar 24, 2024 at 7:55=E2=80=AFAM <gnats-admin@netbsd.org> wrote:
 > >Synopsis:       panic: Trap: Data Abort (EL1): Translation Fault L0 on P=
 i3 during automated tests
 > >Arrival-Date:   Sun Mar 24 11:55:00 +0000 2024

 I have run the last test case logged before this panic by itself
 (>2400 runs) once a minute with no ill effect.
 I do not fully understand the stack trace details, and think some
 other cause than the fs/ffs/t_snapshot test triggered the panic.
 I will let this run a bit longer then go back to full automated test
 framework runs.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.