NetBSD Problem Report #55388

From www@netbsd.org  Mon Jun 15 08:12:40 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 63D361A9213
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 15 Jun 2020 08:12:40 +0000 (UTC)
Message-Id: <20200615081239.3004E1A9218@mollari.NetBSD.org>
Date: Mon, 15 Jun 2020 08:12:39 +0000 (UTC)
From: marklmi26-intf@yahoo.com
Reply-To: marklmi26-intf@yahoo.com
To: gnats-bugs@NetBSD.org
Subject: current on aarch64 Odroid C2: tar -xzf src.tar.gz crashes NetBSD multiple ways
X-Send-Pr-Version: www-1.0

>Number:         55388
>Category:       kern
>Synopsis:       current on aarch64 Odroid C2: tar -xzf src.tar.gz crashes NetBSD multiple ways
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    ad
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jun 15 08:15:00 +0000 2020
>Last-Modified:  Tue Jun 16 01:40:01 +0000 2020
>Originator:     Mark Millard
>Release:        current
>Organization:
>Environment:
NetBSD NBSDODC2 9.99.64 NetBSD 9.99.64 (GENERIC64) #0: Wed Jun  3 07:06:18 UTC 2020  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64 evbarm aarch64

>Description:
The detailed failure tended to vary but someplace in vcache_reclaim
always seemed to be involved in the call chain. 2 crash backtraces
are reported below.

# tar -xzf src.tar.gz -C /usr
[ 809055.7412856] panic: Trap: Data Abort (EL1): Translation Fault L0 with read access for e456e6619dd79e2a: pc ffffc00000476e48: opcode f940029c: ldr x28, [x20]

[ 809055.7512870] cpu2: Begin traceback...
[ 809055.7512870] trace fp ffffc00042bcf7b0
[ 809055.7612880] fp ffffc00042bcf7d0 vpanic() at ffffc000004b2324 netbsd:vpanic+0x15c
[ 809055.7612880] fp ffffc00042bcf840 panic() at ffffc000004b241c netbsd:panic+0x44
[ 809055.7712931] fp ffffc00042bcf8d0 data_abort_handler() at ffffc0000008c2ec netbsd:data_abort_handler+0x4dc
[ 809055.7812920] tf ffffc00042bcf950 el1_trap() at ffffc00000088bd8 netbsd:el1_trap
[ 809055.7912930] ---- trapframe 0xffffc00042bcf950 (304 bytes) ----
[ 809055.7912930]     pc=ffffc00000476e48,   spsr=0000000060000005
[ 809055.8012925]    esr=0000000096000004,    far=e456e6619dd79e2a
[ 809055.8012925]     x0=0000000000000001,     x1=00000000000000c4
[ 809055.8112929]     x2=00000000000000c4,     x3=0000000000000000
[ 809055.8212920]     x4=ffffc000000815ec,     x5=0000000000000000
[ 809055.8212920]     x6=0000000000000006,     x7=000000000000000a
[ 809055.8312921]     x8=0000000000000004,     x9=ffffc00042bcfdb8
[ 809055.8312921]    x10=ffffc00042bcfdb8,    x11=0000000000000000
[ 809055.8412930]    x12=000000007ccf4000,    x13=0000000000000018
[ 809055.8512946]    x14=0000000000006000,    x15=ffffffffffffffe8
[ 809055.8512946]    x16=0000000000000000,    x17=0000000000000000
[ 809055.8612947]    x18=0000000000001000,    x19=ffff00007fdea580
[ 809055.8612947]    x20=e456e6619dd79e2a,    x21=ffffffffffffffe4
[ 809055.8712943]    x22=0000000000000000,    x23=ffff00007fdea580
[ 809055.8812963]    x24=0000000000000008,    x25=0000000000000000
[ 809055.8812963]    x26=0000000000000001,    x27=ffff00007f12f000
[ 809055.8912956]    x28=ffff00006751c000, fp=x29=ffffc00042bcfc80
[ 809055.8912956] lr=x30=ffffc00000476df4,     sp=ffffc00042bcfc80
[ 809055.9012962] ------------------------------------------------
[ 809055.9112976] fp ffffc00042bcfc80 rw_enter() at ffffc00000476e48 netbsd:rw_enter+0x88
[ 809055.9212996] fp ffffc00042bcfd30 vcache_reclaim() at ffffc00000514ae8 netbsd:vcache_reclaim+0x98
[ 809055.9212996] fp ffffc00042bcfdf0 vrecycle() at ffffc00000516714 netbsd:vrecycle+0x13c
[ 809055.9313029] fp ffffc00042bcfe30 vdrain_thread() at ffffc00000517118 netbsd:vdrain_thread+0x518
address 0x100 is invalid
address 0xe8 is invalid
[ 809055.9413058] cpu2: End traceback...

[ 809055.9513026] dump to dev 92,9 not possible
[ 809060.9515521] rebooting...

Trying again after the reboot got:

# tar -xf src.tar.gz -C /usr
[ 2906.7962893] panic: kernel diagnostic assertion "RB_SENTINEL_P(standin_son) || RB_RED_P(standin_son)" failed: file "/home/source/ab/HEAD/src/sys/lib/libkern/../../../common/lib/libc/gen/rb.c", line 582 
[ 2906.8062904] cpu3: Begin traceback...
[ 2906.8162905] trace fp ffffc00042bcfaf0
[ 2906.8162905] fp ffffc00042bcfb10 vpanic() at ffffc000004b2324 netbsd:vpanic+0x15c
[ 2906.8262906] fp ffffc00042bcfb80 kern_assert() at ffffc000007d054c netbsd:kern_assert+0x5c
[ 2906.8362924] fp ffffc00042bcfc10 rb_tree_remove_node() at ffffc000007cfe84 netbsd:rb_tree_remove_node+0x454
[ 2906.8462917] fp ffffc00042bcfc90 cache_remove() at ffffc000004fb654 netbsd:cache_remove+0x124
[ 2906.8562921] fp ffffc00042bcfcd0 cache_purge1() at ffffc000004fe1f4 netbsd:cache_purge1+0x1cc
[ 2906.8662922] fp ffffc00042bcfd30 vcache_reclaim() at ffffc00000514b34 netbsd:vcache_reclaim+0xe4
[ 2906.8662922] fp ffffc00042bcfdf0 vrecycle() at ffffc00000516714 netbsd:vrecycle+0x13c
[ 2906.8762927] fp ffffc00042bcfe30 vdrain_thread() at ffffc00000517118 netbsd:vdrain_thread+0x518
address 0x100 is invalid
address 0xe8 is invalid
[ 2906.8862939] cpu3: End traceback...

[ 2906.8962945] dump to dev 92,9 not possible
[ 2911.8965471] rebooting...

In both examples, it worked on extracting the tar for a long time
before getting the eventual error.

FYI:

# uname -ap
NetBSD NBSDODC2 9.99.64 NetBSD 9.99.64 (GENERIC64) #0: Wed Jun  3 07:06:18 UTC 2020  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64 evbarm aarch64

# df -m
Filesystem    1M-blocks       Used      Avail %Cap Mounted on
/dev/ld1a        117296      10346     101085   9% /
/dev/ld1e            79         28         51  35% /boot
kernfs                0          0          0 100% /kern
ptyfs                 0          0          0 100% /dev/pts
procfs                0          0          0 100% /proc
tmpfs               503          0        503   0% /var/shm

/dev/ld1 is the (removable) eMMC.

Not that it contributes, but I did have a swap file
set up for the above, but not for the below.

By contrast . . .

Under 9.0_STABLE, the analogous tar -xzf commands for the sources
for 9.0_STABLE completed just fine.

# uname -ap
NetBSD arm64 9.0_STABLE NetBSD 9.0_STABLE (GENERIC64) #0: Thu Jun 11 11:04:11 UTC 2020  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64 evbarm aarch64

This suggests that the problem is specific to current.

I did all this activity as root. I am new to NetBSD.

>How-To-Repeat:
I'd not done much beyond dd'ing the image from armbsd.org, booting it,
and list a name in /etc/rc.conf and set up a swap file before downloading
(ftp) a src.tar.gz (days later) and then trying to expand it. Also
downloaded was xsrc.tar.gz --and one for pkgsrc as well.

Given src.tar.gz is already present, all it took to get
the error was (in each case):

# tar -xzf src.tar.gz -C /usr

and waiting for it.

>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Mon, 15 Jun 2020 18:25:38 +0000
Responsible-Changed-Why:
I'll take a look.


From: Mark Millard <marklmi26-intf@yahoo.com>
To: gnats-bugs@netbsd.org
Cc: ad@netbsd.org,
 kern-bug-people@netbsd.org,
 netbsd-bugs@netbsd.org,
 gnats-admin@netbsd.org
Subject: Re: kern/55388 (current on aarch64 Odroid C2: tar -xzf src.tar.gz
 crashes NetBSD multiple ways)
Date: Mon, 15 Jun 2020 17:06:13 -0700

 On 2020-Jun-15, at 11:25, ad@netbsd.org <ad at NetBSD.org> wrote:

 > Synopsis: current on aarch64 Odroid C2: tar -xzf src.tar.gz crashes =
 NetBSD multiple ways
 >=20
 > Responsible-Changed-From-To: kern-bug-people->ad
 > Responsible-Changed-By: ad@NetBSD.org
 > Responsible-Changed-When: Mon, 15 Jun 2020 18:25:38 +0000
 > Responsible-Changed-Why:
 > I'll take a look.
 >=20

 I got to thinking about it and it is possible that I'd
 added the following to /etc/sysctl.conf for the testing
 of -current on the ODroid C2, copying from my earlier
 RPi4 experiments:

 vm.anonmin=3D70
 vm.anonmax=3D90
 vm.execmin=3D2
 vm.execmax=3D4
 vm.filemin=3D1
 vm.filemax=3D6

 I did not do so for the later ODroid C2 NetBSD-9.0_stable
 experiment. So this might be another difference beyond
 the swapfile in-use vs. not distinction.

 I've not tried the tar based technique of getting the
 sources on the (4 GiByte) RPi4.

 =3D=3D=3D
 Mark Millard
 marklmi at yahoo.com
 ( dsl-only.net went
 away in early 2018-Mar)

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.