NetBSD Problem Report #57385

From www@netbsd.org  Sun Apr 30 20:35:13 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 206B01A9241
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 30 Apr 2023 20:35:13 +0000 (UTC)
Message-Id: <20230430203511.679C81A9242@mollari.NetBSD.org>
Date: Sun, 30 Apr 2023 20:35:11 +0000 (UTC)
From: jspath55@gmail.com
Reply-To: jspath55@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Automated test case for puffs file system fails intermittently on different architectures
X-Send-Pr-Version: www-1.0

>Number:         57385
>Category:       kern
>Synopsis:       Automated test case for puffs file system fails intermittently on different architectures
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Apr 30 20:40:01 +0000 2023
>Last-Modified:  Sun Apr 30 21:50:01 +0000 2023
>Originator:     Jim Spath
>Release:        10.0_BETA
>Organization:
>Environment:
NetBSD nb10 10.0_BETA NetBSD 10.0_BETA (GENERIC64) #0: Fri Jan 13 19:15:32 UTC 2023  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64 evbarm

>Description:
Found this result on some test case runs:

tc-end: 1682130361.979255, mountfuzz8, failed, Test program received signal 11 (core dumped)
tc-end: 1682281741.642125, mountfuzz8, failed, Test program received signal 11 (core dumped)

I set up repeated runs on different systems, after observing some passes and some failures. 

Example 1 (i386 system):

tps-count: 1
tp-start: 1682130360.870328, fs/puffs/t_fuzz, 1
tc-start: 1682130360.870402, mountfuzz8
tc-so:test seeded RNG with 1682130360
tc-so:[   1.0000000] entropy: ready
tc-so:[   1.0800050] putter_fop_ioctl: putter 0 not inited
...
tc-so:[   1.3200050] putter_fop_read: private 0 not inited
tc-so:[   1.3200050] putter_fop_ioctl: putter 0 not inited
tc-so:[   1
tc-se:Test program crashed; attempting to get stack trace
tc-se:[New process 25814]
tc-se:[New process 12868]
...
tc-se:Core was generated by `t_fuzz'.
tc-se:Program terminated with signal SIGSEGV, Segmentation fault.
tc-se:#0  0xb876cc10 in rumpuser_mutex_spin_p () from /usr/lib/librumpuser.so.0
tc-se:[Current thread is 1 (process 25814)]
tc-se:#0  0xb876cc10 in rumpuser_mutex_spin_p () from /usr/lib/librumpuser.so.0
tc-se:#1  0xb88519bd in rumpns_mutex_enter () from /usr/lib/librump.so.0
tc-se:#2  0xb8a65d96 in rumpns_puffs_msgif_getout () from /usr/lib/librumpfs_syspuffs.so.0
tc-se:#3  0xb8a7784b in ?? () from /usr/lib/librumpdev_putter.so.0
tc-se:#4  0xb87c8e84 in rumpns_dofileread () from /usr/lib/librump.so.0
tc-se:#5  0xb87c8f4f in rumpns_sys_read () from /usr/lib/librump.so.0
tc-se:#6  0xb885a47f in rump_syscall () from /usr/lib/librump.so.0
tc-se:#7  0xb884c29b in rump___sysimpl_read () from /usr/lib/librump.so.0
tc-se:#8  0x00473518 in respondthread ()
tc-se:#9  0xb875aeab in ?? () from /usr/lib/libpthread.so.1
tc-se:#10 0xb85b82b0 in __mknod50 () from /usr/lib/libc.so.12
tc-se:#11 0xb5d76a00 in ?? ()
tc-se:#12 0xb892e000 in ?? () from /usr/lib/librumpvfs.so.0
tc-se:Stack trace complete
tc-end: 1682130361.979255, mountfuzz8, failed, Test program received signal 11 (core dumped)
tp-end: 1682130361.982031, fs/puffs/t_fuzz
info: time.end, Sat Apr 22 02:26:02 UTC 2023


Example 2 (i386 system):
tps-count: 1
tp-start: 1682281740.804085, fs/puffs/t_fuzz, 1
tc-start: 1682281740.804158, mountfuzz8
tc-so:test seeded RNG with 1682281740
tc-so:[   1.0000000] entropy: ready
tc-so:[   1.1400050] putter_fop_ioctl: putter 0 not inited
tc-so:[   1.1400050] putter_fop_read: private 0 not inited
...
tc-so:[   1.1400050] putter_fop_ioctl: putter 0 not inited
tc-so:[   1.1400050] putter_fop_read: private 0 not inite
tc-se:Test program crashed; attempting to get stack trace
tc-se:[New process 16178]
tc-se:[New process 28523]
tc-se:[New process 28498]
...
tc-se:[New process 16350]
tc-se:Core was generated by `t_fuzz'.
tc-se:Program terminated with signal SIGSEGV, Segmentation fault.
tc-se:#0  0xbb385c10 in rumpuser_mutex_spin_p () from /usr/lib/librumpuser.so.0
tc-se:[Current thread is 1 (process 16178)]
tc-se:#0  0xbb385c10 in rumpuser_mutex_spin_p () from /usr/lib/librumpuser.so.0
tc-se:#1  0xbb2159bd in rumpns_mutex_enter () from /usr/lib/librump.so.0
tc-se:#2  0xbb469299 in rumpns_puffs_msgif_waitcount () from /usr/lib/librumpfs_syspuffs.so.0
tc-se:#3  0xbb47a5a4 in ?? () from /usr/lib/librumpdev_putter.so.0
tc-se:#4  0xbb1b0acd in ?? () from /usr/lib/librump.so.0
tc-se:#5  0xbb1b1586 in rumpns_pollcommon () from /usr/lib/librump.so.0
tc-se:#6  0xbb1b16ec in rumpns_sys_poll () from /usr/lib/librump.so.0
tc-se:#7  0xbb21e47f in rump_syscall () from /usr/lib/librump.so.0
tc-se:#8  0xbb212363 in rump___sysimpl_poll () from /usr/lib/librump.so.0
tc-se:#9  0x005dd4fa in respondthread ()
tc-se:#10 0xbb139eab in ?? () from /usr/lib/libpthread.so.1
tc-se:#11 0xbaf972b0 in __mknod50 () from /usr/lib/libc.so.12
tc-se:#12 0xb8766a00 in ?? ()
tc-se:#13 0xbb415000 in ?? () from /usr/lib/librumpvfs.so.0
tc-se:Stack trace complete
tc-end: 1682281741.642125, mountfuzz8, failed, Test program received signal 11 (core dumped)
tp-end: 1682281741.644855, fs/puffs/t_fuzz
info: time.end, Sun Apr 23 20:29:01 UTC 2023


Example 3 (aarch64):

tps-count: 1
tp-start: 1682142241.88128, fs/puffs/t_fuzz, 1
tc-start: 1682142241.88273, mountfuzz8
tc-so:test seeded RNG with 1682142241
tc-so:[   1.0000000] entropy: ready
tc-so:[   1.2100050] putter_fop_ioctl: putter 0 not inited
tc-so:[   1.2100050] putter_fop_read: private 0 not inited
...
tc-so:[   1.2700050] putter_fop_ioctl: putter 0 not inited
tc-so:[   1.2700050] panic: kernel diagnostic assertion "pmp->pmp_status == PUFFSTAT_MOUNTING" failed: file "/usr/src/sys/rump/fs/lib/libsyspuffs/../../../../fs/puffs/puffs_vfsops.c", line 330
tc-so:[   1.2700050] rump kernel halting...
tc-so:halted
tc-se:Test program crashed; attempting to get stack trace
tc-se:[New process 21855]
tc-se:[New process 24938]
...
tc-se:Core was generated by `t_fuzz'.
tc-se:Program terminated with signal SIGABRT, Aborted.
tc-se:#0  0x0000f21029f0e14c in _lwp_kill () from /usr/lib/libc.so.12
tc-se:[Current thread is 1 (process 21855)]
tc-se:#0  0x0000f21029f0e14c in _lwp_kill () from /usr/lib/libc.so.12
tc-se:#1  0x0000f21029f0e40c in abort () from /usr/lib/libc.so.12
tc-se:#2  0x0000f2102a218314 in rumpuser_exit () from /usr/lib/librumpuser.so.0
tc-se:#3  0x0000f2102a33ff14 in rumpns_cpu_reboot () from /usr/lib/librump.so.0
tc-se:#4  0x0000f2102a2ea2a0 in rumpns_kern_reboot () from /usr/lib/librump.so.0
tc-se:#5  0x0000f2102a2e91b4 in rumpns_vpanic () from /usr/lib/librump.so.0
tc-se:#6  0x0000f2102a2c8f54 in rumpns_kern_assert () from /usr/lib/librump.so.0
tc-se:#7  0x0000f2102a5c8fa0 in rumpns_puffs_vfsop_start () from /usr/lib/librumpfs_syspuffs.so.0
tc-se:#8  0x0000f2102a4090c4 in rumpns_VFS_START () from /usr/lib/librumpvfs.so.0
tc-se:#9  0x0000f2102a40c134 in rumpns_mount_domount () from /usr/lib/librumpvfs.so.0
tc-se:#10 0x0000f2102a402034 in rumpns_do_sys_mount () from /usr/lib/librumpvfs.so.0
tc-se:#11 0x0000f2102a40253c in rumpns_sys___mount50 () from /usr/lib/librumpvfs.so.0
tc-se:#12 0x0000f2102a340f44 in rump_syscall () from /usr/lib/librump.so.0
tc-se:#13 0x0000f2102a3368dc in rump___sysimpl_mount50 () from /usr/lib/librump.so.0
tc-se:#14 0x000000000efe22b0 in testbody ()
tc-se:#15 0x0000f2102a1ba8b8 in atf_tc_run () from /usr/lib/libatf-c.so.0
tc-se:#16 0x0000f2102a1b6cc8 in atf_tp_main () from /usr/lib/libatf-c.so.0
tc-se:#17 0x000000000efe1af8 in ___start ()
tc-se:#18 0x0000fffffb180b10 in _rtld_start () from /usr/libexec/ld.elf_so
tc-se:Backtrace stopped: previous frame identical to this frame (corrupt stack?)
tc-se:Stack trace complete
tc-end: 1682142243.141752, mountfuzz8, failed, Test program received signal 6 (core dumped)
tp-end: 1682142243.154181, fs/puffs/t_fuzz
info: time.end, Sat Apr 22 05:44:03 UTC 2023


Example 4 (aarch64):
tps-count: 1
tp-start: 1682183460.907036, fs/puffs/t_fuzz, 1
tc-start: 1682183460.907189, mountfuzz8
tc-so:test seeded RNG with 1682183460
tc-so:[   1.0000000] entropy: ready
tc-so:[   1.1500050] putter_fop_ioctl: putter 0 not inited
tc-so:[   1.1500050] putter_fop_read: private 0 not inited
tc-so:[   1.1500050] putter_fop_ioctl: putter 0 not inited
tc-so:[   1.1500050] putter_fop_read: private 0 not inited
...
tc-so:[   1.2700050] putter_fop_read: private 0 not inited
tc-so:[   1.2700050] putter_fop_ioctl: putter 0 not inited
tc-so:[   1
tc-se:Test program crashed; attempting to get stack trace
tc-se:[New process 11979]
tc-se:[New process 10027]
tc-se:[New process 5696]
...
tc-se:[New process 15031]
tc-se:[New process 2646]
tc-se:Core was generated by `t_fuzz'.
tc-se:Program terminated with signal SIGSEGV, Segmentation fault.
tc-se:#0  0x0000f254ee308794 in rumpuser_mutex_spin_p () from /usr/lib/librumpuser.so.0
tc-se:[Current thread is 1 (process 11979)]
tc-se:#0  0x0000f254ee308794 in rumpuser_mutex_spin_p () from /usr/lib/librumpuser.so.0
tc-se:#1  0x0000f254ee427ed8 in rumpns_mutex_enter () from /usr/lib/librump.so.0
tc-se:#2  0x0000f254ee6c1154 in rumpns_puffs_msgif_waitcount () from /usr/lib/librumpfs_syspuffs.so.0
tc-se:#3  0x0000f254ee701e54 in ?? () from /usr/lib/librumpdev_putter.so.0
tc-se:#4  0x0000f254ee3cccbc in ?? () from /usr/lib/librump.so.0
tc-se:#5  0x0000f254ee3cd608 in rumpns_pollcommon () from /usr/lib/librump.so.0
tc-se:#6  0x0000f254ee3cd71c in rumpns_sys_poll () from /usr/lib/librump.so.0
tc-se:#7  0x0000f254ee430f44 in rump_syscall () from /usr/lib/librump.so.0
tc-se:#8  0x0000f254ee424d40 in rump___sysimpl_poll () from /usr/lib/librump.so.0
tc-se:#9  0x0000000008cf1cac in respondthread ()
tc-se:#10 0x0000f254ee2ddc14 in ?? () from /usr/lib/libpthread.so.1
tc-se:#11 0x0000f254edf13b5c in __mknod50 () from /usr/lib/libc.so.12
tc-se:Backtrace stopped: previous frame identical to this frame (corrupt stack?)
tc-se:Stack trace complete
tc-end: 1682183463.425932, mountfuzz8, failed, Test program received signal 11 (core dumped)
tp-end: 1682183463.438386, fs/puffs/t_fuzz
info: time.end, Sat Apr 22 17:11:03 UTC 2023


Test result observations of potential interest:

- failures on several architectures
- low frequency of failures
- core dumps; signal 6 and signal 11
- all failures on last test (8 of 8) mount


>How-To-Repeat:
cd /usr/tests
atf-run fs/puffs/t_fuzz:mountfuzz8
>Fix:
Not fixes per se, just documentation change suggestions based on trying to read and comprehend the test goal(s)

$ diff -u t_fuzz.c.df  t_fuzz.c
--- t_fuzz.c.df 2021-09-16 22:19:11.000000000 +0000
+++ t_fuzz.c    2023-04-30 20:02:19.550000000 +0000
@@ -28,12 +28,14 @@

 /*
  * Fuzztest puffs mount.  There are n different levels of testing:
- * each one pours more and more sane garbage into the args to that
+ * each one pours more and more sane garbage into the args so that
  * the mount progresses further and further.  Level 8 (at least when
  * writing this comment) should be the one where mounting actually
  * succeeds.
  *
  * Our metric of success is crash / no crash.
+ * Test failures may include core dumps and/or stack traces.
+ * Failures might occur with low frequency.
  */

 #include <sys/types.h>


$diff -u  h_macros.h.df h_macros.h   
--- h_macros.h.df       2016-08-20 15:49:08.000000000 +0000
+++ h_macros.h  2023-04-30 20:10:49.100000000 +0000
@@ -70,6 +70,9 @@
        atf_tc_fail("%s", buf);
 }

+/*
+ * fill a buffer with fuzz/random data to use with file system mounts
+ */
 static __inline void
 tests_makegarbage(void *space, size_t len)
 {


Previous PRs (open, not directly related):

kern/43800 [serious/low]: PUFFS kernel crash
Release:        NetBSD-5.0.2/i386
Arrival-Date:   Thu Aug 26 04:55:00 +0000 2010

kern/46722 [critical/high]: rump_nfs dies and crashes the kernel
Release:        NetBSD 6.0_BETA2
Arrival-Date:   Thu Jul 19 11:25:00 +0000 2012

kern/49396 [critical/high]: reproducable panic on puffs
Release:        NetBSD 7.99.1
Arrival-Date:   Sun Nov 16 13:10:00 +0000 2014

Other PRs referenced or found possibly related:
PR kern/32535 closed
PR kern/33374 closed
PR kern/38188 closed
PR kern/41840 closed
PR kern/42210 closed
PR kern/51762 closed
PR kern/54759 closed
PR kern/54889 closed


>Audit-Trail:
From: Jim Spath <jspath55@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57385: Automated test case for puffs file system fails
 intermittently on different architectures
Date: Sun, 30 Apr 2023 17:47:03 -0400

 Test case failure counts for 3 systems, all 10.0_BETA:

 system runs passes fails percent fail
 evbarm 1516 1480 36 2.4%
 i386 1624 1622 2 0.1%
 aarch64 1705 1671 34 2.0%

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.