NetBSD Problem Report #55004

From gson@gson.org  Sat Feb 22 16:12:51 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 41D7B1A9213
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 22 Feb 2020 16:12:51 +0000 (UTC)
Message-Id: <20200222161245.8DF7B253FA3@guava.gson.org>
Date: Sat, 22 Feb 2020 18:12:45 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Hundreds of file system tests now fail on real hardware
X-Send-Pr-Version: 3.95

>Number:         55004
>Category:       kern
>Synopsis:       Hundreds of file system tests now fail on real hardware
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    jdolecek
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 22 16:15:00 +0000 2020
>Closed-Date:    Wed Feb 26 15:29:23 +0000 2020
>Last-Modified:  Wed Feb 26 15:29:23 +0000 2020
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2020.02.21.02.04.40
>Organization:

>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

My amd64 testbed bare metal is showing hundreds of new test failures
in file system tests, such as:

  http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.02.21.02.04.40/test.html#fs_vfs_t_full_ext2fs_fillfs

The qemu-based TNF i386 testbed does not appear to be affected.

The problem started during a recent period when the system did not
even install, so it's not easily auto-bisected.  During this period,
the following commits were made:

  commit 2020.02.20.15.48.38 riastradh src/sys/kern/vfs_bio.c 1.288
  commit 2020.02.20.15.48.52 riastradh src/lib/libp2k/p2k.c 1.72
  commit 2020.02.20.18.24.20 pgoyette src/bin/sh/sh.1 1.224
  commit 2020.02.20.19.59.12 christos src/external/historical/nawk/dist/lex.c 1.7
  commit 2020.02.20.19.59.12 christos src/external/historical/nawk/dist/lib.c 1.11
  commit 2020.02.20.19.59.12 christos src/external/historical/nawk/dist/main.c 1.11
  commit 2020.02.20.19.59.12 christos src/external/historical/nawk/dist/proto.h 1.11
  commit 2020.02.20.19.59.12 christos src/external/historical/nawk/dist/run.c 1.12
  commit 2020.02.20.21.14.23 jdolecek src/sys/kern/subr_autoconf.c 1.266
  commit 2020.02.20.22.38.54 kamil src/tests/lib/libc/sys/t_ptrace_wait.c 1.164
  commit 2020.02.20.22.52.10 joerg src/sys/rump/Makefile.rump 1.125
  commit 2020.02.20.22.52.10 joerg src/sys/rump/librump/rumpkern/kobj_rename.c 1.3
  commit 2020.02.20.23.57.16 kamil src/tests/lib/libc/sys/t_ptrace_x86_wait.h 1.24
  commit 2020.02.21.00.26.21 joerg src/external/bsd/libevent/dist/test/regress_http.c 1.6
  commit 2020.02.21.00.26.21 joerg src/external/bsd/libevent/dist/test/regress_ssl.c 1.4
  commit 2020.02.21.00.26.22 joerg src/external/bsd/wpa/dist/src/radius/radius_client.c 1.2
  commit 2020.02.21.00.26.22 joerg src/external/cddl/osnet/dist/tools/ctf/cvt/iidesc.c 1.4
  commit 2020.02.21.00.26.22 joerg src/sys/arch/x86/x86/spectre.c 1.34
  commit 2020.02.21.00.26.22 joerg src/sys/arch/x86/x86/tsc.c 1.38
  commit 2020.02.21.00.26.22 joerg src/sys/dev/clockctl.c 1.38
  commit 2020.02.21.00.26.22 joerg src/sys/dev/nvmm/x86/nvmm_x86_svm.c 1.56
  commit 2020.02.21.00.26.22 joerg src/sys/dev/nvmm/x86/nvmm_x86_vmx.c 1.49
  commit 2020.02.21.00.26.22 joerg src/sys/dist/pf/net/pf_ioctl.c 1.57
  commit 2020.02.21.00.26.22 joerg src/sys/external/bsd/ipf/netinet/ip_fil_netbsd.c 1.34
  commit 2020.02.21.00.26.22 joerg src/sys/kern/kern_ktrace.c 1.175
  commit 2020.02.21.00.26.22 joerg src/sys/kern/kern_proc.c 1.241
  commit 2020.02.21.00.26.22 joerg src/sys/kern/kern_resource.c 1.186
  commit 2020.02.21.00.26.22 joerg src/sys/kern/kern_veriexec.c 1.23
  commit 2020.02.21.00.26.22 joerg src/sys/kern/sys_pset.c 1.23
  commit 2020.02.21.00.26.22 joerg src/sys/kern/sysv_ipc.c 1.41
  commit 2020.02.21.00.26.22 joerg src/sys/kern/uipc_socket.c 1.287
  commit 2020.02.21.00.26.22 joerg src/sys/kern/vfs_init.c 1.50
  commit 2020.02.21.00.26.23 joerg src/sys/net/if.c 1.473
  commit 2020.02.21.00.26.23 joerg src/sys/netsmb/smb_conn.c 1.31
  commit 2020.02.21.00.26.23 joerg src/sys/secmodel/extensions/secmodel_extensions.c 1.11
  commit 2020.02.21.00.26.23 joerg src/sys/secmodel/keylock/secmodel_keylock.c 1.10
  commit 2020.02.21.00.26.23 joerg src/sys/secmodel/securelevel/secmodel_securelevel.c 1.33
  commit 2020.02.21.00.26.23 joerg src/sys/secmodel/suser/secmodel_suser.c 1.51
  commit 2020.02.21.02.04.40 riastradh src/sys/kern/vfs_bio.c 1.289

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sat, 22 Feb 2020 19:46:55 +0000
State-Changed-Why:
Assuming fixed: http://mail-index.netbsd.org/current-users/2020/02/21/msg037802.html
With vfs_bio.c 1.289


From: Andreas Gustafsson <gson@gson.org>
To: maya@NetBSD.org
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55004 (Hundreds of file system tests now fail on real hardware)
Date: Sat, 22 Feb 2020 22:27:10 +0200

 maya@NetBSD.org wrote:
 > Assuming fixed: http://mail-index.netbsd.org/current-users/2020/02/21/msg037802.html
 > With vfs_bio.c 1.289

 Your assumption is incorrect.  I said the problem started during the
 period ending which the commit of vfs_bio.c 1.289, and that means it
 was failing *after* that commit:

   http://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2020.02.html#2020.02.21.02.04.40

 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: feedback->open
State-Changed-By: gson@NetBSD.org
State-Changed-When: Sat, 22 Feb 2020 20:37:36 +0000
State-Changed-Why:
Still broken.


Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Sat, 22 Feb 2020 20:58:56 +0000
Responsible-Changed-Why:
Seems this started with my config_mountroot() mutex changes, I'll fix this.


From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55004 CVS commit: src/sys/rump/librump/rumpkern
Date: Sat, 22 Feb 2020 21:45:35 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Sat Feb 22 21:45:35 UTC 2020

 Modified Files:
 	src/sys/rump/librump/rumpkern: rump.c

 Log Message:
 rump_init(): need to call config_init() now.

 PR kern/55004 (Hundreds of file system tests now fail on real hardware)


 To generate a diff of this commit:
 cvs rdiff -u -r1.341 -r1.342 src/sys/rump/librump/rumpkern/rump.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: jdolecek@netbsd.org, kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
	gnats-admin@netbsd.org, Andreas Gustafsson <gson@gson.org>,
	chs@netbsd.org
Subject: Re: kern/55004 (Hundreds of file system tests now fail on real
 hardware)
Date: Sat, 22 Feb 2020 22:11:44 +0000

 I see still more breakage probably due to the recent removal of aiodoned
 intersecting with LFS brain damage.

 fs_cleanerd[27468]: /mnt: attaching cleaner
 [   1.1700090] panic: kernel diagnostic assertion "giantcnt == 1" failed: file "klock.c", line 127
 [   1.1700090] rump kernel halting...
 halted

 Thread 42 "" received signal SIGABRT, Aborted.
 [Switching to LWP 42 of process 27468]
 0x000070110c58609a in _lwp_kill () from /usr/lib/libc.so.12
 (gdb) bt
 #0  0x000070110c58609a in _lwp_kill () from /usr/lib/libc.so.12
 #1  0x000070110c58643a in abort () from /usr/lib/libc.so.12
 #2  0x000070110e608713 in ?? () from /usr/lib/librumpuser.so.0
 #3  0x000070110eaf0efa in cpu_reboot (howto=4, bootstr=0x0) at emul.c:431
 #4  0x000070110ea8a8dd in kern_reboot (howto=4, bootstr=0x0) at /home/ad/src/sys/rump/librump/rumpkern/../../../kern/kern_reboot.c:61
 #5  0x000070110ea874d5 in vpanic (fmt=0x70110eb097a8 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ", ap=0x7010ffb4fd48) at /home/ad/src/sys/rump/librump/rumpkern/../../../kern/subr_prf.c:336
 #6  0x000070110ea67c63 in kern_assert (fmt=0x70110eb097a8 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at /home/ad/src/sys/rump/librump/rumpkern/../../../lib/libkern/kern_assert.c:51
 #7  0x000070110eaf2d5d in _kernel_unlock (nlocks=-1, countp=0x0) at klock.c:127
 #8  0x00007011126535cd in lfs_free_aiodone (bp=0x70111291f3c8) at /home/ad/src/sys/rump/fs/lib/liblfs/../../../../ufs/lfs/lfs_segment.c:2516
 #9  0x000070110f266737 in biodone2 (bp=0x70111291f3c8) at /home/ad/src/sys/rump/librump/rumpvfs/../../../kern/vfs_bio.c:1702
 #10 0x000070110f266616 in biodone (bp=0x70111291f3c8) at /home/ad/src/sys/rump/librump/rumpvfs/../../../kern/vfs_bio.c:1666
 #11 0x0000701112653aa9 in lfs_cluster_aiodone (bp=0x70111291f758) at /home/ad/src/sys/rump/fs/lib/liblfs/../../../../ufs/lfs/lfs_segment.c:2621
 #12 0x000070110f266737 in biodone2 (bp=0x70111291f758) at /home/ad/src/sys/rump/librump/rumpvfs/../../../kern/vfs_bio.c:1702
 #13 0x000070110f266616 in biodone (bp=0x70111291f758) at /home/ad/src/sys/rump/librump/rumpvfs/../../../kern/vfs_bio.c:1666
 #14 0x000070110f25f3aa in rump_biodone (arg=0x70111291f758, count=1536, error=0) at rump_vfs.c:521
 #15 0x000070110e60722f in ?? () from /usr/lib/librumpuser.so.0
 #16 0x000070110e607313 in ?? () from /usr/lib/librumpuser.so.0
 #17 0x000070110e20caf2 in ?? () from /usr/lib/libpthread.so.1
 #18 0x000070110c48fd10 in ?? () from /usr/lib/libc.so.12
 #19 0x0000000000000000 in ?? ()

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55004 CVS commit: src/sys/ufs/lfs
Date: Sat, 22 Feb 2020 22:20:47 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Sat Feb 22 22:20:47 UTC 2020

 Modified Files:
 	src/sys/ufs/lfs: lfs_segment.c

 Log Message:
 Make LFS/rump play nice with aiodoned removal.

 PR kern/55004 (Hundreds of file system tests now fail on real hardware)


 To generate a diff of this commit:
 cvs rdiff -u -r1.282 -r1.283 src/sys/ufs/lfs/lfs_segment.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sun, 23 Feb 2020 01:46:02 +0000
State-Changed-Why:
ad committed change to rump_init() to call config_init() early, this should
fix this problem. Can you confirm it works now?


From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55004 CVS commit: src/sys/rump/librump/rumpdev
Date: Sun, 23 Feb 2020 01:53:03 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Sun Feb 23 01:53:03 UTC 2020

 Modified Files:
 	src/sys/rump/librump/rumpdev: rump_dev.c

 Log Message:
 no need to call config_init_mi() in rumpdev any more - rump_init() now calls
 config_init(), and the sysctl shouldn't be needed

 PR kern/55004


 To generate a diff of this commit:
 cvs rdiff -u -r1.27 -r1.28 src/sys/rump/librump/rumpdev/rump_dev.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Wed, 26 Feb 2020 15:29:23 +0000
State-Changed-Why:
Confirmed fixed, thanks.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.