NetBSD Problem Report #56546
From gson@gson.org Sun Dec 12 15:21:15 2021
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 23BB81A9239
for <gnats-bugs@gnats.NetBSD.org>; Sun, 12 Dec 2021 15:21:15 +0000 (UTC)
Message-Id: <20211212152102.906B6254286@guava.gson.org>
Date: Sun, 12 Dec 2021 17:21:02 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: cgd tests fail randomly
X-Send-Pr-Version: 3.95
>Number: 56546
>Category: kern
>Synopsis: cgd tests fail randomly
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Dec 12 15:25:00 +0000 2021
>Closed-Date: Wed Dec 15 12:47:05 +0000 2021
>Last-Modified: Wed Dec 15 12:47:05 +0000 2021
>Originator: Andreas Gustafsson
>Release: NetBSD-current
>Organization:
>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:
On some testbeds, various cgd test cases are randomly failing with the
error message "panic: rumpuser fatal failure 22 (Invalid argument)".
For example:
http://releng.netbsd.org/b5reports/sparc64/2021/2021.12.10.01.18.29/test.html#dev_cgd_t_cgd_blowfish_cgd_bf_cbc_448_encblkno1
https://www.gson.org/netbsd/bugs/build/i386-laptop/2021/2021.12.11.11.13.30/test.html#dev_cgd_t_cgd_aes_cgd_aes_cbc_128_encblkno8
Based on the "0xdead" in ptm_magic, I'm guessing it's trying to lock a freed mutex:
(gdb) bt
#0 0xb996f847 in _lwp_kill () from /usr/lib/libc.so.12
#1 0xb996f7d6 in raise (s=6)
at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/libc/gen/raise.c:48
#2 0xb996fe06 in abort ()
at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/libc/stdlib/abort.c:74
#3 0xb9a395de in rumpuser_mutex_enter_nowrap (mtx=0xb94914c0)
at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librumpuser/rumpuser_pth.c:211
#4 0xb9a3967a in rumpuser_mutex_enter (mtx=0xb94914c0)
at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librumpuser/rumpuser_pth.c:196
#5 0xb9b18a85 in mutex_enter (mtx=0xb9752e90)
at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librump/../../sys/rump/librump/rumpkern/locks.c:166
#6 0xb9c49aa4 in cgd_process (wk=0xb5ca6fb0, arg=0x0)
at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/sys/rump/dev/lib/libcgd/../../../../dev/cgd.c:1574
#7 0xb9abbae8 in workqueue_runlist (list=0xb9463a94, list=0xb9463a94, wq=0xb9463a40)
at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librump/../../sys/rump/../kern/subr_workqueue.c:105
#8 workqueue_worker (cookie=<optimized out>, cookie@entry=0xb9463a40)
at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librump/../../sys/rump/../kern/subr_workqueue.c:135
#9 0xb9b1bd04 in threadbouncer (arg=0xb9752ec0)
at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librump/../../sys/rump/librump/rumpkern/threads.c:90
#10 0xb9a26e8b in pthread__create_tramp (cookie=0xb6ff4000)
at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/libpthread/pthread.c:561
#11 0xb9863b10 in __mknod50 () from /usr/lib/libc.so.12
(gdb) frame 4
#4 0xb9a3967a in rumpuser_mutex_enter (mtx=0xb94914c0)
at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librumpuser/rumpuser_pth.c:196
196 in /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librumpuser/rumpuser_pth.c
(gdb) print /x *mtx
$4 = {pthmtx = {ptm_magic = 0xdead0003, ptm_errorcheck = 0x1, ptm_pad1 = {0x0,
0x0, 0x0}, {ptm_ceiling = 0x0, ptm_unused = 0x0}, ptm_pad2 = {0x0, 0x0,
0x0}, ptm_owner = 0x0, ptm_waiters = 0x0, ptm_recursed = 0x0,
ptm_spare2 = 0x0}, owner = 0x0, flags = 0x3}
>How-To-Repeat:
cd /usr/tests/dev/cgd/
sysctl -w kern.defcorename="/tmp/%n.core"
i=0; while echo && echo $i && atf-run t_cgd_aes:cgd_aes_cbc_128_encblkno8; do i=$(expr $i + 1); done
gdb ./t_cgd_aes /tmp/t_cgd_aes.core
>Fix:
>Release-Note:
>Audit-Trail:
From: Taylor R Campbell <campbell@mumble.net>
To: gnats-bugs@NetBSD.org
Cc: gson@gson.org (Andreas Gustafsson)
Subject: Re: kern/56546: cgd tests fail randomly
Date: Mon, 13 Dec 2021 00:21:40 +0000
This is a multi-part message in MIME format.
--=_Jk88w0qeZbZBxabcXkD5GbHSqAbVNpLw
Try the attached patch?
--=_Jk88w0qeZbZBxabcXkD5GbHSqAbVNpLw
Content-Type: text/plain; charset="ISO-8859-1"; name="cgd"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="cgd.patch"
diff --git a/sys/dev/cgd.c b/sys/dev/cgd.c
index c0befcb865f6..5dec568df1ff 100644
--- a/sys/dev/cgd.c
+++ b/sys/dev/cgd.c
@@ -692,14 +692,20 @@ cgd_create_worker(void)
static void
cgd_destroy_worker(struct cgd_worker *cw)
{
+
+ /*
+ * Wait for all worker threads to complete before destroying
+ * the rest of the cgd_worker.
+ */
+ if (cw->cw_wq)
+ workqueue_destroy(cw->cw_wq);
+
mutex_destroy(&cw->cw_lock);
=20
if (cw->cw_cpool) {
pool_destroy(cw->cw_cpool);
kmem_free(cw->cw_cpool, sizeof(struct pool));
}
- if (cw->cw_wq)
- workqueue_destroy(cw->cw_wq);
=20
kmem_free(cw, sizeof(struct cgd_worker));
}
--=_Jk88w0qeZbZBxabcXkD5GbHSqAbVNpLw--
From: Andreas Gustafsson <gson@gson.org>
To: Taylor R Campbell <campbell@mumble.net>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/56546: cgd tests fail randomly
Date: Mon, 13 Dec 2021 22:56:51 +0200
Taylor R Campbell wrote:
> Try the attached patch?
Without the patch, the test failed after some 168 runs, and with the
patch, it has now run more than 16,800 times without failing. This
leads me to believe that the patch indeed fixes the bug with roughly
99% confidence. :)
--
Andreas Gustafsson, gson@gson.org
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56546 CVS commit: src/sys/dev
Date: Mon, 13 Dec 2021 21:15:26 +0000
Module Name: src
Committed By: riastradh
Date: Mon Dec 13 21:15:26 UTC 2021
Modified Files:
src/sys/dev: cgd.c
Log Message:
cgd(4): Wait for worker threads to complete before destroying mutex.
Fixes PR kern/56546 (probably!).
To generate a diff of this commit:
cvs rdiff -u -r1.140 -r1.141 src/sys/dev/cgd.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 13 Dec 2021 21:23:15 +0000
State-Changed-Why:
fixed
State-Changed-From-To: closed->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 13 Dec 2021 21:27:45 +0000
State-Changed-Why:
same code appears in netbsd-9
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56546 CVS commit: [netbsd-9] src/sys/dev
Date: Tue, 14 Dec 2021 19:05:11 +0000
Module Name: src
Committed By: martin
Date: Tue Dec 14 19:05:11 UTC 2021
Modified Files:
src/sys/dev [netbsd-9]: cgd.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #1393):
sys/dev/cgd.c: revision 1.141
cgd(4): Wait for worker threads to complete before destroying mutex.
Fixes PR kern/56546 (probably!).
To generate a diff of this commit:
cvs rdiff -u -r1.116.10.3 -r1.116.10.4 src/sys/dev/cgd.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 15 Dec 2021 12:47:05 +0000
State-Changed-Why:
fixed and pulled up to netbsd-9 (not relevant in netbsd<=8)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.