NetBSD Problem Report #56546

From gson@gson.org  Sun Dec 12 15:21:15 2021
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 23BB81A9239
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 12 Dec 2021 15:21:15 +0000 (UTC)
Message-Id: <20211212152102.906B6254286@guava.gson.org>
Date: Sun, 12 Dec 2021 17:21:02 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: cgd tests fail randomly
X-Send-Pr-Version: 3.95

>Number:         56546
>Category:       kern
>Synopsis:       cgd tests fail randomly
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Dec 12 15:25:00 +0000 2021
>Closed-Date:    Wed Dec 15 12:47:05 +0000 2021
>Last-Modified:  Wed Dec 15 12:47:05 +0000 2021
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current
>Organization:

>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:

On some testbeds, various cgd test cases are randomly failing with the
error message "panic: rumpuser fatal failure 22 (Invalid argument)".
For example:

  http://releng.netbsd.org/b5reports/sparc64/2021/2021.12.10.01.18.29/test.html#dev_cgd_t_cgd_blowfish_cgd_bf_cbc_448_encblkno1
  https://www.gson.org/netbsd/bugs/build/i386-laptop/2021/2021.12.11.11.13.30/test.html#dev_cgd_t_cgd_aes_cgd_aes_cbc_128_encblkno8

Based on the "0xdead" in ptm_magic, I'm guessing it's trying to lock a freed mutex:

  (gdb) bt
  #0  0xb996f847 in _lwp_kill () from /usr/lib/libc.so.12
  #1  0xb996f7d6 in raise (s=6)
      at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/libc/gen/raise.c:48
  #2  0xb996fe06 in abort ()
      at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/libc/stdlib/abort.c:74
  #3  0xb9a395de in rumpuser_mutex_enter_nowrap (mtx=0xb94914c0)
      at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librumpuser/rumpuser_pth.c:211
  #4  0xb9a3967a in rumpuser_mutex_enter (mtx=0xb94914c0)
      at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librumpuser/rumpuser_pth.c:196
  #5  0xb9b18a85 in mutex_enter (mtx=0xb9752e90)
      at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librump/../../sys/rump/librump/rumpkern/locks.c:166
  #6  0xb9c49aa4 in cgd_process (wk=0xb5ca6fb0, arg=0x0)
      at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/sys/rump/dev/lib/libcgd/../../../../dev/cgd.c:1574
  #7  0xb9abbae8 in workqueue_runlist (list=0xb9463a94, list=0xb9463a94, wq=0xb9463a40)
      at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librump/../../sys/rump/../kern/subr_workqueue.c:105
  #8  workqueue_worker (cookie=<optimized out>, cookie@entry=0xb9463a40)
      at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librump/../../sys/rump/../kern/subr_workqueue.c:135
  #9  0xb9b1bd04 in threadbouncer (arg=0xb9752ec0)
      at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librump/../../sys/rump/librump/rumpkern/threads.c:90
  #10 0xb9a26e8b in pthread__create_tramp (cookie=0xb6ff4000)
      at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/libpthread/pthread.c:561
  #11 0xb9863b10 in __mknod50 () from /usr/lib/libc.so.12
  (gdb) frame 4
  #4  0xb9a3967a in rumpuser_mutex_enter (mtx=0xb94914c0)
      at /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librumpuser/rumpuser_pth.c:196
  196	in /tmp/build/2021.12.12.11.18.46-i386-debug-laptop/src/lib/librumpuser/rumpuser_pth.c
  (gdb) print /x *mtx
  $4 = {pthmtx = {ptm_magic = 0xdead0003, ptm_errorcheck = 0x1, ptm_pad1 = {0x0,
        0x0, 0x0}, {ptm_ceiling = 0x0, ptm_unused = 0x0}, ptm_pad2 = {0x0, 0x0,
        0x0}, ptm_owner = 0x0, ptm_waiters = 0x0, ptm_recursed = 0x0,
      ptm_spare2 = 0x0}, owner = 0x0, flags = 0x3}

>How-To-Repeat:

cd /usr/tests/dev/cgd/
sysctl -w kern.defcorename="/tmp/%n.core"
i=0; while echo && echo $i && atf-run t_cgd_aes:cgd_aes_cbc_128_encblkno8; do i=$(expr $i + 1); done
gdb ./t_cgd_aes /tmp/t_cgd_aes.core

>Fix:

>Release-Note:

>Audit-Trail:
From: Taylor R Campbell <campbell@mumble.net>
To: gnats-bugs@NetBSD.org
Cc: gson@gson.org (Andreas Gustafsson)
Subject: Re: kern/56546: cgd tests fail randomly
Date: Mon, 13 Dec 2021 00:21:40 +0000

 This is a multi-part message in MIME format.
 --=_Jk88w0qeZbZBxabcXkD5GbHSqAbVNpLw

 Try the attached patch?

 --=_Jk88w0qeZbZBxabcXkD5GbHSqAbVNpLw
 Content-Type: text/plain; charset="ISO-8859-1"; name="cgd"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="cgd.patch"

 diff --git a/sys/dev/cgd.c b/sys/dev/cgd.c
 index c0befcb865f6..5dec568df1ff 100644
 --- a/sys/dev/cgd.c
 +++ b/sys/dev/cgd.c
 @@ -692,14 +692,20 @@ cgd_create_worker(void)
  static void
  cgd_destroy_worker(struct cgd_worker *cw)
  {
 +
 +	/*
 +	 * Wait for all worker threads to complete before destroying
 +	 * the rest of the cgd_worker.
 +	 */
 +	if (cw->cw_wq)
 +		workqueue_destroy(cw->cw_wq);
 +
  	mutex_destroy(&cw->cw_lock);
 =20
  	if (cw->cw_cpool) {
  		pool_destroy(cw->cw_cpool);
  		kmem_free(cw->cw_cpool, sizeof(struct pool));
  	}
 -	if (cw->cw_wq)
 -		workqueue_destroy(cw->cw_wq);
 =20
  	kmem_free(cw, sizeof(struct cgd_worker));
  }

 --=_Jk88w0qeZbZBxabcXkD5GbHSqAbVNpLw--

From: Andreas Gustafsson <gson@gson.org>
To: Taylor R Campbell <campbell@mumble.net>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/56546: cgd tests fail randomly
Date: Mon, 13 Dec 2021 22:56:51 +0200

 Taylor R Campbell wrote:
 > Try the attached patch?

 Without the patch, the test failed after some 168 runs, and with the
 patch, it has now run more than 16,800 times without failing.  This
 leads me to believe that the patch indeed fixes the bug with roughly
 99% confidence. :)
 -- 
 Andreas Gustafsson, gson@gson.org

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56546 CVS commit: src/sys/dev
Date: Mon, 13 Dec 2021 21:15:26 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Dec 13 21:15:26 UTC 2021

 Modified Files:
 	src/sys/dev: cgd.c

 Log Message:
 cgd(4): Wait for worker threads to complete before destroying mutex.

 Fixes PR kern/56546 (probably!).


 To generate a diff of this commit:
 cvs rdiff -u -r1.140 -r1.141 src/sys/dev/cgd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 13 Dec 2021 21:23:15 +0000
State-Changed-Why:
fixed


State-Changed-From-To: closed->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 13 Dec 2021 21:27:45 +0000
State-Changed-Why:
same code appears in netbsd-9


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56546 CVS commit: [netbsd-9] src/sys/dev
Date: Tue, 14 Dec 2021 19:05:11 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Tue Dec 14 19:05:11 UTC 2021

 Modified Files:
 	src/sys/dev [netbsd-9]: cgd.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #1393):

 	sys/dev/cgd.c: revision 1.141

 cgd(4): Wait for worker threads to complete before destroying mutex.

 Fixes PR kern/56546 (probably!).


 To generate a diff of this commit:
 cvs rdiff -u -r1.116.10.3 -r1.116.10.4 src/sys/dev/cgd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 15 Dec 2021 12:47:05 +0000
State-Changed-Why:
fixed and pulled up to netbsd-9 (not relevant in netbsd<=8)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.