NetBSD Problem Report #57703

From paul@whooppee.com  Sat Nov 18 06:19:53 2023
Return-Path: <paul@whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id F235B1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 18 Nov 2023 06:19:52 +0000 (UTC)
Message-Id: <20231118061950.6E66E999C5@speedy.whooppee.com>
Date: Fri, 17 Nov 2023 22:19:50 -0800 (PST)
From: paul@whoooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: kernel panic in eventfd_fop_close()
X-Send-Pr-Version: 3.95

>Number:         57703
>Category:       kern
>Synopsis:       kernel panic in eventfd_fop_close()
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    thorpej
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Nov 18 06:20:00 +0000 2023
>Closed-Date:    Sun Nov 26 13:16:06 +0000 2023
>Last-Modified:  Sun Nov 26 13:16:06 +0000 2023
>Originator:     Paul Goyette
>Release:        NetBSD 10.99.8
>Organization:
+---------------------+--------------------------+----------------------+
| Paul Goyette (.sig) | PGP Key fingerprint:     | E-mail addresses:    |
| (Retired)           | 1B11 1849 721C 56C8 F63A | paul@whooppee.com    |
| Software Developer  | 6E2E 05FD 15CE 9F2D 5102 | pgoyette@netbsd.org  |
| & Network Engineer  |                          | pgoyette99@gmail.com |
+---------------------+--------------------------+----------------------+
>Environment:


System: NetBSD speedy.whooppee.com 10.99.8 NetBSD 10.99.8 (SPEEDY 2023-09-07 23:08:48 UTC) #0: Sat Sep 9 03:34:26 UTC 2023 paul@speedy.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/SPEEDY amd64
Architecture: x86_64
Machine: amd64
>Description:
firefox got confused loading a particular website, so I attempted to
bounce the network.  did not fix the problem so I started a shutdown.
It paniced rather than normal shutdown.

I got a 20gb crash file!

[ 4235856.8095913] wm0: link state DOWN (was UP)
[ 4235860.4814167] wm0: link state UP (was DOWN)
[ 4236043.6425062] pid 27107 (pulseaudio), uid 1000: exited on signal 11 (core n
ot dumped, err = 78)
[ 4236044.0427050] panic: kernel diagnostic assertion "efd->efd_has_read_waiters
 == false" failed: file "/build/netbsd-local/src_ro/sys/kern/sys_eventfd.c", lin
e 120
[ 4236044.0427050] cpu1: Begin traceback...
[ 4236044.0427050] vpanic() at netbsd:vpanic+0x173
[ 4236044.0427050] kern_assert() at netbsd:kern_assert+0x4b
[ 4236044.0927296] eventfd_fop_close() at netbsd:eventfd_fop_close+0xe7
[ 4236044.0927296] closef() at netbsd:closef+0xa3
[ 4236044.0927296] fd_free() at netbsd:fd_free+0x1ee
[ 4236044.0927296] exit1() at netbsd:exit1+0x126
[ 4236044.0927296] sigexit() at netbsd:sigexit+0x2e7
[ 4236044.0927296] postsig() at netbsd:postsig+0x35b
[ 4236044.1027348] lwp_userret() at netbsd:lwp_userret+0x20c
[ 4236044.1027348] mi_userret() at netbsd:mi_userret+-0x979b4
[ 4236044.1027348] trap() at netbsd:trap+0x12b
[ 4236044.1027348] --- trap (number 6) ---
[ 4236044.1027348] 70223343862e:
[ 4236044.1027348] cpu1: End traceback...
[ 4236044.1027348] fatal breakpoint trap in supervisor mode
[ 4236044.1027348] trap type 1 code 0 rip 0xffffffff80232835 cs 0x8 rflags 0x202
 cr2 0x70222d61f010 ilevel 0 rsp 0xffffd92125fa0aa0
[ 4236044.1027348] curlwp 0xffffb0e3e5267b40 pid 27107.11854 lowest kstack 0xfff
fd92125f9c2c0
Stopped in pid 27107.11854 (pulseaudio) at      netbsd:breakpoint+0x5:  leave
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x173
kern_assert() at netbsd:kern_assert+0x4b
eventfd_fop_close() at netbsd:eventfd_fop_close+0xe7
closef() at netbsd:closef+0xa3
fd_free() at netbsd:fd_free+0x1ee
exit1() at netbsd:exit1+0x126
sigexit() at netbsd:sigexit+0x2e7
postsig() at netbsd:postsig+0x35b
lwp_userret() at netbsd:lwp_userret+0x20c
mi_userret() at netbsd:mi_userret+-0x979b4
trap() at netbsd:trap+0x12b
[ 4236044.1027348] --- trap (number 6) ---
70223343862e[ 4236044.1027348] :
ds          8
es          1
fs          180
gs          a50
rdi         0
rsi         1
rbp         ffffd92125fa0aa0
rbx         ffffb0dcbff90780
rdx         1
rcx         ffffffffffffff
rax         800000000000000
r8          0
r9          0
r10         0
r11         0
r12         ffffffff80ac8010    ostype+0x8f8
r13         ffffd92125fa0ae8
r14         104
r15         18
rip         ffffffff80232835    breakpoint+0x5
cs          8
rflags      202
rsp         ffffd92125fa0aa0
ss          0
netbsd:breakpoint+0x5:  leave

[ 4236044.1027348] dumping to dev 168,3 (offset=8, size=33531543):
[ 4236044.1027348] dump


>How-To-Repeat:
see above

>Fix:

please do

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->thorpej
Responsible-Changed-By: thorpej@NetBSD.org
Responsible-Changed-When: Sat, 18 Nov 2023 12:51:25 +0000
Responsible-Changed-Why:
Take.


From: Taylor R Campbell <riastradh@NetBSD.org>
To: paul@whooppee.com
Cc: gnats-bugs@NetBSD.org, thorpej@NetBSD.org
Subject: Re: kern/57703: kernel panic in eventfd_fop_close()
Date: Sat, 18 Nov 2023 20:45:43 +0000

 It looks like the problem is that the cv_wait_sig in eventfd_wait was
 interrupted by a signal, and nothing cleared efd_has_read_waiters.

 The only things that do clear efd_has_read_waiters are eventfd_wake
 and eventfd_fop_restart, but:

 1. eventfd_wake is only used when a read or write is in progress and a
    concurrent matching write or read is triggered -- not happening
    here, as far as I can tell; the pulseaudio process is just waiting
    and was woken by a signal.

 2. eventfd_fop_close is only used when a read or write is in progress
    and a concurrent close is triggered -- also not happening here, as
    far as I can tell; the read or write was woken by a signal, and
    _then_ (sequentially, not concurrently) the file was closed during
    process teardown.

 I think we should just nix efd_has_read/write_waiters, because
 cv_broadcast already has this micro-optimization built-in, and the
 assertion would need more bookkeeping to make it actually work.  We
 can just keep the efd_nwaiters == 0 assertion and it'll be fine.

From: Taylor R Campbell <riastradh@NetBSD.org>
To: thorpej@NetBSD.org
Cc: paul@whooppee.com, gnats-bugs@NetBSD.org
Subject: Re: kern/57703: kernel panic in eventfd_fop_close()
Date: Sat, 18 Nov 2023 21:14:10 +0000

 I wrote an atf test case to reproduce this bug, which has now prompted
 me to rewrite these tests with rump so I don't crash my development
 machine while drafting the fix!

From: Taylor R Campbell <riastradh@NetBSD.org>
To: thorpej@NetBSD.org
Cc: paul@whooppee.com, gnats-bugs@NetBSD.org
Subject: Re: kern/57703: kernel panic in eventfd_fop_close()
Date: Sat, 18 Nov 2023 22:56:58 +0000

 This is a multi-part message in MIME format.
 --=_ml7q+P8YFLTA7qSAPcDHHBHCSIXmg0yj

 Here's the test that crashed my laptop -- working on converting this
 to use rump instead.

 --=_ml7q+P8YFLTA7qSAPcDHHBHCSIXmg0yj
 Content-Type: text/plain; charset="ISO-8859-1"; name="pr57703-eventfdsignaltestwip"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="pr57703-eventfdsignaltestwip.patch"

 From f71c52d125f8376c3e99f1fda37f1f466c97e199 Mon Sep 17 00:00:00 2001
 From: Taylor R Campbell <riastradh@NetBSD.org>
 Date: Sat, 18 Nov 2023 21:16:13 +0000
 Subject: [PATCH] WIP: add test for interrupting eventfd read/write by signal

 PR kern/57703
 ---
  tests/lib/libc/sys/t_eventfd.c | 71 +++++++++++++++++++++++++++++++++-
  1 file changed, 69 insertions(+), 2 deletions(-)

 diff --git a/tests/lib/libc/sys/t_eventfd.c b/tests/lib/libc/sys/t_eventfd.c
 index 4eaea2aff3ba..4dc2fe3e4baf 100644
 --- a/tests/lib/libc/sys/t_eventfd.c
 +++ b/tests/lib/libc/sys/t_eventfd.c
 @@ -31,23 +31,27 @@ __COPYRIGHT("@(#) Copyright (c) 2020\
   The NetBSD Foundation, inc. All rights reserved.");
  __RCSID("$NetBSD: t_eventfd.c,v 1.3 2022/02/20 15:21:14 thorpej Exp $");
 =20
 -#include <sys/types.h>
  #include <sys/event.h>
  #include <sys/eventfd.h>
  #include <sys/ioctl.h>
  #include <sys/select.h>
  #include <sys/stat.h>
  #include <sys/syscall.h>
 +#include <sys/types.h>
 +
  #include <errno.h>
  #include <poll.h>
  #include <pthread.h>
 -#include <stdlib.h>
 +#include <signal.h>
  #include <stdio.h>
 +#include <stdlib.h>
  #include <time.h>
  #include <unistd.h>
 =20
  #include <atf-c.h>
 =20
 +#include "h_macros.h"
 +
  struct helper_context {
  	int	efd;
 =20
 @@ -815,6 +819,67 @@ ATF_TC_BODY(eventfd_fcntl, tc)
 =20
  /*************************************************************************=
 ****/
 =20
 +static pthread_key_t eventfd_signal_key;
 +
 +static void
 +eventfd_read_signal_handler(int signo)
 +{
 +	volatile sig_atomic_t *const flag =3D
 +	    pthread_getspecific(eventfd_signal_key);
 +
 +	*flag =3D 1;
 +}
 +
 +static void *
 +eventfd_read_signal_helper(void * const v)
 +{
 +	struct helper_context * const ctx =3D v;
 +	eventfd_t efd_value;
 +	sig_atomic_t flag =3D 0;
 +	int error;
 +
 +	RZ(pthread_setspecific(eventfd_signal_key, &flag));
 +	if (signal(SIGUSR1, &eventfd_read_signal_handler) =3D=3D SIG_ERR)
 +		atf_tc_fail("signal(SIGUSR1): %s", strerror(errno));
 +
 +	ATF_REQUIRE(wait_barrier(ctx));
 +	ATF_REQUIRE(eventfd_read(ctx->efd, &efd_value) =3D=3D -1);
 +	error =3D errno;
 +	ATF_REQUIRE_MSG(error =3D=3D EINTR, "errno=3D%d (%s)", error,
 +	    strerror(error));
 +	ATF_REQUIRE_MSG(flag, "signal not delivered");
 +
 +	return NULL;
 +}
 +
 +ATF_TC(eventfd_read_signal);
 +ATF_TC_HEAD(eventfd_read_signal, tc)
 +{
 +	atf_tc_set_md_var(tc, "descr",
 +	    "Tests eventfd reads can be interrupted by signal");
 +}
 +ATF_TC_BODY(eventfd_read_signal, tc)
 +{
 +	struct helper_context ctx;
 +	pthread_t helper;
 +
 +	RZ(pthread_key_create(&eventfd_signal_key, NULL));
 +
 +	init_helper_context(&ctx);
 +
 +	RL(ctx.efd =3D eventfd(0, 0));
 +	RZ(pthread_create(&helper, NULL, &eventfd_read_signal_helper, &ctx));
 +
 +	ATF_REQUIRE(wait_barrier(&ctx)); /* wait for helper to start */
 +	(void)sleep(1);		/* wait for the read to block */
 +	(void)alarm(1);		/* set a deadline */
 +	RZ(pthread_kill(helper, SIGUSR1)); /* wake helper */
 +	RZ(pthread_join(helper, NULL)); /* wait for helper to wake and fail */
 +	(void)alarm(0);		/* clear deadline */
 +}
 +
 +/*************************************************************************=
 ****/
 +
  ATF_TP_ADD_TCS(tp)
  {
  	ATF_TP_ADD_TC(tp, eventfd_normal);
 @@ -825,6 +890,8 @@ ATF_TP_ADD_TCS(tp)
  	ATF_TP_ADD_TC(tp, eventfd_select_poll_kevent_block);
  	ATF_TP_ADD_TC(tp, eventfd_restart);
  	ATF_TP_ADD_TC(tp, eventfd_fcntl);
 +	ATF_TP_ADD_TC(tp, eventfd_read_signal);
 +//	ATF_TP_ADD_TC(tp, eventfd_write_signal);
 =20
  	return atf_no_error();
  }

 --=_ml7q+P8YFLTA7qSAPcDHHBHCSIXmg0yj--

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57703 CVS commit: src/sys/kern
Date: Sun, 19 Nov 2023 04:13:38 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Nov 19 04:13:38 UTC 2023

 Modified Files:
 	src/sys/kern: sys_eventfd.c

 Log Message:
 eventfd(2): Omit needless micro-optimization causing PR kern/57703.

 Unfortunately, owing to PR kern/57705 and PR misc/57706, it isn't
 convenient to flip the xfail switch on a test for this bug.  So we'll
 do that separately.  (But I did verify that a rumpified version of
 the test postd to PR kern/57703 failed without this change, and
 passed with this change.)

 PR kern/57703

 XXX pullup-10


 To generate a diff of this commit:
 cvs rdiff -u -r1.9 -r1.10 src/sys/kern/sys_eventfd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 19 Nov 2023 13:13:47 +0000
State-Changed-Why:
fix committed (automatic tests aside), needs pullup to 10 now


State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 19 Nov 2023 13:20:16 +0000
State-Changed-Why:
pullup-10 #468 https://releng.netbsd.org/cgi-bin/req-10.cgi?show=468
not needed <10 because eventfd is new in 10


From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57703 CVS commit: src/sys/kern
Date: Sun, 19 Nov 2023 17:16:01 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Nov 19 17:16:00 UTC 2023

 Modified Files:
 	src/sys/kern: sys_eventfd.c

 Log Message:
 eventfd(2): Prune dead branch.

 Fallout from PR kern/57703 fix.

 XXX pullup-10


 To generate a diff of this commit:
 cvs rdiff -u -r1.10 -r1.11 src/sys/kern/sys_eventfd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Manuel Bouyer" <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57703 CVS commit: [netbsd-10] src/sys/kern
Date: Sun, 26 Nov 2023 12:33:19 +0000

 Module Name:	src
 Committed By:	bouyer
 Date:		Sun Nov 26 12:33:19 UTC 2023

 Modified Files:
 	src/sys/kern [netbsd-10]: sys_eventfd.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #468):
 	sys/kern/sys_eventfd.c: revision 1.10
 eventfd(2): Omit needless micro-optimization causing PR kern/57703.
 Unfortunately, owing to PR kern/57705 and PR misc/57706, it isn't
 convenient to flip the xfail switch on a test for this bug.  So we'll
 do that separately.  (But I did verify that a rumpified version of
 the test postd to PR kern/57703 failed without this change, and
 passed with this change.)
 PR kern/57703
 XXX pullup-10


 To generate a diff of this commit:
 cvs rdiff -u -r1.9 -r1.9.4.1 src/sys/kern/sys_eventfd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Sun, 26 Nov 2023 13:16:06 +0000
State-Changed-Why:
Fix pulled up to netbsd-10 branch.  Fix already confirmed with test case.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.