NetBSD Problem Report #57703
From paul@whooppee.com Sat Nov 18 06:19:53 2023
Return-Path: <paul@whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id F235B1A9239
for <gnats-bugs@gnats.NetBSD.org>; Sat, 18 Nov 2023 06:19:52 +0000 (UTC)
Message-Id: <20231118061950.6E66E999C5@speedy.whooppee.com>
Date: Fri, 17 Nov 2023 22:19:50 -0800 (PST)
From: paul@whoooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: kernel panic in eventfd_fop_close()
X-Send-Pr-Version: 3.95
>Number: 57703
>Category: kern
>Synopsis: kernel panic in eventfd_fop_close()
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: thorpej
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Nov 18 06:20:00 +0000 2023
>Closed-Date: Sun Nov 26 13:16:06 +0000 2023
>Last-Modified: Sun Nov 26 13:16:06 +0000 2023
>Originator: Paul Goyette
>Release: NetBSD 10.99.8
>Organization:
+---------------------+--------------------------+----------------------+
| Paul Goyette (.sig) | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | 1B11 1849 721C 56C8 F63A | paul@whooppee.com |
| Software Developer | 6E2E 05FD 15CE 9F2D 5102 | pgoyette@netbsd.org |
| & Network Engineer | | pgoyette99@gmail.com |
+---------------------+--------------------------+----------------------+
>Environment:
System: NetBSD speedy.whooppee.com 10.99.8 NetBSD 10.99.8 (SPEEDY 2023-09-07 23:08:48 UTC) #0: Sat Sep 9 03:34:26 UTC 2023 paul@speedy.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/SPEEDY amd64
Architecture: x86_64
Machine: amd64
>Description:
firefox got confused loading a particular website, so I attempted to
bounce the network. did not fix the problem so I started a shutdown.
It paniced rather than normal shutdown.
I got a 20gb crash file!
[ 4235856.8095913] wm0: link state DOWN (was UP)
[ 4235860.4814167] wm0: link state UP (was DOWN)
[ 4236043.6425062] pid 27107 (pulseaudio), uid 1000: exited on signal 11 (core n
ot dumped, err = 78)
[ 4236044.0427050] panic: kernel diagnostic assertion "efd->efd_has_read_waiters
== false" failed: file "/build/netbsd-local/src_ro/sys/kern/sys_eventfd.c", lin
e 120
[ 4236044.0427050] cpu1: Begin traceback...
[ 4236044.0427050] vpanic() at netbsd:vpanic+0x173
[ 4236044.0427050] kern_assert() at netbsd:kern_assert+0x4b
[ 4236044.0927296] eventfd_fop_close() at netbsd:eventfd_fop_close+0xe7
[ 4236044.0927296] closef() at netbsd:closef+0xa3
[ 4236044.0927296] fd_free() at netbsd:fd_free+0x1ee
[ 4236044.0927296] exit1() at netbsd:exit1+0x126
[ 4236044.0927296] sigexit() at netbsd:sigexit+0x2e7
[ 4236044.0927296] postsig() at netbsd:postsig+0x35b
[ 4236044.1027348] lwp_userret() at netbsd:lwp_userret+0x20c
[ 4236044.1027348] mi_userret() at netbsd:mi_userret+-0x979b4
[ 4236044.1027348] trap() at netbsd:trap+0x12b
[ 4236044.1027348] --- trap (number 6) ---
[ 4236044.1027348] 70223343862e:
[ 4236044.1027348] cpu1: End traceback...
[ 4236044.1027348] fatal breakpoint trap in supervisor mode
[ 4236044.1027348] trap type 1 code 0 rip 0xffffffff80232835 cs 0x8 rflags 0x202
cr2 0x70222d61f010 ilevel 0 rsp 0xffffd92125fa0aa0
[ 4236044.1027348] curlwp 0xffffb0e3e5267b40 pid 27107.11854 lowest kstack 0xfff
fd92125f9c2c0
Stopped in pid 27107.11854 (pulseaudio) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x173
kern_assert() at netbsd:kern_assert+0x4b
eventfd_fop_close() at netbsd:eventfd_fop_close+0xe7
closef() at netbsd:closef+0xa3
fd_free() at netbsd:fd_free+0x1ee
exit1() at netbsd:exit1+0x126
sigexit() at netbsd:sigexit+0x2e7
postsig() at netbsd:postsig+0x35b
lwp_userret() at netbsd:lwp_userret+0x20c
mi_userret() at netbsd:mi_userret+-0x979b4
trap() at netbsd:trap+0x12b
[ 4236044.1027348] --- trap (number 6) ---
70223343862e[ 4236044.1027348] :
ds 8
es 1
fs 180
gs a50
rdi 0
rsi 1
rbp ffffd92125fa0aa0
rbx ffffb0dcbff90780
rdx 1
rcx ffffffffffffff
rax 800000000000000
r8 0
r9 0
r10 0
r11 0
r12 ffffffff80ac8010 ostype+0x8f8
r13 ffffd92125fa0ae8
r14 104
r15 18
rip ffffffff80232835 breakpoint+0x5
cs 8
rflags 202
rsp ffffd92125fa0aa0
ss 0
netbsd:breakpoint+0x5: leave
[ 4236044.1027348] dumping to dev 168,3 (offset=8, size=33531543):
[ 4236044.1027348] dump
>How-To-Repeat:
see above
>Fix:
please do
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->thorpej
Responsible-Changed-By: thorpej@NetBSD.org
Responsible-Changed-When: Sat, 18 Nov 2023 12:51:25 +0000
Responsible-Changed-Why:
Take.
From: Taylor R Campbell <riastradh@NetBSD.org>
To: paul@whooppee.com
Cc: gnats-bugs@NetBSD.org, thorpej@NetBSD.org
Subject: Re: kern/57703: kernel panic in eventfd_fop_close()
Date: Sat, 18 Nov 2023 20:45:43 +0000
It looks like the problem is that the cv_wait_sig in eventfd_wait was
interrupted by a signal, and nothing cleared efd_has_read_waiters.
The only things that do clear efd_has_read_waiters are eventfd_wake
and eventfd_fop_restart, but:
1. eventfd_wake is only used when a read or write is in progress and a
concurrent matching write or read is triggered -- not happening
here, as far as I can tell; the pulseaudio process is just waiting
and was woken by a signal.
2. eventfd_fop_close is only used when a read or write is in progress
and a concurrent close is triggered -- also not happening here, as
far as I can tell; the read or write was woken by a signal, and
_then_ (sequentially, not concurrently) the file was closed during
process teardown.
I think we should just nix efd_has_read/write_waiters, because
cv_broadcast already has this micro-optimization built-in, and the
assertion would need more bookkeeping to make it actually work. We
can just keep the efd_nwaiters == 0 assertion and it'll be fine.
From: Taylor R Campbell <riastradh@NetBSD.org>
To: thorpej@NetBSD.org
Cc: paul@whooppee.com, gnats-bugs@NetBSD.org
Subject: Re: kern/57703: kernel panic in eventfd_fop_close()
Date: Sat, 18 Nov 2023 21:14:10 +0000
I wrote an atf test case to reproduce this bug, which has now prompted
me to rewrite these tests with rump so I don't crash my development
machine while drafting the fix!
From: Taylor R Campbell <riastradh@NetBSD.org>
To: thorpej@NetBSD.org
Cc: paul@whooppee.com, gnats-bugs@NetBSD.org
Subject: Re: kern/57703: kernel panic in eventfd_fop_close()
Date: Sat, 18 Nov 2023 22:56:58 +0000
This is a multi-part message in MIME format.
--=_ml7q+P8YFLTA7qSAPcDHHBHCSIXmg0yj
Here's the test that crashed my laptop -- working on converting this
to use rump instead.
--=_ml7q+P8YFLTA7qSAPcDHHBHCSIXmg0yj
Content-Type: text/plain; charset="ISO-8859-1"; name="pr57703-eventfdsignaltestwip"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr57703-eventfdsignaltestwip.patch"
From f71c52d125f8376c3e99f1fda37f1f466c97e199 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh@NetBSD.org>
Date: Sat, 18 Nov 2023 21:16:13 +0000
Subject: [PATCH] WIP: add test for interrupting eventfd read/write by signal
PR kern/57703
---
tests/lib/libc/sys/t_eventfd.c | 71 +++++++++++++++++++++++++++++++++-
1 file changed, 69 insertions(+), 2 deletions(-)
diff --git a/tests/lib/libc/sys/t_eventfd.c b/tests/lib/libc/sys/t_eventfd.c
index 4eaea2aff3ba..4dc2fe3e4baf 100644
--- a/tests/lib/libc/sys/t_eventfd.c
+++ b/tests/lib/libc/sys/t_eventfd.c
@@ -31,23 +31,27 @@ __COPYRIGHT("@(#) Copyright (c) 2020\
The NetBSD Foundation, inc. All rights reserved.");
__RCSID("$NetBSD: t_eventfd.c,v 1.3 2022/02/20 15:21:14 thorpej Exp $");
=20
-#include <sys/types.h>
#include <sys/event.h>
#include <sys/eventfd.h>
#include <sys/ioctl.h>
#include <sys/select.h>
#include <sys/stat.h>
#include <sys/syscall.h>
+#include <sys/types.h>
+
#include <errno.h>
#include <poll.h>
#include <pthread.h>
-#include <stdlib.h>
+#include <signal.h>
#include <stdio.h>
+#include <stdlib.h>
#include <time.h>
#include <unistd.h>
=20
#include <atf-c.h>
=20
+#include "h_macros.h"
+
struct helper_context {
int efd;
=20
@@ -815,6 +819,67 @@ ATF_TC_BODY(eventfd_fcntl, tc)
=20
/*************************************************************************=
****/
=20
+static pthread_key_t eventfd_signal_key;
+
+static void
+eventfd_read_signal_handler(int signo)
+{
+ volatile sig_atomic_t *const flag =3D
+ pthread_getspecific(eventfd_signal_key);
+
+ *flag =3D 1;
+}
+
+static void *
+eventfd_read_signal_helper(void * const v)
+{
+ struct helper_context * const ctx =3D v;
+ eventfd_t efd_value;
+ sig_atomic_t flag =3D 0;
+ int error;
+
+ RZ(pthread_setspecific(eventfd_signal_key, &flag));
+ if (signal(SIGUSR1, &eventfd_read_signal_handler) =3D=3D SIG_ERR)
+ atf_tc_fail("signal(SIGUSR1): %s", strerror(errno));
+
+ ATF_REQUIRE(wait_barrier(ctx));
+ ATF_REQUIRE(eventfd_read(ctx->efd, &efd_value) =3D=3D -1);
+ error =3D errno;
+ ATF_REQUIRE_MSG(error =3D=3D EINTR, "errno=3D%d (%s)", error,
+ strerror(error));
+ ATF_REQUIRE_MSG(flag, "signal not delivered");
+
+ return NULL;
+}
+
+ATF_TC(eventfd_read_signal);
+ATF_TC_HEAD(eventfd_read_signal, tc)
+{
+ atf_tc_set_md_var(tc, "descr",
+ "Tests eventfd reads can be interrupted by signal");
+}
+ATF_TC_BODY(eventfd_read_signal, tc)
+{
+ struct helper_context ctx;
+ pthread_t helper;
+
+ RZ(pthread_key_create(&eventfd_signal_key, NULL));
+
+ init_helper_context(&ctx);
+
+ RL(ctx.efd =3D eventfd(0, 0));
+ RZ(pthread_create(&helper, NULL, &eventfd_read_signal_helper, &ctx));
+
+ ATF_REQUIRE(wait_barrier(&ctx)); /* wait for helper to start */
+ (void)sleep(1); /* wait for the read to block */
+ (void)alarm(1); /* set a deadline */
+ RZ(pthread_kill(helper, SIGUSR1)); /* wake helper */
+ RZ(pthread_join(helper, NULL)); /* wait for helper to wake and fail */
+ (void)alarm(0); /* clear deadline */
+}
+
+/*************************************************************************=
****/
+
ATF_TP_ADD_TCS(tp)
{
ATF_TP_ADD_TC(tp, eventfd_normal);
@@ -825,6 +890,8 @@ ATF_TP_ADD_TCS(tp)
ATF_TP_ADD_TC(tp, eventfd_select_poll_kevent_block);
ATF_TP_ADD_TC(tp, eventfd_restart);
ATF_TP_ADD_TC(tp, eventfd_fcntl);
+ ATF_TP_ADD_TC(tp, eventfd_read_signal);
+// ATF_TP_ADD_TC(tp, eventfd_write_signal);
=20
return atf_no_error();
}
--=_ml7q+P8YFLTA7qSAPcDHHBHCSIXmg0yj--
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57703 CVS commit: src/sys/kern
Date: Sun, 19 Nov 2023 04:13:38 +0000
Module Name: src
Committed By: riastradh
Date: Sun Nov 19 04:13:38 UTC 2023
Modified Files:
src/sys/kern: sys_eventfd.c
Log Message:
eventfd(2): Omit needless micro-optimization causing PR kern/57703.
Unfortunately, owing to PR kern/57705 and PR misc/57706, it isn't
convenient to flip the xfail switch on a test for this bug. So we'll
do that separately. (But I did verify that a rumpified version of
the test postd to PR kern/57703 failed without this change, and
passed with this change.)
PR kern/57703
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.9 -r1.10 src/sys/kern/sys_eventfd.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 19 Nov 2023 13:13:47 +0000
State-Changed-Why:
fix committed (automatic tests aside), needs pullup to 10 now
State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 19 Nov 2023 13:20:16 +0000
State-Changed-Why:
pullup-10 #468 https://releng.netbsd.org/cgi-bin/req-10.cgi?show=468
not needed <10 because eventfd is new in 10
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57703 CVS commit: src/sys/kern
Date: Sun, 19 Nov 2023 17:16:01 +0000
Module Name: src
Committed By: riastradh
Date: Sun Nov 19 17:16:00 UTC 2023
Modified Files:
src/sys/kern: sys_eventfd.c
Log Message:
eventfd(2): Prune dead branch.
Fallout from PR kern/57703 fix.
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.10 -r1.11 src/sys/kern/sys_eventfd.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Manuel Bouyer" <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57703 CVS commit: [netbsd-10] src/sys/kern
Date: Sun, 26 Nov 2023 12:33:19 +0000
Module Name: src
Committed By: bouyer
Date: Sun Nov 26 12:33:19 UTC 2023
Modified Files:
src/sys/kern [netbsd-10]: sys_eventfd.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #468):
sys/kern/sys_eventfd.c: revision 1.10
eventfd(2): Omit needless micro-optimization causing PR kern/57703.
Unfortunately, owing to PR kern/57705 and PR misc/57706, it isn't
convenient to flip the xfail switch on a test for this bug. So we'll
do that separately. (But I did verify that a rumpified version of
the test postd to PR kern/57703 failed without this change, and
passed with this change.)
PR kern/57703
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.9 -r1.9.4.1 src/sys/kern/sys_eventfd.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Sun, 26 Nov 2023 13:16:06 +0000
State-Changed-Why:
Fix pulled up to netbsd-10 branch. Fix already confirmed with test case.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.