NetBSD Problem Report #56730
From gson@gson.org Thu Feb 24 11:20:17 2022
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 6F5841A9239
for <gnats-bugs@gnats.NetBSD.org>; Thu, 24 Feb 2022 11:20:17 +0000 (UTC)
Message-Id: <20220224112006.11FD7254381@guava.gson.org>
Date: Thu, 24 Feb 2022 13:20:06 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Lock error panic in entropy_enter_early()
X-Send-Pr-Version: 3.95
>Number: 56730
>Category: kern
>Synopsis: Lock error panic in entropy_enter_early()
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Feb 24 11:25:01 +0000 2022
>Last-Modified: Fri Mar 18 23:35:01 +0000 2022
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date 2022.02.23.12.17.06
>Organization:
>Environment:
System: NetBSD
Architecture: mips
Machine: pmax
>Description:
The TNF pmax testbed recently paniced during boot:
[ 1.0000000] entropy: WARNING: extracting entropy too early
[ 1.0078165] Mutex error: mutex_vector_enter,479: locking against myself
[ 1.0078165] lock address : 0x00000000805a5380
[ 1.0078165] current cpu : 0
[ 1.0078165] current lwp : 0x000000008059c600
[ 1.0078165] owner field : 000000000000000000 wait/spin: 0/1
[ 1.0078165] panic: lock error: Mutex: mutex_vector_enter,479: locking against myself: lock 0x805a5380 cpu 0 lwp 0x8059c600
[ 1.0078165] cpu0: Begin traceback...
[ 1.0078165] pid -2140824880 not found
[ 1.0078165] cpu0: End traceback...
[ 1.0078165] kernel: breakpoint trap
Stopped in pid 0.0 (system) at netbsd:cpu_Debugger+0x4: jr ra
bdslot: nop
0x80659af8: cpu_Debugger+0x4 (0,0x8000,0x805d34fc,0x8058ed40) ra 0x80353a0c sz 0
0x80659af8: vpanic+0x1a4 (0,0x8000,0x805d34fc,0x8058ed40) ra 0x80353a88 sz 48
0x80659b28: panic+0x24 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x80346a50 sz 32
0x80659b48: lockdebug_abort+0xe4 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x802f6b14 sz 56
0x80659b80: mutex_abort+0x48 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x802d4760 sz 32
0x80659ba0: entropy_enter_early+0x110 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x802d5204 sz 40
0x80659bc8: rnd_add_data_1+0x4f0 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x802d71f0 sz 88
0x80659c20: rnd_add_data+0x120 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x802d7400 sz 96
0x80659c80: rnd_add_uint32+0x20 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x802c980c sz 32
0x80659ca0: clockrnd_sample+0x40 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0 sz 0
User-level: pid 0.0
at 1
v0 1
v1 8058ed40 cpu_info_store
The full console log is at
http://releng.netbsd.org/b5reports/pmax/2022/2022.02.23.12.17.06/test.log
Grepping the logs shows two earlier instances:
http://releng.netbsd.org/b5reports/pmax/2021/2021.06.04.06.28.42/test.log
http://releng.netbsd.org/b5reports/pmax/2021/2021.10.04.07.04.39/test.log
This hpcmips panic also looks like it could be the same issue:
http://releng.netbsd.org/b5reports/hpcmips/2021/2021.01.17.00.13.49/test.log
And it has also happened once on sparc, so it's not mips specific:
http://releng.netbsd.org/b5reports/sparc/2021/2021.04.20.10.39.58/test.log
>How-To-Repeat:
>Fix:
>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gson@gson.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/56730: Lock error panic in entropy_enter_early()
Date: Thu, 24 Feb 2022 13:00:20 +0000
This happens because there's a short window early at boot between when
interrupts are enabled and when entering entropy stops taking the
global entropy lock:
1. Initially at boot, there are no interrupts, so entering entropy
doesn't require blocking interrupts in order to prevent them from
interfering with updates to the entropy pool.
2. Then interrupts are enabled, but entering entropy still takes the
global entropy lock.
3. Then, after entropy_init_late, entering entropy stops taking the
global entropy lock -- if something is already entering entropy
when an interrupt handler runs, it just drops the sample on the
floor, so that the crypto operations involved in the entropy pool
don't hurt interrupt latency.
Currently the global entropy lock runs at IPL_VM, which is lower
priority than IPL_SCHED where the hardclock timer runs -- that was an
oversight on my part when adding clockrnd, but it only causes a
problem during (2).
We could raise the lock to IPL_SCHED as a provisional workaround:
diff --git a/sys/kern/kern_entropy.c b/sys/kern/kern_entropy.c
index 8b97c49b8d2a..4c2f18ea1c92 100644
--- a/sys/kern/kern_entropy.c
+++ b/sys/kern/kern_entropy.c
@@ -369,7 +369,7 @@ entropy_init(void)
NULL, 0, &E->epoch, 0, CTL_CREATE, CTL_EOL);
=20
/* Initialize the global state for multithreaded operation. */
- mutex_init(&E->lock, MUTEX_DEFAULT, IPL_VM);
+ mutex_init(&E->lock, MUTEX_DEFAULT, IPL_SCHED);
cv_init(&E->cv, "entropy");
selinit(&E->selq);
cv_init(&E->sourcelock_cv, "entsrclock");
However, it _should_ run at IPL_SOFTSERIAL so that it never blocks
hard interrupt handlers. Unfortunately, this is tricky: there's a lot
of details to get right in the short window early boot between when
interrupts start running and when entropy_init_late transitions
everything to the non-early paths. And debugging is hard because
mistakes often cause the machine to wedge before the console works.
Here's the diff so far in the branch. It doesn't work yet, though,
and I haven't finished diagnosing what's wrong with it:
diff --git a/share/man/man9/cprng.9 b/share/man/man9/cprng.9
index 960c97c555d7..40b4dd677d3d 100644
--- a/share/man/man9/cprng.9
+++ b/share/man/man9/cprng.9
@@ -93,16 +93,14 @@ the strong generator can also be created by calling
.Pp
The
.Nm
-functions may be used at interrupt priority level
-.Dv IPL_VM
-or below,
+functions may be used in soft interrupt context,
except for
.Fn cprng_strong_create
and
.Fn cprng_strong_destroy
which are allowed only at
-.Dv IPL_NONE ;
-see
+.Dv IPL_NONE
+in thread context; see
.Xr spl 9 .
.Pp
The
diff --git a/sys/kern/kern_entropy.c b/sys/kern/kern_entropy.c
index 8b97c49b8d2a..6c2c2a452940 100644
--- a/sys/kern/kern_entropy.c
+++ b/sys/kern/kern_entropy.c
@@ -167,9 +167,9 @@ struct {
kcondvar_t sourcelock_cv; /* notifies sourcelock release */
LIST_HEAD(,krndsource) sources; /* list of entropy sources */
enum entropy_stage {
- ENTROPY_COLD =3D 0, /* single-threaded */
- ENTROPY_WARM, /* multi-threaded at boot before CPUs */
- ENTROPY_HOT, /* multi-threaded multi-CPU */
+ ENTROPY_COLD =3D 0, /* before initialization */
+ ENTROPY_WARM, /* after initialization */
+ ENTROPY_HOT, /* housekeeping thread started */
} stage;
bool consolidate; /* kick thread to consolidate */
bool seed_rndsource; /* true if seed source is attached */
@@ -211,9 +211,6 @@ EVCNT_ATTACH_STATIC(entropy_partial_evcnt);
static struct evcnt entropy_consolidate_evcnt =3D
EVCNT_INITIALIZER(EVCNT_TYPE_MISC, NULL, "entropy", "consolidate");
EVCNT_ATTACH_STATIC(entropy_consolidate_evcnt);
-static struct evcnt entropy_extract_intr_evcnt =3D
- EVCNT_INITIALIZER(EVCNT_TYPE_MISC, NULL, "entropy", "extract intr");
-EVCNT_ATTACH_STATIC(entropy_extract_intr_evcnt);
static struct evcnt entropy_extract_fail_evcnt =3D
EVCNT_INITIALIZER(EVCNT_TYPE_MISC, NULL, "entropy", "extract fail");
EVCNT_ATTACH_STATIC(entropy_extract_fail_evcnt);
@@ -369,7 +366,7 @@ entropy_init(void)
NULL, 0, &E->epoch, 0, CTL_CREATE, CTL_EOL);
=20
/* Initialize the global state for multithreaded operation. */
- mutex_init(&E->lock, MUTEX_DEFAULT, IPL_VM);
+ mutex_init(&E->lock, MUTEX_DEFAULT, IPL_SOFTSERIAL);
cv_init(&E->cv, "entropy");
selinit(&E->selq);
cv_init(&E->sourcelock_cv, "entsrclock");
@@ -385,6 +382,19 @@ entropy_init(void)
LIST_FOREACH(rs, &E->sources, list)
rs->state =3D percpu_alloc(sizeof(struct rndsource_cpu));
=20
+ /* Allocate and initialize the per-CPU state. */
+ entropy_percpu =3D percpu_create(sizeof(struct entropy_cpu),
+ entropy_init_cpu, entropy_fini_cpu, NULL);
+
+ /*
+ * Establish the softint at the highest softint priority level.
+ * Must happen after CPU detection.
+ */
+ entropy_sih =3D softint_establish(SOFTINT_SERIAL|SOFTINT_MPSAFE,
+ &entropy_softintr, NULL);
+ if (entropy_sih =3D=3D NULL)
+ panic("unable to establish entropy softint");
+
/* Enter the boot cycle count to get started. */
extra[i++] =3D entropy_timer();
KASSERT(i =3D=3D __arraycount(extra));
@@ -409,19 +419,6 @@ entropy_init_late(void)
=20
KASSERT(E->stage =3D=3D ENTROPY_WARM);
=20
- /* Allocate and initialize the per-CPU state. */
- entropy_percpu =3D percpu_create(sizeof(struct entropy_cpu),
- entropy_init_cpu, entropy_fini_cpu, NULL);
-
- /*
- * Establish the softint at the highest softint priority level.
- * Must happen after CPU detection.
- */
- entropy_sih =3D softint_establish(SOFTINT_SERIAL|SOFTINT_MPSAFE,
- &entropy_softintr, NULL);
- if (entropy_sih =3D=3D NULL)
- panic("unable to establish entropy softint");
-
/*
* Create the entropy housekeeping thread. Must happen after
* lwpinit.
@@ -652,7 +649,7 @@ entropy_account_cpu(struct entropy_cpu *ec)
{
unsigned diff;
=20
- KASSERT(E->stage =3D=3D ENTROPY_HOT);
+ KASSERT(E->stage >=3D ENTROPY_WARM);
=20
/*
* If there's no entropy needed, and entropy has been
@@ -739,8 +736,7 @@ entropy_enter_early(const void *buf, size_t len, unsign=
ed nbits)
{
bool notify =3D false;
=20
- if (E->stage >=3D ENTROPY_WARM)
- mutex_enter(&E->lock);
+ KASSERT(E->stage =3D=3D ENTROPY_COLD);
=20
/* Enter it into the pool. */
entpool_enter(&E->pool, buf, len);
@@ -759,9 +755,6 @@ entropy_enter_early(const void *buf, size_t len, unsign=
ed nbits)
entropy_notify();
entropy_immediate_evcnt.ev_count++;
}
-
- if (E->stage >=3D ENTROPY_WARM)
- mutex_exit(&E->lock);
}
=20
/*
@@ -785,7 +778,7 @@ entropy_enter(const void *buf, size_t len, unsigned nbi=
ts)
"impossible entropy rate: %u bits in %zu-byte string", nbits, len);
=20
/* If it's too early after boot, just use entropy_enter_early. */
- if (__predict_false(E->stage < ENTROPY_HOT)) {
+ if (__predict_false(E->stage =3D=3D ENTROPY_COLD)) {
entropy_enter_early(buf, len, nbits);
return;
}
@@ -845,7 +838,7 @@ entropy_enter_intr(const void *buf, size_t len, unsigne=
d nbits)
"impossible entropy rate: %u bits in %zu-byte string", nbits, len);
=20
/* If it's too early after boot, just use entropy_enter_early. */
- if (__predict_false(E->stage < ENTROPY_HOT)) {
+ if (__predict_false(E->stage =3D=3D ENTROPY_COLD)) {
entropy_enter_early(buf, len, nbits);
return true;
}
@@ -1278,14 +1271,13 @@ entropy_extract(void *buf, size_t len, int flags)
"can't wait for entropy until warm");
}
=20
+ /* Refuse to operate in interrupt context. */
+ KASSERT(!cpu_intr_p());
+
/* Acquire the global lock to get at the global pool. */
if (E->stage >=3D ENTROPY_WARM)
mutex_enter(&E->lock);
=20
- /* Count up request for entropy in interrupt context. */
- if (cpu_intr_p())
- entropy_extract_intr_evcnt.ev_count++;
-
/* Wait until there is enough entropy in the system. */
error =3D 0;
while (E->needed) {
@@ -1875,9 +1867,7 @@ rnd_add_data_1(struct krndsource *rs, const void *buf=
, uint32_t len,
* contributed from this source.
*/
if (fullyused) {
- if (E->stage < ENTROPY_HOT) {
- if (E->stage >=3D ENTROPY_WARM)
- mutex_enter(&E->lock);
+ if (__predict_false(E->stage =3D=3D ENTROPY_COLD)) {
rs->total =3D add_sat(rs->total, entropybits);
switch (flag) {
case RND_FLAG_COLLECT_TIME:
@@ -1889,8 +1879,6 @@ rnd_add_data_1(struct krndsource *rs, const void *buf=
, uint32_t len,
add_sat(rs->value_delta.insamples, 1);
break;
}
- if (E->stage >=3D ENTROPY_WARM)
- mutex_exit(&E->lock);
} else {
struct rndsource_cpu *rc =3D percpu_getref(rs->state);
=20
diff --git a/sys/kern/subr_cprng.c b/sys/kern/subr_cprng.c
index 97973ae8eb73..b513f3844d37 100644
--- a/sys/kern/subr_cprng.c
+++ b/sys/kern/subr_cprng.c
@@ -44,7 +44,7 @@
* This code serves the first two categories without having extra
* logic for /dev/random.
*
- * kern_cprng - available at IPL_VM or lower
+ * kern_cprng - available at IPL_SOFTSERIAL or lower
* user_cprng - available only at IPL_NONE in thread context
*
* The name kern_cprng is for hysterical raisins. The name
@@ -88,7 +88,6 @@ struct cprng_cpu {
struct nist_hash_drbg *cc_drbg;
struct {
struct evcnt reseed;
- struct evcnt intr;
} *cc_evcnt;
unsigned cc_epoch;
};
@@ -99,7 +98,7 @@ static void cprng_init_cpu(void *, void *, struct cpu_inf=
o *);
static void cprng_fini_cpu(void *, void *, struct cpu_info *);
=20
/* Well-known CPRNG instances */
-struct cprng_strong *kern_cprng __read_mostly; /* IPL_VM */
+struct cprng_strong *kern_cprng __read_mostly; /* IPL_SOFTSERIAL */
struct cprng_strong *user_cprng __read_mostly; /* IPL_NONE */
=20
static struct sysctllog *cprng_sysctllog __read_mostly;
@@ -112,12 +111,12 @@ cprng_init(void)
panic("NIST Hash_DRBG failed self-test");
=20
/*
- * Create CPRNG instances at two IPLs: IPL_VM for kernel use
- * that may occur inside IPL_VM interrupt handlers (!!??!?!?),
+ * Create CPRNG instances at two IPLs: IPL_SOFTSERIAL for
+ * kernel use that may occur inside soft interrupt handlers,
* and IPL_NONE for userland use which need not block
* interrupts.
*/
- kern_cprng =3D cprng_strong_create("kern", IPL_VM, 0);
+ kern_cprng =3D cprng_strong_create("kern", IPL_SOFTSERIAL, 0);
user_cprng =3D cprng_strong_create("user", IPL_NONE, 0);
=20
/* Create kern.urandom and kern.arandom sysctl nodes. */
@@ -246,8 +245,6 @@ cprng_init_cpu(void *ptr, void *cookie, struct cpu_info=
*ci)
/* Attach the event counters. */
/* XXX ci_cpuname may not be initialized early enough. */
cpuname =3D ci->ci_cpuname[0] =3D=3D '\0' ? "cpu0" : ci->ci_cpuname;
- evcnt_attach_dynamic(&cc->cc_evcnt->intr, EVCNT_TYPE_MISC, NULL,
- cpuname, "cprng_strong intr");
evcnt_attach_dynamic(&cc->cc_evcnt->reseed, EVCNT_TYPE_MISC, NULL,
cpuname, "cprng_strong reseed");
=20
@@ -261,7 +258,6 @@ cprng_fini_cpu(void *ptr, void *cookie, struct cpu_info=
*ci)
struct cprng_cpu *cc =3D ptr;
=20
evcnt_detach(&cc->cc_evcnt->reseed);
- evcnt_detach(&cc->cc_evcnt->intr);
if (__predict_false(nist_hash_drbg_destroy(cc->cc_drbg)))
panic("nist_hash_drbg_destroy");
=20
@@ -277,6 +273,9 @@ cprng_strong(struct cprng_strong *cprng, void *buf, siz=
e_t len, int flags)
unsigned epoch;
int s;
=20
+ /* Not allowed in hard interrupt context. */
+ KASSERT(!cpu_intr_p());
+
/*
* Verify maximum request length. Caller should really limit
* their requests to 32 bytes to avoid spending much time with
@@ -292,9 +291,6 @@ cprng_strong(struct cprng_strong *cprng, void *buf, siz=
e_t len, int flags)
cc =3D percpu_getref(cprng->cs_percpu);
s =3D splraiseipl(cprng->cs_iplcookie);
=20
- if (cpu_intr_p())
- cc->cc_evcnt->intr.ev_count++;
-
/* If the entropy epoch has changed, (re)seed. */
epoch =3D entropy_epoch();
if (__predict_false(epoch !=3D cc->cc_epoch)) {
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56730 CVS commit: src/sys/kern
Date: Fri, 18 Mar 2022 23:34:56 +0000
Module Name: src
Committed By: riastradh
Date: Fri Mar 18 23:34:56 UTC 2022
Modified Files:
src/sys/kern: kern_entropy.c
Log Message:
entropy(9): Use the early-entropy path only while cold.
This way, we never take the global entropy lock from interrupt
handlers (no interrupts while cold), so the global entropy lock need
not block interrupts.
There's an annoying ordering issue here: softint_establish doesn't
work until after CPUs have been detected, which happens inside
configure(), which is also what enables interrupts. So we have no
opportunity to softint_establish the entropy softint _before_
interrupts are enabled.
To work around this, we have to put a conditional into the interrupt
path, and go out of our way to process any queued samples after
establishing the softint. If we just made softint_establish work
early, like percpu_create does now, this problem would go away and we
could delete a bit of logic here.
Candidate fix for PR kern/56730.
To generate a diff of this commit:
cvs rdiff -u -r1.36 -r1.37 src/sys/kern/kern_entropy.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.