NetBSD Problem Report #56730

From gson@gson.org  Thu Feb 24 11:20:17 2022
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 6F5841A9239
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 24 Feb 2022 11:20:17 +0000 (UTC)
Message-Id: <20220224112006.11FD7254381@guava.gson.org>
Date: Thu, 24 Feb 2022 13:20:06 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Lock error panic in entropy_enter_early()
X-Send-Pr-Version: 3.95

>Number:         56730
>Category:       kern
>Synopsis:       Lock error panic in entropy_enter_early()
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Feb 24 11:25:01 +0000 2022
>Last-Modified:  Fri Mar 18 23:35:01 +0000 2022
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date 2022.02.23.12.17.06
>Organization:

>Environment:
System: NetBSD
Architecture: mips
Machine: pmax
>Description:

The TNF pmax testbed recently paniced during boot:

[   1.0000000] entropy: WARNING: extracting entropy too early
[   1.0078165] Mutex error: mutex_vector_enter,479: locking against myself

[   1.0078165] lock address : 0x00000000805a5380
[   1.0078165] current cpu  :                  0
[   1.0078165] current lwp  : 0x000000008059c600
[   1.0078165] owner field  : 000000000000000000 wait/spin:                0/1

[   1.0078165] panic: lock error: Mutex: mutex_vector_enter,479: locking against myself: lock 0x805a5380 cpu 0 lwp 0x8059c600
[   1.0078165] cpu0: Begin traceback...
[   1.0078165] pid -2140824880 not found
[   1.0078165] cpu0: End traceback...
[   1.0078165] kernel: breakpoint trap
Stopped in pid 0.0 (system) at  netbsd:cpu_Debugger+0x4:        jr      ra
                bdslot: nop
0x80659af8: cpu_Debugger+0x4 (0,0x8000,0x805d34fc,0x8058ed40) ra 0x80353a0c sz 0
0x80659af8: vpanic+0x1a4 (0,0x8000,0x805d34fc,0x8058ed40) ra 0x80353a88 sz 48
0x80659b28: panic+0x24 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x80346a50 sz 32
0x80659b48: lockdebug_abort+0xe4 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x802f6b14 sz 56
0x80659b80: mutex_abort+0x48 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x802d4760 sz 32
0x80659ba0: entropy_enter_early+0x110 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x802d5204 sz 40
0x80659bc8: rnd_add_data_1+0x4f0 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x802d71f0 sz 88
0x80659c20: rnd_add_data+0x120 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x802d7400 sz 96
0x80659c80: rnd_add_uint32+0x20 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0x802c980c sz 32
0x80659ca0: clockrnd_sample+0x40 (0,0x804fcc94,0x804bc5c4,0x1df) ra 0 sz 0
User-level: pid 0.0
at          1
v0          1
v1          8058ed40    cpu_info_store

The full console log is at

  http://releng.netbsd.org/b5reports/pmax/2022/2022.02.23.12.17.06/test.log

Grepping the logs shows two earlier instances:

  http://releng.netbsd.org/b5reports/pmax/2021/2021.06.04.06.28.42/test.log
  http://releng.netbsd.org/b5reports/pmax/2021/2021.10.04.07.04.39/test.log

This hpcmips panic also looks like it could be the same issue:

  http://releng.netbsd.org/b5reports/hpcmips/2021/2021.01.17.00.13.49/test.log

And it has also happened once on sparc, so it's not mips specific:

  http://releng.netbsd.org/b5reports/sparc/2021/2021.04.20.10.39.58/test.log

>How-To-Repeat:

>Fix:

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gson@gson.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/56730: Lock error panic in entropy_enter_early()
Date: Thu, 24 Feb 2022 13:00:20 +0000

 This happens because there's a short window early at boot between when
 interrupts are enabled and when entering entropy stops taking the
 global entropy lock:

 1. Initially at boot, there are no interrupts, so entering entropy
    doesn't require blocking interrupts in order to prevent them from
    interfering with updates to the entropy pool.

 2. Then interrupts are enabled, but entering entropy still takes the
    global entropy lock.

 3. Then, after entropy_init_late, entering entropy stops taking the
    global entropy lock -- if something is already entering entropy
    when an interrupt handler runs, it just drops the sample on the
    floor, so that the crypto operations involved in the entropy pool
    don't hurt interrupt latency.

 Currently the global entropy lock runs at IPL_VM, which is lower
 priority than IPL_SCHED where the hardclock timer runs -- that was an
 oversight on my part when adding clockrnd, but it only causes a
 problem during (2).

 We could raise the lock to IPL_SCHED as a provisional workaround:

 diff --git a/sys/kern/kern_entropy.c b/sys/kern/kern_entropy.c
 index 8b97c49b8d2a..4c2f18ea1c92 100644
 --- a/sys/kern/kern_entropy.c
 +++ b/sys/kern/kern_entropy.c
 @@ -369,7 +369,7 @@ entropy_init(void)
  	    NULL, 0, &E->epoch, 0, CTL_CREATE, CTL_EOL);
 =20
  	/* Initialize the global state for multithreaded operation.  */
 -	mutex_init(&E->lock, MUTEX_DEFAULT, IPL_VM);
 +	mutex_init(&E->lock, MUTEX_DEFAULT, IPL_SCHED);
  	cv_init(&E->cv, "entropy");
  	selinit(&E->selq);
  	cv_init(&E->sourcelock_cv, "entsrclock");

 However, it _should_ run at IPL_SOFTSERIAL so that it never blocks
 hard interrupt handlers.  Unfortunately, this is tricky: there's a lot
 of details to get right in the short window early boot between when
 interrupts start running and when entropy_init_late transitions
 everything to the non-early paths.  And debugging is hard because
 mistakes often cause the machine to wedge before the console works.

 Here's the diff so far in the branch.  It doesn't work yet, though,
 and I haven't finished diagnosing what's wrong with it:

 diff --git a/share/man/man9/cprng.9 b/share/man/man9/cprng.9
 index 960c97c555d7..40b4dd677d3d 100644
 --- a/share/man/man9/cprng.9
 +++ b/share/man/man9/cprng.9
 @@ -93,16 +93,14 @@ the strong generator can also be created by calling
  .Pp
  The
  .Nm
 -functions may be used at interrupt priority level
 -.Dv IPL_VM
 -or below,
 +functions may be used in soft interrupt context,
  except for
  .Fn cprng_strong_create
  and
  .Fn cprng_strong_destroy
  which are allowed only at
 -.Dv IPL_NONE ;
 -see
 +.Dv IPL_NONE
 +in thread context; see
  .Xr spl 9 .
  .Pp
  The
 diff --git a/sys/kern/kern_entropy.c b/sys/kern/kern_entropy.c
 index 8b97c49b8d2a..6c2c2a452940 100644
 --- a/sys/kern/kern_entropy.c
 +++ b/sys/kern/kern_entropy.c
 @@ -167,9 +167,9 @@ struct {
  	kcondvar_t	sourcelock_cv;	/* notifies sourcelock release */
  	LIST_HEAD(,krndsource) sources;	/* list of entropy sources */
  	enum entropy_stage {
 -		ENTROPY_COLD =3D 0, /* single-threaded */
 -		ENTROPY_WARM,	  /* multi-threaded at boot before CPUs */
 -		ENTROPY_HOT,	  /* multi-threaded multi-CPU */
 +		ENTROPY_COLD =3D 0, /* before initialization */
 +		ENTROPY_WARM,	  /* after initialization */
 +		ENTROPY_HOT,	  /* housekeeping thread started */
  	}		stage;
  	bool		consolidate;	/* kick thread to consolidate */
  	bool		seed_rndsource;	/* true if seed source is attached */
 @@ -211,9 +211,6 @@ EVCNT_ATTACH_STATIC(entropy_partial_evcnt);
  static struct evcnt entropy_consolidate_evcnt =3D
      EVCNT_INITIALIZER(EVCNT_TYPE_MISC, NULL, "entropy", "consolidate");
  EVCNT_ATTACH_STATIC(entropy_consolidate_evcnt);
 -static struct evcnt entropy_extract_intr_evcnt =3D
 -    EVCNT_INITIALIZER(EVCNT_TYPE_MISC, NULL, "entropy", "extract intr");
 -EVCNT_ATTACH_STATIC(entropy_extract_intr_evcnt);
  static struct evcnt entropy_extract_fail_evcnt =3D
      EVCNT_INITIALIZER(EVCNT_TYPE_MISC, NULL, "entropy", "extract fail");
  EVCNT_ATTACH_STATIC(entropy_extract_fail_evcnt);
 @@ -369,7 +366,7 @@ entropy_init(void)
  	    NULL, 0, &E->epoch, 0, CTL_CREATE, CTL_EOL);
 =20
  	/* Initialize the global state for multithreaded operation.  */
 -	mutex_init(&E->lock, MUTEX_DEFAULT, IPL_VM);
 +	mutex_init(&E->lock, MUTEX_DEFAULT, IPL_SOFTSERIAL);
  	cv_init(&E->cv, "entropy");
  	selinit(&E->selq);
  	cv_init(&E->sourcelock_cv, "entsrclock");
 @@ -385,6 +382,19 @@ entropy_init(void)
  	LIST_FOREACH(rs, &E->sources, list)
  		rs->state =3D percpu_alloc(sizeof(struct rndsource_cpu));
 =20
 +	/* Allocate and initialize the per-CPU state.  */
 +	entropy_percpu =3D percpu_create(sizeof(struct entropy_cpu),
 +	    entropy_init_cpu, entropy_fini_cpu, NULL);
 +
 +	/*
 +	 * Establish the softint at the highest softint priority level.
 +	 * Must happen after CPU detection.
 +	 */
 +	entropy_sih =3D softint_establish(SOFTINT_SERIAL|SOFTINT_MPSAFE,
 +	    &entropy_softintr, NULL);
 +	if (entropy_sih =3D=3D NULL)
 +		panic("unable to establish entropy softint");
 +
  	/* Enter the boot cycle count to get started.  */
  	extra[i++] =3D entropy_timer();
  	KASSERT(i =3D=3D __arraycount(extra));
 @@ -409,19 +419,6 @@ entropy_init_late(void)
 =20
  	KASSERT(E->stage =3D=3D ENTROPY_WARM);
 =20
 -	/* Allocate and initialize the per-CPU state.  */
 -	entropy_percpu =3D percpu_create(sizeof(struct entropy_cpu),
 -	    entropy_init_cpu, entropy_fini_cpu, NULL);
 -
 -	/*
 -	 * Establish the softint at the highest softint priority level.
 -	 * Must happen after CPU detection.
 -	 */
 -	entropy_sih =3D softint_establish(SOFTINT_SERIAL|SOFTINT_MPSAFE,
 -	    &entropy_softintr, NULL);
 -	if (entropy_sih =3D=3D NULL)
 -		panic("unable to establish entropy softint");
 -
  	/*
  	 * Create the entropy housekeeping thread.  Must happen after
  	 * lwpinit.
 @@ -652,7 +649,7 @@ entropy_account_cpu(struct entropy_cpu *ec)
  {
  	unsigned diff;
 =20
 -	KASSERT(E->stage =3D=3D ENTROPY_HOT);
 +	KASSERT(E->stage >=3D ENTROPY_WARM);
 =20
  	/*
  	 * If there's no entropy needed, and entropy has been
 @@ -739,8 +736,7 @@ entropy_enter_early(const void *buf, size_t len, unsign=
 ed nbits)
  {
  	bool notify =3D false;
 =20
 -	if (E->stage >=3D ENTROPY_WARM)
 -		mutex_enter(&E->lock);
 +	KASSERT(E->stage =3D=3D ENTROPY_COLD);
 =20
  	/* Enter it into the pool.  */
  	entpool_enter(&E->pool, buf, len);
 @@ -759,9 +755,6 @@ entropy_enter_early(const void *buf, size_t len, unsign=
 ed nbits)
  		entropy_notify();
  		entropy_immediate_evcnt.ev_count++;
  	}
 -
 -	if (E->stage >=3D ENTROPY_WARM)
 -		mutex_exit(&E->lock);
  }
 =20
  /*
 @@ -785,7 +778,7 @@ entropy_enter(const void *buf, size_t len, unsigned nbi=
 ts)
  	    "impossible entropy rate: %u bits in %zu-byte string", nbits, len);
 =20
  	/* If it's too early after boot, just use entropy_enter_early.  */
 -	if (__predict_false(E->stage < ENTROPY_HOT)) {
 +	if (__predict_false(E->stage =3D=3D ENTROPY_COLD)) {
  		entropy_enter_early(buf, len, nbits);
  		return;
  	}
 @@ -845,7 +838,7 @@ entropy_enter_intr(const void *buf, size_t len, unsigne=
 d nbits)
  	    "impossible entropy rate: %u bits in %zu-byte string", nbits, len);
 =20
  	/* If it's too early after boot, just use entropy_enter_early.  */
 -	if (__predict_false(E->stage < ENTROPY_HOT)) {
 +	if (__predict_false(E->stage =3D=3D ENTROPY_COLD)) {
  		entropy_enter_early(buf, len, nbits);
  		return true;
  	}
 @@ -1278,14 +1271,13 @@ entropy_extract(void *buf, size_t len, int flags)
  		    "can't wait for entropy until warm");
  	}
 =20
 +	/* Refuse to operate in interrupt context.  */
 +	KASSERT(!cpu_intr_p());
 +
  	/* Acquire the global lock to get at the global pool.  */
  	if (E->stage >=3D ENTROPY_WARM)
  		mutex_enter(&E->lock);
 =20
 -	/* Count up request for entropy in interrupt context.  */
 -	if (cpu_intr_p())
 -		entropy_extract_intr_evcnt.ev_count++;
 -
  	/* Wait until there is enough entropy in the system.  */
  	error =3D 0;
  	while (E->needed) {
 @@ -1875,9 +1867,7 @@ rnd_add_data_1(struct krndsource *rs, const void *buf=
 , uint32_t len,
  	 * contributed from this source.
  	 */
  	if (fullyused) {
 -		if (E->stage < ENTROPY_HOT) {
 -			if (E->stage >=3D ENTROPY_WARM)
 -				mutex_enter(&E->lock);
 +		if (__predict_false(E->stage =3D=3D ENTROPY_COLD)) {
  			rs->total =3D add_sat(rs->total, entropybits);
  			switch (flag) {
  			case RND_FLAG_COLLECT_TIME:
 @@ -1889,8 +1879,6 @@ rnd_add_data_1(struct krndsource *rs, const void *buf=
 , uint32_t len,
  				    add_sat(rs->value_delta.insamples, 1);
  				break;
  			}
 -			if (E->stage >=3D ENTROPY_WARM)
 -				mutex_exit(&E->lock);
  		} else {
  			struct rndsource_cpu *rc =3D percpu_getref(rs->state);
 =20
 diff --git a/sys/kern/subr_cprng.c b/sys/kern/subr_cprng.c
 index 97973ae8eb73..b513f3844d37 100644
 --- a/sys/kern/subr_cprng.c
 +++ b/sys/kern/subr_cprng.c
 @@ -44,7 +44,7 @@
   *	This code serves the first two categories without having extra
   *	logic for /dev/random.
   *
 - *	kern_cprng - available at IPL_VM or lower
 + *	kern_cprng - available at IPL_SOFTSERIAL or lower
   *	user_cprng - available only at IPL_NONE in thread context
   *
   *	The name kern_cprng is for hysterical raisins.  The name
 @@ -88,7 +88,6 @@ struct cprng_cpu {
  	struct nist_hash_drbg	*cc_drbg;
  	struct {
  		struct evcnt	reseed;
 -		struct evcnt	intr;
  	}			*cc_evcnt;
  	unsigned		cc_epoch;
  };
 @@ -99,7 +98,7 @@ static void	cprng_init_cpu(void *, void *, struct cpu_inf=
 o *);
  static void	cprng_fini_cpu(void *, void *, struct cpu_info *);
 =20
  /* Well-known CPRNG instances */
 -struct cprng_strong *kern_cprng __read_mostly; /* IPL_VM */
 +struct cprng_strong *kern_cprng __read_mostly; /* IPL_SOFTSERIAL */
  struct cprng_strong *user_cprng __read_mostly; /* IPL_NONE */
 =20
  static struct sysctllog *cprng_sysctllog __read_mostly;
 @@ -112,12 +111,12 @@ cprng_init(void)
  		panic("NIST Hash_DRBG failed self-test");
 =20
  	/*
 -	 * Create CPRNG instances at two IPLs: IPL_VM for kernel use
 -	 * that may occur inside IPL_VM interrupt handlers (!!??!?!?),
 +	 * Create CPRNG instances at two IPLs: IPL_SOFTSERIAL for
 +	 * kernel use that may occur inside soft interrupt handlers,
  	 * and IPL_NONE for userland use which need not block
  	 * interrupts.
  	 */
 -	kern_cprng =3D cprng_strong_create("kern", IPL_VM, 0);
 +	kern_cprng =3D cprng_strong_create("kern", IPL_SOFTSERIAL, 0);
  	user_cprng =3D cprng_strong_create("user", IPL_NONE, 0);
 =20
  	/* Create kern.urandom and kern.arandom sysctl nodes.  */
 @@ -246,8 +245,6 @@ cprng_init_cpu(void *ptr, void *cookie, struct cpu_info=
  *ci)
  	/* Attach the event counters.  */
  	/* XXX ci_cpuname may not be initialized early enough.  */
  	cpuname =3D ci->ci_cpuname[0] =3D=3D '\0' ? "cpu0" : ci->ci_cpuname;
 -	evcnt_attach_dynamic(&cc->cc_evcnt->intr, EVCNT_TYPE_MISC, NULL,
 -	    cpuname, "cprng_strong intr");
  	evcnt_attach_dynamic(&cc->cc_evcnt->reseed, EVCNT_TYPE_MISC, NULL,
  	    cpuname, "cprng_strong reseed");
 =20
 @@ -261,7 +258,6 @@ cprng_fini_cpu(void *ptr, void *cookie, struct cpu_info=
  *ci)
  	struct cprng_cpu *cc =3D ptr;
 =20
  	evcnt_detach(&cc->cc_evcnt->reseed);
 -	evcnt_detach(&cc->cc_evcnt->intr);
  	if (__predict_false(nist_hash_drbg_destroy(cc->cc_drbg)))
  		panic("nist_hash_drbg_destroy");
 =20
 @@ -277,6 +273,9 @@ cprng_strong(struct cprng_strong *cprng, void *buf, siz=
 e_t len, int flags)
  	unsigned epoch;
  	int s;
 =20
 +	/* Not allowed in hard interrupt context.  */
 +	KASSERT(!cpu_intr_p());
 +
  	/*
  	 * Verify maximum request length.  Caller should really limit
  	 * their requests to 32 bytes to avoid spending much time with
 @@ -292,9 +291,6 @@ cprng_strong(struct cprng_strong *cprng, void *buf, siz=
 e_t len, int flags)
  	cc =3D percpu_getref(cprng->cs_percpu);
  	s =3D splraiseipl(cprng->cs_iplcookie);
 =20
 -	if (cpu_intr_p())
 -		cc->cc_evcnt->intr.ev_count++;
 -
  	/* If the entropy epoch has changed, (re)seed.  */
  	epoch =3D entropy_epoch();
  	if (__predict_false(epoch !=3D cc->cc_epoch)) {

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56730 CVS commit: src/sys/kern
Date: Fri, 18 Mar 2022 23:34:56 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Fri Mar 18 23:34:56 UTC 2022

 Modified Files:
 	src/sys/kern: kern_entropy.c

 Log Message:
 entropy(9): Use the early-entropy path only while cold.

 This way, we never take the global entropy lock from interrupt
 handlers (no interrupts while cold), so the global entropy lock need
 not block interrupts.

 There's an annoying ordering issue here: softint_establish doesn't
 work until after CPUs have been detected, which happens inside
 configure(), which is also what enables interrupts.  So we have no
 opportunity to softint_establish the entropy softint _before_
 interrupts are enabled.

 To work around this, we have to put a conditional into the interrupt
 path, and go out of our way to process any queued samples after
 establishing the softint.  If we just made softint_establish work
 early, like percpu_create does now, this problem would go away and we
 could delete a bit of logic here.

 Candidate fix for PR kern/56730.


 To generate a diff of this commit:
 cvs rdiff -u -r1.36 -r1.37 src/sys/kern/kern_entropy.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.