NetBSD Problem Report #55641
From www@netbsd.org Thu Sep 3 18:40:26 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 2FC5B1A9217
for <gnats-bugs@gnats.NetBSD.org>; Thu, 3 Sep 2020 18:40:26 +0000 (UTC)
Message-Id: <20200903184025.0F63B1A9239@mollari.NetBSD.org>
Date: Thu, 3 Sep 2020 18:40:25 +0000 (UTC)
From: davshao@gmail.com
Reply-To: davshao@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Recent changes to random/entropy "pkgsrc devel brick" an Intel Ivy Bridge system, with workaround
X-Send-Pr-Version: www-1.0
>Number: 55641
>Category: kern
>Synopsis: Recent changes to random/entropy "pkgsrc devel brick" an Intel Ivy Bridge system, with workaround
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Sep 03 18:45:00 +0000 2020
>Last-Modified: Fri Jun 30 21:45:02 +0000 2023
>Originator: David Shao
>Release: NetBSD current after a couple of weeks ago
>Organization:
>Environment:
NetBSD xxxxxx.xxx 9.99.72 NetBSD 9.99.72 (GENERIC) #1: Thu Sep 3 08:36:53 PDT 2020 xxxxxx.xxx:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
>Description:
After updating NetBSD current a couple of weeks ago, building pkgsrc devel/glib2 would stop on an Intel Ivy Bridge machine (Asus P8H77-V) motherboard, i3 CPU) at a line ending in something like
. output
That is what I mean by "pkgsrc devel bricking". Rebuilding the system, reinstalling the system, nothing will fix this.
Pressing Ctrl-C, on this system, I discovered building devel/glib2 stopped in a line calling a random number function. Then I remember seeing on boot a line flashing by talking about entropy.
This problem does not appear at all on an Intel Eagle Lake system and on a Comet Lake system. Therefore it will not appear across a wide variety of machines. But from what I am seeing in the mailing lists, it will appear for some number of contributors and will render their machines useless for further pkgsrc development unless worked around.
Why Ivy Bridge? I believe it introduced the RDRAND function. It would not surprise me if a first generation implementation had some edge cases.
>How-To-Repeat:
>Fix:
Look up a posting:
HEADS UP: Entropy overhaul
Date: Fri, 1 May 2020 21:10:58 +0000
Apply the following two lines as root as a workaround:
dd if=/dev/urandom of=/dev/random bs=32 count=1
sysctl -w kern.entropy.consolidate=1
In the long run, it seems unlikely a fix will ever be applied since the problem occurs for a small number of older machines. But there must be some way to communicate to people upgrading to current, or later to a release, to apply the workaround to "pkgsrc devel unbrick" their systems.
>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: davshao@gmail.com
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/55641: Recent changes to random/entropy "pkgsrc devel brick" an Intel Ivy Bridge system, with workaround
Date: Fri, 4 Sep 2020 02:26:00 +0000
> Date: Thu, 3 Sep 2020 18:40:25 +0000 (UTC)
> From: davshao@gmail.com
>
> After updating NetBSD current a couple of weeks ago, building pkgsrc
> devel/glib2 would stop on an Intel Ivy Bridge machine (Asus P8H77-V)
> motherboard, i3 CPU) at a line ending in something like
>
> . output
>
> That is what I mean by "pkgsrc devel bricking". Rebuilding the
> system, reinstalling the system, nothing will fix this.
>
> Pressing Ctrl-C, on this system, I discovered building devel/glib2
> stopped in a line calling a random number function. Then I remember
> seeing on boot a line flashing by talking about entropy.
Can you please share the dmesg output, and share whatever kind of
trace led you to conclude `building devel/glib2 stopped in a line
calling a random number function'? Can you also share any relevant ps
or pstree output?
When did you last update before your most recent update a couple of
weeks ago?
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55641 CVS commit: src
Date: Fri, 30 Jun 2023 21:42:06 +0000
Module Name: src
Committed By: riastradh
Date: Fri Jun 30 21:42:06 UTC 2023
Modified Files:
src/share/man/man7: entropy.7
src/sys/kern: kern_clock.c kern_entropy.c
Log Message:
entropy(9): Reintroduce netbsd<=9 time-delta estimator for unblocking.
The system will (in a subsequent change) by default block for this
condition before almost all of userland is running (including
/etc/rc.d/sshd key generation). That way, a never-blocking
getentropy(3) API will never return any data without at least
best-effort entropy like netbsd<=9 did to applications except in
single-user mode (where you have to be careful about everything
anyway) or in the few processes that run before a seed can even be
loaded (where blocking indefinitely, e.g. when generating a stack
protector cookie in libc, could pose a severe availability problem
that can't be configured away, but where the security impact is low).
However, (in another subsequent change) we will continue to use
_only_ HWRNG driver estimates and seed estimates, and _not_
time-delta estimator, for _warning_ about security in motd, daily
security report, etc. And if HWRNG/seed provides enough entropy
before time-delta estimator does, that will unblock /dev/random too.
The result is:
- Machines with HWRNG or seed won't warn about entropy and will
essentially never block -- even on first boot without a seed, it
will take only as long as the fastest HWRNG to unblock.
- Machines with neither HWRNG nor seed:
. will warn about entropy, giving feedback about security;
and
. will avoid returning anything more predictable than netbsd<=9;
but
. won't block (much) longer than netbsd<=9 would (and won't block
again after blocking once, except with kern.entropy.depletion=1 for
testing).
(The threshold for unblocking is now somewhat higher than before:
512 samples that pass the time-delta estimator, rather than 80 as
it used to be.)
And, of course, adding a seed (or HWRNG) will prevent both warnings
and blocking.
The mechanism is:
1. /dev/random will block until _either_
(a) enough bits of entropy (256) from reliable sources have been
added to the pool, _or_
(b) enough samples have been added from any sources (512), passing
the old time-delta entropy estimator, that the possible
security benefit doesn't justify holding up availability any
longer (`best effort'), except on systems with higher security
requirements like securelevel=2 which can disable non-HWRNG,
non-seed sources with rndctl_flags in rc.conf(5).
2. dmesg will report `entropy: ready' when 1(a) is satisfied, but if
1(b) is satisfied first, it will report `entropy: best effort', so
the concise log messages will reflect the timing and whether in
any period of time any of the system might be relying on best
effort entropy.
3. The sysctl knob kern.entropy.needed (and the ioctl RNDGETPOOLSTAT
variable rndpoolstat_t::added) still reflects the number of bits
of entropy from reliable sources, so we can still use this to
suggest regenerating ssh keys.
This matters on platforms that can only be reached, after flashing
an installation image, by sshing in over a (private) network, like
small network appliances or remote virtual machines without
(interactive) serial consoles. If we blocked indefinitely at boot
when generating ssh keys, such platforms would be unusable. This
way, platforms are usable, but operators can still be advised at
login time to regenerate keys as soon as they can actually load
entropy onto the system, e.g. with rndctl(8) on a seed file copied
from a local machine over the (private) network.
4. On machines without HWRNG, using a seed file still suppresses
warnings for users who need more confident security. But it is no
longer necessary for availability.
This is a compromise between availability and security:
- The security mechanism of blocking indefinitely on machines without
HWRNG hurts availability too much, as painful experience over the
multiple years since I made the mistake of introducing it have
shown. (Sorry!)
- The other main alternative, not having a blocking path at all (as I
pushed for, and as OpenBSD has done for a long time) could
potentially reduce security vs netbsd<=9, and would run against the
expectations set by many popular operating systems to the severe
detriment of public perception of NetBSD security.
Even though we can't _confidently_ assess enough entropy from, e.g.,
sampling interrupt timings, this is the traditional behaviour that
most operating systems provide -- and the result here is a net
nondecrease in security over netbsd<=9, because all paths from the
entropy pool to userland now have at least as high a standard before
returning data as they did in netbsd<=9.
PR kern/55641
PR pkg/55847
PR kern/57185
https://mail-index.netbsd.org/current-users/2020/09/02/msg039470.html
https://mail-index.netbsd.org/current-users/2020/11/21/msg039931.html
https://mail-index.netbsd.org/current-users/2020/12/05/msg040019.html
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.8 -r1.9 src/share/man/man7/entropy.7
cvs rdiff -u -r1.148 -r1.149 src/sys/kern/kern_clock.c
cvs rdiff -u -r1.61 -r1.62 src/sys/kern/kern_entropy.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.