NetBSD Problem Report #55641

From www@netbsd.org  Thu Sep  3 18:40:26 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 2FC5B1A9217
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  3 Sep 2020 18:40:26 +0000 (UTC)
Message-Id: <20200903184025.0F63B1A9239@mollari.NetBSD.org>
Date: Thu,  3 Sep 2020 18:40:25 +0000 (UTC)
From: davshao@gmail.com
Reply-To: davshao@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Recent changes to random/entropy "pkgsrc devel brick" an Intel Ivy Bridge system, with workaround
X-Send-Pr-Version: www-1.0

>Number:         55641
>Category:       kern
>Synopsis:       Recent changes to random/entropy "pkgsrc devel brick" an Intel Ivy Bridge system, with workaround
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Sep 03 18:45:00 +0000 2020
>Last-Modified:  Fri Jun 30 21:45:02 +0000 2023
>Originator:     David Shao
>Release:        NetBSD current after a couple of weeks ago
>Organization:
>Environment:
NetBSD xxxxxx.xxx 9.99.72 NetBSD 9.99.72 (GENERIC) #1: Thu Sep  3 08:36:53 PDT 2020  xxxxxx.xxx:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
>Description:
After updating NetBSD current a couple of weeks ago, building pkgsrc devel/glib2 would stop on an Intel Ivy Bridge machine (Asus P8H77-V) motherboard, i3 CPU) at a line ending in something like

. output

That is what I mean by "pkgsrc devel bricking".  Rebuilding the system, reinstalling the system, nothing will fix this.

Pressing Ctrl-C, on this system, I discovered building devel/glib2 stopped in a line calling a random number function.  Then I remember seeing on boot a line flashing by talking about entropy.

This problem does not appear at all on an Intel Eagle Lake system and on a Comet Lake system.  Therefore it will not appear across a wide variety of machines.  But from what I am seeing in the mailing lists, it will appear for some number of contributors and will render their machines useless for further pkgsrc development unless worked around.

Why Ivy Bridge?  I believe it introduced the RDRAND function.  It would not surprise me if a first generation implementation had some edge cases.

>How-To-Repeat:

>Fix:
Look up a posting:

HEADS UP: Entropy overhaul

Date: Fri, 1 May 2020 21:10:58 +0000

Apply the following two lines as root as a workaround:

dd if=/dev/urandom of=/dev/random bs=32 count=1
sysctl -w kern.entropy.consolidate=1

In the long run, it seems unlikely a fix will ever be applied since the problem occurs for a small number of older machines.  But there must be some way to communicate to people upgrading to current, or later to a release, to apply the workaround to "pkgsrc devel unbrick" their systems.

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: davshao@gmail.com
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/55641: Recent changes to random/entropy "pkgsrc devel brick" an Intel Ivy Bridge system, with workaround
Date: Fri, 4 Sep 2020 02:26:00 +0000

 > Date: Thu,  3 Sep 2020 18:40:25 +0000 (UTC)
 > From: davshao@gmail.com
 > 
 > After updating NetBSD current a couple of weeks ago, building pkgsrc
 > devel/glib2 would stop on an Intel Ivy Bridge machine (Asus P8H77-V)
 > motherboard, i3 CPU) at a line ending in something like
 > 
 > . output
 > 
 > That is what I mean by "pkgsrc devel bricking".  Rebuilding the
 > system, reinstalling the system, nothing will fix this.
 > 
 > Pressing Ctrl-C, on this system, I discovered building devel/glib2
 > stopped in a line calling a random number function.  Then I remember
 > seeing on boot a line flashing by talking about entropy.

 Can you please share the dmesg output, and share whatever kind of
 trace led you to conclude `building devel/glib2 stopped in a line
 calling a random number function'?  Can you also share any relevant ps
 or pstree output?

 When did you last update before your most recent update a couple of
 weeks ago?

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55641 CVS commit: src
Date: Fri, 30 Jun 2023 21:42:06 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Fri Jun 30 21:42:06 UTC 2023

 Modified Files:
 	src/share/man/man7: entropy.7
 	src/sys/kern: kern_clock.c kern_entropy.c

 Log Message:
 entropy(9): Reintroduce netbsd<=9 time-delta estimator for unblocking.

 The system will (in a subsequent change) by default block for this
 condition before almost all of userland is running (including
 /etc/rc.d/sshd key generation).  That way, a never-blocking
 getentropy(3) API will never return any data without at least
 best-effort entropy like netbsd<=9 did to applications except in
 single-user mode (where you have to be careful about everything
 anyway) or in the few processes that run before a seed can even be
 loaded (where blocking indefinitely, e.g. when generating a stack
 protector cookie in libc, could pose a severe availability problem
 that can't be configured away, but where the security impact is low).

 However, (in another subsequent change) we will continue to use
 _only_ HWRNG driver estimates and seed estimates, and _not_
 time-delta estimator, for _warning_ about security in motd, daily
 security report, etc.  And if HWRNG/seed provides enough entropy
 before time-delta estimator does, that will unblock /dev/random too.

 The result is:

 - Machines with HWRNG or seed won't warn about entropy and will
   essentially never block -- even on first boot without a seed, it
   will take only as long as the fastest HWRNG to unblock.

 - Machines with neither HWRNG nor seed:
   . will warn about entropy, giving feedback about security;
     and
   . will avoid returning anything more predictable than netbsd<=9;
     but
   . won't block (much) longer than netbsd<=9 would (and won't block
     again after blocking once, except with kern.entropy.depletion=1 for
     testing).

   (The threshold for unblocking is now somewhat higher than before:
   512 samples that pass the time-delta estimator, rather than 80 as
   it used to be.)

   And, of course, adding a seed (or HWRNG) will prevent both warnings
   and blocking.

 The mechanism is:

 1. /dev/random will block until _either_

    (a) enough bits of entropy (256) from reliable sources have been
        added to the pool, _or_

    (b) enough samples have been added from any sources (512), passing
        the old time-delta entropy estimator, that the possible
        security benefit doesn't justify holding up availability any
        longer (`best effort'), except on systems with higher security
        requirements like securelevel=2 which can disable non-HWRNG,
        non-seed sources with rndctl_flags in rc.conf(5).

 2. dmesg will report `entropy: ready' when 1(a) is satisfied, but if
    1(b) is satisfied first, it will report `entropy: best effort', so
    the concise log messages will reflect the timing and whether in
    any period of time any of the system might be relying on best
    effort entropy.

 3. The sysctl knob kern.entropy.needed (and the ioctl RNDGETPOOLSTAT
    variable rndpoolstat_t::added) still reflects the number of bits
    of entropy from reliable sources, so we can still use this to
    suggest regenerating ssh keys.

    This matters on platforms that can only be reached, after flashing
    an installation image, by sshing in over a (private) network, like
    small network appliances or remote virtual machines without
    (interactive) serial consoles.  If we blocked indefinitely at boot
    when generating ssh keys, such platforms would be unusable.  This
    way, platforms are usable, but operators can still be advised at
    login time to regenerate keys as soon as they can actually load
    entropy onto the system, e.g. with rndctl(8) on a seed file copied
    from a local machine over the (private) network.

 4. On machines without HWRNG, using a seed file still suppresses
    warnings for users who need more confident security.  But it is no
    longer necessary for availability.

 This is a compromise between availability and security:

 - The security mechanism of blocking indefinitely on machines without
   HWRNG hurts availability too much, as painful experience over the
   multiple years since I made the mistake of introducing it have
   shown.  (Sorry!)

 - The other main alternative, not having a blocking path at all (as I
   pushed for, and as OpenBSD has done for a long time) could
   potentially reduce security vs netbsd<=9, and would run against the
   expectations set by many popular operating systems to the severe
   detriment of public perception of NetBSD security.

 Even though we can't _confidently_ assess enough entropy from, e.g.,
 sampling interrupt timings, this is the traditional behaviour that
 most operating systems provide -- and the result here is a net
 nondecrease in security over netbsd<=9, because all paths from the
 entropy pool to userland now have at least as high a standard before
 returning data as they did in netbsd<=9.

 PR kern/55641
 PR pkg/55847
 PR kern/57185
 https://mail-index.netbsd.org/current-users/2020/09/02/msg039470.html
 https://mail-index.netbsd.org/current-users/2020/11/21/msg039931.html
 https://mail-index.netbsd.org/current-users/2020/12/05/msg040019.html

 XXX pullup-10


 To generate a diff of this commit:
 cvs rdiff -u -r1.8 -r1.9 src/share/man/man7/entropy.7
 cvs rdiff -u -r1.148 -r1.149 src/sys/kern/kern_clock.c
 cvs rdiff -u -r1.61 -r1.62 src/sys/kern/kern_entropy.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.