NetBSD Problem Report #57230

From www@netbsd.org  Tue Feb 14 09:52:03 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id BA2D41A9239
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 14 Feb 2023 09:52:03 +0000 (UTC)
Message-Id: <20230214095201.588731A923A@mollari.NetBSD.org>
Date: Tue, 14 Feb 2023 09:52:01 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: set DIT/DOITM bit on arm/x86
X-Send-Pr-Version: www-1.0

>Number:         57230
>Category:       kern
>Synopsis:       set DIT/DOITM bit on arm/x86
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Feb 14 09:55:00 +0000 2023
>Last-Modified:  Tue Feb 14 16:10:02 +0000 2023
>Originator:     Taylor R Campbell
>Release:        current
>Organization:
The DoitBSD Foundation
>Environment:
gotta roast it a wee tiny bit more for security
>Description:
Cryptographic secrets can leak through side channels based on timing when cryptographic operations on them take variable time that depends on the secrets.

Some CPU instructions, such as addition and bitwise XOR, traditionally run in constant time independent of their operands -- there's no temptation to make bitwise XOR take a different number of cycles depending on the inputs.  Others, such as division, conditional branches, or loads and stores, typically run in variable time for various reasons (division algorithms, branch prediction, cache hit or miss depending on load/store address).  Modern cryptography software is often limited to instructions that traditionally run in constant time.

However, this behaviour is merely _traditional_ based on the obvious implementation techniques in the logic gates.  It has, until recently, never been _guaranteed_.  Arm and Intel recently added some architectural state bits to enable a guarantee:

- ARMv8.4-DIT (mandatory in Armv8.4) adds PSTATE.DIT (aarch64) and CPSR.DIT (aarch32) bits, for Data Independent Timing.  When this bit is set, certain instructions are guaranteed to run in time independent of the values of any register operands, and loads and stores are guaranteed to run in time independent of the values being loaded or stored (but not independent of the address).  Details: https://developer.arm.com/documentation/ddi0601/2020-12/AArch64-Registers/DIT--Data-Independent-Timing

- Newer Intel CPUs have an MSR with a DOITM bit, for Data Operand Independent Timing Mode, similar to the Arm DIT bit.  Details: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/data-operand-independent-timing-isa-guidance.html

- Newer Intel CPUs also appear to have a bug where the DOITM bit isn't quite enough in some instructions that were previously advertised to have data-operand independent timing, such as PMULDQ -- when the floating-point exception status bits are unset in the MXCSR, these instructions sometimes have data-dependent timing.  Setting all the floating-point exception status bits in the MXCSR in advance avoids this leak.  Details: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/resources/mcdt-data-operand-independent-timing-instructions.html

(Unfortunately, the page on instructions with MXCSR Dependent Timing (MCDT) is resistant to archiving in the Internet Archive for some reason.  Currently the list is: PMADDUBSW PMADDWD PMULDQ PMULHRSW PMULHUW PMULHW PMULLD PMULLW PMULUDQ VPLZCNTD VPLZCNTQ VPMADD52HUQ VPMADD52LUQ VPMADDUBSW VPMADDWD VPMULDQ VPMULHRSW VPMULHUW VPMULHW VPMULLD VPMULLQ VPMULLW VPMULUDQ)

Some options:

1. Set DIT on Arm and DOITM/MXCSR on Intel in the kernel unconditionally, 100% of the time.
2. Set DIT on Arm and DOITM on Intel in the kernel unconditionally, 100% of the time.  Set the MXCSR exception status bits in fpu_kern_enter.
3. Set DIT on Arm and DOITM/MXCSR on Intel in the kernel in fpu_kern_enter, and restore it on fpu_kern_leave.

Exactly what performance impact to expect is unclear -- maybe Arm and Intel will do something bonkers and make XOR take longer with the DIT/DOITM bit set, but that seems unlikely because you'd have to go out of your way to design an XOR instruction that takes variable time anyway.

More likely, I think, some of the fancier vectorized operations that already have a long latency which is tempting to make slightly variable -- e.g., maybe take one cycle longer to set a condition code at the end -- might be altered to always take the maximum latency.  Of course, many instructions which currently run in variable time anyway, such as division, are unaffected by the DIT/DOITM bit and must be still avoided for handling secrets.

Only some cryptographic code in the kernel is bracketed by fpu_kern_enter/leave -- just the code with MD vectorized implementations.  None of the portable C implementations of cryptographic primitives use this; they run in the normal mode of the kernel.

Further, it would be bad if, for example, copyin and copyout had data-dependent timing when transferring secrets through a pipe, or if reading or writing data in swap took time that depends on the bits of the data.

So I think we should do option (1).

Some further discussion:
https://seclists.org/oss-sec/2023/q1/52
https://lkml.org/lkml/2023/1/24/1393

Note: There are also timing side channels based on dynamic voltage and frequency scaling (see, e.g., https://www.hertzbleed.com/hertzbleed.pdf and https://arxiv.org/pdf/2206.13660v1.pdf).  This is not about that -- this is only about the new architectural DIT/DOITM bits.
>How-To-Repeat:
ask CPU designers to codify microarchitectural guarantees like https://xkcd.com/1172/
>Fix:
Yes please!

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/57230: set DIT/DOITM bit on arm/x86
Date: Tue, 14 Feb 2023 16:09:31 +0000

 One difference between Arm DIT and Intel DOITM:

 - The Arm PSTATE.DIT bit can be read and written at any execution
   level.  Userland programs could set PSTATE.DIT=0 if they are willing
   to run on the edge to take advantage of what are most likely
   minuscule microarchitectural microoptimizations, while the kernel
   could have PSTATE.DIT=1.

 - The Intel DOITM bit lives in an MSR, IA32_UARCH_MISC_CTL, which I
   suspect can only be read or written while privileged.  We could
   create a syscall/ioctl path by which userland programs could request
   to have the DOITM bit cleared when executing in userland, and have
   the kernel run with the DOITM bit set.

   (Note: The MXCSR bits, which affect the MCDT instructions, can be
   read and written unprivileged.)

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.