NetBSD Problem Report #55790

From www@netbsd.org  Fri Nov  6 22:58:44 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 5717C1A9246
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  6 Nov 2020 22:58:44 +0000 (UTC)
Message-Id: <20201106225843.043621A925D@mollari.NetBSD.org>
Date: Fri,  6 Nov 2020 22:58:42 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Floating point exception crashes DIAGNOSTIC kernel
X-Send-Pr-Version: www-1.0

>Number:         55790
>Category:       port-arm
>Synopsis:       Floating point exception crashes DIAGNOSTIC kernel
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-arm-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Nov 06 23:00:00 +0000 2020
>Originator:     Rin Okuyama
>Release:        9.99.75
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD rpi0w 9.99.75 NetBSD 9.99.75 (RPI) #68: Sat Nov  7 07:24:01 JST 2020  rin@latipes:/sys/arch/evbarm/compile/RPI evbarm earmv6hf
>Description:
For CPU which raises floating point exceptions, e.g., ARM1176 in RPI[01],
a FPE crashes DIAGNOSTIC kernels:

----
$ cat fpe.c
#include <fenv.h>
#include <stdlib.h>

int
main(void)
{
        volatile double x, y;

        feenableexcept(FE_ALL_EXCEPT);

        x = atoi("1");
        y = atoi("0");
        return (int)(x / y);
}
$ cc fpe.c -lm && ./a.out
[ 16754.2789480] panic: kernel diagnostic assertion "(armreg_fpexc_read() & VFP_FPEXC_EN) == 0" failed: file "../../../../arch/arm/vfp/vfp_init.c", line 547
[ 16754.2789480] cpu0: Begin traceback...
[ 16754.2789480] 0xc9c11d94: netbsd:db_panic+0xc
[ 16754.2789480] 0xc9c11dac: netbsd:vpanic+0xc4
[ 16754.2789480] 0xc9c11dc4: netbsd:kern_assert+0x3c
[ 16754.2789480] 0xc9c11dfc: netbsd:vfp_state_load+0x144
[ 16754.2789480] 0xc9c11e5c: netbsd:pcu_load+0x1e0
[ 16754.2789480] 0xc9c11ef4: netbsd:vfp_handler+0x64
[ 16754.2789480] 0xc9c11fac: netbsd:undefinedinstruction+0x124
[ 16754.2789480] cpu0: End traceback...
Stopped in pid 309.309 (a.out) at       netbsd:cpu_Debugger+0x4:        bx r14
----

In vfp_handler(),

	https://nxr.netbsd.org/xref/src/sys/arch/arm/vfp/vfp_init.c#421

421 /* The real handler for VFP bounces.  */
422 static int
423 vfp_handler(u_int address, u_int insn, trapframe_t *frame, int fault_code)
...
437 	/*
438 	 * If we already own the FPU and it's enabled (and no exception), raise
439 	 * SIGILL.  If there is an exception, drop through to raise a SIGFPE.
440 	 */
441 	if (curcpu()->ci_pcu_curlwp[PCU_FPU] == curlwp
442 	    && (armreg_fpexc_read() & (VFP_FPEXC_EX|VFP_FPEXC_EN)) == VFP_FPEXC_EN)
443 		return 1;
444 
445 	/*
446 	 * Make sure we own the FP.
447 	 */
448 	pcu_load(&arm_vfp_ops);
449 
450 	uint32_t fpexc = armreg_fpexc_read();
451 	if (fpexc & VFP_FPEXC_EX) {
...		----> raise SIGFPE

as comment says, when curlwp owns FPU and it is enabled, if there's no
exception, it raises SIGILL. Otherwise, it falls through in order to raise
SIGFPE. However, since curlwp already owns enabled FPU, the KASSERT fires
in vfp_state_load() called from pcu_load(), as described above.

Therefore, we need to skip pcu_load() in this case:

	http://www.netbsd.org/~rin/vfp_init_20201107.patch

----
Index: sys/arch/arm/vfp/vfp_init.c
===================================================================
RCS file: /home/netbsd/src/sys/arch/arm/vfp/vfp_init.c,v
retrieving revision 1.72
diff -p -u -r1.72 vfp_init.c
--- sys/arch/arm/vfp/vfp_init.c	30 Oct 2020 18:54:37 -0000	1.72
+++ sys/arch/arm/vfp/vfp_init.c	6 Nov 2020 21:59:29 -0000
@@ -423,6 +423,7 @@ static int
 vfp_handler(u_int address, u_int insn, trapframe_t *frame, int fault_code)
 {
 	struct cpu_info * const ci = curcpu();
+	uint32_t fpexc;

 	/* This shouldn't ever happen.  */
 	if (fault_code != FAULT_USER &&
@@ -438,20 +439,30 @@ vfp_handler(u_int address, u_int insn, t
 	 * If we already own the FPU and it's enabled (and no exception), raise
 	 * SIGILL.  If there is an exception, drop through to raise a SIGFPE.
 	 */
-	if (curcpu()->ci_pcu_curlwp[PCU_FPU] == curlwp
-	    && (armreg_fpexc_read() & (VFP_FPEXC_EX|VFP_FPEXC_EN)) == VFP_FPEXC_EN)
-		return 1;
+	if (curlwp->l_pcu_cpu[PCU_FPU] == ci) {
+		KASSERT(ci->ci_pcu_curlwp[PCU_FPU] == curlwp);
+
+		fpexc = armreg_fpexc_read();
+		if (fpexc & VFP_FPEXC_EN) {
+			if ((fpexc & VFP_FPEXC_EX) == 0) {
+				return 1;	/* SIGILL */
+			} else {
+				goto fpe;	/* SIGFPE; skip pcu_load(9) */
+			}
+		}
+	}

 	/*
 	 * Make sure we own the FP.
 	 */
 	pcu_load(&arm_vfp_ops);

-	uint32_t fpexc = armreg_fpexc_read();
+	fpexc = armreg_fpexc_read();
 	if (fpexc & VFP_FPEXC_EX) {
 		ksiginfo_t ksi;
 		KASSERT(fpexc & VFP_FPEXC_EN);

+fpe:
 		curcpu()->ci_vfp_evs[2].ev_count++;

 		/*
----

Then, the process receives SIGFPE as expected. Note that the original
code checks ``curcpu()->ci_pcu_curlwp[PCU_FPU] == curlwp'' to determine
whether curlwp owns FPU. But, code in pcu_load(9) checks
``curlwp->l_pcu_cpu[PCU_FPU] == curcpu()'' for this purpose, and asserts
the former condition:

	https://nxr.netbsd.org/xref/src/sys/kern/subr_pcu.c#305

In this patch, I follow this manner for the clarity.

No new regression in tests/lib/libm is observed with this patch on RPI0.
>How-To-Repeat:
Described above; cause FPE on Raspberry Pi 1 or ZERO.
>Fix:
Described above; http://www.netbsd.org/~rin/vfp_init_20201107.patch

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.