NetBSD Problem Report #55790

From www@netbsd.org  Fri Nov  6 22:58:44 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 5717C1A9246
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  6 Nov 2020 22:58:44 +0000 (UTC)
Message-Id: <20201106225843.043621A925D@mollari.NetBSD.org>
Date: Fri,  6 Nov 2020 22:58:42 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Floating point exception crashes DIAGNOSTIC kernel
X-Send-Pr-Version: www-1.0

>Number:         55790
>Category:       port-arm
>Synopsis:       Floating point exception crashes DIAGNOSTIC kernel
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-arm-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Nov 06 23:00:00 +0000 2020
>Closed-Date:    Sun Sep 05 06:15:35 +0000 2021
>Last-Modified:  Sun Sep 05 06:15:35 +0000 2021
>Originator:     Rin Okuyama
>Release:        9.99.75
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD rpi0w 9.99.75 NetBSD 9.99.75 (RPI) #68: Sat Nov  7 07:24:01 JST 2020  rin@latipes:/sys/arch/evbarm/compile/RPI evbarm earmv6hf
>Description:
For CPU which raises floating point exceptions, e.g., ARM1176 in RPI[01],
a FPE crashes DIAGNOSTIC kernels:

----
$ cat fpe.c
#include <fenv.h>
#include <stdlib.h>

int
main(void)
{
        volatile double x, y;

        feenableexcept(FE_ALL_EXCEPT);

        x = atoi("1");
        y = atoi("0");
        return (int)(x / y);
}
$ cc fpe.c -lm && ./a.out
[ 16754.2789480] panic: kernel diagnostic assertion "(armreg_fpexc_read() & VFP_FPEXC_EN) == 0" failed: file "../../../../arch/arm/vfp/vfp_init.c", line 547
[ 16754.2789480] cpu0: Begin traceback...
[ 16754.2789480] 0xc9c11d94: netbsd:db_panic+0xc
[ 16754.2789480] 0xc9c11dac: netbsd:vpanic+0xc4
[ 16754.2789480] 0xc9c11dc4: netbsd:kern_assert+0x3c
[ 16754.2789480] 0xc9c11dfc: netbsd:vfp_state_load+0x144
[ 16754.2789480] 0xc9c11e5c: netbsd:pcu_load+0x1e0
[ 16754.2789480] 0xc9c11ef4: netbsd:vfp_handler+0x64
[ 16754.2789480] 0xc9c11fac: netbsd:undefinedinstruction+0x124
[ 16754.2789480] cpu0: End traceback...
Stopped in pid 309.309 (a.out) at       netbsd:cpu_Debugger+0x4:        bx r14
----

In vfp_handler(),

	https://nxr.netbsd.org/xref/src/sys/arch/arm/vfp/vfp_init.c#421

421 /* The real handler for VFP bounces.  */
422 static int
423 vfp_handler(u_int address, u_int insn, trapframe_t *frame, int fault_code)
...
437 	/*
438 	 * If we already own the FPU and it's enabled (and no exception), raise
439 	 * SIGILL.  If there is an exception, drop through to raise a SIGFPE.
440 	 */
441 	if (curcpu()->ci_pcu_curlwp[PCU_FPU] == curlwp
442 	    && (armreg_fpexc_read() & (VFP_FPEXC_EX|VFP_FPEXC_EN)) == VFP_FPEXC_EN)
443 		return 1;
444 
445 	/*
446 	 * Make sure we own the FP.
447 	 */
448 	pcu_load(&arm_vfp_ops);
449 
450 	uint32_t fpexc = armreg_fpexc_read();
451 	if (fpexc & VFP_FPEXC_EX) {
...		----> raise SIGFPE

as comment says, when curlwp owns FPU and it is enabled, if there's no
exception, it raises SIGILL. Otherwise, it falls through in order to raise
SIGFPE. However, since curlwp already owns enabled FPU, the KASSERT fires
in vfp_state_load() called from pcu_load(), as described above.

Therefore, we need to skip pcu_load() in this case:

	http://www.netbsd.org/~rin/vfp_init_20201107.patch

----
Index: sys/arch/arm/vfp/vfp_init.c
===================================================================
RCS file: /home/netbsd/src/sys/arch/arm/vfp/vfp_init.c,v
retrieving revision 1.72
diff -p -u -r1.72 vfp_init.c
--- sys/arch/arm/vfp/vfp_init.c	30 Oct 2020 18:54:37 -0000	1.72
+++ sys/arch/arm/vfp/vfp_init.c	6 Nov 2020 21:59:29 -0000
@@ -423,6 +423,7 @@ static int
 vfp_handler(u_int address, u_int insn, trapframe_t *frame, int fault_code)
 {
 	struct cpu_info * const ci = curcpu();
+	uint32_t fpexc;

 	/* This shouldn't ever happen.  */
 	if (fault_code != FAULT_USER &&
@@ -438,20 +439,30 @@ vfp_handler(u_int address, u_int insn, t
 	 * If we already own the FPU and it's enabled (and no exception), raise
 	 * SIGILL.  If there is an exception, drop through to raise a SIGFPE.
 	 */
-	if (curcpu()->ci_pcu_curlwp[PCU_FPU] == curlwp
-	    && (armreg_fpexc_read() & (VFP_FPEXC_EX|VFP_FPEXC_EN)) == VFP_FPEXC_EN)
-		return 1;
+	if (curlwp->l_pcu_cpu[PCU_FPU] == ci) {
+		KASSERT(ci->ci_pcu_curlwp[PCU_FPU] == curlwp);
+
+		fpexc = armreg_fpexc_read();
+		if (fpexc & VFP_FPEXC_EN) {
+			if ((fpexc & VFP_FPEXC_EX) == 0) {
+				return 1;	/* SIGILL */
+			} else {
+				goto fpe;	/* SIGFPE; skip pcu_load(9) */
+			}
+		}
+	}

 	/*
 	 * Make sure we own the FP.
 	 */
 	pcu_load(&arm_vfp_ops);

-	uint32_t fpexc = armreg_fpexc_read();
+	fpexc = armreg_fpexc_read();
 	if (fpexc & VFP_FPEXC_EX) {
 		ksiginfo_t ksi;
 		KASSERT(fpexc & VFP_FPEXC_EN);

+fpe:
 		curcpu()->ci_vfp_evs[2].ev_count++;

 		/*
----

Then, the process receives SIGFPE as expected. Note that the original
code checks ``curcpu()->ci_pcu_curlwp[PCU_FPU] == curlwp'' to determine
whether curlwp owns FPU. But, code in pcu_load(9) checks
``curlwp->l_pcu_cpu[PCU_FPU] == curcpu()'' for this purpose, and asserts
the former condition:

	https://nxr.netbsd.org/xref/src/sys/kern/subr_pcu.c#305

In this patch, I follow this manner for the clarity.

No new regression in tests/lib/libm is observed with this patch on RPI0.
>How-To-Repeat:
Described above; cause FPE on Raspberry Pi 1 or ZERO.
>Fix:
Described above; http://www.netbsd.org/~rin/vfp_init_20201107.patch

>Release-Note:

>Audit-Trail:
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55790 CVS commit: src/sys/arch/arm/vfp
Date: Tue, 1 Jun 2021 00:13:19 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Tue Jun  1 00:13:19 UTC 2021

 Modified Files:
 	src/sys/arch/arm/vfp: vfp_init.c

 Log Message:
 PR port-arm/55790

 Style fix for clarity, in preparation of main fix.

 Replace condition ``curcpu()->ci_pcu_curlwp[PCU_FPU] == curlwp'' with
 ``curlwp->l_pcu_cpu[PCU_FPU] == curcpu()''. And add KASSERT to check
 the two conditions are equivalent, as done for MI pcu code:

 https://nxr.netbsd.org/xref/src/sys/kern/subr_pcu.c#323

 No functional changes.


 To generate a diff of this commit:
 cvs rdiff -u -r1.72 -r1.73 src/sys/arch/arm/vfp/vfp_init.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55790 CVS commit: src/sys/arch/arm/vfp
Date: Tue, 1 Jun 2021 00:30:22 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Tue Jun  1 00:30:22 UTC 2021

 Modified Files:
 	src/sys/arch/arm/vfp: vfp_init.c

 Log Message:
 PR port-arm/55790

 Fix KASSERT failure with floating-point exception in userland.

 Consider the case in which curlwp owns enabled FPU in vfp_handler().
 If FPE is raised, we must skip pcu_load(9) rather than just falling
 through. Otherwise, KASSERT fires in vfp_state_load(), since curlwp
 already owns enabled FPU.

 No regression for ATF is introduced.


 To generate a diff of this commit:
 cvs rdiff -u -r1.73 -r1.74 src/sys/arch/arm/vfp/vfp_init.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Sun, 05 Sep 2021 06:15:35 +0000
State-Changed-Why:
patch applied.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.