NetBSD Problem Report #56820

From www@netbsd.org  Sat May  7 12:47:32 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 230841A923B
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  7 May 2022 12:47:32 +0000 (UTC)
Message-Id: <20220507124700.AB8F51A923C@mollari.NetBSD.org>
Date: Sat,  7 May 2022 12:47:00 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Many FPE related tests fail on softfloat machines
X-Send-Pr-Version: www-1.0

>Number:         56820
>Category:       misc
>Synopsis:       Many FPE related tests fail on softfloat machines
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    misc-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat May 07 12:50:01 +0000 2022
>Last-Modified:  Mon Feb 19 17:25:01 +0000 2024
>Originator:     Rin Okuyama
>Release:        9.99.96
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD usl5p 9.99.96 NetBSD 9.99.96 (USL-5P) #1: Fri May  6 14:47:54 JST 2022  rin@latipes:/build/src/sys/arch/landisk/compile/USL-5P landisk
>Description:
As observed for, e.g.,

armv5:	https://www.netbsd.org/~martin/evbarm-atf/
sh3:	https://www.netbsd.org/~martin/landisk-atf/

many FPE related tests, like
libc/sys/t_ptrace_*_signal{ignore,masked}_crash_fpe,
fail on softfloat machines.

These tests expect that SIGFPE cannot be ignored nor blocked, as
it is raised by FPE handler in kernel, like SIGBUS or SIGILL.

However, for softfloat environments, SIGFPE can be ignored/blocked
like other ``normal'' signals, as it is raised by libc/softfloat.
>How-To-Repeat:
On softfloat machines:

# cd /usr/tests/lib/libc/sys && atf-run t_ptrace_wait
>Fix:
Skip these tests on softfloat machines?

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Sat, 7 May 2022 14:56:58 +0200

 Shouldn't we try to make softfloat vs. hardware FPU mostly undetectable
 by userland?  Besides related "sysctl machdep" entries like machdep.fpu_present
 of course.

 Martin

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, misc-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Sat, 7 May 2022 22:16:26 +0900

 On 2022/05/07 22:00, Martin Husemann wrote:
 >   Shouldn't we try to make softfloat vs. hardware FPU mostly undetectable
 >   by userland?  Besides related "sysctl machdep" entries like machdep.fpu_present
 >   of course.

 It would be nice, but doesn't it require specific system call?

 Thanks,
 rin

From: Martin Husemann <martin@duskware.de>
To: Rin Okuyama <rokuyama.rk@gmail.com>
Cc: gnats-bugs@netbsd.org
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Sat, 7 May 2022 15:19:28 +0200

 On Sat, May 07, 2022 at 10:16:26PM +0900, Rin Okuyama wrote:
 > It would be nice, but doesn't it require specific system call?

 Or special kernel code that filters the signal mask changes? I'm not
 sure, just thinking out loud.

 Martin

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Thu, 26 May 2022 02:58:32 +0000

 On Sat, May 07, 2022 at 01:20:05PM +0000, Martin Husemann wrote:
  > From: Martin Husemann <martin@duskware.de>
  > To: Rin Okuyama <rokuyama.rk@gmail.com>
  > Cc: gnats-bugs@netbsd.org
  > Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
  > Date: Sat, 7 May 2022 15:19:28 +0200
  > 
  >  On Sat, May 07, 2022 at 10:16:26PM +0900, Rin Okuyama wrote:
  >  > It would be nice, but doesn't it require specific system call?
  >  
  >  Or special kernel code that filters the signal mask changes? I'm not
  >  sure, just thinking out loud.

 I'm pretty sure the behavior if you block or ignore and then trigger
 SIGFPE is undefined, same as SIGSEGV.

 But if we want it to die, it seems like the best approach is in the
 place where the library posts SIGFPE, if that returns without exiting
 unblock the signal, then set it to SIG_DFL, then post it again. And
 maybe then if that returns too, whine on stderr and post SIGKILL.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Tue, 3 Oct 2023 20:11:20 +0200

 I like the unblocking/default action aproach, but I'm not sure
 I understand the "if that returns w/o existing" part.

 The code that posts the signal is src/lib/libc/softfloat/softfloat-specialize
 around line 95:

         memset(&info, 0, sizeof info);
         info.si_signo = SIGFPE;
         info.si_pid = getpid(); 
         info.si_uid = geteuid();   
         if (flags & float_flag_underflow)
             info.si_code = FPE_FLTUND;
         else if (flags & float_flag_overflow)
             info.si_code = FPE_FLTOVF;
         else if (flags & float_flag_divbyzero)
             info.si_code = FPE_FLTDIV;
         else if (flags & float_flag_invalid)
             info.si_code = FPE_FLTINV;
         else if (flags & float_flag_inexact)
             info.si_code = FPE_FLTRES;
         sigqueueinfo(getpid(), &info);


 Wouldn't it be good enough to always call sigprocmask to unblock SIGFPE
 and then check sigaction old state and if it is SIG_IGN set it to SIG_DFL?

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Thu, 5 Oct 2023 13:50:09 +0200

 The patch below seems to work for me. Am I ovelooking some corner
 cases?

 Martin


 Index: softfloat-specialize
 ===================================================================
 RCS file: /cvsroot/src/lib/libc/softfloat/softfloat-specialize,v
 retrieving revision 1.9
 diff -u -p -r1.9 softfloat-specialize
 --- softfloat-specialize	10 Aug 2014 05:57:31 -0000	1.9
 +++ softfloat-specialize	5 Oct 2023 11:46:10 -0000
 @@ -66,6 +66,8 @@ fp_except float_exception_mask = 0;
  void
  float_raise( fp_except flags )
  {
 +    struct sigaction sa;
 +    sigset_t set;
      siginfo_t info;
      fp_except mask = float_exception_mask;

 @@ -92,6 +94,26 @@ float_raise( fp_except flags )
  	    info.si_code = FPE_FLTINV;
  	else if (flags & float_flag_inexact)
  	    info.si_code = FPE_FLTRES;
 +
 +	/*
 +	 * Make sure SIGFPE is not blocked/ignored - that would be impossible
 +	 * with FP hardware.
 +	 */
 +	sigemptyset(&set);
 +	sigaddset(&set, SIGFPE);
 +	sigprocmask(SIG_UNBLOCK, &set, NULL);
 +	if (sigaction(SIGFPE, NULL, &sa) == 0) {
 +		if (sa.sa_handler == SIG_IGN) {
 +			memset(&sa, 0, sizeof(sa));
 +			sa.sa_handler = SIG_DFL;
 +			sigemptyset(&sa.sa_mask);
 +		} else {
 +			sa.sa_flags |= SA_RESETHAND;
 +		}
 +		sigaction(SIGFPE, &sa, NULL);
 +	}
 +
 +	/* deliver the signal */
  	sigqueueinfo(getpid(), &info);
      }
  }

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Sat, 7 Oct 2023 14:13:34 +0200

 There is still something missing - float_exception_flags needs to be cleared
 "more often", but I'm not quite sure where.

 For example with the patch applied, the following still fails:

 Test case: lib/libc/gen/t_fpsetmask/fpsetmask_unmasked_double
 Duration: 21.741232 seconds
 Termination reason

 FAILED: Test program received signal 8 (core dumped)
 Standard error stream

 Test program crashed; attempting to get stack trace
 [New process 9660]
 Core was generated by `t_fpsetmask'.
 Program terminated with signal SIGFPE, Arithmetic exception.
 #0  0x752ea0f4 in sigqueueinfo () from /usr/lib/libc.so.12
 #0  0x752ea0f4 in sigqueueinfo () from /usr/lib/libc.so.12
 #1  0x752e84c2 in _softfloat_float_raise (flags=<optimized out>) at /work/src/lib/libc/softfloat/softfloat-specialize:117
 #2  0x752e9bc8 in __divdf3 (a=<optimized out>, b=<optimized out>) at /work/src/lib/libc/softfloat/bits64/softfloat.c:2944
 #3  0x004011b0 in d_inv () at /work/src/tests/lib/libc/gen/t_fpsetmask.c:126
 #4  0x0040146e in fpsetmask_unmasked (test_ops=0x4127f8 <double_ops>) at /work/src/tests/lib/libc/gen/t_fpsetmask.c:281
 #5  0x7539947c in atf_tc_run (tc=0x412a98 <atfu_fpsetmask_unmasked_double_tc>, resfile=<optimized out>) at /work/src/external/bsd/atf/dist/atf-c/tc.c:1024
 #6  0x753964f4 in atf_tp_run (tp=<optimized out>, tcname=<optimized out>, resfile=0x753b5018 "/tmp/atf-run.Uu9ByZ/tcr") at /work/src/external/bsd/atf/dist/atf-c/tp.c:205
 #7  0x75395ff0 in run_tc (exitcode=<synthetic pointer>, p=0x7ff074f0, tp=0x7ff074e4) at /work/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:510
 #8  controlled_main (exitcode=<synthetic pointer>, add_tcs_hook=0x401968 <atfu_tp_add_tcs>, argv=<optimized out>, argc=<optimized out>) at /work/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:580
 #9  atf_tp_main (argc=<optimized out>, argv=<optimized out>, add_tcs_hook=0x401968 <atfu_tp_add_tcs>) at /work/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:610
 #10 0x00401d4a in main (argc=<optimized out>, argv=<optimized out>) at /work/src/tests/lib/libc/gen/t_fpsetmask.c:352
 Stack trace complete


 This is due to float_exception_flags being set to 1 for the first round
 of that test (d_dz -> double division by zero) and that bit is still present
 in the second iteration (d_inv -> double infinite value), where bit 4 gets
 added to float_exception_flags and masked in _softfloat_float_raise(),
 but due to 1 being left over after the mask operation it still triggers
 a SIGFPE.

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Wed, 11 Oct 2023 17:47:45 +0200

 On Sat, Oct 07, 2023 at 02:13:34PM +0200, Martin Husemann wrote:
 > For example with the patch applied, the following still fails:
 > 
 > Test case: lib/libc/gen/t_fpsetmask/fpsetmask_unmasked_double
 > Duration: 21.741232 seconds
 > Termination reason
 > 
 > FAILED: Test program received signal 8 (core dumped)
 > Standard error stream
 > 
 > Test program crashed; attempting to get stack trace
 > [New process 9660]
 > Core was generated by `t_fpsetmask'.
 > Program terminated with signal SIGFPE, Arithmetic exception.
 > #0  0x752ea0f4 in sigqueueinfo () from /usr/lib/libc.so.12
 > #0  0x752ea0f4 in sigqueueinfo () from /usr/lib/libc.so.12
 > #1  0x752e84c2 in _softfloat_float_raise (flags=<optimized out>) at /work/src/lib/libc/softfloat/softfloat-specialize:117
 > #2  0x752e9bc8 in __divdf3 (a=<optimized out>, b=<optimized out>) at /work/src/lib/libc/softfloat/bits64/softfloat.c:2944
 > #3  0x004011b0 in d_inv () at /work/src/tests/lib/libc/gen/t_fpsetmask.c:126

 This is due to the SA_RESETHAND used in the patch:

         sigemptyset(&set);
         sigaddset(&set, SIGFPE);
         sigprocmask(SIG_UNBLOCK, &set, NULL);
         if (sigaction(SIGFPE, NULL, &sa) == 0) {
                 if (sa.sa_handler == SIG_IGN) {
                         memset(&sa, 0, sizeof(sa));
                         sa.sa_handler = SIG_DFL;
                         sigemptyset(&sa.sa_mask);
                 } else {
                         sa.sa_flags |= SA_RESETHAND;
                 }
                 sigaction(SIGFPE, &sa, NULL);
         }


 This SA_RESETHAND is required to make SIGFPE generated while running the
 SIGFPE signal handler terminate the program. The test program
 /usr/tests/kernel/t_trapsignal tests for this in the fpe_handle_recurse
 test case. The helper program catches the signale and generates a new
 one in the handler.

 But for the lib/libc/gen/t_fpsetmask "unmasked" tests this causes failure:
 the test code iterates over four FP exceptions, triggers them one after
 the other, catches the signal and uses siglongjmp to go from the signal
 handler right back to the loop over the four exceptions.

 The first loop works fine, but the SA_RESETHAND causes the signal handler
 to be reset when we enter the signal handler:

 	{_sa_handler = 0x0, _sa_sigaction = 0x0}

 and then we siglongjmp to the main loop, restoring the signal mask, but
 not the sigaction handler. So when the loop trigers the second exception,
 the SIGFPE kills the test program.

 Anyone have a good idea how to deal with this?

 Martin

From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Wed, 11 Oct 2023 21:16:34 +0300

 >  and then we siglongjmp to the main loop, restoring the signal mask,
 >  but not the sigaction handler.  So when the loop trigers the second
 >  exception, the SIGFPE kills the test program.

 That sounds like a bug in the test.  It asked for the signal
 disposition to be reset, it needs to re-install the handler if it
 wants it to be used again.

 -uwe

From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Wed, 11 Oct 2023 23:29:24 +0300

 Oh, I replied to the last comment, didn't look at the context.  The
 SA_RESETHAND patch doesn't look correct to me.  It says "if the signal
 is handled by the program, let the handler run, but then reset its
 disposition", which is not what the program requested if it didn't
 specify SA_RESETHAND, like e.g. the t_fpsetmask test.

 -uwe

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Thu, 12 Oct 2023 07:19:58 +0200

 On Wed, Oct 11, 2023 at 08:30:03PM +0000, Valery Ushakov wrote:
 >  Oh, I replied to the last comment, didn't look at the context.  The
 >  SA_RESETHAND patch doesn't look correct to me.  It says "if the signal
 >  is handled by the program, let the handler run, but then reset its
 >  disposition", which is not what the program requested if it didn't
 >  specify SA_RESETHAND, like e.g. the t_fpsetmask test.

 Yes, it is wrong but I see no good way to reset the sigaction handler only
 while inside the handler.

 I added the SA_RESETHAND because another test requires this:

 	/usr/tests/kernel/h_segv fpe handle recurse

 installs a signal handler for SIGFPE, triggers one, and inside the signal
 handler triggers another SIGFPE. This second SIGFPE needs to cause a core
 dump.

 On machines with FPU you will get:

  > /usr/tests/kernel/h_segv fpe handle recurse
 got 8
 Floating exception (core dumped)

 but with softfloat and no SA_RESETHAND the second SIGFPE will just recursively
 invoke the handler again (and the helper will print another "got 8").

 If we want to solve this purely in userland we would have to add hooks
 to (sig)longjmp and maybe (sig)setjmp, and interpose the signal handler,
 but I'm not sure this would work.

 Open for any alternative ideas...

 Martin

From: Taylor R Campbell <riastradh@NetBSD.org>
To: rokuyama.rk@gmail.com
Cc: gnats-bugs@NetBSD.org, martin@NetBSD.org
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Mon, 19 Feb 2024 17:24:36 +0000

 The relevant fragment of the kernel's hardware SIGFPE delivery path
 that we need to emulate is this:

 	mutex_enter(&proc_lock);
 	mutex_enter(p->p_lock);
 ...
 	action = SIGACTION_PS(ps, signo).sa_handler;
 ...
 	const bool masked = sigismember(mask, signo);
 	const bool ignored = action == SIG_IGN;
 	if (masked || ignored) {
 		mutex_enter(&ps->sa_mutex);
 		sigdelset(mask, signo);
 		sigdelset(&p->p_sigctx.ps_sigcatch, signo);
 		sigdelset(&p->p_sigctx.ps_sigignore, signo);
 		sigdelset(&SIGACTION_PS(ps, signo).sa_mask, signo);
 		SIGACTION_PS(ps, signo).sa_handler = SIG_DFL;
 		mutex_exit(&ps->sa_mutex);
 	}

 	kpsignal2(p, ksi);
 	mutex_exit(p->p_lock);
 	mutex_exit(&proc_lock);

 Unfortunately, I don't think there's any non-racy way to do that in
 userland without a new syscall.

 That said, for a single-threaded program, perhaps the following
 userland logic will work better to emulate the kernel trap logic --
 with a caveat noted in the XXX comment about where it should loop:

 	/*
 	 * Deliver the signal, and loop in case the signal handler
 	 * returns -- executing the same instruction should have the
 	 * same effect.
 	 *
 	 * XXX What if the signal handler changes the set of masked
 	 * exceptions in an attempt to restart the operation?  For
 	 * example, this could record a fine-grained stack trace of
 	 * where an invalid-operation or divide-by-zero first happened,
 	 * and then pick up where it left off with the exception
 	 * masked.  So really this loop should be around the
 	 * floating-point operation, not around the signal delivery.
 	 */
 	for (;;) {
 		struct sigaction sa;
 		sigset_t mask, omask;

 		/*
 		 * Block all signals while we figure out how to deliver
 		 * an uncatchable SIGFPE, and obtain the current signal
 		 * mask.
 		 */
 		sigfillset(&mask);
 		sigprocmask(SIG_BLOCK, &mask, &omask);

 		/*
 		 * Find the current signal disposition of SIGFPE.
 		 */
 		sigaction(SIGFPE, NULL, &sa);

 		/*
 		 * If SIGFPE is masked or ignored, unmask it and reset
 		 * it to the default disposition to deliver the signal.
 		 */
 		if (sigismember(&omask, SIGFPE) ||
 		    ((sa.sa_flags & SA_SIGINFO) == 0 &&
 			sa.sa_handler == SIG_IGN)) {
 			/*
 			 * Prepare to unmask SIGFPE.  This will take
 			 * effect when we use sigprocmask(SIG_SETMASK,
 			 * ...) below, once the signal has been queued,
 			 * so that it happens atomically with respect
 			 * to other signal delivery.
 			 */
 			sigdelset(&omask, SIGFPE);

 			/*
 			 * Reset SIGFPE to the default disposition,
 			 * which is to terminate the process.
 			 */
 			memset(&sa, 0, sizeof(sa));
 			sa.sa_handler = SIG_DFL;
 			sigemptyset(&sa.sa_mask);
 			sa.sa_flags = 0;
 			sigaction(SIGFPE, &sa, NULL);
 		}

 		/*
 		 * Queue the signal for delivery.  It won't trigger the
 		 * signal handler yet, because it's still masked, but
 		 * as soon as we unmask it either the process will
 		 * terminate or the signal handler will be called.
 		 */
 		sigqueueinfo(getpid(), &info);

 		/*
 		 * Restore the old signal mask, except with SIGFPE
 		 * unmasked even if it was masked before.
 		 *
 		 * At this point, either the process will terminate (if
 		 * SIGFPE had or now has the default disposition) or
 		 * the signal handler will be called (if SIGFPE had a
 		 * non-default, non-ignored disposition).
 		 */
 		sigprocmask(SIG_SETMASK, &omask, NULL);
 	}

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.