NetBSD Problem Report #56820
From www@netbsd.org Sat May 7 12:47:32 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 230841A923B
for <gnats-bugs@gnats.NetBSD.org>; Sat, 7 May 2022 12:47:32 +0000 (UTC)
Message-Id: <20220507124700.AB8F51A923C@mollari.NetBSD.org>
Date: Sat, 7 May 2022 12:47:00 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Many FPE related tests fail on softfloat machines
X-Send-Pr-Version: www-1.0
>Number: 56820
>Category: misc
>Synopsis: Many FPE related tests fail on softfloat machines
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: misc-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat May 07 12:50:01 +0000 2022
>Last-Modified: Mon Feb 19 17:25:01 +0000 2024
>Originator: Rin Okuyama
>Release: 9.99.96
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD usl5p 9.99.96 NetBSD 9.99.96 (USL-5P) #1: Fri May 6 14:47:54 JST 2022 rin@latipes:/build/src/sys/arch/landisk/compile/USL-5P landisk
>Description:
As observed for, e.g.,
armv5: https://www.netbsd.org/~martin/evbarm-atf/
sh3: https://www.netbsd.org/~martin/landisk-atf/
many FPE related tests, like
libc/sys/t_ptrace_*_signal{ignore,masked}_crash_fpe,
fail on softfloat machines.
These tests expect that SIGFPE cannot be ignored nor blocked, as
it is raised by FPE handler in kernel, like SIGBUS or SIGILL.
However, for softfloat environments, SIGFPE can be ignored/blocked
like other ``normal'' signals, as it is raised by libc/softfloat.
>How-To-Repeat:
On softfloat machines:
# cd /usr/tests/lib/libc/sys && atf-run t_ptrace_wait
>Fix:
Skip these tests on softfloat machines?
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Sat, 7 May 2022 14:56:58 +0200
Shouldn't we try to make softfloat vs. hardware FPU mostly undetectable
by userland? Besides related "sysctl machdep" entries like machdep.fpu_present
of course.
Martin
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, misc-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Sat, 7 May 2022 22:16:26 +0900
On 2022/05/07 22:00, Martin Husemann wrote:
> Shouldn't we try to make softfloat vs. hardware FPU mostly undetectable
> by userland? Besides related "sysctl machdep" entries like machdep.fpu_present
> of course.
It would be nice, but doesn't it require specific system call?
Thanks,
rin
From: Martin Husemann <martin@duskware.de>
To: Rin Okuyama <rokuyama.rk@gmail.com>
Cc: gnats-bugs@netbsd.org
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Sat, 7 May 2022 15:19:28 +0200
On Sat, May 07, 2022 at 10:16:26PM +0900, Rin Okuyama wrote:
> It would be nice, but doesn't it require specific system call?
Or special kernel code that filters the signal mask changes? I'm not
sure, just thinking out loud.
Martin
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Thu, 26 May 2022 02:58:32 +0000
On Sat, May 07, 2022 at 01:20:05PM +0000, Martin Husemann wrote:
> From: Martin Husemann <martin@duskware.de>
> To: Rin Okuyama <rokuyama.rk@gmail.com>
> Cc: gnats-bugs@netbsd.org
> Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
> Date: Sat, 7 May 2022 15:19:28 +0200
>
> On Sat, May 07, 2022 at 10:16:26PM +0900, Rin Okuyama wrote:
> > It would be nice, but doesn't it require specific system call?
>
> Or special kernel code that filters the signal mask changes? I'm not
> sure, just thinking out loud.
I'm pretty sure the behavior if you block or ignore and then trigger
SIGFPE is undefined, same as SIGSEGV.
But if we want it to die, it seems like the best approach is in the
place where the library posts SIGFPE, if that returns without exiting
unblock the signal, then set it to SIG_DFL, then post it again. And
maybe then if that returns too, whine on stderr and post SIGKILL.
--
David A. Holland
dholland@netbsd.org
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Tue, 3 Oct 2023 20:11:20 +0200
I like the unblocking/default action aproach, but I'm not sure
I understand the "if that returns w/o existing" part.
The code that posts the signal is src/lib/libc/softfloat/softfloat-specialize
around line 95:
memset(&info, 0, sizeof info);
info.si_signo = SIGFPE;
info.si_pid = getpid();
info.si_uid = geteuid();
if (flags & float_flag_underflow)
info.si_code = FPE_FLTUND;
else if (flags & float_flag_overflow)
info.si_code = FPE_FLTOVF;
else if (flags & float_flag_divbyzero)
info.si_code = FPE_FLTDIV;
else if (flags & float_flag_invalid)
info.si_code = FPE_FLTINV;
else if (flags & float_flag_inexact)
info.si_code = FPE_FLTRES;
sigqueueinfo(getpid(), &info);
Wouldn't it be good enough to always call sigprocmask to unblock SIGFPE
and then check sigaction old state and if it is SIG_IGN set it to SIG_DFL?
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Thu, 5 Oct 2023 13:50:09 +0200
The patch below seems to work for me. Am I ovelooking some corner
cases?
Martin
Index: softfloat-specialize
===================================================================
RCS file: /cvsroot/src/lib/libc/softfloat/softfloat-specialize,v
retrieving revision 1.9
diff -u -p -r1.9 softfloat-specialize
--- softfloat-specialize 10 Aug 2014 05:57:31 -0000 1.9
+++ softfloat-specialize 5 Oct 2023 11:46:10 -0000
@@ -66,6 +66,8 @@ fp_except float_exception_mask = 0;
void
float_raise( fp_except flags )
{
+ struct sigaction sa;
+ sigset_t set;
siginfo_t info;
fp_except mask = float_exception_mask;
@@ -92,6 +94,26 @@ float_raise( fp_except flags )
info.si_code = FPE_FLTINV;
else if (flags & float_flag_inexact)
info.si_code = FPE_FLTRES;
+
+ /*
+ * Make sure SIGFPE is not blocked/ignored - that would be impossible
+ * with FP hardware.
+ */
+ sigemptyset(&set);
+ sigaddset(&set, SIGFPE);
+ sigprocmask(SIG_UNBLOCK, &set, NULL);
+ if (sigaction(SIGFPE, NULL, &sa) == 0) {
+ if (sa.sa_handler == SIG_IGN) {
+ memset(&sa, 0, sizeof(sa));
+ sa.sa_handler = SIG_DFL;
+ sigemptyset(&sa.sa_mask);
+ } else {
+ sa.sa_flags |= SA_RESETHAND;
+ }
+ sigaction(SIGFPE, &sa, NULL);
+ }
+
+ /* deliver the signal */
sigqueueinfo(getpid(), &info);
}
}
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Sat, 7 Oct 2023 14:13:34 +0200
There is still something missing - float_exception_flags needs to be cleared
"more often", but I'm not quite sure where.
For example with the patch applied, the following still fails:
Test case: lib/libc/gen/t_fpsetmask/fpsetmask_unmasked_double
Duration: 21.741232 seconds
Termination reason
FAILED: Test program received signal 8 (core dumped)
Standard error stream
Test program crashed; attempting to get stack trace
[New process 9660]
Core was generated by `t_fpsetmask'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0 0x752ea0f4 in sigqueueinfo () from /usr/lib/libc.so.12
#0 0x752ea0f4 in sigqueueinfo () from /usr/lib/libc.so.12
#1 0x752e84c2 in _softfloat_float_raise (flags=<optimized out>) at /work/src/lib/libc/softfloat/softfloat-specialize:117
#2 0x752e9bc8 in __divdf3 (a=<optimized out>, b=<optimized out>) at /work/src/lib/libc/softfloat/bits64/softfloat.c:2944
#3 0x004011b0 in d_inv () at /work/src/tests/lib/libc/gen/t_fpsetmask.c:126
#4 0x0040146e in fpsetmask_unmasked (test_ops=0x4127f8 <double_ops>) at /work/src/tests/lib/libc/gen/t_fpsetmask.c:281
#5 0x7539947c in atf_tc_run (tc=0x412a98 <atfu_fpsetmask_unmasked_double_tc>, resfile=<optimized out>) at /work/src/external/bsd/atf/dist/atf-c/tc.c:1024
#6 0x753964f4 in atf_tp_run (tp=<optimized out>, tcname=<optimized out>, resfile=0x753b5018 "/tmp/atf-run.Uu9ByZ/tcr") at /work/src/external/bsd/atf/dist/atf-c/tp.c:205
#7 0x75395ff0 in run_tc (exitcode=<synthetic pointer>, p=0x7ff074f0, tp=0x7ff074e4) at /work/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:510
#8 controlled_main (exitcode=<synthetic pointer>, add_tcs_hook=0x401968 <atfu_tp_add_tcs>, argv=<optimized out>, argc=<optimized out>) at /work/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:580
#9 atf_tp_main (argc=<optimized out>, argv=<optimized out>, add_tcs_hook=0x401968 <atfu_tp_add_tcs>) at /work/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:610
#10 0x00401d4a in main (argc=<optimized out>, argv=<optimized out>) at /work/src/tests/lib/libc/gen/t_fpsetmask.c:352
Stack trace complete
This is due to float_exception_flags being set to 1 for the first round
of that test (d_dz -> double division by zero) and that bit is still present
in the second iteration (d_inv -> double infinite value), where bit 4 gets
added to float_exception_flags and masked in _softfloat_float_raise(),
but due to 1 being left over after the mask operation it still triggers
a SIGFPE.
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Wed, 11 Oct 2023 17:47:45 +0200
On Sat, Oct 07, 2023 at 02:13:34PM +0200, Martin Husemann wrote:
> For example with the patch applied, the following still fails:
>
> Test case: lib/libc/gen/t_fpsetmask/fpsetmask_unmasked_double
> Duration: 21.741232 seconds
> Termination reason
>
> FAILED: Test program received signal 8 (core dumped)
> Standard error stream
>
> Test program crashed; attempting to get stack trace
> [New process 9660]
> Core was generated by `t_fpsetmask'.
> Program terminated with signal SIGFPE, Arithmetic exception.
> #0 0x752ea0f4 in sigqueueinfo () from /usr/lib/libc.so.12
> #0 0x752ea0f4 in sigqueueinfo () from /usr/lib/libc.so.12
> #1 0x752e84c2 in _softfloat_float_raise (flags=<optimized out>) at /work/src/lib/libc/softfloat/softfloat-specialize:117
> #2 0x752e9bc8 in __divdf3 (a=<optimized out>, b=<optimized out>) at /work/src/lib/libc/softfloat/bits64/softfloat.c:2944
> #3 0x004011b0 in d_inv () at /work/src/tests/lib/libc/gen/t_fpsetmask.c:126
This is due to the SA_RESETHAND used in the patch:
sigemptyset(&set);
sigaddset(&set, SIGFPE);
sigprocmask(SIG_UNBLOCK, &set, NULL);
if (sigaction(SIGFPE, NULL, &sa) == 0) {
if (sa.sa_handler == SIG_IGN) {
memset(&sa, 0, sizeof(sa));
sa.sa_handler = SIG_DFL;
sigemptyset(&sa.sa_mask);
} else {
sa.sa_flags |= SA_RESETHAND;
}
sigaction(SIGFPE, &sa, NULL);
}
This SA_RESETHAND is required to make SIGFPE generated while running the
SIGFPE signal handler terminate the program. The test program
/usr/tests/kernel/t_trapsignal tests for this in the fpe_handle_recurse
test case. The helper program catches the signale and generates a new
one in the handler.
But for the lib/libc/gen/t_fpsetmask "unmasked" tests this causes failure:
the test code iterates over four FP exceptions, triggers them one after
the other, catches the signal and uses siglongjmp to go from the signal
handler right back to the loop over the four exceptions.
The first loop works fine, but the SA_RESETHAND causes the signal handler
to be reset when we enter the signal handler:
{_sa_handler = 0x0, _sa_sigaction = 0x0}
and then we siglongjmp to the main loop, restoring the signal mask, but
not the sigaction handler. So when the loop trigers the second exception,
the SIGFPE kills the test program.
Anyone have a good idea how to deal with this?
Martin
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Wed, 11 Oct 2023 21:16:34 +0300
> and then we siglongjmp to the main loop, restoring the signal mask,
> but not the sigaction handler. So when the loop trigers the second
> exception, the SIGFPE kills the test program.
That sounds like a bug in the test. It asked for the signal
disposition to be reset, it needs to re-install the handler if it
wants it to be used again.
-uwe
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Wed, 11 Oct 2023 23:29:24 +0300
Oh, I replied to the last comment, didn't look at the context. The
SA_RESETHAND patch doesn't look correct to me. It says "if the signal
is handled by the program, let the handler run, but then reset its
disposition", which is not what the program requested if it didn't
specify SA_RESETHAND, like e.g. the t_fpsetmask test.
-uwe
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Thu, 12 Oct 2023 07:19:58 +0200
On Wed, Oct 11, 2023 at 08:30:03PM +0000, Valery Ushakov wrote:
> Oh, I replied to the last comment, didn't look at the context. The
> SA_RESETHAND patch doesn't look correct to me. It says "if the signal
> is handled by the program, let the handler run, but then reset its
> disposition", which is not what the program requested if it didn't
> specify SA_RESETHAND, like e.g. the t_fpsetmask test.
Yes, it is wrong but I see no good way to reset the sigaction handler only
while inside the handler.
I added the SA_RESETHAND because another test requires this:
/usr/tests/kernel/h_segv fpe handle recurse
installs a signal handler for SIGFPE, triggers one, and inside the signal
handler triggers another SIGFPE. This second SIGFPE needs to cause a core
dump.
On machines with FPU you will get:
> /usr/tests/kernel/h_segv fpe handle recurse
got 8
Floating exception (core dumped)
but with softfloat and no SA_RESETHAND the second SIGFPE will just recursively
invoke the handler again (and the helper will print another "got 8").
If we want to solve this purely in userland we would have to add hooks
to (sig)longjmp and maybe (sig)setjmp, and interpose the signal handler,
but I'm not sure this would work.
Open for any alternative ideas...
Martin
From: Taylor R Campbell <riastradh@NetBSD.org>
To: rokuyama.rk@gmail.com
Cc: gnats-bugs@NetBSD.org, martin@NetBSD.org
Subject: Re: misc/56820: Many FPE related tests fail on softfloat machines
Date: Mon, 19 Feb 2024 17:24:36 +0000
The relevant fragment of the kernel's hardware SIGFPE delivery path
that we need to emulate is this:
mutex_enter(&proc_lock);
mutex_enter(p->p_lock);
...
action = SIGACTION_PS(ps, signo).sa_handler;
...
const bool masked = sigismember(mask, signo);
const bool ignored = action == SIG_IGN;
if (masked || ignored) {
mutex_enter(&ps->sa_mutex);
sigdelset(mask, signo);
sigdelset(&p->p_sigctx.ps_sigcatch, signo);
sigdelset(&p->p_sigctx.ps_sigignore, signo);
sigdelset(&SIGACTION_PS(ps, signo).sa_mask, signo);
SIGACTION_PS(ps, signo).sa_handler = SIG_DFL;
mutex_exit(&ps->sa_mutex);
}
kpsignal2(p, ksi);
mutex_exit(p->p_lock);
mutex_exit(&proc_lock);
Unfortunately, I don't think there's any non-racy way to do that in
userland without a new syscall.
That said, for a single-threaded program, perhaps the following
userland logic will work better to emulate the kernel trap logic --
with a caveat noted in the XXX comment about where it should loop:
/*
* Deliver the signal, and loop in case the signal handler
* returns -- executing the same instruction should have the
* same effect.
*
* XXX What if the signal handler changes the set of masked
* exceptions in an attempt to restart the operation? For
* example, this could record a fine-grained stack trace of
* where an invalid-operation or divide-by-zero first happened,
* and then pick up where it left off with the exception
* masked. So really this loop should be around the
* floating-point operation, not around the signal delivery.
*/
for (;;) {
struct sigaction sa;
sigset_t mask, omask;
/*
* Block all signals while we figure out how to deliver
* an uncatchable SIGFPE, and obtain the current signal
* mask.
*/
sigfillset(&mask);
sigprocmask(SIG_BLOCK, &mask, &omask);
/*
* Find the current signal disposition of SIGFPE.
*/
sigaction(SIGFPE, NULL, &sa);
/*
* If SIGFPE is masked or ignored, unmask it and reset
* it to the default disposition to deliver the signal.
*/
if (sigismember(&omask, SIGFPE) ||
((sa.sa_flags & SA_SIGINFO) == 0 &&
sa.sa_handler == SIG_IGN)) {
/*
* Prepare to unmask SIGFPE. This will take
* effect when we use sigprocmask(SIG_SETMASK,
* ...) below, once the signal has been queued,
* so that it happens atomically with respect
* to other signal delivery.
*/
sigdelset(&omask, SIGFPE);
/*
* Reset SIGFPE to the default disposition,
* which is to terminate the process.
*/
memset(&sa, 0, sizeof(sa));
sa.sa_handler = SIG_DFL;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGFPE, &sa, NULL);
}
/*
* Queue the signal for delivery. It won't trigger the
* signal handler yet, because it's still masked, but
* as soon as we unmask it either the process will
* terminate or the signal handler will be called.
*/
sigqueueinfo(getpid(), &info);
/*
* Restore the old signal mask, except with SIGFPE
* unmasked even if it was masked before.
*
* At this point, either the process will terminate (if
* SIGFPE had or now has the default disposition) or
* the signal handler will be called (if SIGFPE had a
* non-default, non-ignored disposition).
*/
sigprocmask(SIG_SETMASK, &omask, NULL);
}
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.