NetBSD Problem Report #46395

From www@NetBSD.org  Tue May  1 17:24:04 2012
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 88E8863C785
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  1 May 2012 17:24:04 +0000 (UTC)
Message-Id: <20120501172403.37ED663BA4F@www.NetBSD.org>
Date: Tue,  1 May 2012 17:24:03 +0000 (UTC)
From: glee@force10networks.com
Reply-To: glee@force10networks.com
To: gnats-bugs@NetBSD.org
Subject: Modified i386 FP context after signal delivery and context switch
X-Send-Pr-Version: www-1.0

>Number:         46395
>Category:       port-i386
>Synopsis:       Modified i386 FP context after signal delivery and context switch
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-i386-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue May 01 17:25:00 +0000 2012
>Originator:     Bob Lee
>Release:        5.1 Stable
>Organization:
Dell - Force 10
>Environment:
NetBSD 5.1_STABLE Dell Force10 (S3240) #0: Mon May 23 06:55:18 PDT 2011 build@ecluster-sjc-04:/work/build/buildSpaces/build15/Z9000-8-3-11/SW-NetBSD5/usr/src/sys/arch/i386/compile/S3240

>Description:
Problem:
    Floating point registers in an i386 context are corrupted after
invocation of a signal handler, context switch to another process
which uses fpu.

Assumptions:
    Semantics of a standard x86 signal handler are such that a copy of
an active fpu context is saved for inspection and potential alteration
by the signal handler, and that context (possibly modified) is
restored on program resumption on signal handler completion.  Further,
The signal handler would be provided a clean fpu context for its use.
    Semantics of a sigcontext signal version less than 2 results in a
reset fpu context for the current running lwp.  I believe the intent
was that the signal handler continues to use any exiting fpu context.

Proposed Solution:
    Preserve state of MDL_USEDFPU in
compat_16_machdep.c:sendsig_sigcontext before calling buildcontext,
restoring it flag after the function call.

Discussion:
    The primary symptom is that a sequence of floating point
operations is "sliced" by a signal.  On return from the signal handler
to base user context, the floating point context was corrupted.
Subsequent investigation narrowed this to rather than corruption, the
fpu context was reset.  This unexpected reset of the hardware fpu
context resulted in a number of user level symptoms.  Further, the
problem is limited to the 16 emulation code.
    Looking at the kernel signal delivery code, the 16 emulation is
only in the IA32 (i386) code.  In the normal signal delivery, there are
basically two methods, that being siginfo and sigcontext.  .siginfo. is
the currently preferred method, looking at its mechanism, the entire
processor context is saved on the signal stack (be it independent of the
current running lwp stack or not), and a bit set in the signal context
if the fpu context is valid in the siginfo.  After this, buildcontext
is used create the initial signal running context (registers and such).
Within buildcontext, the MD lwp flag MDL_USEDFPU is always cleared.
The return from signal restores the fpu context from siginfo, if it
was valid (I assume this allows the signal handler to modify indirectly
base lwp fpu context (for emulation, or some other MD specific reason).
This allows for a distinct fpu context for the signal handler, that has
no direct effect on the base lwp context.  I can see where resetting
MDL_USEDFPU could also be an lwp context switch optimization, but this
is the resulting siginfo semantics.
    However, the sigcontext semantics are drastically different than
that of siginfo.  The i386 code tests the number of arguments required
by the signal handler, and if the version of the sigaction descriptor
is less than 2, the 16 specific function sendsig_sigcontext is called.
In this function there is no copy of the fpu context made, and it also
calls buildcontext, which, by extension resets MDL_USEDFPU.  Thus the
semantics are that any emulated sigcontext version less than 2 resets
the fpu context of the lwp.
    Resetting the fpu context appears to be an unintended side effect
of this signal handling, that is unless the signal is expected
terminate the lwp (process).  I believe that in the compat_16_machdep.c
>How-To-Repeat:
Active process with 16 emuluation enabled in the kernel.  SIGPROF occurs and is delivered in the midst of a set of FP instructions, prior to the return to the signal handler a second process runs.  When the original FP sequence is resumed, the FP context has been modified.
>Fix:
Possible fix, IIUC is:

--- //depot/main/Dev/Cyclone/ManagedPVT/NAVASOTA-DEV-9-1-0/SW-NetBSD5/usr/src/sys/arch/i386/i386/compat_16_machdep.c   2011-09-07 05:46:27.000000000 -0700
+++ /work/swos-01/glee/glee-nav4/SW-NetBSD5/usr/src/sys/arch/i386/i386/compat_16_machdep.c     2011-09-07 05:46:27.000000000 -0700
@@ -175,6 +175,7 @@
        u_long code = KSI_TRAPCODE(ksi);
        struct sigframe_sigcontext *fp = getframe(l, sig, &onstack), frame;
        sig_t catcher = SIGACTION(p, sig).sa_handler;
+       int svufpu;

        fp--;

@@ -259,8 +260,9 @@
                sigexit(l, SIGILL);
                /* NOTREACHED */
        }
-
+       svufpu = l->l_md.md_flags & MDL_USEDFPU;
        buildcontext(l, sel, catcher, fp);
+       l->l_md.md_flags |= svufpu;

        /* Remember that we're now on the signal stack. */
        if (onstack)

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.