NetBSD Problem Report #30977

From cube@eve-team.com  Fri Aug 12 14:12:30 2005
Return-Path: <cube@eve-team.com>
Received: from relay01a.clb.oleane.net (relay01a.clb.oleane.net [213.56.31.145])
	by narn.netbsd.org (Postfix) with ESMTP id 927F663B116
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 12 Aug 2005 14:12:24 +0000 (UTC)
Message-Id: <20050812141129.4C26014691@vidal.eve>
Date: Fri, 12 Aug 2005 16:11:29 +0200 (CEST)
From: cube@cubidou.net
Reply-To: cube@cubidou.net
To: gnats-bugs@netbsd.org
Subject: FPU troubles
X-Send-Pr-Version: 3.95

>Number:         30977
>Category:       port-xen
>Synopsis:       FPU troubles
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bouyer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 12 14:13:00 +0000 2005
>Closed-Date:    Sun Mar 05 12:28:37 +0000 2006
>Last-Modified:  Sun Mar 05 12:28:37 +0000 2006
>Originator:     Quentin Garnier
>Release:        NetBSD 3.99.7
>Organization:
>Environment:
System: NetBSD vidal.eve 3.99.7 NetBSD 3.99.7 (XEN_VIDAL) #0: Thu Aug 11 16:39:22 CEST 2005 cube@vidal.eve:/empty/NetBSD/vidal/obj/empty/NetBSD/current/src/sys/arch/i386/compile/XEN_VIDAL i386
Architecture: i386
Machine: i386
>Description:
	Applications that use the FPU show weird behaviour:  xpdf has
	trouble displaying images (reloading a page several time shows
	different results) as they get black regions on it.

	Ogg decoders such as ogg123 or any player linked to libvorbis
	insert beeps and cracks in the music.

	I could prouve it is a FPU issue by listening to the difference
	when using libvorbis.so (from audio/libvorbis) and
	libvorbisidec.so (from audio/tremor).

	The former uses the FPU, while the latter is a fixed-point
	implementation.  The former inserts beeps and cracks, sound is
	just fine when using the latter.

	I also had funny results one day when tried manipulating images
	with ImageMagick one day.  Black lines appeared on the result.
>How-To-Repeat:
	Use mp3blaster to listen to your collection of Ogg/Vorbis files.
	Cry in despair when KT Tunstall's music gets desecrated.
>Fix:
	Unknown.  Obviously updating Xen to 2.0.7 and NetBSD to a very
	-current wasn't enough.

>Release-Note:

>Audit-Trail:
From: Quentin Garnier <cube@cubidou.net>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/30977: FPU troubles
Date: Thu, 8 Sep 2005 17:01:21 +0200

 --nWclrlgqnAApBvUd
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable

 On Fri, Aug 12, 2005 at 02:13:01PM +0000, cube@cubidou.net wrote:
 > >Number:         30977
 > >Category:       port-xen
 > >Synopsis:       FPU troubles
 [...]
 > >Description:
 > 	Applications that use the FPU show weird behaviour:  xpdf has
 > 	trouble displaying images (reloading a page several time shows
 > 	different results) as they get black regions on it.
 >=20
 > 	Ogg decoders such as ogg123 or any player linked to libvorbis
 > 	insert beeps and cracks in the music.
 >=20
 > 	I could prouve it is a FPU issue by listening to the difference
 > 	when using libvorbis.so (from audio/libvorbis) and
 > 	libvorbisidec.so (from audio/tremor).
 >=20
 > 	The former uses the FPU, while the latter is a fixed-point
 > 	implementation.  The former inserts beeps and cracks, sound is
 > 	just fine when using the latter.
 >=20
 > 	I also had funny results one day when tried manipulating images
 > 	with ImageMagick one day.  Black lines appeared on the result.
 [...]
 > >Fix:
 > 	Unknown.  Obviously updating Xen to 2.0.7 and NetBSD to a very
 > 	-current wasn't enough.

 I've updated userland & kernel to latest -current, and I still see the
 behaviour.

 --=20
 Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
 "When I find the controls, I'll go where I like, I'll know where I want
 to be, but maybe for now I'll stay right here on a silent sea."
 KT Tunstall, Silent Sea, Eye to the Telescope, 2004.

 --nWclrlgqnAApBvUd
 Content-Type: application/pgp-signature
 Content-Disposition: inline

 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.6 (NetBSD)

 iQEVAwUBQyBSQdgoQloHrPnoAQJm5AgAglICs+jSDG+3XSK44sGT5XdOsEH3J2z3
 5H0WN26UGUHW9U+gTNNIIOAmg+MgVu61kQU883//jd+FZnLg7Zz08B2KSLA+KWFg
 GxI77cthoa6OF2xQjFPNY80UZ8nEP0Ij2rBlfOfIskAkwt9q0bOdFnUxQT2LccKz
 GQpDCaOBcEEsFnk3kkBuaDr+fRFkw/FAsh3/TLEvMMrm1FK+jEFPd6ShVQEvnhar
 TTFpvZUnj2sqXxv0jK4PHWa/dzeGVr103v/lgSWdCKrtCWfrMWXZEWA4BqFm4Rtv
 lbRZa9FotfUY995nj/FPHzTkZ+RrOKfFoe7/3VXcZfd1BOtP7n8sHQ==
 =lSoq
 -----END PGP SIGNATURE-----

 --nWclrlgqnAApBvUd--

From: Paul Ripke <stix@stix.id.au>
To: gnats-bugs@netbsd.org
Cc: cube@cubidou.net, port-xen@netbsd.org
Subject: Re: port-xen/30977: FPU troubles
Date: Wed, 23 Nov 2005 08:15:04 +1100

 I've finally got around to trying xen (NetBSD-3.0BETA, xen 2.0.7) and
 have run into this PR in DOM0.

 First sign was xearth coredumping on login. Next was only "clicking"
 from xmms, and I've had xpdf core, too.

 A little debugging shows that the FPU state is not being correctly
 saved or restored during "some" context switches. The following code
 appears to exhibit the behaviour reliably:

 marvin:ksh$ cat fp2.c
 #include <assert.h>
 #include <pthread.h>
 #include <stdio.h>
 #include <time.h>

 void *
 runloop(void *p)
 {
          int i;
          double fi;

          i = 1;
          while (1) {
                  fi = i;
                  if ((int)fi != i) {
                          printf("%9ld %9d %9d\n", clock(), i, (int)fi);
                  }
                  i++;
          }
 }

 int
 main(int argc, char **argv)
 {
          pthread_t a, b;

          setlinebuf(stdout);
          assert(pthread_create(&a, NULL, &runloop, NULL) == 0);
          assert(pthread_create(&b, NULL, &runloop, NULL) == 0);
          assert(pthread_join(a, NULL) == 0);

          return 0;
 }
 marvin:ksh$ gcc -O -Wall -lpthread -o fp2 fp2.c
 marvin:ksh$ ./fp2 | head
         19   2810714   4887424
         34   4782167 -2147483648
        152  29109559 -2147483648
        189  36046059 -2147483648
        217  43217401 -2147483648
        295  58190482  60223234
        334  65918432 -2147483648
        415  84428916 -2147483648
        478  93211664  97457195
        507 102074913 -2147483648

 I've dug through NetBSD code (on the assumption xen is doing the
 right thing), but I'm not 100% sure exactly what I'm looking for.
 I guess this would be a nice bug to squash for 3.0 (assuming the
 bug is in NetBSD).

 Cheers,
 --
 Paul Ripke

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Paul Ripke <stix@stix.id.au>
Cc: gnats-bugs@NetBSD.org, cube@cubidou.net, port-xen@NetBSD.org
Subject: Re: port-xen/30977: FPU troubles
Date: Wed, 23 Nov 2005 13:47:04 +0100

 --T4sUOijqQbZv57TR
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Wed, Nov 23, 2005 at 08:15:04AM +1100, Paul Ripke wrote:
 > I've finally got around to trying xen (NetBSD-3.0BETA, xen 2.0.7) and
 > have run into this PR in DOM0.
 > 
 > First sign was xearth coredumping on login. Next was only "clicking"
 > from xmms, and I've had xpdf core, too.
 > 
 > A little debugging shows that the FPU state is not being correctly
 > saved or restored during "some" context switches. The following code
 > appears to exhibit the behaviour reliably:

 Interesting. I modified your test prog to also print a thread id (see attached
 file). I get the same result as you with a stock current kernel:
 1        33    873373 -2147483648
 1        55   2001359 -2147483648
 0        62   2770587 -2147483648
 1        92   3483395 -2147483648
 0       128   5914000 -2147483648
 1       132   5228374 -2147483648
 1       172   6972647   6098061
 0       182   8008230   6098061
 0       222   9753133 -2147483648
 1       232   9592029 -2147483648
 1       272  11335805 -2147483648
 0       282  12371751 -2147483648
 1       312  13079804  13241628
 0       328  14641093 -2147483648
 1       352  14824306  14985832
 1       412  17443856 -2147483648
 0       482  21107482  20059525

 But with a kernel with the FPU patch I posted here some times ago (see second
 attachement), I get:
 1        72   2622678   3646769
 0        81   3646769   2622678
 1       152   6105605   7113173
 0       162   7113173   6105605
 0       281  12357157  12217903
 1       291  12217903  12357157
 0       321  14100004  13958543
 1       331  13958543  14970612
 0       341  14970612  14100004
 1       751  32317194  33249945
 0       761  33249945  32317194

 Now the values are just swapped between threads. It seems that the fpu context
 is properly restored, but sometimes to the wrong thread ...

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

 --T4sUOijqQbZv57TR
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="fp2.c"

 #include <assert.h>
 #include <pthread.h>
 #include <stdio.h>
 #include <time.h>

 void *
 runloop(void *p)
 {
          int i;
          double fi;
          int *myid = p;

          i = 1;
          while (1) {
                  fi = i;
                  if ((int)fi != i) {
                          printf("%d %9ld %9d %9d\n", *myid, clock(), i, (int)fi);
                  }
                  i++;
          }
 }

 int
 main(int argc, char **argv)
 {
          pthread_t a, b;
 	 int ia = 0, ib = 1;

          setlinebuf(stdout);
          assert(pthread_create(&a, NULL, &runloop, &ia) == 0);
          assert(pthread_create(&b, NULL, &runloop, &ib) == 0);
          assert(pthread_join(a, NULL) == 0);

          return 0;
 }

 --T4sUOijqQbZv57TR
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=diff

 Index: i386/machdep.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/xen/i386/machdep.c,v
 retrieving revision 1.18
 diff -u -r1.18 machdep.c
 --- i386/machdep.c	19 Aug 2005 16:06:12 -0000	1.18
 +++ i386/machdep.c	7 Nov 2005 15:55:22 -0000
 @@ -2419,6 +2419,7 @@
  		if (l->l_addr->u_pcb.pcb_fpcpu != NULL)
  			npxsave_lwp(l, 0);
  #endif
 +		KASSERT((flags & _UC_FXSAVE) == 0);
  		if (flags & _UC_FXSAVE) {
  			if (i386_use_fxsave) {
  				memcpy(
 @@ -2444,6 +2445,11 @@
  		}
  		/* If not set already. */
  		l->l_md.md_flags |= MDL_USEDFPU;
 +#define frstor(addr)            __asm("frstor %0" : : "m" (*addr))
 +		frstor(&l->l_addr->u_pcb.pcb_savefpu.sv_87);
 +#undef frstor
 +		l->l_addr->u_pcb.pcb_fpcpu = curcpu();
 +		curcpu()->ci_fpcurlwp = l;
  #if 0
  		/* Apparently unused. */
  		l->l_addr->u_pcb.pcb_saveemc = mcp->mc_fp.fp_emcsts;
 Index: i386/npx.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/xen/i386/npx.c,v
 retrieving revision 1.4
 diff -u -r1.4 npx.c
 --- i386/npx.c	20 Mar 2005 13:12:59 -0000	1.4
 +++ i386/npx.c	7 Nov 2005 15:55:22 -0000
 @@ -70,7 +70,7 @@
  #include <sys/cdefs.h>
  __KERNEL_RCSID(0, "$NetBSD: npx.c,v 1.4 2005/03/20 13:12:59 bouyer Exp $");

 -#if 0
 +#if 1
  #define IPRINTF(x)	printf x
  #else
  #define	IPRINTF(x)
 @@ -116,7 +116,7 @@
   * 387 and 287 Numeric Coprocessor Extension (NPX) Driver.
   *
   * We do lazy initialization and switching using the TS bit in cr0 and the
 - * MDL_USEDFPU bit in mdproc.
 + * MDL_USEDFPU bit in mdlwp.
   *
   * DNA exceptions are handled like this:
   *
 @@ -149,7 +149,6 @@
  #define	stts()			HYPERVISOR_fpu_taskswitch()
  #endif

 -int npxdna(struct cpu_info *);
  static int	npxdna_notset(struct cpu_info *);
  static int	npxdna_s87(struct cpu_info *);
  #ifdef I686_CPU
 @@ -319,8 +318,7 @@
  }
  #endif

 -void npxinit(ci)
 -	struct cpu_info *ci;
 +void npxinit(struct cpu_info *ci)
  {
  	lcr0(rcr0() & ~(CR0_EM|CR0_TS));
  	fninit();
 @@ -534,7 +532,7 @@
  {
  	struct lwp *l;
  	int s;
 -#ifdef XXXXENDEBUG_LOW
 +#if 1 /* def XXXXENDEBUG_LOW */
  	struct trapframe *f = (struct trapframe *)((uint32_t *)(void *)&ci)[1];
  #endif

 @@ -565,9 +563,11 @@
  		IPRINTF(("Save"));
  		npxsave_cpu(ci, 1);
  	} else {
 +		clts();
  		IPRINTF(("Init"));
  		fninit();
  		fwait();
 +		stts();
  	}
  	splx(s);

 @@ -610,8 +610,11 @@
  	 * fpu_save, if we're saving the FPU state not from interrupt
  	 * context (f.i. during fork()).  Just return in this case.
  	 */
 -	if (ci->ci_fpsaving)
 +	if (ci->ci_fpsaving) {
 +		printf("recursive npx trap; cr0=%x\n", rcr0());
 +		Debugger();
  		return (1);
 +	}

  	s = splipi();		/* lock out IPI's while we clean house.. */
  #ifdef MULTIPROCESSOR
 @@ -650,7 +653,7 @@
  	s = splipi();
  	ci->ci_fpcurlwp = l;
  	l->l_addr->u_pcb.pcb_fpcpu = ci;
 -	ci->ci_fpused = 1;
 +	//ci->ci_fpused = 1;
  	splx(s);


 @@ -689,8 +692,8 @@
  	if (l == NULL)
  		return;

 -	IPRINTF(("%s: fp CPU %s lwp %p\n", ci->ci_dev->dv_xname,
 -	    save? "save" : "flush", l));
 +	/* IPRINTF(("%s: fp CPU %s lwp %p\n", ci->ci_dev->dv_xname,
 +	    save? "save" : "flush", l)); */

  	if (save) {
  #ifdef DIAGNOSTIC
 @@ -720,12 +723,12 @@
  	 * We set the TS bit in the saved CR0 for this process, so that it
  	 * will get a DNA exception on any FPU instruction and force a reload.
  	 */
 -	l->l_addr->u_pcb.pcb_cr0 |= CR0_TS;
 +	//l->l_addr->u_pcb.pcb_cr0 |= CR0_TS;

  	s = splipi();
  	l->l_addr->u_pcb.pcb_fpcpu = NULL;
  	ci->ci_fpcurlwp = NULL;
 -	ci->ci_fpused = 1;
 +	//ci->ci_fpused = 1;
  	splx(s);
  }


 --T4sUOijqQbZv57TR--

From: Quentin Garnier <cube@cubidou.net>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: Paul Ripke <stix@stix.id.au>, gnats-bugs@NetBSD.org,
	port-xen@NetBSD.org
Subject: Re: port-xen/30977: FPU troubles
Date: Fri, 9 Dec 2005 18:45:50 +0100

 --GeDkoc8jIzHasOdk
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable

 On Wed, Nov 23, 2005 at 01:47:04PM +0100, Manuel Bouyer wrote:
 > On Wed, Nov 23, 2005 at 08:15:04AM +1100, Paul Ripke wrote:
 > > I've finally got around to trying xen (NetBSD-3.0BETA, xen 2.0.7) and
 > > have run into this PR in DOM0.
 > >=20
 > > First sign was xearth coredumping on login. Next was only "clicking"
 > > from xmms, and I've had xpdf core, too.
 > >=20
 > > A little debugging shows that the FPU state is not being correctly
 > > saved or restored during "some" context switches. The following code
 > > appears to exhibit the behaviour reliably:
 [...]
 > Index: i386/machdep.c
 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

 ogg123 is happier with this, but still not perfect.

 gkrellm, OTOH, is really not happy with it.

 --=20
 Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
 "When I find the controls, I'll go where I like, I'll know where I want
 to be, but maybe for now I'll stay right here on a silent sea."
 KT Tunstall, Silent Sea, Eye to the Telescope, 2004.

 --GeDkoc8jIzHasOdk
 Content-Type: application/pgp-signature
 Content-Disposition: inline

 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.6 (NetBSD)

 iQEVAwUBQ5nCztgoQloHrPnoAQIRcwgAgf4dyJO36mhuYK4cdur4fn6glk9sbvyC
 nSimx59JQZw/5xgdZriEJrcqHzxg73WqHdL23+2qyF+yAYI3xOlU6J6lCiDuDtYN
 NEL/8CFvAvEyc2aVM5dMmLNF+8SqXvYEfi+kEl1rWWm/nw1vwc4qmdkHcfqxUb5R
 fD7Z/JR4PeqltW/pKxEZ9O0oJOqSLoCXUdV+phHnPQoeeipfHYxP95uk+GaVzkwj
 DfTYY7vxyiQ4iD8Gs4axTSmX/3vHkP9CjG1HUuSj2k9nP712lOcAzoGwqysjuSf0
 ds0sny9ECmnR0x+J7VTXxVRQjuu9VI4bHNmBSeT3OceMPMXcujfO0g==
 =dF8G
 -----END PGP SIGNATURE-----

 --GeDkoc8jIzHasOdk--

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Quentin Garnier <cube@cubidou.net>
Cc: Paul Ripke <stix@stix.id.au>, gnats-bugs@NetBSD.org,
	port-xen@NetBSD.org
Subject: Re: port-xen/30977: FPU troubles
Date: Fri, 9 Dec 2005 21:21:31 +0100

 On Fri, Dec 09, 2005 at 06:45:50PM +0100, Quentin Garnier wrote:
 > On Wed, Nov 23, 2005 at 01:47:04PM +0100, Manuel Bouyer wrote:
 > > On Wed, Nov 23, 2005 at 08:15:04AM +1100, Paul Ripke wrote:
 > > > I've finally got around to trying xen (NetBSD-3.0BETA, xen 2.0.7) and
 > > > have run into this PR in DOM0.
 > > > 
 > > > First sign was xearth coredumping on login. Next was only "clicking"
 > > > from xmms, and I've had xpdf core, too.
 > > > 
 > > > A little debugging shows that the FPU state is not being correctly
 > > > saved or restored during "some" context switches. The following code
 > > > appears to exhibit the behaviour reliably:
 > [...]
 > > Index: i386/machdep.c
 > > ===================================================================
 > 
 > ogg123 is happier with this, but still not perfect.

 OK, this may be related to the flaw paranoia is still reporting.

 > gkrellm, OTOH, is really not happy with it.

 Is this a threaded program ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Paul Ripke <stix@stix.id.au>
To: Manuel Bouyer <bouyer@antioche.eu.org>,
	Quentin Garnier <cube@cubidou.net>
Cc: gnats-bugs@NetBSD.org, port-xen@NetBSD.org
Subject: Re: port-xen/30977: FPU troubles
Date: Mon, 19 Dec 2005 09:34:01 +1100

 OK, I've had a closer look at this, and come up with the included patch,
 which appears to work fine for me, with threaded and non-threaded 
 programs.
 I can't repeat the behaviour of the FPU state being not being restored,
 restored to the wrong thread, or wiped.

 Thoughts behind the patch:
 - since clts() is a no-op under xen 2 (in xen 3, 
 HYPERVISOR_fpu_taskswitch
    takes an int parameter allowing set/clear), it makes no sense to 
 stts()
    inside the DNA handler. We can't clear it again.
 - I don't get the npxsave_lwp() calls in the DNA handler. On SMP, it's
    used to dump an LWP's FPU state from a different CPU. On a 
 uniprocessor,
    the FPU state has already been dumped by npxsave_cpu(). SMP support 
 isn't
    relevant for xen 2.

 Anyway, please test the patch and see how it goes. I note that my sound
 (SBlive) is still crook, looks like another bug.

 Index: sys/arch/xen/i386/machdep.c
 ===================================================================
 RCS file: /export/netbsd/cvsroot/src/sys/arch/xen/i386/machdep.c,v
 retrieving revision 1.13.2.3
 diff -u -d -b -w -r1.13.2.3 machdep.c
 --- sys/arch/xen/i386/machdep.c	25 Aug 2005 20:16:21 -0000	1.13.2.3
 +++ sys/arch/xen/i386/machdep.c	18 Dec 2005 08:00:53 -0000
 @@ -1,4 +1,4 @@
 -/*	$NetBSD$	*/
 +/*	$NetBSD: machdep.c,v 1.13.2.3 2005/08/25 20:16:21 tron Exp $	*/
   /*	NetBSD: machdep.c,v 1.559 2004/07/22 15:12:46 mycroft Exp 	*/

   /*-
 @@ -73,7 +73,7 @@
    */

   #include <sys/cdefs.h>
 -__KERNEL_RCSID(0, "$NetBSD$");
 +__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.13.2.3 2005/08/25 20:16:21 
 tron Exp $");

   #include "opt_beep.h"
   #include "opt_compat_ibcs2.h"
 @@ -1174,8 +1174,10 @@

   #if NNPX > 0
   	/* If we were using the FPU, forget about it. */
 -	if (l->l_addr->u_pcb.pcb_fpcpu != NULL)
 +	if (l->l_addr->u_pcb.pcb_fpcpu != NULL) {
   		npxsave_lwp(l, 0);
 +		HYPERVISOR_fpu_taskswitch();
 +	}
   #endif

   #ifdef USER_LDT
 @@ -2335,6 +2337,7 @@
   		 */
   		if (l->l_addr->u_pcb.pcb_fpcpu) {
   			npxsave_lwp(l, 1);
 +			HYPERVISOR_fpu_taskswitch();
   		}
   #endif
   		if (i386_use_fxsave) {
 @@ -2418,8 +2421,10 @@
   		/*
   		 * If we were using the FPU, forget that we were.
   		 */
 -		if (l->l_addr->u_pcb.pcb_fpcpu != NULL)
 +		if (l->l_addr->u_pcb.pcb_fpcpu != NULL) {
   			npxsave_lwp(l, 0);
 +			HYPERVISOR_fpu_taskswitch();
 +		}
   #endif
   		if (flags & _UC_FXSAVE) {
   			if (i386_use_fxsave) {
 Index: sys/arch/xen/i386/npx.c
 ===================================================================
 RCS file: /export/netbsd/cvsroot/src/sys/arch/xen/i386/npx.c,v
 retrieving revision 1.3.14.1
 diff -u -d -b -w -r1.3.14.1 npx.c
 --- sys/arch/xen/i386/npx.c	20 Mar 2005 14:38:21 -0000	1.3.14.1
 +++ sys/arch/xen/i386/npx.c	18 Dec 2005 21:40:44 -0000
 @@ -1,4 +1,4 @@
 -/*	$NetBSD$	*/
 +/*	$NetBSD: npx.c,v 1.3.14.1 2005/03/20 14:38:21 tron Exp $	*/
   /*	NetBSD: npx.c,v 1.103 2004/03/21 10:56:24 simonb Exp 	*/

   /*-
 @@ -68,10 +68,11 @@
    */

   #include <sys/cdefs.h>
 -__KERNEL_RCSID(0, "$NetBSD$");
 +__KERNEL_RCSID(0, "$NetBSD: npx.c,v 1.3.14.1 2005/03/20 14:38:21 tron 
 Exp $");

   #if 0
   #define IPRINTF(x)	printf x
 +#define	XXXXENDEBUG_LOW
   #else
   #define	IPRINTF(x)
   #endif
 @@ -575,12 +576,8 @@
   	KDASSERT(ci->ci_fpcurlwp == NULL);
   #ifndef MULTIPROCESSOR
   	KDASSERT(l->l_addr->u_pcb.pcb_fpcpu == NULL);
 -#else
 -	if (l->l_addr->u_pcb.pcb_fpcpu != NULL)
 -		npxsave_lwp(l, 1);
   #endif
   	l->l_addr->u_pcb.pcb_cr0 &= ~CR0_TS;
 -	clts();
   	s = splipi();
   	ci->ci_fpcurlwp = l;
   	l->l_addr->u_pcb.pcb_fpcpu = ci;
 @@ -629,11 +626,9 @@
   	if (ci->ci_fpcurlwp != NULL)
   		npxsave_cpu(ci, 1);
   	else {
 -		clts();
   		IPRINTF(("%s: fp init\n", ci->ci_dev->dv_xname));
   		fninit();
   		fwait();
 -		stts();
   	}
   	splx(s);

 @@ -641,12 +636,8 @@
   	KDASSERT(ci->ci_fpcurlwp == NULL);
   #ifndef MULTIPROCESSOR
   	KDASSERT(l->l_addr->u_pcb.pcb_fpcpu == NULL);
 -#else
 -	if (l->l_addr->u_pcb.pcb_fpcpu != NULL)
 -		npxsave_lwp(l, 1);
   #endif
   	l->l_addr->u_pcb.pcb_cr0 &= ~CR0_TS;
 -	clts();
   	s = splipi();
   	ci->ci_fpcurlwp = l;
   	l->l_addr->u_pcb.pcb_fpcpu = ci;
 @@ -710,7 +701,6 @@
   		  * which report FP failures via traps rather than irq13).
   		  * XXX punting for now..
   		  */
 -		clts();
   		ci->ci_fpsaving = 1;
   		fpu_save(&l->l_addr->u_pcb.pcb_savefpu);
   		ci->ci_fpsaving = 0;

 --
 Paul Ripke

Responsible-Changed-From-To: port-xen-maintainer->bouyer
Responsible-Changed-By: bouyer@netbsd.org
Responsible-Changed-When: Tue, 03 Jan 2006 20:23:52 +0000
Responsible-Changed-Why:
I commited a variant of Paul's patch.


State-Changed-From-To: open->feedback
State-Changed-By: bouyer@netbsd.org
State-Changed-When: Tue, 03 Jan 2006 20:23:52 +0000
State-Changed-Why:
Hi
can you please try a current kernel with xen/i386/npx.c 1.7 and see if it's
fixed ?


From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Paul Ripke <stix@stix.id.au>
Cc: Quentin Garnier <cube@cubidou.net>, gnats-bugs@NetBSD.org,
	port-xen@NetBSD.org
Subject: Re: port-xen/30977: FPU troubles
Date: Tue, 3 Jan 2006 21:21:52 +0100

 On Mon, Dec 19, 2005 at 09:34:01AM +1100, Paul Ripke wrote:
 > OK, I've had a closer look at this, and come up with the included patch,
 > which appears to work fine for me, with threaded and non-threaded 
 > programs.
 > I can't repeat the behaviour of the FPU state being not being restored,
 > restored to the wrong thread, or wiped.

 This is great, many thanks ! I commited the attached patch, which
 calls HYPERVISOR_fpu_taskswitch() at the end of npxsave_lwp() instead of the
 various places in which call npxsave_lwp() in the code.

 > 
 > Thoughts behind the patch:
 > - since clts() is a no-op under xen 2 (in xen 3, 
 > HYPERVISOR_fpu_taskswitch
 >   takes an int parameter allowing set/clear), it makes no sense to 
 > stts()
 >   inside the DNA handler. We can't clear it again.
 > - I don't get the npxsave_lwp() calls in the DNA handler. On SMP, it's
 >   used to dump an LWP's FPU state from a different CPU. On a 
 > uniprocessor,
 >   the FPU state has already been dumped by npxsave_cpu(). SMP support 
 > isn't
 >   relevant for xen 2.

 Sure, but as it's #ifdef MULTIPROCESSOR, there's no problem keeping
 it here. The goal is to eventually be able to share more code with i386.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Paul Ripke <stix@stix.id.au>
Cc: Quentin Garnier <cube@cubidou.net>, gnats-bugs@NetBSD.org,
	port-xen@NetBSD.org
Subject: Re: port-xen/30977: FPU troubles
Date: Tue, 3 Jan 2006 21:23:43 +0100

 --+HP7ph2BbKc20aGI
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Tue, Jan 03, 2006 at 09:21:52PM +0100, Manuel Bouyer wrote:
 > On Mon, Dec 19, 2005 at 09:34:01AM +1100, Paul Ripke wrote:
 > > OK, I've had a closer look at this, and come up with the included patch,
 > > which appears to work fine for me, with threaded and non-threaded 
 > > programs.
 > > I can't repeat the behaviour of the FPU state being not being restored,
 > > restored to the wrong thread, or wiped.
 > 
 > This is great, many thanks ! I commited the attached patch, which

  here is the patch

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

 --+HP7ph2BbKc20aGI
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="xen.diff"

 Index: npx.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/xen/i386/npx.c,v
 retrieving revision 1.6
 diff -u -r1.6 npx.c
 --- i386/npx.c	24 Dec 2005 20:07:48 -0000	1.6
 +++ i386/npx.c	3 Jan 2006 19:09:33 -0000
 @@ -146,7 +146,7 @@
  #define	stts()			lcr0(rcr0() | CR0_TS)
  #else
  #define	clts()
 -#define	stts()			HYPERVISOR_fpu_taskswitch()
 +#define	stts()
  #endif

  int npxdna(struct cpu_info *);
 @@ -749,7 +749,7 @@

  	oci = l->l_addr->u_pcb.pcb_fpcpu;
  	if (oci == NULL)
 -		return;
 +		goto end;

  	IPRINTF(("%s: fp %s lwp %p\n", ci->ci_dev->dv_xname,
  	    save? "save" : "flush", l));
 @@ -792,4 +792,6 @@
  	KASSERT(ci->ci_fpcurlwp == l);
  	npxsave_cpu(ci, save);
  #endif
 +end:
 +	HYPERVISOR_fpu_taskswitch();
  }

 --+HP7ph2BbKc20aGI--

State-Changed-From-To: feedback->closed
State-Changed-By: cube@netbsd.org
State-Changed-When: Sun, 05 Mar 2006 12:28:37 +0000
State-Changed-Why:
Problem is fixed, now.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.