NetBSD Problem Report #52966
From mlelstv@serpens.de Tue Jan 30 17:51:11 2018
Return-Path: <mlelstv@serpens.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id D5D567A16F
for <gnats-bugs@gnats.NetBSD.org>; Tue, 30 Jan 2018 17:51:11 +0000 (UTC)
Message-Id: <201801301750.w0UHoox5012906@serpens.de>
Date: Tue, 30 Jan 2018 18:50:52 +0100 (MET)
From: mlelstv@serpens.de
Reply-To: mlelstv@serpens.de
To: gnats-bugs@NetBSD.org
Subject: amd64 FPU handling broken on AMD
X-Send-Pr-Version: 3.95
>Number: 52966
>Category: port-amd64
>Synopsis: amd64 FPU handling broken on AMD
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: port-amd64-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jan 30 17:55:00 +0000 2018
>Closed-Date: Tue Oct 29 12:44:38 +0000 2019
>Last-Modified: Tue Oct 29 12:44:38 +0000 2019
>Originator: Michael van Elst
>Release: NetBSD 8.99.12
>Organization:
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
>Environment:
System: NetBSD slowpoke 8.99.12 NetBSD 8.99.12 (SLOWPOKE) #19: Tue Jan 30 13:57:16 CET 2018 mlelstv@gossam:/home/netbsd-current/obj.amd64/home/netbsd-current/src/sys/arch/amd64/compile/SLOWPOKE amd64
Architecture: x86_64
Machine: amd64
>Description:
A version of the stream benchmark fails on AMD Ryzen CPUs. The benchmark
does multi-threaded floating point operations (using OpenMP) for testing
memory bandwidth and also validates the result by comparing it with a scalar
compuation. While the benchmark runs fine, the validation fails if multiple
threads are used. With more than 4 threads it fails almost always, with less
threads it some times succeeds, with a single thread it succeeds.
>How-To-Repeat:
Get source from
http://ftp.netbsd.org/pub/NetBSD/misc/mlelstv/stream.c
which has been slightly adjusted from the original to compile without -lnuma,
and compile with:
gcc -O3 -std=c99 -fopenmp -DNON_NUMA -DN=80000000 -DNTIMES=100 stream.c -o stream
and let it run. With that value of N you need about 1.8GB RAM.
When the validation succeeds the program reports "Solution validates",
otherwise it reports the error. On the Ryzen system the errors are somewhat
random.
The same machine runs the benchmark fine with the latest netbsd-8 kernel
as it preceeds XSAVEOPT support.
>Fix:
A workaround suggested by maxv@ is to disable the use of XSAVEOPT by
commenting out:
if (descs[0] & CPUID_PES1_XSAVEOPT)
x86_fpu_save = FPU_SAVE_XSAVEOPT;
in sys/arch/x86/x86/identcpu.c. The kernel then falls back to use XSAVE
to save and restore the FPU registers.
>Release-Note:
>Audit-Trail:
From: "Maya Rashish" <maya@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52966 CVS commit: src/sys/arch/x86/x86
Date: Wed, 7 Feb 2018 22:49:32 +0000
Module Name: src
Committed By: maya
Date: Wed Feb 7 22:49:32 UTC 2018
Modified Files:
src/sys/arch/x86/x86: identcpu.c
Log Message:
stopgap fix: restrict XSAVEOPT to Intel CPUs
The current code causes floating point miscalculations on AMD Ryzen.
PR port-amd64/52966: amd64 FPU handling broken on AMD
To generate a diff of this commit:
cvs rdiff -u -r1.67 -r1.68 src/sys/arch/x86/x86/identcpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52966 CVS commit: src/sys/arch/x86/x86
Date: Fri, 9 Feb 2018 18:45:55 +0000
Module Name: src
Committed By: maxv
Date: Fri Feb 9 18:45:55 UTC 2018
Modified Files:
src/sys/arch/x86/x86: identcpu.c
Log Message:
Disable XSAVEOPT, until it is clear what's wrong with it (PR/52966).
To generate a diff of this commit:
cvs rdiff -u -r1.68 -r1.69 src/sys/arch/x86/x86/identcpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52966 CVS commit: src/sys/arch
Date: Thu, 14 Jun 2018 14:36:46 +0000
Module Name: src
Committed By: maxv
Date: Thu Jun 14 14:36:46 UTC 2018
Modified Files:
src/sys/arch/amd64/amd64: locore.S
src/sys/arch/x86/include: cpu.h fpu.h
src/sys/arch/x86/x86: fpu.c x86_machdep.c
Log Message:
Add some code to support eager fpu switch, INTEL-SA-00145. We restore the
FPU state of the lwp right away during context switches. This guarantees
that when the CPU executes in userland, the FPU doesn't contain secrets.
Maybe we also need to clear the FPU in setregs(), not sure about this one.
Can be enabled/disabled via:
machdep.fpu_eager = {0/1}
Not yet turned on automatically on affected CPUs (Intel Family 6).
More generally it would be good to turn it on automatically when XSAVEOPT
is supported, because in this case there is probably a non-negligible
performance gain; but we need to fix PR/52966.
To generate a diff of this commit:
cvs rdiff -u -r1.165 -r1.166 src/sys/arch/amd64/amd64/locore.S
cvs rdiff -u -r1.91 -r1.92 src/sys/arch/x86/include/cpu.h
cvs rdiff -u -r1.8 -r1.9 src/sys/arch/x86/include/fpu.h
cvs rdiff -u -r1.32 -r1.33 src/sys/arch/x86/x86/fpu.c
cvs rdiff -u -r1.115 -r1.116 src/sys/arch/x86/x86/x86_machdep.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Mon, 06 Aug 2018 13:06:58 +0000
State-Changed-Why:
I don't see where the problem is. I had added XSAVEOPT after following the
Intel spec, and everything worked fine back then. Today it still works
fine on my Intel hardware, with and without EagerFPU (I have tested your
"stream" program, repeatedly, I always get "solution validates").
It may be a Ryzen CPU bug. So can you:
* Update the BIOS of your motherboard. Make sure you have the latest
version. Then re-test. If there's still a problem:
* Disable SMT in your BIOS, and re-test. Maybe it is an SMT-related
problem.
Once this is done, and if there is still an issue, we'll have to dig
deeper.
From: coypu@sdf.org
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: PR/52966 CVS commit: src/sys/arch/x86/x86
Date: Sat, 26 Oct 2019 07:51:00 +0000
with:
AMD Ryzen 7 2700X Eight-Core Processor
NetBSD 9.99.17
the solution validates.
But I don't know if it failed before in the same setup. It'd be
interesting to hear whether mlelstv's machine still fails.
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: PR/52966 CVS commit: src/sys/arch/x86/x86
Date: Sat, 26 Oct 2019 08:27:02 -0000 (UTC)
coypu@sdf.org writes:
>The following reply was made to PR port-amd64/52966; it has been noted by GNATS.
>From: coypu@sdf.org
>To: gnats-bugs@netbsd.org
>Cc:
>Subject: Re: PR/52966 CVS commit: src/sys/arch/x86/x86
>Date: Sat, 26 Oct 2019 07:51:00 +0000
> with:
> AMD Ryzen 7 2700X Eight-Core Processor
> NetBSD 9.99.17
> the solution validates.
>
> But I don't know if it failed before in the same setup. It'd be
> interesting to hear whether mlelstv's machine still fails.
>
-current as of today works. But that's probably because using
XSAVEOPT is still #ifdef'd out in identcpu.c.
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: Lars Reichardt <lars@paradoxon.info>
To: gnats-bugs@netbsd.org, port-amd64-maintainer@netbsd.org,
netbsd-bugs@netbsd.org, mlelstv@serpens.de, coypu@sdf.org
Cc:
Subject: Re: PR/52966 CVS commit: src/sys/arch/x86/x86
Date: Sat, 26 Oct 2019 15:37:26 +0200
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--wOlCq1XK6CINDQJnwim3YDaBviELyilHA
Content-Type: multipart/mixed; boundary="RPBSKkCgMc9AJzYa2pUH2I0940rZ7tmjT"
--RPBSKkCgMc9AJzYa2pUH2I0940rZ7tmjT
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Content-Language: de-LU
I've checked both variants XSAVEOPT and XSAVE on my AMD Ryzen 2700 month
ago and both worked correctly.
I can recheck.
On 10/26/19 10:30 AM, Michael van Elst wrote:
> The following reply was made to PR port-amd64/52966; it has been noted =
by GNATS.
>
> From: mlelstv@serpens.de (Michael van Elst)
> To: gnats-bugs@netbsd.org
> Cc:=20
> Subject: Re: PR/52966 CVS commit: src/sys/arch/x86/x86
> Date: Sat, 26 Oct 2019 08:27:02 -0000 (UTC)
>
> coypu@sdf.org writes:
> =20
> >The following reply was made to PR port-amd64/52966; it has been note=
d by GNATS.
> =20
> >From: coypu@sdf.org
> >To: gnats-bugs@netbsd.org
> >Cc:=20
> >Subject: Re: PR/52966 CVS commit: src/sys/arch/x86/x86
> >Date: Sat, 26 Oct 2019 07:51:00 +0000
> =20
> > with:
> > AMD Ryzen 7 2700X Eight-Core Processor
> > NetBSD 9.99.17
> > the solution validates.
> >=20
> > But I don't know if it failed before in the same setup. It'd be
> > interesting to hear whether mlelstv's machine still fails.
> >=20
> =20
> -current as of today works. But that's probably because using
> XSAVEOPT is still #ifdef'd out in identcpu.c.
> =20
> --=20
> --=20
> Michael van Elst
> Internet: mlelstv@serpens.de
> "A potential Snark may lurk in every t=
ree."
> =20
--=20
-----
You will continue to suffer
if you have an emotional reaction to everything that is said to you.
True power is sitting back and observing everything with logic.
If words control you that means everyone else can control you.
Breathe and allow things to pass.
--- Bruce Lee
--RPBSKkCgMc9AJzYa2pUH2I0940rZ7tmjT--
--wOlCq1XK6CINDQJnwim3YDaBviELyilHA
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCAAdFiEEKr+CRUEAsbCC4oKDexg3nfOkUnAFAl20TBsACgkQexg3nfOk
UnAEuA/+I9YFHw5QAJ9Okz4iCQsUXowu2qpL7Xxy4Wef5lCJ6zbIV/h4I1lWr7tL
QmV93GjduQXRY2/eMN4cu0HVQEPRCNjj9vcN/ees2yu8bIPFOeydGsokUKWxO1xa
i2m3LaUsZJrFcEcpEgNt3wwenJO9D2Cw7IT+2vJZ8LyCRzobkEOgqwTJ1kp7QdpX
hQn+r6Vc+4u9RZrA0mWOYpw/I3/m3JorZbh11o7E0IPwDkSLEppWZh5ue1IwgQXY
aHoxaPEBLLOVpKPuCrlfW+g+gqhU7YSkI99AopxmffKrMfU4PQKW+ktpZnRxhXn9
n1/0o8/1RIJ/b+XGcgu+kcnLqe7yT/Y/p5LMuNF4HoQ7Jh4m7CRqsG4udilOrfGP
J843/GPFcbcqbq7Hey73tqow+BfrtMOm+2q79yNVVrxQurnBqJ14oY0/ltYaIoPc
ouU9Dp1NUCTkA7SgVvsD3WjadWj0GZ8dX+/WhUoTAPYHRWb0Rb7uBpANEXeSatTn
EUCnJFYG+SI7dZLkFOyJemKOLe9kwIC+UsY9xDyrO1j/E4pB5zTQn8CqsrQxKliZ
MXadusAanXKr1vWPLGCm6n/SnsYfvkBI8bZ1kr1GnoV/krqNoD8RHWhG9RFYzDkh
ZBnz1V1FjPL5ywYZBd9y4guEyCwSOn31ZYYyVdf+9Nes8Iikxiw=
=lLSW
-----END PGP SIGNATURE-----
--wOlCq1XK6CINDQJnwim3YDaBviELyilHA--
State-Changed-From-To: feedback->closed
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Tue, 29 Oct 2019 12:44:38 +0000
State-Changed-Why:
Close this PR. The code has changed a lot, we've now dropped LazyFPU, the
code is a lot simpler, and I've re-enabled XSAVEOPT after several days of
testing with NVMM and also your stream.c code.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.