NetBSD Problem Report #53399
From www@NetBSD.org Tue Jun 26 18:08:09 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id C8F6A7A157
for <gnats-bugs@gnats.NetBSD.org>; Tue, 26 Jun 2018 18:08:09 +0000 (UTC)
Message-Id: <20180626180808.5D7DF7A280@mollari.NetBSD.org>
Date: Tue, 26 Jun 2018 18:08:08 +0000 (UTC)
From: phil@netbsd.org
Reply-To: phil@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: Assert failure in ch_voltag_convert_in called from fpu_eagerswitch()
X-Send-Pr-Version: www-1.0
>Number: 53399
>Category: kern
>Synopsis: Assert failure: fpu_eagerswitch()
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: maxv
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jun 26 18:10:00 +0000 2018
>Closed-Date: Tue Jul 24 08:35:39 +0000 2018
>Last-Modified: Tue Jul 24 08:35:39 +0000 2018
>Originator: Phil Nelson
>Release: 8.99.20
>Organization:
>Environment:
NetBSD steelhead.pcnelson.net 8.99.20 NetBSD 8.99.20 (STEELHEAD) #1: Tue Jun 26 09:29:27 PDT 2018 phil@steelhead.pcnelson.net:/home/phil/netbsd/src/sys/arch/amd64/compile/STEELHEAD amd64
Machines is a Dell optiplex 3020, quad core Intel(R) Core(TM) i5-4590, 8GB of memory
>Description:
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] panic: kernel diagnostic assertion "pcb->pcb_fpcpu == NULL" failed: file "/usr/src/sys/arch/x86/x86/fpu.c", line 345
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cpu2: Begin traceback...
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] vpanic() at netbsd:vpanic+0x16f
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] fpu_eagerswitch() at netbsd:fpu_eagerswitch+0xd5
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cpu_switchto() at netbsd:cpu_switchto+0x56
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] sleepq_block() at netbsd:sleepq_block+0xaa
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cv_wait() at netbsd:cv_wait+0xfb
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] fork1() at netbsd:fork1+0x87e
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] sys___vfork14() at netbsd:sys___vfork14+0x2c
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] syscall() at netbsd:syscall+0x208
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] --- syscall (number 282) ---
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] 70da5228adec:
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cpu2: End traceback...
Jun 25 14:57:42 steelhead savecore: reboot after panic: [ 21411.5077674] panic: kernel diagnostic assertion "pcb->pcb_fpcpu == NULL" failed: file "/usr/src/sys/arch/x86/x86/fpu.c", line 345
>How-To-Repeat:
I was building packages ... has done this consistently while building packages.
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->maxv
Responsible-Changed-By: kamil@NetBSD.org
Responsible-Changed-When: Tue, 26 Jun 2018 20:22:56 +0200
Responsible-Changed-Why:
Assign <maxv> as he is the author of the change.
From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53399: Assert failure in ch_voltag_convert_in called from
fpu_eagerswitch()
Date: Tue, 26 Jun 2018 20:23:39 +0200
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--E3O56Cfu4CtJ8lSGMu9IHBRCSEKcWgus4
Content-Type: multipart/mixed; boundary="1e39IMT9dVrmG7kxPOPmA5AexiJ5KwWN2";
protected-headers="v1"
From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Message-ID: <b87814fb-9200-dc76-b946-8d8f616270e8@gmx.com>
Subject: Re: kern/53399: Assert failure in ch_voltag_convert_in called from
fpu_eagerswitch()
References: <pr-kern-53399@gnats.netbsd.org>
<20180626180808.5D7DF7A280@mollari.NetBSD.org>
<20180626181000.D46727A281@mollari.NetBSD.org>
In-Reply-To: <20180626181000.D46727A281@mollari.NetBSD.org>
--1e39IMT9dVrmG7kxPOPmA5AexiJ5KwWN2
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
On 26.06.2018 20:10, phil@netbsd.org wrote:
plex 3020, quad core Intel(R) Core(TM) i5-4590, 8GB of memory
>> Description:
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] panic: kernel diagn=
ostic assertion "pcb->pcb_fpcpu =3D=3D NULL" failed: file "/usr/src/sys/a=
rch/x86/x86/fpu.c", line 345=20
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cpu2: Begin traceba=
ck...
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] vpanic() at netbsd:=
vpanic+0x16f
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] ch_voltag_convert_i=
n() at netbsd:ch_voltag_convert_in
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] fpu_eagerswitch() a=
t netbsd:fpu_eagerswitch+0xd5
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cpu_switchto() at n=
etbsd:cpu_switchto+0x56
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] sleepq_block() at n=
etbsd:sleepq_block+0xaa
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cv_wait() at netbsd=
:cv_wait+0xfb
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] fork1() at netbsd:f=
ork1+0x87e
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] sys___vfork14() at =
netbsd:sys___vfork14+0x2c
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] syscall() at netbsd=
:syscall+0x208
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] --- syscall (number=
282) ---
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] 70da5228adec:
> Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cpu2: End traceback=
=2E..
> Jun 25 14:57:42 steelhead savecore: reboot after panic: [ 21411.5077674=
] panic: kernel diagnostic assertion "pcb->pcb_fpcpu =3D=3D NULL" failed:=
file "/usr/src/sys/arch/x86/x86/fpu.c", line 345=20
>=20
I'm reproducing the same sometimes with building the release.
I'm running a kernel will all the recent as of today kernel patches for
x86 FPU.
--1e39IMT9dVrmG7kxPOPmA5AexiJ5KwWN2--
--E3O56Cfu4CtJ8lSGMu9IHBRCSEKcWgus4
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
iQJABAEBCAAqFiEELaxVpweEzw+lMDwuS7MI6bAudmwFAlsyhKsMHG41NEBnbXgu
Y29tAAoJEEuzCOmwLnZsKc4P+wdi0asIjjDlN2qtCJga0uN7GHIjjBH+iY0MuCZa
k7y4ZGes8Hu7BwXSq5athaGryX3EOJg+IS04Isetv6orhUxNj8bWUw/wd0ZeBjV3
7kvzmkZFS+tb8wWgkv9YctapnM9qufITnZcheY7dWgUTQawQc6DZ6CMmxWmxo4rC
2x9vkWQA4gv/C5KCPJY2S4jRkPbdhWE4gqExeY8VcWYXK5/7W8iUHOW60LfXjmFW
rUeiXOdUjstVv4/9qMAhxLLBnJ+zont3f7Vt0DL5B7m2TVrEkwCnr7zkv9wGJjDs
d56jOGimbx0WgcqS6XAc+FghipHW6X4QwHgxVm3fzdnQmM2tfCjb8K/mZ5h64VEP
Io2Aqqsoeush7raNo6sffV5WHoP745ygKKbl0GEv/pEYYrC/tEZk+hQZigvr9pPU
zR5BFqIJ2q4bFPmwCochrmKjW+RmuluKS73lwpO8CR4aJgEcGF+0jnxEUTsqU+cz
1NRwvIOACsshNvMthfFgpQKs9n2anUEsESVAwEo4n41OevNyN5x1vhpHRzSMHgG2
1+xrkHtc4Y8wrMe/wxFt7RpFgMLk8HI94kqnp0tUazpwbSqhoWzJntY/TMKcisIq
wOyURoqrMG7/QldOXBNXfmfyqdmFU0d3rrWSjkOuyXU3wOEqwN8h+Qr6t3YqsVmj
1m93
=Bbv5
-----END PGP SIGNATURE-----
--E3O56Cfu4CtJ8lSGMu9IHBRCSEKcWgus4--
From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53399 CVS commit: src/sys/arch
Date: Fri, 29 Jun 2018 19:21:43 +0000
Module Name: src
Committed By: maxv
Date: Fri Jun 29 19:21:43 UTC 2018
Modified Files:
src/sys/arch/amd64/amd64: locore.S
src/sys/arch/i386/i386: locore.S
Log Message:
Call fpu_eagerswitch a little later, after we make sure newlwp is not
pinned.
Because if it is, the fpu state of the lwp we are context-switching to
is already installed on the current cpu, so no point re-installing it.
Or, it isn't, and in this case we don't want to install it.
This wrong re-installation can occur when we leave a softint.
It may fix bugs in places that call fpusave_lwp with spl != IPL_HIGH,
and that expect the fpu state to stay in memory. As far as I can tell
only cpu_lwp_free meets these conditions, and as far as I can tell
again, there it's harmless.
Should help PR/53399.
To generate a diff of this commit:
cvs rdiff -u -r1.166 -r1.167 src/sys/arch/amd64/amd64/locore.S
cvs rdiff -u -r1.157 -r1.158 src/sys/arch/i386/i386/locore.S
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53399 CVS commit: src/sys/arch/x86/x86
Date: Fri, 29 Jun 2018 19:34:35 +0000
Module Name: src
Committed By: maxv
Date: Fri Jun 29 19:34:35 UTC 2018
Modified Files:
src/sys/arch/x86/x86: fpu.c
Log Message:
Add more KASSERTs.
Should help PR/53399.
To generate a diff of this commit:
cvs rdiff -u -r1.43 -r1.44 src/sys/arch/x86/x86/fpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Fri, 29 Jun 2018 19:40:24 +0000
State-Changed-Why:
I've committed a few things, tell me if the new KASSERTs get triggered, or
if the problem got fixed. Verily I don't have time to handle that right
now, I will have some in ~2 weeks, you can temporarily work around the
problem if it's still there by typing "sysctl -w machdep.fpu_eager=0".
Also changed the title.
State-Changed-From-To: feedback->closed
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Tue, 24 Jul 2018 08:35:39 +0000
State-Changed-Why:
Fixed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.