NetBSD Problem Report #53399

From www@NetBSD.org  Tue Jun 26 18:08:09 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id C8F6A7A157
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 26 Jun 2018 18:08:09 +0000 (UTC)
Message-Id: <20180626180808.5D7DF7A280@mollari.NetBSD.org>
Date: Tue, 26 Jun 2018 18:08:08 +0000 (UTC)
From: phil@netbsd.org
Reply-To: phil@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: Assert failure in ch_voltag_convert_in called from fpu_eagerswitch()
X-Send-Pr-Version: www-1.0

>Number:         53399
>Category:       kern
>Synopsis:       Assert failure: fpu_eagerswitch()
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    maxv
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jun 26 18:10:00 +0000 2018
>Closed-Date:    Tue Jul 24 08:35:39 +0000 2018
>Last-Modified:  Tue Jul 24 08:35:39 +0000 2018
>Originator:     Phil Nelson
>Release:        8.99.20
>Organization:
>Environment:
NetBSD steelhead.pcnelson.net 8.99.20 NetBSD 8.99.20 (STEELHEAD) #1: Tue Jun 26 09:29:27 PDT 2018  phil@steelhead.pcnelson.net:/home/phil/netbsd/src/sys/arch/amd64/compile/STEELHEAD amd64

Machines is a Dell optiplex 3020, quad core Intel(R) Core(TM) i5-4590, 8GB of memory
>Description:
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] panic: kernel diagnostic assertion "pcb->pcb_fpcpu == NULL" failed: file "/usr/src/sys/arch/x86/x86/fpu.c", line 345 
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cpu2: Begin traceback...
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] vpanic() at netbsd:vpanic+0x16f
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] fpu_eagerswitch() at netbsd:fpu_eagerswitch+0xd5
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cpu_switchto() at netbsd:cpu_switchto+0x56
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] sleepq_block() at netbsd:sleepq_block+0xaa
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cv_wait() at netbsd:cv_wait+0xfb
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] fork1() at netbsd:fork1+0x87e
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] sys___vfork14() at netbsd:sys___vfork14+0x2c
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] syscall() at netbsd:syscall+0x208
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] --- syscall (number 282) ---
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] 70da5228adec:
Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cpu2: End traceback...
Jun 25 14:57:42 steelhead savecore: reboot after panic: [ 21411.5077674] panic: kernel diagnostic assertion "pcb->pcb_fpcpu == NULL" failed: file "/usr/src/sys/arch/x86/x86/fpu.c", line 345 

>How-To-Repeat:
I was building packages ... has done this consistently while building packages.
>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->maxv
Responsible-Changed-By: kamil@NetBSD.org
Responsible-Changed-When: Tue, 26 Jun 2018 20:22:56 +0200
Responsible-Changed-Why:
Assign <maxv> as he is the author of the change.


From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53399: Assert failure in ch_voltag_convert_in called from
 fpu_eagerswitch()
Date: Tue, 26 Jun 2018 20:23:39 +0200

 This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 --E3O56Cfu4CtJ8lSGMu9IHBRCSEKcWgus4
 Content-Type: multipart/mixed; boundary="1e39IMT9dVrmG7kxPOPmA5AexiJ5KwWN2";
  protected-headers="v1"
 From: Kamil Rytarowski <n54@gmx.com>
 To: gnats-bugs@NetBSD.org
 Message-ID: <b87814fb-9200-dc76-b946-8d8f616270e8@gmx.com>
 Subject: Re: kern/53399: Assert failure in ch_voltag_convert_in called from
  fpu_eagerswitch()
 References: <pr-kern-53399@gnats.netbsd.org>
  <20180626180808.5D7DF7A280@mollari.NetBSD.org>
  <20180626181000.D46727A281@mollari.NetBSD.org>
 In-Reply-To: <20180626181000.D46727A281@mollari.NetBSD.org>

 --1e39IMT9dVrmG7kxPOPmA5AexiJ5KwWN2
 Content-Type: text/plain; charset=utf-8
 Content-Language: en-US
 Content-Transfer-Encoding: quoted-printable

 On 26.06.2018 20:10, phil@netbsd.org wrote:
 plex 3020, quad core Intel(R) Core(TM) i5-4590, 8GB of memory
 >> Description:
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] panic: kernel diagn=
 ostic assertion "pcb->pcb_fpcpu =3D=3D NULL" failed: file "/usr/src/sys/a=
 rch/x86/x86/fpu.c", line 345=20
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cpu2: Begin traceba=
 ck...
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] vpanic() at netbsd:=
 vpanic+0x16f
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] ch_voltag_convert_i=
 n() at netbsd:ch_voltag_convert_in
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] fpu_eagerswitch() a=
 t netbsd:fpu_eagerswitch+0xd5
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cpu_switchto() at n=
 etbsd:cpu_switchto+0x56
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] sleepq_block() at n=
 etbsd:sleepq_block+0xaa
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cv_wait() at netbsd=
 :cv_wait+0xfb
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] fork1() at netbsd:f=
 ork1+0x87e
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] sys___vfork14() at =
 netbsd:sys___vfork14+0x2c
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] syscall() at netbsd=
 :syscall+0x208
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] --- syscall (number=
  282) ---
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] 70da5228adec:
 > Jun 25 14:57:29 steelhead /netbsd: [ 21411.5077674] cpu2: End traceback=
 =2E..
 > Jun 25 14:57:42 steelhead savecore: reboot after panic: [ 21411.5077674=
 ] panic: kernel diagnostic assertion "pcb->pcb_fpcpu =3D=3D NULL" failed:=
  file "/usr/src/sys/arch/x86/x86/fpu.c", line 345=20
 >=20


 I'm reproducing the same sometimes with building the release.

 I'm running a kernel will all the recent as of today kernel patches for
 x86 FPU.


 --1e39IMT9dVrmG7kxPOPmA5AexiJ5KwWN2--

 --E3O56Cfu4CtJ8lSGMu9IHBRCSEKcWgus4
 Content-Type: application/pgp-signature; name="signature.asc"
 Content-Description: OpenPGP digital signature
 Content-Disposition: attachment; filename="signature.asc"

 -----BEGIN PGP SIGNATURE-----

 iQJABAEBCAAqFiEELaxVpweEzw+lMDwuS7MI6bAudmwFAlsyhKsMHG41NEBnbXgu
 Y29tAAoJEEuzCOmwLnZsKc4P+wdi0asIjjDlN2qtCJga0uN7GHIjjBH+iY0MuCZa
 k7y4ZGes8Hu7BwXSq5athaGryX3EOJg+IS04Isetv6orhUxNj8bWUw/wd0ZeBjV3
 7kvzmkZFS+tb8wWgkv9YctapnM9qufITnZcheY7dWgUTQawQc6DZ6CMmxWmxo4rC
 2x9vkWQA4gv/C5KCPJY2S4jRkPbdhWE4gqExeY8VcWYXK5/7W8iUHOW60LfXjmFW
 rUeiXOdUjstVv4/9qMAhxLLBnJ+zont3f7Vt0DL5B7m2TVrEkwCnr7zkv9wGJjDs
 d56jOGimbx0WgcqS6XAc+FghipHW6X4QwHgxVm3fzdnQmM2tfCjb8K/mZ5h64VEP
 Io2Aqqsoeush7raNo6sffV5WHoP745ygKKbl0GEv/pEYYrC/tEZk+hQZigvr9pPU
 zR5BFqIJ2q4bFPmwCochrmKjW+RmuluKS73lwpO8CR4aJgEcGF+0jnxEUTsqU+cz
 1NRwvIOACsshNvMthfFgpQKs9n2anUEsESVAwEo4n41OevNyN5x1vhpHRzSMHgG2
 1+xrkHtc4Y8wrMe/wxFt7RpFgMLk8HI94kqnp0tUazpwbSqhoWzJntY/TMKcisIq
 wOyURoqrMG7/QldOXBNXfmfyqdmFU0d3rrWSjkOuyXU3wOEqwN8h+Qr6t3YqsVmj
 1m93
 =Bbv5
 -----END PGP SIGNATURE-----

 --E3O56Cfu4CtJ8lSGMu9IHBRCSEKcWgus4--

From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53399 CVS commit: src/sys/arch
Date: Fri, 29 Jun 2018 19:21:43 +0000

 Module Name:	src
 Committed By:	maxv
 Date:		Fri Jun 29 19:21:43 UTC 2018

 Modified Files:
 	src/sys/arch/amd64/amd64: locore.S
 	src/sys/arch/i386/i386: locore.S

 Log Message:
 Call fpu_eagerswitch a little later, after we make sure newlwp is not
 pinned.

 Because if it is, the fpu state of the lwp we are context-switching to
 is already installed on the current cpu, so no point re-installing it.
 Or, it isn't, and in this case we don't want to install it.

 This wrong re-installation can occur when we leave a softint.

 It may fix bugs in places that call fpusave_lwp with spl != IPL_HIGH,
 and that expect the fpu state to stay in memory. As far as I can tell
 only cpu_lwp_free meets these conditions, and as far as I can tell
 again, there it's harmless.

 Should help PR/53399.


 To generate a diff of this commit:
 cvs rdiff -u -r1.166 -r1.167 src/sys/arch/amd64/amd64/locore.S
 cvs rdiff -u -r1.157 -r1.158 src/sys/arch/i386/i386/locore.S

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53399 CVS commit: src/sys/arch/x86/x86
Date: Fri, 29 Jun 2018 19:34:35 +0000

 Module Name:	src
 Committed By:	maxv
 Date:		Fri Jun 29 19:34:35 UTC 2018

 Modified Files:
 	src/sys/arch/x86/x86: fpu.c

 Log Message:
 Add more KASSERTs.

 Should help PR/53399.


 To generate a diff of this commit:
 cvs rdiff -u -r1.43 -r1.44 src/sys/arch/x86/x86/fpu.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Fri, 29 Jun 2018 19:40:24 +0000
State-Changed-Why:
I've committed a few things, tell me if the new KASSERTs get triggered, or
if the problem got fixed. Verily I don't have time to handle that right
now, I will have some in ~2 weeks, you can temporarily work around the
problem if it's still there by typing "sysctl -w machdep.fpu_eager=0".

Also changed the title.


State-Changed-From-To: feedback->closed
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Tue, 24 Jul 2018 08:35:39 +0000
State-Changed-Why:
Fixed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.