NetBSD Problem Report #53383

From gson@gson.org  Mon Jun 18 10:35:25 2018
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id BE9267A111
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 18 Jun 2018 10:35:25 +0000 (UTC)
Message-Id: <20180618103521.5A4A398969F@guava.gson.org>
Date: Mon, 18 Jun 2018 13:35:21 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: i386 tests panic under qemu with x86/identcpu.c 1.72
X-Send-Pr-Version: 3.95

>Number:         53383
>Category:       port-i386
>Synopsis:       i386 tests panic under qemu with x86/identcpu.c 1.72
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    maxv
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jun 18 10:40:00 +0000 2018
>Closed-Date:    Wed Jun 20 12:48:15 +0000 2018
>Last-Modified:  Wed Jun 20 12:48:15 +0000 2018
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2018.06.17.07.13.02
>Organization:

>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:

Since the commit

 2018.06.17.07.13.02 maxv src/sys/arch/x86/x86/identcpu.c 1.72

the i386 test runs on the TNF testbed are panicing, as seen in the
following console output:

net/net/t_forwarding (428/759): 6 test cases
    ipforwarding_fastforward_v4: [38.869879s] Passed.
    ipforwarding_fastforward_v6: [42.693007s] Passed.
    ipforwarding_fragment_v4: [43.053724s] Passed.
    ipforwarding_misc: [79.858593s] Passed.
    ipforwarding_v4: [55.624037s] Passed.
    ipforwarding_v6: [72.979567s] Passed.
[333.552765s]

[ 6727.3164359] panic: kernel diagnostic assertion "pcb->pcb_fpcpu == NULL" failed: file "/tmp/bracket/build/2018.06.17.13.12.25-i386/src/sys/arch/x86/x86/fpu.c", line 345
[ 6727.3164359] cpu0: Begin traceback...
[ 6727.3164359] vpanic(c0fc6f80,c939ad9c,c939adbc,c011e886,c0fc6f80,c0fc6ebf,c0fc8289,c0fc834c,159,c8f64000) at netbsd:vpanic+0x132
[ 6727.3164359] kern_assert(c0fc6f80,c0fc6ebf,c0fc8289,c0fc834c,159,c8f64000,c24b6020,c24b6020,c2504aa0,c939ae2c) at netbsd:kern_assert+0x23
[ 6727.3164359] fpu_eagerswitch(c24b6020,c2504aa0,0,c124dc80,0,c24b6020,c092f720,c24b6020,c2504aa0,0) at netbsd:fpu_eagerswitch+0xe0
[ 6727.3164359] cpu_switchto(c24b6020,c17c7d40,c939ae78,c1e6d920,c10850df,c1af9c80,c24b608c,c1af9c80,c17c56c0,c939ae84) at netbsd:cpu_switchto+0x7a
[ 6727.3164359] sleepq_block(0,1,c10850df,c1296770,3,c17c7d40,c1af9c7c,80,c1ccd540,c939aec0) at netbsd:sleepq_block+0x9d
[ 6727.3164359] cv_wait_sig(c1af9c80,c1ccd540,c2368c40,c08fbbc4,c1aeb51c,c1af9ca4,0,c24b61c8,0,c1e5d210) at netbsd:cv_wait_sig+0x100
[ 6727.3164359] pipe_read(c1aeb280,c1aeb280,c939aee4,c17c3a80,1,bfbac0b8,80,c939aedc,1,c1b26004) at netbsd:pipe_read+0x2ab
[ 6727.3164359] dofileread(3,c1aeb280,bfbac0b8,80,c1aeb280,1,c939af60,c24b6020,c1293f5c,c939af9c) at netbsd:dofileread+0x8a
[ 6727.3164359] sys_read(c24b6020,c939af68,c939af60,c1b26004,c9398000,c939af60,c939af68,3,0,0) at netbsd:sys_read+0x60
[ 6727.3164359] syscall() at netbsd:syscall+0x1f7
[ 6727.3164359] --- syscall (number 3) ---
[ 6727.3164359] ae1260d7:
[ 6727.3164359] cpu0: End traceback...

[ 6727.3164359] dumping to dev 0,1 offset 2680
[ 6727.3164359] dump 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 succeeded


[ 6727.3164359] rebooting...

The above was extracted from http://releng.netbsd.org/b5reports/i386/2018/2018.06.17.13.12.25/test.log .
There are more logs at http://releng.netbsd.org/b5reports/i386/commits-2018.06.html#2018.06.17.07.13.02 .

>How-To-Repeat:

Inspect the test logs.

>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: port-i386-maintainer->maxv
Responsible-Changed-By: gson@NetBSD.org
Responsible-Changed-When: Mon, 18 Jun 2018 10:41:44 +0000
Responsible-Changed-Why:
Over to committer.


From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53383 CVS commit: src/sys/arch/x86/x86
Date: Mon, 18 Jun 2018 20:20:27 +0000

 Module Name:	src
 Committed By:	maxv
 Date:		Mon Jun 18 20:20:27 UTC 2018

 Modified Files:
 	src/sys/arch/x86/x86: fpu.c

 Log Message:
 Add more KASSERTs, see if they help PR/53383.


 To generate a diff of this commit:
 cvs rdiff -u -r1.37 -r1.38 src/sys/arch/x86/x86/fpu.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Andreas Gustafsson <gson@gson.org>
To: maxv@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-i386/53383: i386 tests panic under qemu with x86/identcpu.c 1.72
Date: Tue, 19 Jun 2018 10:01:04 +0300

 With x86/x86/fpu.c 1.38, the panic has moved to fpu.c line 699:

 bin/sh/t_syntax (21/759): 20 test cases
     a_basic_tokenisation: [3.212352s] Passed.
     b_comments: [7.355291s] Passed.
     c_line_wrapping: [7.315732s] Passed.
     d_cstrings: [6.448371s] Passed.
     f_redirects: [8.697452s] Passed.
     g_variable_syntax: [210.182721s] Passed.
     h_var_assign: [4.478792s] Passed.
     i_pipelines: [81.822791s] Passed.
     j_and_or_lists: [20.252139s] Passed.
     k_lists: [29.831012s] Passed.
     l_async_lists: [38.177734s] Passed.
     m_compound_lists: [ 1828.7424373] panic: kernel diagnostic assertion "pcb2->pcb_fpcpu == NULL" failed: file "/tmp/bracket/build/2018.06.19.02.08.12-i386/src/sys/arch/x86/x86/fpu.c", line 699
 [ 1828.7424373] cpu0: Begin traceback...
 [ 1828.7424373] vpanic(c0fc6fa0,c916ada8,c916adc0,c011ee6a,c0fc6fa0,c0fc6edf,c0fc830f,c0fc8384,2bb,c8df0000) at netbsd:vpanic+0x132
 [ 1828.7628038] kern_assert(c0fc6fa0,c0fc6edf,c0fc830f,c0fc8384,2bb,c8df0000,c8df02c0,c916adf4,c0157dec,c8df0000) at netbsd:kern_assert+0x23
 [ 1828.7628038] fpu_save_area_fork(c8df0000,c9168000,c916ade8,c0913518,c1b2b9e4,2,c9168000,c8df0000,c1b2b800,c1b227e4) at netbsd:fpu_save_area_fork+0x79
 [ 1828.7828451] cpu_lwp_fork(c2170aa0,c1b2b800,0,0,c0157646,c1b2b800,c1b2b800,c916ae78,c090f1e3,c2170aa0) at netbsd:cpu_lwp_fork+0x95
 [ 1828.7828451] uvm_lwp_fork(c2170aa0,c1b2b800,0,0,c0157646,c1b2b800,c19df618,c19df168,bfb28000,1) at netbsd:uvm_lwp_fork+0x35
 [ 1828.8028224] lwp_create(c2170aa0,c1b227e4,c8df0000,0,0,0,c0157646,c1b2b800,c916aed0,0) at netbsd:lwp_create+0x30a
 [ 1828.8228318] fork1(c2170aa0,0,14,0,0,c0157646,0,c916af60,c916af9c,c0157577) at netbsd:fork1+0x43c
 [ 1828.8228318] sys_fork(c2170aa0,c916af68,c916af60,c1fe55f0,c9168000,c916af60,c916af68,2,0,0) at netbsd:sys_fork+0x48
 [ 1828.8428979] syscall() at netbsd:syscall+0x1f7
 [ 1828.8428979] --- syscall (number 2) ---
 [ 1828.8428979] b379cd97:
 [ 1828.8428979] cpu0: End traceback...

 [ 1828.8628954] dumping to dev 0,1 offset 2680
 [ 1828.8628954] dump 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 succeeded

 Full console logs:

 http://releng.netbsd.org/b5reports/i386/2018/2018.06.18.23.50.35/test.log
 http://releng.netbsd.org/b5reports/i386/2018/2018.06.19.02.08.12/test.log

 -- 
 Andreas Gustafsson, gson@gson.org

From: Maxime Villard <max@m00nbsd.net>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-i386/53383: i386 tests panic under qemu with x86/identcpu.c
 1.72
Date: Tue, 19 Jun 2018 09:06:36 +0200

 Le 19/06/2018 à 09:01, Andreas Gustafsson a écrit :
 > With x86/x86/fpu.c 1.38, the panic has moved to fpu.c line 699:

 Yes I saw, looks like there is a race in cpu_lwp_fork(), the thread could
 have context-switched between fpusave_lwp() and fpu_save_area_fork(), and
 in this case we're back with pcb_fpcpu!=NULL.

From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53383 CVS commit: src/sys/arch/x86/x86
Date: Tue, 19 Jun 2018 07:23:45 +0000

 Module Name:	src
 Committed By:	maxv
 Date:		Tue Jun 19 07:23:45 UTC 2018

 Modified Files:
 	src/sys/arch/x86/x86: vm_machdep.c

 Log Message:
 Explicitly clear l2's pcb_fpcpu when forking.

 A context switch (preemption) could occur between

 	fpusave_lwp(l1, true);
 and
 	memcpy(pcb2, pcb1, sizeof(struct pcb));

 In this case, l1's FPU state is re-installed on the current CPU, and
 pcb1->pcb_fpcpu becomes non NULL. While it's fine to have l1's state
 installed, we don't want to indicate l2's state is installed too.

 With lazy fpu this was not a problem, because the context-switch would
 not re-install the state, so pcb1->pcb_fpcpu was NULL.

 Should fix PR/53383.


 To generate a diff of this commit:
 cvs rdiff -u -r1.33 -r1.34 src/sys/arch/x86/x86/vm_machdep.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Wed, 20 Jun 2018 12:48:15 +0000
State-Changed-Why:
Fixed by src/sys/arch/x86/x86/fpu.c 1.39 - thanks maxv.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.