NetBSD Problem Report #53383
From gson@gson.org Mon Jun 18 10:35:25 2018
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id BE9267A111
for <gnats-bugs@gnats.NetBSD.org>; Mon, 18 Jun 2018 10:35:25 +0000 (UTC)
Message-Id: <20180618103521.5A4A398969F@guava.gson.org>
Date: Mon, 18 Jun 2018 13:35:21 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: i386 tests panic under qemu with x86/identcpu.c 1.72
X-Send-Pr-Version: 3.95
>Number: 53383
>Category: port-i386
>Synopsis: i386 tests panic under qemu with x86/identcpu.c 1.72
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: maxv
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jun 18 10:40:00 +0000 2018
>Closed-Date: Wed Jun 20 12:48:15 +0000 2018
>Last-Modified: Wed Jun 20 12:48:15 +0000 2018
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date >= 2018.06.17.07.13.02
>Organization:
>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:
Since the commit
2018.06.17.07.13.02 maxv src/sys/arch/x86/x86/identcpu.c 1.72
the i386 test runs on the TNF testbed are panicing, as seen in the
following console output:
net/net/t_forwarding (428/759): 6 test cases
ipforwarding_fastforward_v4: [38.869879s] Passed.
ipforwarding_fastforward_v6: [42.693007s] Passed.
ipforwarding_fragment_v4: [43.053724s] Passed.
ipforwarding_misc: [79.858593s] Passed.
ipforwarding_v4: [55.624037s] Passed.
ipforwarding_v6: [72.979567s] Passed.
[333.552765s]
[ 6727.3164359] panic: kernel diagnostic assertion "pcb->pcb_fpcpu == NULL" failed: file "/tmp/bracket/build/2018.06.17.13.12.25-i386/src/sys/arch/x86/x86/fpu.c", line 345
[ 6727.3164359] cpu0: Begin traceback...
[ 6727.3164359] vpanic(c0fc6f80,c939ad9c,c939adbc,c011e886,c0fc6f80,c0fc6ebf,c0fc8289,c0fc834c,159,c8f64000) at netbsd:vpanic+0x132
[ 6727.3164359] kern_assert(c0fc6f80,c0fc6ebf,c0fc8289,c0fc834c,159,c8f64000,c24b6020,c24b6020,c2504aa0,c939ae2c) at netbsd:kern_assert+0x23
[ 6727.3164359] fpu_eagerswitch(c24b6020,c2504aa0,0,c124dc80,0,c24b6020,c092f720,c24b6020,c2504aa0,0) at netbsd:fpu_eagerswitch+0xe0
[ 6727.3164359] cpu_switchto(c24b6020,c17c7d40,c939ae78,c1e6d920,c10850df,c1af9c80,c24b608c,c1af9c80,c17c56c0,c939ae84) at netbsd:cpu_switchto+0x7a
[ 6727.3164359] sleepq_block(0,1,c10850df,c1296770,3,c17c7d40,c1af9c7c,80,c1ccd540,c939aec0) at netbsd:sleepq_block+0x9d
[ 6727.3164359] cv_wait_sig(c1af9c80,c1ccd540,c2368c40,c08fbbc4,c1aeb51c,c1af9ca4,0,c24b61c8,0,c1e5d210) at netbsd:cv_wait_sig+0x100
[ 6727.3164359] pipe_read(c1aeb280,c1aeb280,c939aee4,c17c3a80,1,bfbac0b8,80,c939aedc,1,c1b26004) at netbsd:pipe_read+0x2ab
[ 6727.3164359] dofileread(3,c1aeb280,bfbac0b8,80,c1aeb280,1,c939af60,c24b6020,c1293f5c,c939af9c) at netbsd:dofileread+0x8a
[ 6727.3164359] sys_read(c24b6020,c939af68,c939af60,c1b26004,c9398000,c939af60,c939af68,3,0,0) at netbsd:sys_read+0x60
[ 6727.3164359] syscall() at netbsd:syscall+0x1f7
[ 6727.3164359] --- syscall (number 3) ---
[ 6727.3164359] ae1260d7:
[ 6727.3164359] cpu0: End traceback...
[ 6727.3164359] dumping to dev 0,1 offset 2680
[ 6727.3164359] dump 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 succeeded
[ 6727.3164359] rebooting...
The above was extracted from http://releng.netbsd.org/b5reports/i386/2018/2018.06.17.13.12.25/test.log .
There are more logs at http://releng.netbsd.org/b5reports/i386/commits-2018.06.html#2018.06.17.07.13.02 .
>How-To-Repeat:
Inspect the test logs.
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: port-i386-maintainer->maxv
Responsible-Changed-By: gson@NetBSD.org
Responsible-Changed-When: Mon, 18 Jun 2018 10:41:44 +0000
Responsible-Changed-Why:
Over to committer.
From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53383 CVS commit: src/sys/arch/x86/x86
Date: Mon, 18 Jun 2018 20:20:27 +0000
Module Name: src
Committed By: maxv
Date: Mon Jun 18 20:20:27 UTC 2018
Modified Files:
src/sys/arch/x86/x86: fpu.c
Log Message:
Add more KASSERTs, see if they help PR/53383.
To generate a diff of this commit:
cvs rdiff -u -r1.37 -r1.38 src/sys/arch/x86/x86/fpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Andreas Gustafsson <gson@gson.org>
To: maxv@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-i386/53383: i386 tests panic under qemu with x86/identcpu.c 1.72
Date: Tue, 19 Jun 2018 10:01:04 +0300
With x86/x86/fpu.c 1.38, the panic has moved to fpu.c line 699:
bin/sh/t_syntax (21/759): 20 test cases
a_basic_tokenisation: [3.212352s] Passed.
b_comments: [7.355291s] Passed.
c_line_wrapping: [7.315732s] Passed.
d_cstrings: [6.448371s] Passed.
f_redirects: [8.697452s] Passed.
g_variable_syntax: [210.182721s] Passed.
h_var_assign: [4.478792s] Passed.
i_pipelines: [81.822791s] Passed.
j_and_or_lists: [20.252139s] Passed.
k_lists: [29.831012s] Passed.
l_async_lists: [38.177734s] Passed.
m_compound_lists: [ 1828.7424373] panic: kernel diagnostic assertion "pcb2->pcb_fpcpu == NULL" failed: file "/tmp/bracket/build/2018.06.19.02.08.12-i386/src/sys/arch/x86/x86/fpu.c", line 699
[ 1828.7424373] cpu0: Begin traceback...
[ 1828.7424373] vpanic(c0fc6fa0,c916ada8,c916adc0,c011ee6a,c0fc6fa0,c0fc6edf,c0fc830f,c0fc8384,2bb,c8df0000) at netbsd:vpanic+0x132
[ 1828.7628038] kern_assert(c0fc6fa0,c0fc6edf,c0fc830f,c0fc8384,2bb,c8df0000,c8df02c0,c916adf4,c0157dec,c8df0000) at netbsd:kern_assert+0x23
[ 1828.7628038] fpu_save_area_fork(c8df0000,c9168000,c916ade8,c0913518,c1b2b9e4,2,c9168000,c8df0000,c1b2b800,c1b227e4) at netbsd:fpu_save_area_fork+0x79
[ 1828.7828451] cpu_lwp_fork(c2170aa0,c1b2b800,0,0,c0157646,c1b2b800,c1b2b800,c916ae78,c090f1e3,c2170aa0) at netbsd:cpu_lwp_fork+0x95
[ 1828.7828451] uvm_lwp_fork(c2170aa0,c1b2b800,0,0,c0157646,c1b2b800,c19df618,c19df168,bfb28000,1) at netbsd:uvm_lwp_fork+0x35
[ 1828.8028224] lwp_create(c2170aa0,c1b227e4,c8df0000,0,0,0,c0157646,c1b2b800,c916aed0,0) at netbsd:lwp_create+0x30a
[ 1828.8228318] fork1(c2170aa0,0,14,0,0,c0157646,0,c916af60,c916af9c,c0157577) at netbsd:fork1+0x43c
[ 1828.8228318] sys_fork(c2170aa0,c916af68,c916af60,c1fe55f0,c9168000,c916af60,c916af68,2,0,0) at netbsd:sys_fork+0x48
[ 1828.8428979] syscall() at netbsd:syscall+0x1f7
[ 1828.8428979] --- syscall (number 2) ---
[ 1828.8428979] b379cd97:
[ 1828.8428979] cpu0: End traceback...
[ 1828.8628954] dumping to dev 0,1 offset 2680
[ 1828.8628954] dump 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 succeeded
Full console logs:
http://releng.netbsd.org/b5reports/i386/2018/2018.06.18.23.50.35/test.log
http://releng.netbsd.org/b5reports/i386/2018/2018.06.19.02.08.12/test.log
--
Andreas Gustafsson, gson@gson.org
From: Maxime Villard <max@m00nbsd.net>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-i386/53383: i386 tests panic under qemu with x86/identcpu.c
1.72
Date: Tue, 19 Jun 2018 09:06:36 +0200
Le 19/06/2018 à 09:01, Andreas Gustafsson a écrit :
> With x86/x86/fpu.c 1.38, the panic has moved to fpu.c line 699:
Yes I saw, looks like there is a race in cpu_lwp_fork(), the thread could
have context-switched between fpusave_lwp() and fpu_save_area_fork(), and
in this case we're back with pcb_fpcpu!=NULL.
From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53383 CVS commit: src/sys/arch/x86/x86
Date: Tue, 19 Jun 2018 07:23:45 +0000
Module Name: src
Committed By: maxv
Date: Tue Jun 19 07:23:45 UTC 2018
Modified Files:
src/sys/arch/x86/x86: vm_machdep.c
Log Message:
Explicitly clear l2's pcb_fpcpu when forking.
A context switch (preemption) could occur between
fpusave_lwp(l1, true);
and
memcpy(pcb2, pcb1, sizeof(struct pcb));
In this case, l1's FPU state is re-installed on the current CPU, and
pcb1->pcb_fpcpu becomes non NULL. While it's fine to have l1's state
installed, we don't want to indicate l2's state is installed too.
With lazy fpu this was not a problem, because the context-switch would
not re-install the state, so pcb1->pcb_fpcpu was NULL.
Should fix PR/53383.
To generate a diff of this commit:
cvs rdiff -u -r1.33 -r1.34 src/sys/arch/x86/x86/vm_machdep.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Wed, 20 Jun 2018 12:48:15 +0000
State-Changed-Why:
Fixed by src/sys/arch/x86/x86/fpu.c 1.39 - thanks maxv.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.