NetBSD Problem Report #43007
From www@NetBSD.org Fri Mar 19 07:27:14 2010
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 346D963B873
for <gnats-bugs@gnats.NetBSD.org>; Fri, 19 Mar 2010 07:27:14 +0000 (UTC)
Message-Id: <20100319072713.E043D63B86C@www.NetBSD.org>
Date: Fri, 19 Mar 2010 07:27:13 +0000 (UTC)
From: irino@src.ricoh.co.jp
Reply-To: irino@src.ricoh.co.jp
To: gnats-bugs@NetBSD.org
Subject: kernel preemption in NMI handler (tprof shows inaccurate results on MP system)
X-Send-Pr-Version: www-1.0
>Number: 43007
>Category: port-i386
>Synopsis: kernel preemption in NMI handler (tprof shows inaccurate results on MP system)
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-i386-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Mar 19 07:30:01 +0000 2010
>Closed-Date: Sat Nov 19 02:12:29 +0000 2011
>Last-Modified: Sat Nov 19 02:12:29 +0000 2011
>Originator: IRINO yoshiaki
>Release: NetBSD-5.0.2
>Organization:
RICOH COMPANY, LTD.
>Environment:
NetBSD 5.0.2 NetBSD 5.0.2 (GENERIC_TPROF) #1: Mon Mar 15 14:58:42 JST 2010 root@:/usr/src/sys/arch/i386/compile/GENERIC_TPROF i386
>Description:
I modified tprof module (src/sys/arch/x86/x86/tprof_pmi.c) to work
on Core2Duo CPU. But the tprof offten shows inaccurate result
(large "emptybuf" value), and sometimes cause kernel assertion fail
("KASSERT((pcint & LAPIC_LVT_MASKED) != 0)").
I tried to trace this problem by using debug-ful (== with many
breakpoints) kernel. This kernel's backtrace shows a rdmsr() in
tprof_pmi_nmi() was interrupted by clock interrupt, and kernel
preemption occurred in the NMI handler context.
If the preempted context was resumed on the same CPU core later,
tprof counts emptybuf. Because the CPU's performace counter is not
reloaded for long seconds in heavy loaded system.
If the preempted context was resumed on the other CPU core, kernel
assertion may occur.
Because the other CPU's performance counter may be not yet fired.
---
fatal breakpoing trap in supervisor mode
trap type 1 code 0 eip c055f6fc cs 8 eflags 202 cr2 80a5000 ilevel 7
Stopped in pid 4.78 (np1) at netbsd:breakpoint+0x4: popl %ebp
db{0}> bt
breakpoint(ccc64eac,1,96d0,0,8,0,800000a,ccc64cc0,c0b27bc0,ccc5cc2c,ccc64cc0) at netbsd:breakpoint+0x4
cpu_need_resched(c0b27bc0,4,0,a,ccc64d54,ccc64cc0,ccc64cc0,c0b27bc0,ccc5cc2c,ccc64cc0) at netbsd:cpu_need_resched+0xf2
sched_tick(c0b27bc0,0,0,0,0,cb39c780,ccc5cc8c,c01013b7,ccc5cc2c,ccc5cc2c) at netbsd:sched_tick+0xd1
hardclock(ccc5cc2c,ccc5cc2c,0,0,0,0,1100,0,0,0) at netbsd:hardclock+0xbb
DDB lost frame for netbsd:Xresume_lapic_ltimer+0x27, trying 0xcbbfcf74
Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+0x27
--- interrupt ---
--- switch to interrupt stack ---
rdmsr(ccc5cd48,cca8c868,c0ceda54,13,ccc5cd00,ccc5cccc,ccc5cd00,c04baf18,ccc5cccc,9) at netbsd:rdmsr+0x6
trap() at netbsd:trap+0x20b
--- trap (number 9) ---
0x804911c:
db{0}> continue
fatal breakpoint trap in supervisor mode
...
db{0}> bt
...
kpreempt(c055f68e)
...
trap() at netbsd:trap+0x20b
--- trap (number 9) ---
0x804911c:
db{0}>
0xc055f688 <rdmsr>: mov 0x4(%esp),%ecx
0xc055f68c <rdmsr+4>: rdmsr
0xc055f68e <rdmsr+6>: ret
0xc055f68f <rdmsr+7>: nop
0xc055f690 <wrmsr>: mov 0x4(%esp),%ecx
0xc055f694 <wrmsr+4>: mov 0x8(%esp),%eax
0xc055f698 <wrmsr+8>: mov 0xc(%esp),%edx
0xc055f69c <wrmsr+12>: wrmsr
0xc055f69e <wrmsr+14>: ret
0xc055f69f <wrmsr+15>: nop
---
>How-To-Repeat:
Enable kernel's tprof module and make /usr/sbin/tprof command.
Prepare multi-threaded and "hogging the CPU" program.
% /usr/sbin/tprof -o /dev/null program
If you need my modified kernel code and/or test program,
I can send them.
>Fix:
This patch worked fine for me, but I am not sure that all
T_NMI must be executed in interrupt disabled state or not.
% diff -c src/sys/arch/i386/i386/vector.S src/sys/arch/i386/i386/vector.S.NMI
*** src/sys/arch/i386/i386/vector.S Thu Sep 24 16:21:27 2009
--- src/sys/arch/i386/i386/vector.S.NMI Fri Mar 19 14:47:19 2010
***************
*** 841,847 ****
IDTVEC(trap01)
BPTTRAP(T_TRCTRAP)
IDTVEC(trap02)
! ZTRAP(T_NMI)
IDTVEC(trap03)
BPTTRAP(T_BPTFLT)
IDTVEC(trap04)
--- 841,850 ----
IDTVEC(trap01)
BPTTRAP(T_TRCTRAP)
IDTVEC(trap02)
! pushl $0
! pushl $(T_NMI)
! INTRENTRY
! jmp _C_LABEL(calltrap)
IDTVEC(trap03)
BPTTRAP(T_BPTFLT)
IDTVEC(trap04)
>Release-Note:
>Audit-Trail:
From: "YAMAMOTO Takashi" <yamt@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/43007 CVS commit: src/sys/arch/i386/i386
Date: Sun, 31 Oct 2010 04:51:19 +0000
Module Name: src
Committed By: yamt
Date: Sun Oct 31 04:51:19 UTC 2010
Modified Files:
src/sys/arch/i386/i386: vector.S
Log Message:
keep interrupts disabled in NMI handler.
the patch provided by IRINO yoshiaki in PR/43007.
To generate a diff of this commit:
cvs rdiff -u -r1.53 -r1.54 src/sys/arch/i386/i386/vector.S
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 14 Aug 2011 19:12:10 +0000
State-Changed-Why:
Is this fixed?
(should it be pulled up to -5?)
From: yoshiaki.irino@nts.ricoh.co.jp
To: gnats-bugs@NetBSD.org
Cc: dholland@NetBSD.org,
gnats-admin@netbsd.org,
irino@src.ricoh.co.jp,
netbsd-bugs@netbsd.org,
port-i386-maintainer@netbsd.org
Subject: Re: port-i386/43007 (kernel preemption in NMI handler (tprof shows
inaccurate results on MP system))
Date: Mon, 15 Aug 2011 08:24:01 +0900
dholland@NetBSD.org wrote on 2011/08/15 04:12:11:
> Synopsis: kernel preemption in NMI handler (tprof shows inaccurate
> results on MP system)
>
> State-Changed-From-To: open->feedback
> State-Changed-By: dholland@NetBSD.org
> State-Changed-When: Sun, 14 Aug 2011 19:12:10 +0000
> State-Changed-Why:
> Is this fixed?
Yes.
> (should it be pulled up to -5?)
I hope so.
irino@src.ricoh.co.jp
State-Changed-From-To: feedback->pending-pullups
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 05 Nov 2011 16:06:54 +0000
State-Changed-Why:
pullup-5 #1687
From: "Stephen Borrill" <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/43007 CVS commit: [netbsd-5] src/sys/arch/i386/i386
Date: Fri, 18 Nov 2011 21:09:25 +0000
Module Name: src
Committed By: sborrill
Date: Fri Nov 18 21:09:24 UTC 2011
Modified Files:
src/sys/arch/i386/i386 [netbsd-5]: vector.S
Log Message:
Pull up the following revisions(s) (requested by dholland in ticket #1687):
sys/arch/i386/i386/vector.S: revision 1.54
Keep interrupts disabled in NMI handler. Addresses PR/43007.
To generate a diff of this commit:
cvs rdiff -u -r1.42.6.3 -r1.42.6.4 src/sys/arch/i386/i386/vector.S
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 19 Nov 2011 02:12:29 +0000
State-Changed-Why:
Pullup applied; thanks all
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.