NetBSD Problem Report #43007

From www@NetBSD.org  Fri Mar 19 07:27:14 2010
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 346D963B873
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 19 Mar 2010 07:27:14 +0000 (UTC)
Message-Id: <20100319072713.E043D63B86C@www.NetBSD.org>
Date: Fri, 19 Mar 2010 07:27:13 +0000 (UTC)
From: irino@src.ricoh.co.jp
Reply-To: irino@src.ricoh.co.jp
To: gnats-bugs@NetBSD.org
Subject: kernel preemption in NMI handler (tprof shows inaccurate results on MP system)
X-Send-Pr-Version: www-1.0

>Number:         43007
>Category:       port-i386
>Synopsis:       kernel preemption in NMI handler (tprof shows inaccurate results on MP system)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-i386-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Mar 19 07:30:01 +0000 2010
>Closed-Date:    Sat Nov 19 02:12:29 +0000 2011
>Last-Modified:  Sat Nov 19 02:12:29 +0000 2011
>Originator:     IRINO yoshiaki
>Release:        NetBSD-5.0.2
>Organization:
RICOH COMPANY, LTD.
>Environment:
NetBSD  5.0.2 NetBSD 5.0.2 (GENERIC_TPROF) #1: Mon Mar 15 14:58:42 JST 2010  root@:/usr/src/sys/arch/i386/compile/GENERIC_TPROF i386
>Description:
I modified tprof module (src/sys/arch/x86/x86/tprof_pmi.c) to work 
on Core2Duo CPU. But the tprof offten shows inaccurate result
(large "emptybuf" value), and sometimes cause kernel assertion fail
("KASSERT((pcint & LAPIC_LVT_MASKED) != 0)").
I tried to trace this problem by using debug-ful (== with many 
breakpoints) kernel. This kernel's backtrace shows a rdmsr() in 
tprof_pmi_nmi() was interrupted by clock interrupt, and kernel 
preemption occurred in the NMI handler context.
If the preempted context was resumed on the same CPU core later, 
tprof counts emptybuf. Because the CPU's performace counter is not 
reloaded for long seconds in heavy loaded system.
If the preempted context was resumed on the other CPU core, kernel 
assertion may occur. 
Because the other CPU's performance counter may be not yet fired.

---
fatal breakpoing trap in supervisor mode
trap type 1 code 0 eip c055f6fc cs 8 eflags 202 cr2 80a5000 ilevel 7
Stopped in pid 4.78 (np1) at netbsd:breakpoint+0x4: popl %ebp
db{0}> bt
breakpoint(ccc64eac,1,96d0,0,8,0,800000a,ccc64cc0,c0b27bc0,ccc5cc2c,ccc64cc0) at netbsd:breakpoint+0x4
cpu_need_resched(c0b27bc0,4,0,a,ccc64d54,ccc64cc0,ccc64cc0,c0b27bc0,ccc5cc2c,ccc64cc0) at netbsd:cpu_need_resched+0xf2
sched_tick(c0b27bc0,0,0,0,0,cb39c780,ccc5cc8c,c01013b7,ccc5cc2c,ccc5cc2c) at netbsd:sched_tick+0xd1
hardclock(ccc5cc2c,ccc5cc2c,0,0,0,0,1100,0,0,0) at netbsd:hardclock+0xbb
DDB lost frame for netbsd:Xresume_lapic_ltimer+0x27, trying 0xcbbfcf74
Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+0x27
--- interrupt ---
--- switch to interrupt stack ---
rdmsr(ccc5cd48,cca8c868,c0ceda54,13,ccc5cd00,ccc5cccc,ccc5cd00,c04baf18,ccc5cccc,9) at netbsd:rdmsr+0x6
trap() at netbsd:trap+0x20b
--- trap (number 9) ---
0x804911c:
db{0}> continue
fatal breakpoint trap in supervisor mode
...
db{0}> bt
...
kpreempt(c055f68e)
...
trap() at netbsd:trap+0x20b
--- trap (number 9) ---
0x804911c:
db{0}>

0xc055f688 <rdmsr>:     mov    0x4(%esp),%ecx
0xc055f68c <rdmsr+4>:   rdmsr  
0xc055f68e <rdmsr+6>:   ret    
0xc055f68f <rdmsr+7>:   nop    
0xc055f690 <wrmsr>:     mov    0x4(%esp),%ecx
0xc055f694 <wrmsr+4>:   mov    0x8(%esp),%eax
0xc055f698 <wrmsr+8>:   mov    0xc(%esp),%edx
0xc055f69c <wrmsr+12>:  wrmsr  
0xc055f69e <wrmsr+14>:  ret    
0xc055f69f <wrmsr+15>:  nop    
---
>How-To-Repeat:
Enable kernel's tprof module and make /usr/sbin/tprof command.
Prepare multi-threaded and "hogging the CPU" program.
% /usr/sbin/tprof -o /dev/null  program

If you need my modified kernel code and/or test program, 
I can send them.
>Fix:
This patch worked fine for me, but I am not sure that all 
T_NMI must be executed in interrupt disabled state or not.
% diff -c src/sys/arch/i386/i386/vector.S src/sys/arch/i386/i386/vector.S.NMI
*** src/sys/arch/i386/i386/vector.S     Thu Sep 24 16:21:27 2009
--- src/sys/arch/i386/i386/vector.S.NMI Fri Mar 19 14:47:19 2010
***************
*** 841,847 ****
  IDTVEC(trap01)
        BPTTRAP(T_TRCTRAP)
  IDTVEC(trap02)
!       ZTRAP(T_NMI)
  IDTVEC(trap03)
        BPTTRAP(T_BPTFLT)
  IDTVEC(trap04)
--- 841,850 ----
  IDTVEC(trap01)
        BPTTRAP(T_TRCTRAP)
  IDTVEC(trap02)
!       pushl $0
!       pushl $(T_NMI)
!       INTRENTRY
!       jmp _C_LABEL(calltrap)
  IDTVEC(trap03)
        BPTTRAP(T_BPTFLT)
  IDTVEC(trap04)

>Release-Note:

>Audit-Trail:
From: "YAMAMOTO Takashi" <yamt@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/43007 CVS commit: src/sys/arch/i386/i386
Date: Sun, 31 Oct 2010 04:51:19 +0000

 Module Name:	src
 Committed By:	yamt
 Date:		Sun Oct 31 04:51:19 UTC 2010

 Modified Files:
 	src/sys/arch/i386/i386: vector.S

 Log Message:
 keep interrupts disabled in NMI handler.
 the patch provided by IRINO yoshiaki in PR/43007.


 To generate a diff of this commit:
 cvs rdiff -u -r1.53 -r1.54 src/sys/arch/i386/i386/vector.S

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 14 Aug 2011 19:12:10 +0000
State-Changed-Why:
Is this fixed?
(should it be pulled up to -5?)


From: yoshiaki.irino@nts.ricoh.co.jp
To: gnats-bugs@NetBSD.org
Cc: dholland@NetBSD.org,
	gnats-admin@netbsd.org,
	irino@src.ricoh.co.jp,
	netbsd-bugs@netbsd.org,
	port-i386-maintainer@netbsd.org
Subject: Re: port-i386/43007 (kernel preemption in NMI handler (tprof shows
 inaccurate results on MP system))
Date: Mon, 15 Aug 2011 08:24:01 +0900

 dholland@NetBSD.org wrote on 2011/08/15 04:12:11:

 > Synopsis: kernel preemption in NMI handler (tprof shows inaccurate
 > results on MP system)
 >
 > State-Changed-From-To: open->feedback
 > State-Changed-By: dholland@NetBSD.org
 > State-Changed-When: Sun, 14 Aug 2011 19:12:10 +0000
 > State-Changed-Why:
 > Is this fixed?
 Yes.
 > (should it be pulled up to -5?)
 I hope so.

 irino@src.ricoh.co.jp

State-Changed-From-To: feedback->pending-pullups
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 05 Nov 2011 16:06:54 +0000
State-Changed-Why:
pullup-5 #1687


From: "Stephen Borrill" <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/43007 CVS commit: [netbsd-5] src/sys/arch/i386/i386
Date: Fri, 18 Nov 2011 21:09:25 +0000

 Module Name:	src
 Committed By:	sborrill
 Date:		Fri Nov 18 21:09:24 UTC 2011

 Modified Files:
 	src/sys/arch/i386/i386 [netbsd-5]: vector.S

 Log Message:
 Pull up the following revisions(s) (requested by dholland in ticket #1687):
 	sys/arch/i386/i386/vector.S:	revision 1.54

 Keep interrupts disabled in NMI handler. Addresses PR/43007.


 To generate a diff of this commit:
 cvs rdiff -u -r1.42.6.3 -r1.42.6.4 src/sys/arch/i386/i386/vector.S

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 19 Nov 2011 02:12:29 +0000
State-Changed-Why:
Pullup applied; thanks all


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.