NetBSD Problem Report #41106
From www@NetBSD.org Tue Mar 31 01:48:40 2009
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id ED65563B946
for <gnats-bugs@gnats.netbsd.org>; Tue, 31 Mar 2009 01:48:39 +0000 (UTC)
Message-Id: <20090331014839.B690263B8C8@www.NetBSD.org>
Date: Tue, 31 Mar 2009 01:48:39 +0000 (UTC)
From: dmarquess@gmail.com
Reply-To: dmarquess@gmail.com
To: gnats-bugs@NetBSD.org
Subject: GENERIC.MP memory management faults on API CS20/HP DS20L
X-Send-Pr-Version: www-1.0
>Number: 41106
>Category: port-alpha
>Synopsis: GENERIC.MP memory management faults on API CS20/HP DS20L
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-alpha-maintainer
>State: closed
>Class: doc-bug
>Submitter-Id: net
>Arrival-Date: Tue Mar 31 01:50:00 +0000 2009
>Closed-Date: Wed Sep 16 20:31:36 +0000 2009
>Last-Modified: Wed Sep 16 20:31:36 +0000 2009
>Originator: Dustin Marquess
>Release: netbsd-5-200903290002Z
>Organization:
>Environment:
NetBSD 5.0_RC3 (GENERIC-$Revision: 1.325 $) #0: Sun Mar 29 21:24:52 PDT 2009
builds@wb27:/home/builds/ab/netbsd-5/alpha/200903290002Z-obj/home/builds/ab/netbsd-5/src/sys/arch/alpha/compile/GENERIC.MP
hp AlphaServer DS20L 833 MHz, s/n 6969696969
8192 byte page size, 2 processors.
total memory = 2048 MB
(2792 KB reserved for PROM, 2045 MB used by NetBSD)
avail memory = 2006 MB
>Description:
When booting a GENERIC.MP kernel, the kernel always receives a memory management fault trap while scanning the SCSI bus:
fxp1 at pci1 dev 3 function 0: i82559 Ethernet, rev 8
fxp1: interrupting at dec 6600 irq 32
fxp1: Ethernet address 00:02:56:00:08:b6
inphy1 at fxp1 phy 1: i82555 10/100 media interface, rev. 4
inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ahc0 at pci1 dev 4 function 0: Adaptec 29160 Ultra160 SCSI adapter
ahc0: interrupting at dec 6600 irq 36
ahc0: aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus1 at ahc0: 16 targets, 8 luns per target
scsibus0: waiting 2 seconds for devices to settle...
CPU 1: fatal kernel trap:
CPU 1 trap entry = 0x2 (memory management fault)
CPU 1 a0 = 0x8
CPU 1 a1 = 0x1
CPU 1 a2 = 0x0
CPU 1 pc = 0xfffffc0000634414
CPU 1 ra = 0xfffffc0000634400
CPU 1 pv = 0xfffffc0000636900
CPU 1 curlwp = 0xfffffc007f8a3400
CPU 1 pid = 0, comm = system
scsibus1: waiting 2 seconds for devices to settle...
panic: trap
Stopped in pid 0.12 (system) at netbsd:cpu_Debugger+0x4: ret zero,(ra
)
db{1}> bt
cpu_Debugger() at netbsd:cpu_Debugger+0x4
panic() at netbsd:panic+0x244
trap() at netbsd:trap+0x35c
XentMM() at netbsd:XentMM+0x20
--- memory management fault (from ipl 5) ---
uvm_pagealloc_strat() at netbsd:uvm_pagealloc_strat+0x64
uvm_km_alloc_poolpage() at netbsd:uvm_km_alloc_poolpage+0x40
pool_page_alloc_meta() at netbsd:pool_page_alloc_meta+0x24
pool_grow() at netbsd:pool_grow+0x64
pool_get() at netbsd:pool_get+0x5c
pool_cache_put_slow() at netbsd:pool_cache_put_slow+0x1f4
pool_cache_put_paddr() at netbsd:pool_cache_put_paddr+0x168
pmap_do_tlb_shootdown() at netbsd:pmap_do_tlb_shootdown+0x178
alpha_ipi_process() at netbsd:alpha_ipi_process+0xb8
interrupt() at netbsd:interrupt+0x88
XentInt() at netbsd:XentInt+0x1c
--- interrupt (from ipl 0) ---
printf() at netbsd:printf+0xf8
trap() at netbsd:trap+0x168
XentMM() at netbsd:XentMM+0x20
--- memory management fault ---
uvm_pageidlezero() at netbsd:uvm_pageidlezero+0x34
idle_loop() at netbsd:idle_loop+0x1a4
cpu_spinup_trampoline() at netbsd:cpu_spinup_trampoline+0x5c
--- root of call graph ---
Problem does not occur with a GENERIC kernel, only GENERIC.MP
>How-To-Repeat:
Boot a GENERIC.MP kernel.
>Fix:
Unknown
>Release-Note:
>Audit-Trail:
From: "Michael L. Hitch" <mhitch@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/41106 CVS commit: src/sys/arch/alpha/alpha
Date: Sun, 6 Sep 2009 18:06:24 +0000
Module Name: src
Committed By: mhitch
Date: Sun Sep 6 18:06:24 UTC 2009
Modified Files:
src/sys/arch/alpha/alpha: cpu.c
Log Message:
There's now some per-cpu initialization that occurs before the secondary
cpus are told to begin running. Since the seconedary cpus weren't being
added to the cpu_info list until then, that initialization wasn't being
done and resulted in crashes on the secondary cpus. Add the secondary
cpus to the cpu_info_list after they've been started (but waiting to be
told to start running). This fixes the problem specifically stated in
PR port-alpha/41106. MP alphas will now at least boot and begin running,
but will eventually crash in various ways later.
To generate a diff of this commit:
cvs rdiff -u -r1.85 -r1.86 src/sys/arch/alpha/alpha/cpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->analyzed
State-Changed-By: mhitch@NetBSD.org
State-Changed-When: Sun, 06 Sep 2009 18:44:10 +0000
State-Changed-Why:
I've analyzed the problem (and it was quite a bit easier than I thought it would be).
From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: port-alpha-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-alpha/41106: GENERIC.MP memory management faults on API
CS20/HP DS20L
Date: Sun, 6 Sep 2009 11:46:28 -0600 (MDT)
On Tue, 31 Mar 2009, dmarquess@gmail.com wrote:
> db{1}> bt
> cpu_Debugger() at netbsd:cpu_Debugger+0x4
> panic() at netbsd:panic+0x244
> trap() at netbsd:trap+0x35c
> XentMM() at netbsd:XentMM+0x20
> --- memory management fault (from ipl 5) ---
> uvm_pagealloc_strat() at netbsd:uvm_pagealloc_strat+0x64
...
> Problem does not occur with a GENERIC kernel, only GENERIC.MP
It will also work with GENERIC.MP if you disable all the secondary cpus
(which rather defeats the purpose of GENERIC.MP).
I finally got a chance to start looking into this and found out that
there's some per-cpu initialize going on before the secondary cpus are
told they can start running. Prior to that point, only the primary cpu
has been added to the list of cpus, so the per-cpu setup doesn't occur on
the secondary cpus. I've got a fix for that, which at least lets me boot
up and start running with multiple cpus, but eventually crashes later when
it gets busy. I'm in the process of trying to track that down now.
--
Michael L. Hitch mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/41106 CVS commit: [netbsd-5] src/sys/arch/alpha/alpha
Date: Wed, 16 Sep 2009 04:12:49 +0000
Module Name: src
Committed By: snj
Date: Wed Sep 16 04:12:49 UTC 2009
Modified Files:
src/sys/arch/alpha/alpha [netbsd-5]: cpu.c
Log Message:
Pull up following revision(s) (requested by mhitch in ticket #949):
sys/arch/alpha/alpha/cpu.c: revision 1.86
There's now some per-cpu initialization that occurs before the secondary
cpus are told to begin running. Since the seconedary cpus weren't being
added to the cpu_info list until then, that initialization wasn't being
done and resulted in crashes on the secondary cpus. Add the secondary
cpus to the cpu_info_list after they've been started (but waiting to be
told to start running). This fixes the problem specifically stated in
PR port-alpha/41106. MP alphas will now at least boot and begin running,
but will eventually crash in various ways later.
To generate a diff of this commit:
cvs rdiff -u -r1.82 -r1.82.10.1 src/sys/arch/alpha/alpha/cpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/41106 CVS commit: [netbsd-5-0] src/sys/arch/alpha/alpha
Date: Wed, 16 Sep 2009 04:14:33 +0000
Module Name: src
Committed By: snj
Date: Wed Sep 16 04:14:33 UTC 2009
Modified Files:
src/sys/arch/alpha/alpha [netbsd-5-0]: cpu.c
Log Message:
Pull up following revision(s) (requested by mhitch in ticket #949):
sys/arch/alpha/alpha/cpu.c: revision 1.86
There's now some per-cpu initialization that occurs before the secondary
cpus are told to begin running. Since the seconedary cpus weren't being
added to the cpu_info list until then, that initialization wasn't being
done and resulted in crashes on the secondary cpus. Add the secondary
cpus to the cpu_info_list after they've been started (but waiting to be
told to start running). This fixes the problem specifically stated in
PR port-alpha/41106. MP alphas will now at least boot and begin running,
but will eventually crash in various ways later.
To generate a diff of this commit:
cvs rdiff -u -r1.82 -r1.82.14.1 src/sys/arch/alpha/alpha/cpu.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->closed
State-Changed-By: mhitch@NetBSD.org
State-Changed-When: Wed, 16 Sep 2009 20:31:36 +0000
State-Changed-Why:
Problem found, fixed, and pulled up to the netbsd-5* branches.
On to the next.....
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.