NetBSD Problem Report #38699

From tron@zhadum.org.uk  Mon May 19 20:41:58 2008
Return-Path: <tron@zhadum.org.uk>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 426D063B8BC
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 19 May 2008 20:41:58 +0000 (UTC)
Message-Id: <20080519204154.6B1251A2928@lyssa.zhadum.org.uk>
Date: Mon, 19 May 2008 21:41:54 +0100 (BST)
From: tron@zhadum.org.uk
Reply-To: tron@zhadum.org.uk
To: gnats-bugs@gnats.NetBSD.org
Subject: NetBSD-amd64 Xen dom0 with 4.5GB memory panics
X-Send-Pr-Version: 3.95

>Number:         38699
>Category:       port-xen
>Synopsis:       NetBSD-amd64 Xen dom0 with 4.5GB memory panics
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bouyer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 19 20:45:00 +0000 2008
>Closed-Date:    Sat Feb 25 12:17:36 +0000 2012
>Last-Modified:  Sat Feb 25 12:17:36 +0000 2012
>Originator:     tron@zhadum.org.uk
>Release:        NetBSD 4.99.63
>Organization:
Matthias Scheler                                  http://zhadum.org.uk/
>Environment:
System: NetBSD lyssa.zhadum.org.uk 4.99.63 NetBSD 4.99.63 (XEN3_DOM0) #0: Sun May 18 13:34:14 BST 2008  tron@lyssa.zhadum.org.uk:/src/sys/compile/XEN3_DOM0 amd64
Architecture: x86_64
Machine: amd64
>Description:
When I boot a Xen dom0 kernel on my NetBSD-amd64 system with 4.5GB of
memory assigned to it the kernel panics as soon as any serious amount
of filesystem I/O takes place:

uvm_fault(0xffffffff80b37a20, 0xffffffff81400000, 1) -> e
kernel: page fault trap, code=0
Stopped in pid 325.1 (tar) at   netbsd:pmap_kenter_pa+0x163:    movq    0(%rax),%rsi
pmap_kenter_pa() at netbsd:pmap_kenter_pa+0x163
uvm_pagermapin() at netbsd:uvm_pagermapin+0x189
genfs_getpages() at netbsd:genfs_getpages+0xc2f
VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x42
ufs_balloc_range() at netbsd:ufs_balloc_range+0xed
ffs_write() at netbsd:ffs_write+0x7c3
VOP_WRITE() at netbsd:VOP_WRITE+0x2a
vn_write() at netbsd:vn_write+0xce
dofilewrite() at netbsd:dofilewrite+0x7c
sys_write() at netbsd:sys_write+0x72
syscall() at netbsd:syscall+0x98
ds          0x6
es          0
fs          0xffff
gs          0
rdi         0xffffa0004a983000
rsi         0xe67d5000
rbp         0xffffa000528b6700
rbx         0xe67d5
rdx         0x7f8000000000
rcx         0x6
rax         0xffffffff81400ea8
r8          0xffffffff80adeb80  cpu_info_primary
r9          0
r10         0xffffa000528b65d0
r11         0x7f8000000000
r12         0x3
r13         0x7fd000254c18
r14         0xffffa000528b6858
r15         0xffffa0004a983000
rip         0xffffffff804cdcc3  pmap_kenter_pa+0x163
cs          0xe030
rflags      0x10282
rsp         0xffffa000528b66d0
ss          0xe02b
netbsd:pmap_kenter_pa+0x163:    movq    0(%rax),%rsi

Here is the backtrace:

pmap_kenter_pa() at netbsd:pmap_kenter_pa+0x163
uvm_pagermapin() at netbsd:uvm_pagermapin+0x189
genfs_getpages() at netbsd:genfs_getpages+0xc2f
VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x42
ufs_balloc_range() at netbsd:ufs_balloc_range+0xed
ffs_write() at netbsd:ffs_write+0x7c3
VOP_WRITE() at netbsd:VOP_WRITE+0x2a
vn_write() at netbsd:vn_write+0xce
dofilewrite() at netbsd:dofilewrite+0x7c
sys_write() at netbsd:sys_write+0x72
syscall() at netbsd:syscall+0x98

>How-To-Repeat:
Boot a NetBSD-amd64 Xen dom0 with the following grub config:

title Xen 3.1 / NetBSD (hda0, serial)
  root(hd0,0)
  kernel (hd0,a)/xen.gz dom0_mem=4718592 com1=115200,8n1
  module (hd0,a)/netbsd-XEN3_DOM0 bootdev=wd0a ro console=ttyS0 

>Fix:
None provided.

>Release-Note:

>Audit-Trail:
From: Matthias Scheler <tron@zhadum.org.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-xen/38699: NetBSD-amd64 Xen dom0 with 4.5GB memory panics
Date: Wed, 20 Aug 2008 20:46:51 +0100

 On Mon, May 19, 2008 at 08:45:00PM +0000, gnats-admin@NetBSD.org wrote:
 > Thank you very much for your problem report.
 > It has the internal identification `port-xen/38699'.
 > The individual assigned to look at your
 > report is: port-xen-maintainer. 
 > 
 > >Category:       port-xen
 > >Responsible:    port-xen-maintainer
 > >Synopsis:       NetBSD-amd64 Xen dom0 with 4.5GB memory panics
 > >Arrival-Date:   Mon May 19 20:45:00 +0000 2008

 Todd Kover can reproduce this problem on his hardware:

 http://mail-index.netbsd.org/port-xen/2008/08/20/msg004156.html

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: "Manuel Bouyer" <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/38699 CVS commit: src/sys/arch
Date: Thu, 23 Feb 2012 18:59:22 +0000

 Module Name:	src
 Committed By:	bouyer
 Date:		Thu Feb 23 18:59:22 UTC 2012

 Modified Files:
 	src/sys/arch/x86/x86: pmap.c
 	src/sys/arch/xen/x86: x86_xpmap.c

 Log Message:
 On Xen, there is variable-sized Xen data after the kernel's text+data+bss
 (this include the physical->machine table).
 (vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
 on a domU with lots of RAM (more than 4GB) (so large
 xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
 setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
 some page shared with the hypervisor, or our kernel page table). Using it for
 early_zerop will cause of these pages to be unmapped after bootstrap.
 This will cause a kernel page fault for the domU, either immediatly or
 eventually much later, depending on where early_zerop points to.
 To fix this, account for early_zerop when building the bootstrap pages,
 and its VA from here.

 May fix PR port-xen/38699


 To generate a diff of this commit:
 cvs rdiff -u -r1.169 -r1.170 src/sys/arch/x86/x86/pmap.c
 cvs rdiff -u -r1.39 -r1.40 src/sys/arch/xen/x86/x86_xpmap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: port-xen-maintainer->bouyer
Responsible-Changed-By: bouyer@NetBSD.org
Responsible-Changed-When: Thu, 23 Feb 2012 20:28:37 +0000
Responsible-Changed-Why:
.


State-Changed-From-To: open->feedback
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Thu, 23 Feb 2012 20:28:37 +0000
State-Changed-Why:
Can you check if it still happens with 
sys/arch/x86/x86/pmap.c 1.170
sys/arch/xen/x86/x86_xpmap.c 1.40 ?


From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/38699 CVS commit: [netbsd-6] src/sys/arch
Date: Thu, 23 Feb 2012 21:17:25 +0000

 Module Name:	src
 Committed By:	riz
 Date:		Thu Feb 23 21:17:25 UTC 2012

 Modified Files:
 	src/sys/arch/x86/x86 [netbsd-6]: pmap.c
 	src/sys/arch/xen/x86 [netbsd-6]: x86_xpmap.c

 Log Message:
 Pull up following revision(s) (requested by bouyer in ticket #39):
 	sys/arch/x86/x86/pmap.c: revision 1.170
 	sys/arch/xen/x86/x86_xpmap.c: revision 1.40
 On Xen, there is variable-sized Xen data after the kernel's text+data+bss
 (this include the physical->machine table).
 (vaddr_t)(KERNBASE + NKL2_KIMG_ENTRIES * NBPD_L2) is after text+data+bss but,
 on a domU with lots of RAM (more than 4GB) (so large
 xpmap_phys_to_machine_mapping table) this can point to some of Xen's data
 setup at bootstrap (either the xpmap_phys_to_machine_mapping table,
 some page shared with the hypervisor, or our kernel page table). Using it for
 early_zerop will cause of these pages to be unmapped after bootstrap.
 This will cause a kernel page fault for the domU, either immediatly or
 eventually much later, depending on where early_zerop points to.
 To fix this, account for early_zerop when building the bootstrap pages,
 and its VA from here.
 May fix PR port-xen/38699


 To generate a diff of this commit:
 cvs rdiff -u -r1.164.2.2 -r1.164.2.3 src/sys/arch/x86/x86/pmap.c
 cvs rdiff -u -r1.38.2.1 -r1.38.2.2 src/sys/arch/xen/x86/x86_xpmap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Matthias Scheler <tron@zhadum.org.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-xen/38699 (NetBSD-amd64 Xen dom0 with 4.5GB memory panics)
Date: Thu, 23 Feb 2012 23:11:41 +0000

 On Thu, Feb 23, 2012 at 08:28:37PM +0000, bouyer@NetBSD.org wrote:
 > State-Changed-From-To: open->feedback
 > State-Changed-By: bouyer@NetBSD.org
 > State-Changed-When: Thu, 23 Feb 2012 20:28:37 +0000
 > State-Changed-Why:
 > Can you check if it still happens with 
 > sys/arch/x86/x86/pmap.c 1.170
 > sys/arch/xen/x86/x86_xpmap.c 1.40 ?

 I'm afraid not. The machine is in "production use" and runs NetBSD/amd64,
 not NetBSD/xen. But thanks a lot for looking into this.

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: "Stephen Borrill" <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/38699 CVS commit: [netbsd-5] src/sys/arch
Date: Fri, 24 Feb 2012 17:29:33 +0000

 Module Name:	src
 Committed By:	sborrill
 Date:		Fri Feb 24 17:29:32 UTC 2012

 Modified Files:
 	src/sys/arch/x86/x86 [netbsd-5]: pmap.c
 	src/sys/arch/xen/x86 [netbsd-5]: x86_xpmap.c

 Log Message:
 Pull up the following revisions(s) (requested by bouyer in ticket #1729):
 	sys/arch/x86/x86/pmap.c:	revision 1.170 via patch
 	sys/arch/xen/x86/x86_xpmap.c:	revision 1.40 via patch

 Fix random kernel panic on domains with large memory.
 May fix PR port-xen/38699


 To generate a diff of this commit:
 cvs rdiff -u -r1.74.4.3 -r1.74.4.4 src/sys/arch/x86/x86/pmap.c
 cvs rdiff -u -r1.11 -r1.11.4.1 src/sys/arch/xen/x86/x86_xpmap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->closed
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Sat, 25 Feb 2012 12:17:36 +0000
State-Changed-Why:
Matthias can't test any more; lets assume this is fixed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.