NetBSD Problem Report #45975

From riz@wintermute.localdomain  Fri Feb 10 18:32:01 2012
Return-Path: <riz@wintermute.localdomain>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 60C8E63BCF4
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 10 Feb 2012 18:32:01 +0000 (UTC)
Message-Id: <20120210183157.312ED11C87F@wintermute.localdomain>
Date: Fri, 10 Feb 2012 10:31:57 -0800 (PST)
From: riz@NetBSD.org
Reply-To: riz@NetBSD.org
To: gnats-bugs@gnats.NetBSD.org
Subject: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)
X-Send-Pr-Version: 3.95

>Number:         45975
>Category:       port-xen
>Synopsis:       panic: HYPERVISOR_mmu_update failed, ret: -22 during heavy activity
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bouyer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Feb 10 18:35:00 +0000 2012
>Closed-Date:    Sat May 20 20:54:51 +0000 2017
>Last-Modified:  Sat May 20 20:54:51 +0000 2017
>Originator:     Jeff Rizzo
>Release:        NetBSD 5.99.64
>Organization:

>Environment:


System: NetBSD breadfruit.tastylime.net 5.99.64 NetBSD 5.99.64 (XEN3_DOMU) #0: Fri Feb 10 09:10:33 PST 2012  riz@breadfruit.tastylime.net:/home/riz/obj/sys/arch/amd64/compile/XEN3_DOMU amd64
Architecture: x86_64
Machine: amd64
>Description:

I've been discussing this particular problem with cherry@ for a while,
and figured it was time to file a PR.

During heavy load (such as a build.sh -j16), my amd64 MP domUs sometimes
panic like so:


evtchn_do_event: handler 0xffffffff80121b77 didn't lower ipl 8 7
xpq_flush_queue: 1 entries (0 successful)
0x00000000d9fe9de0: 0x000000011beb9007
panic: HYPERVISOR_mmu_update failed, ret: -22

fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff801345c5 cs e030 rflags 246 cr2  7f7ff780027c cWl R6NIrNsGp:  fSPfLf aN0O0T0 bL9O9W0EfR7E1D0 
N TRAP EXIT St6o p0p                        O
d in pid 1833.1 (sh) at   netbsd:breakpoint+0x5:  leave
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x1f2
printf_nolog() at netbsd:printf_nolog
xpq_queue_machphys_update() at netbsd:xpq_queue_machphys_update
pmap_enter_ma() at netbsd:pmap_enter_ma+0xb74
pmap_enter() at netbsd:pmap_enter+0x35
uvm_fault_internal() at netbsd:uvm_fault_internal+0xf17
trap() at netbsd:trap+0x5f5
--- trap (number 7632997) ---
7374757066007469:
ds          ffea
es          f750
fs          100
gs          b180
rdi         0
rsi         d
rbp         ffffa000b990f710
rbx         104
rdx         0
rcx         8
rax         1
r8          ffffa00008978000
r9          1
r10         0
r11         e033
r12         ffffffff804b4a10    copyright+0x3ea10
r13         ffffa000b990f750
r14         ffffffea
r15         2
rip         ffffffff801345c5    breakpoint+0x5
cs          e030
rflags      246
rsp         ffffa000b990f710
ss          e02b
netbsd:breakpoint+0x5:  leave
db{3}> 

Here's the dom0's 'xm dmesg' (some may not be relevant):

durian:riz  ~> sudo xm dmesg
aps.c:2432:d164 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d164 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d164 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d164 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d164 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d165 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d165 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d165 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d165 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d165 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) mm.c:2424:d165 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1f4443 (pfn 5d67)
(XEN) mm.c:915:d165 Attempt to create linear p.t. with write perms
(XEN) traps.c:2432:d166 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d166 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d166 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d166 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d166 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) mm.c:2424:d166 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1f689f (pfn 1970c)
(XEN) mm.c:915:d166 Attempt to create linear p.t. with write perms
(XEN) traps.c:2432:d167 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d167 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d167 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d167 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d167 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d168 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d168 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d168 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d168 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d168 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d169 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d169 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d169 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d169 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d169 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d170 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d170 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d170 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d170 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d170 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) mm.c:2424:d170 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1e6705 (pfn 6edb)
(XEN) mm.c:915:d170 Attempt to create linear p.t. with write perms
(XEN) traps.c:2432:d171 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d171 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d171 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d171 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d171 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d172 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d172 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d172 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d172 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d172 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d173 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d173 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d173 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d173 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d173 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d173 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d173 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d174 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d174 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d174 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d174 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d174 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d175 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d175 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d175 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d175 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d175 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d175 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d175 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) mm.c:2424:d174 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1e51ff (pfn 1edaf)
(XEN) mm.c:915:d174 Attempt to create linear p.t. with write perms
(XEN) traps.c:2432:d176 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d176 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d176 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d176 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d176 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d177 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d177 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d177 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d177 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d177 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) mm.c:2424:d177 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1c496c (pfn de83)
(XEN) mm.c:915:d177 Attempt to create linear p.t. with write perms
(XEN) traps.c:2432:d178 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d178 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d178 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d178 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d178 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d179 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d179 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d179 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d179 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d179 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d179 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d179 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) mm.c:2424:d178 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1c0e0e (pfn 141c4)
(XEN) mm.c:915:d178 Attempt to create linear p.t. with write perms
(XEN) traps.c:2432:d180 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d180 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d180 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d180 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d180 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d181 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d181 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d181 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d181 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d181 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d182 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d182 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d182 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d182 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d182 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d183 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d183 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d183 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d183 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d183 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) mm.c:2424:d183 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 1c47ef (pfn 65cb)
(XEN) mm.c:982:d183 Attempt to create linear p.t. with write perms
(XEN) traps.c:2432:d184 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d184 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d184 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d184 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d184 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d185 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d185 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d185 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d185 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d185 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d186 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d186 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d186 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d186 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d186 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) mm.c:2424:d186 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1f4ff6 (pfn b804)
(XEN) mm.c:915:d186 Attempt to create linear p.t. with write perms
(XEN) traps.c:2432:d187 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d187 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d187 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d187 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d187 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) mm.c:2424:d187 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 1e48ca (pfn 196f5)
(XEN) mm.c:915:d187 Attempt to create linear p.t. with write perms
(XEN) traps.c:2432:d188 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d188 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d188 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d188 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d188 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) mm.c:2424:d179 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 1b224c (pfn 723a3)
(XEN) mm.c:982:d179 Attempt to create linear p.t. with write perms
(XEN) traps.c:2432:d190 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d190 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d190 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d190 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d190 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d190 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d190 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) traps.c:2432:d191 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d191 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d191 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d191 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d191 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d191 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d191 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) mm.c:2424:d191 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn 11beb9 (pfn c8753)
(XEN) mm.c:915:d191 Attempt to create linear p.t. with write perms
(XEN) traps.c:2432:d192 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d192 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d192 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d192 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d192 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.
(XEN) domain.c:652:d192 Attempt to change CR4 flags 00002660 -> 00000620
(XEN) traps.c:2432:d192 Domain attempted WRMSR 0000000000000277 from 0x0000050100070406 to 0x0007010600070106.


>How-To-Repeat:
	build.sh -j16 on an MP amd64 domU;  sometimes the problem takes
	a while to manifest; not usually more than 3 or 4 builds, and
	often during the first.
>Fix:
	None given. 

>Release-Note:

>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: port-xen-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org
Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on
 -current MP domU (amd64)
Date: Fri, 10 Feb 2012 19:42:08 +0100

 On Fri, Feb 10, 2012 at 06:35:00PM +0000, riz@NetBSD.org wrote:
 > I've been discussing this particular problem with cherry@ for a while,
 > and figured it was time to file a PR.
 > 
 > During heavy load (such as a build.sh -j16), my amd64 MP domUs sometimes
 > panic like so:
 > 
 > 
 > evtchn_do_event: handler 0xffffffff80121b77 didn't lower ipl 8 7

 This is annoying but probably unrelated

 > xpq_flush_queue: 1 entries (0 successful)
 > 0x00000000d9fe9de0: 0x000000011beb9007
 > panic: HYPERVISOR_mmu_update failed, ret: -22
 > 
 > fatal breakpoint trap in supervisor mode
 > trap type 1 code 0 rip ffffffff801345c5 cs e030 rflags 246 cr2  7f7ff780027c cWl R6NIrNsGp:  fSPfLf aN0O0T0 bL9O9W0EfR7E1D0 
 > N TRAP EXIT St6o p0p                        O
 > d in pid 1833.1 (sh) at   netbsd:breakpoint+0x5:  leave
 > breakpoint() at netbsd:breakpoint+0x5
 > vpanic() at netbsd:vpanic+0x1f2
 > printf_nolog() at netbsd:printf_nolog
 > xpq_queue_machphys_update() at netbsd:xpq_queue_machphys_update
 > pmap_enter_ma() at netbsd:pmap_enter_ma+0xb74
 > pmap_enter() at netbsd:pmap_enter+0x35
 > uvm_fault_internal() at netbsd:uvm_fault_internal+0xf17
 > trap() at netbsd:trap+0x5f5
 > --- trap (number 7632997) ---

 I'm seening this too. It seems to be caused by UVM using a page which
 was previously being used as a page table. Looks like Xen isn't aware that
 this page is not in use any more as page table.
 It can either be because we didn't unpin yet (or that some mapping clearing
 is still peding in a xpq_queue on other CPU), or because the page
 is still effectively used as a page table (possibly via a recursive
 mapping) by some other CPU.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

Responsible-Changed-From-To: port-xen-maintainer->bouyer
Responsible-Changed-By: bouyer@NetBSD.org
Responsible-Changed-When: Sun, 12 Feb 2012 19:20:38 +0000
Responsible-Changed-Why:
I proposed a patch


State-Changed-From-To: open->feedback
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Sun, 12 Feb 2012 19:20:38 +0000
State-Changed-Why:
Please test the patch; I tested it succesfully on aXen 3.3 install
with amd64 and i386 domUs.


From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: port-xen-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org
Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on
 -current MP domU (amd64)
Date: Sun, 12 Feb 2012 20:18:12 +0100

 --jRHKVT23PllUwdXP
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Fri, Feb 10, 2012 at 07:42:08PM +0100, Manuel Bouyer wrote:
 > > xpq_flush_queue: 1 entries (0 successful)
 > > 0x00000000d9fe9de0: 0x000000011beb9007
 > > panic: HYPERVISOR_mmu_update failed, ret: -22
 > > 
 > > fatal breakpoint trap in supervisor mode
 > > trap type 1 code 0 rip ffffffff801345c5 cs e030 rflags 246 cr2  7f7ff780027c cWl R6NIrNsGp:  fSPfLf aN0O0T0 bL9O9W0EfR7E1D0 
 > > N TRAP EXIT St6o p0p                        O
 > > d in pid 1833.1 (sh) at   netbsd:breakpoint+0x5:  leave
 > > breakpoint() at netbsd:breakpoint+0x5
 > > vpanic() at netbsd:vpanic+0x1f2
 > > printf_nolog() at netbsd:printf_nolog
 > > xpq_queue_machphys_update() at netbsd:xpq_queue_machphys_update
 > > pmap_enter_ma() at netbsd:pmap_enter_ma+0xb74
 > > pmap_enter() at netbsd:pmap_enter+0x35
 > > uvm_fault_internal() at netbsd:uvm_fault_internal+0xf17
 > > trap() at netbsd:trap+0x5f5
 > > --- trap (number 7632997) ---
 > 
 > I'm seening this too. It seems to be caused by UVM using a page which
 > was previously being used as a page table. Looks like Xen isn't aware that
 > this page is not in use any more as page table.
 > It can either be because we didn't unpin yet (or that some mapping clearing
 > is still peding in a xpq_queue on other CPU), or because the page
 > is still effectively used as a page table (possibly via a recursive
 > mapping) by some other CPU.

 I've made progress on this; I think I understood what's going on and
 I have a fix.

 The page is inded still used as a page table; it's still in another CPU's
 ci_kpm_pdir. The reason is that xen_kpm_sync() is not working as expected,
 leading to races between CPUs.
 1 the check (xpq_cpu != &x86_curcpu) is always false because we
   have different x86_curcpu symbols with different addresses in the kernel.
   Fortunably, all addresses dissaemble to the same code.
   Because of this we always use the code intended for bootstrap, which doesn't
   use cross-calls or lock.

 2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
   which cause it to sleep and pmap.c doesn't like that. It triggers this
   KASSERT() in pmap_unmap_ptes():
   KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);

 3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
   needs to know on which CPU a pmap is loaded *now*:
   pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
   to a new pmap, leaving a window where a pmap is still in a CPU's
   ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
   by the hypervisor at any time, it can be large enough to let another
   CPU free the PTP and reuse it as a normal page.

 To fix 2) I choose to avoid cross-calls and IPIs completely, and instead
 use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
 It's safe because we just need to update the table page, a tlbflush IPI will
 happen later. As a side effect, we don't need a different code for bootstrap.

 to fix 3), I introduced a pm_xen_ptp_cpus which is updated from
 cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
 ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

 While there I removed the unused pmap_is_active() function;
 and added some more details to DIAGNOSTIC panics.

 The attached patch implements this; it has been tested on 4-CPUs domU
 (on a physical dual-core box, so the race described in 2) is more likely to
 happen) and I couldn't trigger the panic any more (a build.sh -j8 release
 would never complete before for me).

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

 --jRHKVT23PllUwdXP
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="xen_kpm_sync.diff"

 ? amd64/conf/GENERIC_DIAG
 Index: x86/include/cpu.h
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/x86/include/cpu.h,v
 retrieving revision 1.46
 diff -u -p -u -r1.46 cpu.h
 --- x86/include/cpu.h	28 Jan 2012 07:19:17 -0000	1.46
 +++ x86/include/cpu.h	12 Feb 2012 19:08:19 -0000
 @@ -70,6 +70,7 @@
  #ifdef XEN
  #include <xen/xen-public/xen.h>
  #include <xen/xen-public/event_channel.h>
 +#include <sys/mutex.h>
  #endif /* XEN */

  struct intrsource;
 @@ -185,6 +186,7 @@ struct cpu_info {
  	/* Currently active user PGD (can't use rcr3() with Xen) */
  	pd_entry_t *	ci_kpm_pdir;	/* per-cpu PMD (va) */
  	paddr_t		ci_kpm_pdirpa;  /* per-cpu PMD (pa) */
 +	kmutex_t	ci_kpm_mtx;
  #if defined(__x86_64__)
  	/* per-cpu version of normal_pdes */
  	pd_entry_t *	ci_normal_pdes[3]; /* Ok to hardcode. only for x86_64 && XEN */
 @@ -317,7 +319,7 @@ lwp_t   *x86_curlwp(void);
  void cpu_boot_secondary_processors(void);
  void cpu_init_idle_lwps(void);
  void cpu_init_msrs(struct cpu_info *, bool);
 -void cpu_load_pmap(struct pmap *);
 +void cpu_load_pmap(struct pmap *, struct pmap *);
  void cpu_broadcast_halt(void);
  void cpu_kick(struct cpu_info *);

 Index: x86/include/pmap.h
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/x86/include/pmap.h,v
 retrieving revision 1.49
 diff -u -p -u -r1.49 pmap.h
 --- x86/include/pmap.h	4 Dec 2011 16:24:13 -0000	1.49
 +++ x86/include/pmap.h	12 Feb 2012 19:08:19 -0000
 @@ -165,6 +165,8 @@ struct pmap {
  	uint32_t pm_cpus;		/* mask of CPUs using pmap */
  	uint32_t pm_kernel_cpus;	/* mask of CPUs using kernel part
  					 of pmap */
 +	uint32_t pm_xen_ptp_cpus;	/* mask of CPUs which have this pmap's
 +					 ptp mapped */
  	uint64_t pm_ncsw;		/* for assertions */
  	struct vm_page *pm_gc_ptp;	/* pages from pmap g/c */
  };
 Index: x86/x86/cpu.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/x86/x86/cpu.c,v
 retrieving revision 1.96
 diff -u -p -u -r1.96 cpu.c
 --- x86/x86/cpu.c	18 Oct 2011 05:16:02 -0000	1.96
 +++ x86/x86/cpu.c	12 Feb 2012 19:08:19 -0000
 @@ -1228,7 +1228,7 @@ x86_cpu_idle_halt(void)
   * Loads pmap for the current CPU.
   */
  void
 -cpu_load_pmap(struct pmap *pmap)
 +cpu_load_pmap(struct pmap *pmap, struct pmap *oldpmap)
  {
  #ifdef PAE
  	int i, s;
 Index: x86/x86/pmap.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/x86/x86/pmap.c,v
 retrieving revision 1.163
 diff -u -p -u -r1.163 pmap.c
 --- x86/x86/pmap.c	1 Feb 2012 18:55:32 -0000	1.163
 +++ x86/x86/pmap.c	12 Feb 2012 19:08:19 -0000
 @@ -561,7 +561,6 @@ static void		 pmap_freepage(struct pmap 
  static void		 pmap_free_ptp(struct pmap *, struct vm_page *,
  				       vaddr_t, pt_entry_t *,
  				       pd_entry_t * const *);
 -static bool		 pmap_is_active(struct pmap *, struct cpu_info *, bool);
  static bool		 pmap_remove_pte(struct pmap *, struct vm_page *,
  					 pt_entry_t *, vaddr_t,
  					 struct pv_entry **);
 @@ -680,19 +679,6 @@ pmap_is_curpmap(struct pmap *pmap)
  }

  /*
 - * pmap_is_active: is this pmap loaded into the specified processor's %cr3?
 - */
 -
 -inline static bool
 -pmap_is_active(struct pmap *pmap, struct cpu_info *ci, bool kernel)
 -{
 -
 -	return (pmap == pmap_kernel() ||
 -	    (pmap->pm_cpus & ci->ci_cpumask) != 0 ||
 -	    (kernel && (pmap->pm_kernel_cpus & ci->ci_cpumask) != 0));
 -}
 -
 -/*
   *	Add a reference to the specified pmap.
   */

 @@ -781,7 +767,7 @@ pmap_map_ptes(struct pmap *pmap, struct 
  		ci->ci_tlbstate = TLBSTATE_VALID;
  		atomic_or_32(&pmap->pm_cpus, cpumask);
  		atomic_or_32(&pmap->pm_kernel_cpus, cpumask);
 -		cpu_load_pmap(pmap);
 +		cpu_load_pmap(pmap, curpmap);
  	}
  	pmap->pm_ncsw = l->l_ncsw;
  	*pmap2 = curpmap;
 @@ -2223,6 +2209,7 @@ pmap_create(void)
  	pmap->pm_flags = 0;
  	pmap->pm_cpus = 0;
  	pmap->pm_kernel_cpus = 0;
 +	pmap->pm_xen_ptp_cpus = 0;
  	pmap->pm_gc_ptp = NULL;

  	/* init the LDT */
 @@ -2313,9 +2300,26 @@ pmap_destroy(struct pmap *pmap)
  	}

  #ifdef DIAGNOSTIC
 -	for (CPU_INFO_FOREACH(cii, ci))
 +	for (CPU_INFO_FOREACH(cii, ci)) {
  		if (ci->ci_pmap == pmap)
  			panic("destroying pmap being used");
 +#if defined(XEN) && defined(__x86_64__)
 +		for (i = 0; i < PDIR_SLOT_PTE; i++) {
 +			if (pmap->pm_pdir[i] != 0 &&
 +			    ci->ci_kpm_pdir[i] == pmap->pm_pdir[i]) {
 +				printf("pmap_destroy(%p) pmap_kernel %p "
 +				    "curcpu %d cpu %d ci_pmap %p "
 +				    "ci->ci_kpm_pdir[%d]=%" PRIx64
 +				    " pmap->pm_pdir[%d]=%" PRIx64 "\n",
 +				    pmap, pmap_kernel(), curcpu()->ci_index,
 +				    ci->ci_index, ci->ci_pmap,
 +				    i, ci->ci_kpm_pdir[i],
 +				    i, pmap->pm_pdir[i]);
 +				panic("pmap_destroy: used pmap");
 +			}
 +		}
 +#endif
 +	}
  #endif /* DIAGNOSTIC */

  	/*
 @@ -2744,7 +2748,7 @@ pmap_load(void)
  	lldt(pmap->pm_ldt_sel);

  	u_int gen = uvm_emap_gen_return();
 -	cpu_load_pmap(pmap);
 +	cpu_load_pmap(pmap, oldpmap);
  	uvm_emap_update(gen);

  	ci->ci_want_pmapload = 0;
 Index: xen/include/hypervisor.h
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/xen/include/hypervisor.h,v
 retrieving revision 1.36
 diff -u -p -u -r1.36 hypervisor.h
 --- xen/include/hypervisor.h	7 Dec 2011 15:47:42 -0000	1.36
 +++ xen/include/hypervisor.h	12 Feb 2012 19:08:19 -0000
 @@ -91,7 +91,6 @@ struct xen_npx_attach_args {
  #include <xen/xen-public/io/netif.h>
  #include <xen/xen-public/io/blkif.h>

 -#include <machine/cpu.h>
  #include <machine/hypercalls.h>

  #undef u8
 Index: xen/include/intr.h
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/xen/include/intr.h,v
 retrieving revision 1.33
 diff -u -p -u -r1.33 intr.h
 --- xen/include/intr.h	11 Aug 2011 17:58:59 -0000	1.33
 +++ xen/include/intr.h	12 Feb 2012 19:08:19 -0000
 @@ -39,12 +39,13 @@
  #include <xen/xen.h>
  #include <xen/hypervisor.h>
  #include <xen/evtchn.h>
 -#include <machine/cpu.h>
  #include <machine/pic.h>
  #include <sys/evcnt.h>

  #include "opt_xen.h"

 +
 +struct cpu_info;
  /*
   * Struct describing an event channel. 
   */
 @@ -152,8 +153,6 @@ splraiseipl(ipl_cookie_t icookie)
   * Stub declarations.
   */

 -struct cpu_info;
 -
  struct pcibus_attach_args;

  #ifdef MULTIPROCESSOR
 Index: xen/x86/cpu.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/xen/x86/cpu.c,v
 retrieving revision 1.79
 diff -u -p -u -r1.79 cpu.c
 --- xen/x86/cpu.c	28 Jan 2012 12:15:19 -0000	1.79
 +++ xen/x86/cpu.c	12 Feb 2012 19:08:19 -0000
 @@ -596,6 +596,9 @@ cpu_init(struct cpu_info *ci)
  	/* No user PGD mapped for this CPU yet */
  	ci->ci_xen_current_user_pgd = 0;
  #endif
 +#if defined(__x86_64__) || defined(PAE)
 +	mutex_init(&ci->ci_kpm_mtx, MUTEX_DEFAULT, IPL_VM);
 +#endif

  	atomic_or_32(&cpus_running, ci->ci_cpumask);
  	atomic_or_32(&ci->ci_flags, CPUF_RUNNING);
 @@ -1173,62 +1176,76 @@ x86_cpu_idle_xen(void)
   * Loads pmap for the current CPU.
   */
  void
 -cpu_load_pmap(struct pmap *pmap)
 +cpu_load_pmap(struct pmap *pmap, struct pmap *oldpmap)
  {
 +#if defined(__x86_64__) || defined(PAE)
 +	struct cpu_info *ci = curcpu();
 +	uint32_t cpumask = ci->ci_cpumask;
 +
 +	mutex_enter(&ci->ci_kpm_mtx);
 +	/* make new pmap visible to pmap_kpm_sync_xcall() */
 +	atomic_or_32(&pmap->pm_xen_ptp_cpus, cpumask);
 +#endif
  #ifdef i386
  #ifdef PAE
 -	int i, s;
 -	struct cpu_info *ci;
 -
 -	s = splvm(); /* just to be safe */
 -	ci = curcpu();
 -	paddr_t l3_pd = xpmap_ptom_masked(ci->ci_pae_l3_pdirpa);
 -	/* don't update the kernel L3 slot */
 -	for (i = 0 ; i < PDP_SIZE - 1; i++) {
 -		xpq_queue_pte_update(l3_pd + i * sizeof(pd_entry_t),
 -		    xpmap_ptom(pmap->pm_pdirpa[i]) | PG_V);
 +	{
 +		int i;
 +		paddr_t l3_pd = xpmap_ptom_masked(ci->ci_pae_l3_pdirpa);
 +		/* don't update the kernel L3 slot */
 +		for (i = 0 ; i < PDP_SIZE - 1; i++) {
 +			xpq_queue_pte_update(l3_pd + i * sizeof(pd_entry_t),
 +			    xpmap_ptom(pmap->pm_pdirpa[i]) | PG_V);
 +		}
 +		tlbflush();
  	}
 -	splx(s);
 -	tlbflush();
  #else /* PAE */
  	lcr3(pmap_pdirpa(pmap, 0));
  #endif /* PAE */
  #endif /* i386 */

  #ifdef __x86_64__
 -	int i, s;
 -	pd_entry_t *new_pgd;
 -	struct cpu_info *ci;
 -	paddr_t l4_pd_ma;
 -
 -	ci = curcpu();
 -	l4_pd_ma = xpmap_ptom_masked(ci->ci_kpm_pdirpa);
 +	{
 +		int i;
 +		pd_entry_t *new_pgd;
 +		paddr_t l4_pd_ma;
 +
 +		l4_pd_ma = xpmap_ptom_masked(ci->ci_kpm_pdirpa);
 +
 +		/*
 +		 * Map user space address in kernel space and load
 +		 * user cr3
 +		 */
 +		new_pgd = pmap->pm_pdir;
 +		KASSERT(pmap == ci->ci_pmap);
 +
 +		/* Copy user pmap L4 PDEs (in user addr. range) to per-cpu L4 */
 +		for (i = 0; i < PDIR_SLOT_PTE; i++) {
 +			KASSERT(pmap != pmap_kernel() || new_pgd[i] == 0);
 +			if (ci->ci_kpm_pdir[i] != new_pgd[i]) {
 +				xpq_queue_pte_update(
 +				   l4_pd_ma + i * sizeof(pd_entry_t),
 +				    new_pgd[i]);
 +			}
 +		}
 +
 +		if (__predict_true(pmap != pmap_kernel())) {
 +			xen_set_user_pgd(pmap_pdirpa(pmap, 0));
 +			ci->ci_xen_current_user_pgd = pmap_pdirpa(pmap, 0);
 +		}
 +		else {
 +			xpq_queue_pt_switch(l4_pd_ma);
 +			ci->ci_xen_current_user_pgd = 0;
 +		}

 -	/*
 -	 * Map user space address in kernel space and load
 -	 * user cr3
 -	 */
 -	s = splvm();
 -	new_pgd = pmap->pm_pdir;
 -
 -	/* Copy user pmap L4 PDEs (in user addr. range) to per-cpu L4 */
 -	for (i = 0; i < PDIR_SLOT_PTE; i++) {
 -		xpq_queue_pte_update(l4_pd_ma + i * sizeof(pd_entry_t), new_pgd[i]);
 -	}
 -
 -	if (__predict_true(pmap != pmap_kernel())) {
 -		xen_set_user_pgd(pmap_pdirpa(pmap, 0));
 -		ci->ci_xen_current_user_pgd = pmap_pdirpa(pmap, 0);
 -	}
 -	else {
 -		xpq_queue_pt_switch(l4_pd_ma);
 -		ci->ci_xen_current_user_pgd = 0;
 +		tlbflush();
  	}

 -	tlbflush();
 -	splx(s);
 -
  #endif /* __x86_64__ */
 +#if defined(__x86_64__) || defined(PAE)
 +	/* old pmap no longer visible to pmap_kpm_sync_xcall() */
 +	atomic_and_32(&oldpmap->pm_xen_ptp_cpus, ~cpumask);
 +	mutex_exit(&ci->ci_kpm_mtx);
 +#endif
  }

   /*
 Index: xen/x86/x86_xpmap.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/xen/x86/x86_xpmap.c,v
 retrieving revision 1.38
 diff -u -p -u -r1.38 x86_xpmap.c
 --- xen/x86/x86_xpmap.c	12 Jan 2012 19:49:37 -0000	1.38
 +++ xen/x86/x86_xpmap.c	12 Feb 2012 19:08:19 -0000
 @@ -185,8 +185,12 @@ retry:
  	ret = HYPERVISOR_mmu_update_self(xpq_queue, xpq_idx, &ok);

  	if (xpq_idx != 0 && ret < 0) {
 -		printf("xpq_flush_queue: %d entries (%d successful)\n",
 -		    xpq_idx, ok);
 +		struct cpu_info *ci;
 +		CPU_INFO_ITERATOR cii;
 +
 +		printf("xpq_flush_queue: %d entries (%d successful) on "
 +		    "cpu%d (%ld)\n",
 +		    xpq_idx, ok, xpq_cpu()->ci_index, xpq_cpu()->ci_cpuid);

  		if (ok != 0) {
  			xpq_queue += ok;
 @@ -195,9 +199,23 @@ retry:
  			goto retry;
  		}

 -		for (i = 0; i < xpq_idx; i++)
 -			printf("0x%016" PRIx64 ": 0x%016" PRIx64 "\n",
 -			   xpq_queue[i].ptr, xpq_queue[i].val);
 +		for (CPU_INFO_FOREACH(cii, ci)) {
 +			xpq_queue = xpq_queue_array[ci->ci_cpuid];
 +			xpq_idx = xpq_idx_array[ci->ci_cpuid];
 +			printf("cpu%d (%ld):\n", ci->ci_index, ci->ci_cpuid);
 +			for (i = 0; i < xpq_idx; i++) {
 +				printf("  0x%016" PRIx64 ": 0x%016" PRIx64 "\n",
 +				   xpq_queue[i].ptr, xpq_queue[i].val);
 +			}
 +#ifdef __x86_64__
 +			for (i = 0; i < PDIR_SLOT_PTE; i++) {
 +				if (ci->ci_kpm_pdir[i] == 0)
 +					continue;
 +				printf(" kpm_pdir[%d]: 0x%" PRIx64 "\n",
 +				    i, ci->ci_kpm_pdir[i]);
 +			}
 +#endif
 +		}
  		panic("HYPERVISOR_mmu_update failed, ret: %d\n", ret);
  	}
  	xpq_idx_array[xpq_cpu()->ci_cpuid] = 0;
 Index: xen/x86/xen_ipi.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/xen/x86/xen_ipi.c,v
 retrieving revision 1.9
 diff -u -p -u -r1.9 xen_ipi.c
 --- xen/x86/xen_ipi.c	30 Dec 2011 12:16:19 -0000	1.9
 +++ xen/x86/xen_ipi.c	12 Feb 2012 19:08:19 -0000
 @@ -41,14 +41,13 @@ __KERNEL_RCSID(0, "$NetBSD: xen_ipi.c,v 
  #include <sys/types.h>

  #include <sys/atomic.h>
 -#include <sys/mutex.h>
  #include <sys/cpu.h>
 +#include <sys/mutex.h>
  #include <sys/device.h>
  #include <sys/xcall.h>
  #include <sys/errno.h>
  #include <sys/systm.h>

 -#include <machine/cpu.h>
  #ifdef __x86_64__
  #include <machine/fpu.h>
  #else
 Index: xen/x86/xen_pmap.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/xen/x86/xen_pmap.c,v
 retrieving revision 1.16
 diff -u -p -u -r1.16 xen_pmap.c
 --- xen/x86/xen_pmap.c	28 Jan 2012 07:19:17 -0000	1.16
 +++ xen/x86/xen_pmap.c	12 Feb 2012 19:08:19 -0000
 @@ -350,34 +350,7 @@ pmap_kpm_setpte(struct cpu_info *ci, str
  		xpmap_ptetomach(&ci->ci_kpm_pdir[index]),
  		pmap->pm_pdir[index]);
  #endif /* PAE */
 -}
 -
 -static void
 -pmap_kpm_sync_xcall(void *arg1, void *arg2)
 -{
 -	KASSERT(arg1 != NULL);
 -	KASSERT(arg2 != NULL);
 -
 -	struct pmap *pmap = arg1;
 -	int index = *(int *)arg2;
 -	KASSERT(pmap == pmap_kernel() || index < PDIR_SLOT_PTE);
 -	
 -	struct cpu_info *ci = xpq_cpu();
 -
 -#ifdef PAE
 -	KASSERTMSG(pmap == pmap_kernel(), "%s not allowed for PAE user pmaps", __func__);
 -#endif /* PAE */
 -
 -	if (__predict_true(pmap != pmap_kernel()) &&
 -	    pmap != ci->ci_pmap) {
 -		/* User pmap changed. Nothing to do. */
 -		return;
 -	}
 -
 -	/* Update per-cpu kpm */
 -	pmap_kpm_setpte(ci, pmap, index);
 -	pmap_pte_flush();
 -	return;
 +	xpq_flush_queue();
  }

  /*
 @@ -387,68 +360,30 @@ pmap_kpm_sync_xcall(void *arg1, void *ar
  void
  xen_kpm_sync(struct pmap *pmap, int index)
  {
 -	uint64_t where;
 +	CPU_INFO_ITERATOR cii;
 +	struct cpu_info *ci;

  	KASSERT(pmap != NULL);

  	pmap_pte_flush();

 -	if (__predict_false(xpq_cpu != &x86_curcpu)) { /* Too early to xcall */
 -		CPU_INFO_ITERATOR cii;
 -		struct cpu_info *ci;
 -		int s = splvm();
 -		for (CPU_INFO_FOREACH(cii, ci)) {
 -			if (ci == NULL) {
 -				continue;
 -			}
 -			if (pmap == pmap_kernel() ||
 -			    ci->ci_cpumask & pmap->pm_cpus) {
 -				pmap_kpm_setpte(ci, pmap, index);
 -			}
 +	for (CPU_INFO_FOREACH(cii, ci)) {
 +		if (ci == NULL) {
 +			continue;
  		}
 -		pmap_pte_flush();
 -		splx(s);
 -		return;
 -	}
 -
 -	if (pmap == pmap_kernel()) {
 -		where = xc_broadcast(XC_HIGHPRI,
 -		    pmap_kpm_sync_xcall, pmap, &index);
 -		xc_wait(where);
 -	} else {
 -		KASSERT(mutex_owned(pmap->pm_lock));
 -		KASSERT(kpreempt_disabled());
 -
 -		CPU_INFO_ITERATOR cii;
 -		struct cpu_info *ci;
 -		for (CPU_INFO_FOREACH(cii, ci)) {
 -			if (ci == NULL) {
 -				continue;
 -			}
 -			while (ci->ci_cpumask & pmap->pm_cpus) {
 -#ifdef MULTIPROCESSOR
 -#define CPU_IS_CURCPU(ci) __predict_false((ci) == curcpu())
 -#else /* MULTIPROCESSOR */
 -#define CPU_IS_CURCPU(ci) __predict_true((ci) == curcpu())
 -#endif /* MULTIPROCESSOR */
 -#if 0 /* XXX: Race with remote pmap_load() */
 -				if (ci->ci_want_pmapload &&
 -				    !CPU_IS_CURCPU(ci)) {
 -					/*
 -					 * XXX: make this more cpu
 -					 *  cycle friendly/co-operate
 -					 *  with pmap_load()
 -					 */
 -					continue;
 -				    }
 -#endif /* 0 */
 -				where = xc_unicast(XC_HIGHPRI, pmap_kpm_sync_xcall,
 -				    pmap, &index, ci);
 -				xc_wait(where);
 -				break;
 -			}
 +		if (pmap != pmap_kernel() &&
 +		    (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) == 0)
 +			continue;
 +
 +		/* take the lock and check again */
 +		mutex_enter(&ci->ci_kpm_mtx);
 +		if (pmap == pmap_kernel() ||
 +		    (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) != 0) {
 +			pmap_kpm_setpte(ci, pmap, index);
  		}
 +		mutex_exit(&ci->ci_kpm_mtx);
  	}
 +	return;
  }

  #endif /* PAE || __x86_64__ */

 --jRHKVT23PllUwdXP--

From: Jeff Rizzo <riz@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 during heavy activity)
Date: Sun, 12 Feb 2012 15:45:28 -0800

 On 2/12/12 11:20 AM, bouyer@NetBSD.org wrote:
 > Please test the patch; I tested it succesfully on aXen 3.3 install
 > with amd64 and i386 domUs.
 >
 >

 I just got the panic on a Xen 4.1.2 domU with a 4 vcpu domU (4 physical 
 cores with 2 threads per core on the hardware) while testing the patch:

 panic: HYPERVISOR_mmu_update failed, ret: -22

 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 rip ffffffff80134665 cs e030 rflags 246 cr2  
 7f7ff7e12000 cpl 6 rsp ffffa000b423dea0
 Stopped in pid 765.1 (sh) at    netbsd:breakpoint+0x5:  leave
 breakpoint() at netbsd:breakpoint+0x5
 vpanic() at netbsd:vpanic+0x1f2
 printf_nolog() at netbsd:printf_nolog
 xpq_queue_machphys_update() at netbsd:xpq_queue_machphys_update
 xen_kpm_sync() at netbsd:xen_kpm_sync+0x31
 pmap_enter_ma() at netbsd:pmap_enter_ma+0xe10
 pmap_enter() at netbsd:pmap_enter+0x35
 uvm_fault_lower_enter() at netbsd:uvm_fault_lower_enter+0xfe
 uvm_fault_internal() at netbsd:uvm_fault_internal+0xbf9
 trap() at netbsd:trap+0x5f5

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, riz@NetBSD.org
Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 during heavy activity)
Date: Mon, 13 Feb 2012 10:49:03 +0100

 On Sun, Feb 12, 2012 at 11:50:04PM +0000, Jeff Rizzo wrote:
 > The following reply was made to PR port-xen/45975; it has been noted by GNATS.
 > 
 > From: Jeff Rizzo <riz@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 >  during heavy activity)
 > Date: Sun, 12 Feb 2012 15:45:28 -0800
 > 
 >  On 2/12/12 11:20 AM, bouyer@NetBSD.org wrote:
 >  > Please test the patch; I tested it succesfully on aXen 3.3 install
 >  > with amd64 and i386 domUs.
 >  >
 >  >
 >  
 >  I just got the panic on a Xen 4.1.2 domU with a 4 vcpu domU (4 physical 
 >  cores with 2 threads per core on the hardware) while testing the patch:
 >  
 >  panic: HYPERVISOR_mmu_update failed, ret: -22
 >  
 >  fatal breakpoint trap in supervisor mode
 >  trap type 1 code 0 rip ffffffff80134665 cs e030 rflags 246 cr2  
 >  7f7ff7e12000 cpl 6 rsp ffffa000b423dea0
 >  Stopped in pid 765.1 (sh) at    netbsd:breakpoint+0x5:  leave
 >  breakpoint() at netbsd:breakpoint+0x5
 >  vpanic() at netbsd:vpanic+0x1f2
 >  printf_nolog() at netbsd:printf_nolog
 >  xpq_queue_machphys_update() at netbsd:xpq_queue_machphys_update
 >  xen_kpm_sync() at netbsd:xen_kpm_sync+0x31
 >  pmap_enter_ma() at netbsd:pmap_enter_ma+0xe10
 >  pmap_enter() at netbsd:pmap_enter+0x35
 >  uvm_fault_lower_enter() at netbsd:uvm_fault_lower_enter+0xfe
 >  uvm_fault_internal() at netbsd:uvm_fault_internal+0xbf9
 >  trap() at netbsd:trap+0x5f5

 please also send the last few lines of 'xm dmesg' if possible.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Jeff Rizzo <riz@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 during heavy activity)
Date: Mon, 13 Feb 2012 08:23:27 -0800

 On 2/13/12 1:49 AM, Manuel Bouyer wrote:
 > please also send the last few lines of 'xm dmesg' if possible.
 >

 Unfortunately, they're not timestamped, so it's hard for me to figure 
 out what's relevant.  Here's the entire 'xm dmesg'.

 2:d170 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d170 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d170 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d170 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) mm.c:2424:d170 Bad type (saw 7400000000000001 != exp 
 1000000000000000) for mfn 1e6705 (pfn 6edb)
 (XEN) mm.c:915:d170 Attempt to create linear p.t. with write perms
 (XEN) traps.c:2432:d171 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d171 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d171 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d171 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d171 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d172 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d172 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d172 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d172 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d172 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d173 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d173 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d173 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d173 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d173 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d173 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d173 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d174 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d174 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d174 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d174 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d174 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d175 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d175 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d175 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d175 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d175 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d175 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d175 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) mm.c:2424:d174 Bad type (saw 7400000000000001 != exp 
 1000000000000000) for mfn 1e51ff (pfn 1edaf)
 (XEN) mm.c:915:d174 Attempt to create linear p.t. with write perms
 (XEN) traps.c:2432:d176 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d176 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d176 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d176 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d176 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d177 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d177 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d177 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d177 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d177 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) mm.c:2424:d177 Bad type (saw 7400000000000001 != exp 
 1000000000000000) for mfn 1c496c (pfn de83)
 (XEN) mm.c:915:d177 Attempt to create linear p.t. with write perms
 (XEN) traps.c:2432:d178 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d178 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d178 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d178 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d178 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d179 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d179 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d179 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d179 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d179 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d179 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d179 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) mm.c:2424:d178 Bad type (saw 7400000000000001 != exp 
 1000000000000000) for mfn 1c0e0e (pfn 141c4)
 (XEN) mm.c:915:d178 Attempt to create linear p.t. with write perms
 (XEN) traps.c:2432:d180 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d180 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d180 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d180 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d180 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d181 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d181 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d181 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d181 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d181 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d182 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d182 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d182 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d182 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d182 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d183 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d183 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d183 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d183 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d183 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) mm.c:2424:d183 Bad type (saw 7400000000000001 != exp 
 3000000000000000) for mfn 1c47ef (pfn 65cb)
 (XEN) mm.c:982:d183 Attempt to create linear p.t. with write perms
 (XEN) traps.c:2432:d184 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d184 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d184 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d184 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d184 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d185 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d185 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d185 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d185 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d185 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d186 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d186 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d186 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d186 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d186 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) mm.c:2424:d186 Bad type (saw 7400000000000001 != exp 
 1000000000000000) for mfn 1f4ff6 (pfn b804)
 (XEN) mm.c:915:d186 Attempt to create linear p.t. with write perms
 (XEN) traps.c:2432:d187 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d187 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d187 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d187 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d187 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) mm.c:2424:d187 Bad type (saw 7400000000000001 != exp 
 1000000000000000) for mfn 1e48ca (pfn 196f5)
 (XEN) mm.c:915:d187 Attempt to create linear p.t. with write perms
 (XEN) traps.c:2432:d188 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d188 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d188 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d188 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d188 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) mm.c:2424:d179 Bad type (saw 7400000000000001 != exp 
 3000000000000000) for mfn 1b224c (pfn 723a3)
 (XEN) mm.c:982:d179 Attempt to create linear p.t. with write perms
 (XEN) traps.c:2432:d190 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d190 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d190 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d190 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d190 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d190 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d190 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d191 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d191 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d191 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d191 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d191 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d191 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d191 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) mm.c:2424:d191 Bad type (saw 7400000000000001 != exp 
 1000000000000000) for mfn 11beb9 (pfn c8753)
 (XEN) mm.c:915:d191 Attempt to create linear p.t. with write perms
 (XEN) traps.c:2432:d192 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d192 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d192 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d192 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d192 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d192 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d192 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d193 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d193 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d193 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d193 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d193 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d193 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d193 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) mm.c:2424:d193 Bad type (saw 7400000000000001 != exp 
 1000000000000000) for mfn 1c90f7 (pfn 1b4e5)
 (XEN) mm.c:915:d193 Attempt to create linear p.t. with write perms
 (XEN) mm.c:2424:d193 Bad type (saw 7400000000000001 != exp 
 1000000000000000) for mfn 1c8d36 (pfn 1b8a6)
 (XEN) mm.c:915:d193 Attempt to create linear p.t. with write perms
 (XEN) traps.c:2432:d194 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d194 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d194 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d194 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d194 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d194 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d194 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) traps.c:2432:d196 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d196 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d196 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d196 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d196 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d196 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d196 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) mm.c:2424:d196 Bad type (saw 7400000000000001 != exp 
 1000000000000000) for mfn 1fdc1f (pfn 169b6)
 (XEN) mm.c:915:d196 Attempt to create linear p.t. with write perms
 (XEN) traps.c:2432:d197 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d197 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d197 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d197 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d197 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) domain.c:652:d197 Attempt to change CR4 flags 00002660 -> 00000620
 (XEN) traps.c:2432:d197 Domain attempted WRMSR 0000000000000277 from 
 0x0000050100070406 to 0x0007010600070106.
 (XEN) mm.c:2424:d197 Bad type (saw 7400000000000001 != exp 
 3000000000000000) for mfn 1fe715 (pfn ded3)
 (XEN) mm.c:982:d197 Attempt to create linear p.t. with write perms

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, riz@NetBSD.org
Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 during heavy activity)
Date: Mon, 13 Feb 2012 17:37:38 +0100

 On Mon, Feb 13, 2012 at 04:25:02PM +0000, Jeff Rizzo wrote:
 > The following reply was made to PR port-xen/45975; it has been noted by GNATS.
 > 
 > From: Jeff Rizzo <riz@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 >  during heavy activity)
 > Date: Mon, 13 Feb 2012 08:23:27 -0800
 > 
 >  On 2/13/12 1:49 AM, Manuel Bouyer wrote:
 >  > please also send the last few lines of 'xm dmesg' if possible.
 >  >
 >  
 >  Unfortunately, they're not timestamped, so it's hard for me to figure 
 >  out what's relevant.  Here's the entire 'xm dmesg'.

 You have the domain number ...

 >  (XEN) mm.c:2424:d170 Bad type (saw 7400000000000001 != exp 
 >  1000000000000000) for mfn 1e6705 (pfn 6edb)
 >  (XEN) mm.c:915:d170 Attempt to create linear p.t. with write perms

 All errors are of the same type, and looks similar to what you've
 reported first. It means we're trying to use a page as a page table
 which is already mapped read/write. So it's not the same problem
 as the bug I corrected (I should have checked more carefully), where
 Xen was telling me that a domU is trying to map read/write a page
 already used as page table (so it's the opposite !).

 Today I ran into the issue you're describing now. So the patch I
 proposed is correct and fixes an issue; but it's not this one ...

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Cherry G. Mathew <cherry@zyx.in>
To: gnats-bugs@NetBSD.org
Cc: port-xen-maintainer@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)
Date: Tue, 14 Feb 2012 23:41:02 +0530

 Hi Manuel,


 > I've made progress on this; I think I understood what's going on and
 > I have a fix.
 >
 >
 > The page is inded still used as a page table; it's still in another CPU's
 > ci_kpm_pdir. The reason is that xen_kpm_sync() is not working as expected,
 > leading to races between CPUs.
 > 1 the check (xpq_cpu != &x86_curcpu) is always false because we
 >   have different x86_curcpu symbols with different addresses in the kernel.
 >   Fortunably, all addresses dissaemble to the same code.

 Ok, this is my messup - I was looking for a way to not use %gs/%fs
 before they are setup via cpu_init_msrs(). The reason it doesn't work,
 as dh pointed out elsewhere, is because x86_curcpu() is defined in a
 header as static inline.




 >
 > 3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
 >   needs to know on which CPU a pmap is loaded *now*:
 >   pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
 >   to a new pmap, leaving a window where a pmap is still in a CPU's
 >   ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
 >   by the hypervisor at any time, it can be large enough to let another
 >   CPU free the PTP and reuse it as a normal page.
 >

 Makes sense.

 >
 > To fix 2) I choose to avoid cross-calls and IPIs completely, and instead
 > use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
 > It's safe because we just need to update the table page, a tlbflush IPI will
 > happen later. As a side effect, we don't need a different code for bootstrap.
 >
 >
 > to fix 3), I introduced a pm_xen_ptp_cpus which is updated from
 > cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
 > ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
 >
 >
 > While there I removed the unused pmap_is_active() function;
 > and added some more details to DIAGNOSTIC panics.
 >

 Can we add these broken down into separate patch sets ?

 >
 > The attached patch implements this; it has been tested on 4-CPUs domU
 > (on a physical dual-core box, so the race described in 2) is more likely to
 > happen) and I couldn't trigger the panic any more (a build.sh -j8 release
 > would never complete before for me).
 >

 ...

 > @@ -387,68 +360,30 @@ pmap_kpm_sync_xcall(void *arg1, void *ar
 >  void
 >  xen_kpm_sync(struct pmap *pmap, int index)
 >  {
 > -      uint64_t where;
 > +      CPU_INFO_ITERATOR cii;
 > +      struct cpu_info *ci;
 >
 >
 >        KASSERT(pmap != NULL);
 >
 >
 >        pmap_pte_flush();
 >
 >
 > -      if (__predict_false(xpq_cpu != &x86_curcpu)) { /* Too early to xcall */
 > -              CPU_INFO_ITERATOR cii;
 > -              struct cpu_info *ci;
 > -              int s = splvm();
 > -              for (CPU_INFO_FOREACH(cii, ci)) {
 > -                      if (ci == NULL) {
 > -                              continue;
 > -                      }
 > -                      if (pmap == pmap_kernel() ||
 > -                          ci->ci_cpumask & pmap->pm_cpus) {
 > -                              pmap_kpm_setpte(ci, pmap, index);
 > -                      }
 > +      for (CPU_INFO_FOREACH(cii, ci)) {
 > +              if (ci == NULL) {
 > +                      continue;
 >                }
 > -              pmap_pte_flush();
 > -              splx(s);
 > -              return;
 > -      }
 > -
 > -      if (pmap == pmap_kernel()) {
 > -              where = xc_broadcast(XC_HIGHPRI,
 > -                  pmap_kpm_sync_xcall, pmap, &index);
 > -              xc_wait(where);
 > -      } else {
 > -              KASSERT(mutex_owned(pmap->pm_lock));
 > -              KASSERT(kpreempt_disabled());
 > -
 > -              CPU_INFO_ITERATOR cii;
 > -              struct cpu_info *ci;
 > -              for (CPU_INFO_FOREACH(cii, ci)) {
 > -                      if (ci == NULL) {
 > -                              continue;
 > -                      }
 > -                      while (ci->ci_cpumask & pmap->pm_cpus) {
 > -#ifdef MULTIPROCESSOR
 > -#define CPU_IS_CURCPU(ci) __predict_false((ci) == curcpu())
 > -#else /* MULTIPROCESSOR */
 > -#define CPU_IS_CURCPU(ci) __predict_true((ci) == curcpu())
 > -#endif /* MULTIPROCESSOR */
 > -#if 0 /* XXX: Race with remote pmap_load() */
 > -                              if (ci->ci_want_pmapload &&
 > -                                  !CPU_IS_CURCPU(ci)) {
 > -                                      /*
 > -                                       * XXX: make this more cpu
 > -                                       *  cycle friendly/co-operate
 > -                                       *  with pmap_load()
 > -                                       */
 > -                                      continue;
 > -                                  }
 > -#endif /* 0 */
 > -                              where = xc_unicast(XC_HIGHPRI, pmap_kpm_sync_xcall,
 > -                                  pmap, &index, ci);
 > -                              xc_wait(where);
 > -                              break;
 > -                      }
 > +              if (pmap != pmap_kernel() &&
 > +                  (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) == 0)
 > +                      continue;
 > +
 > +              /* take the lock and check again */
 > +              mutex_enter(&ci->ci_kpm_mtx);
 > +              if (pmap == pmap_kernel() ||
 > +                  (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) != 0) {
 > +                      pmap_kpm_setpte(ci, pmap, index);

 Isn't a tlb shootdown needed after this, to make sure the old pte is not
 referred to in the TLB ?

 Separately, please note that the tlb shootdown code ignores requests to
 shootdown when the pmap is being destroyed. 

 http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap_tlb.c#222

 I have a hypothesis that stale TLB entries involving PTPs in the
 ci_kpm_pdir[] may be responsible for the "other" bug that riz@ is seeing
 ?

 Cheers,

 Cherry.

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: "Cherry G. Mathew" <cherry@zyx.in>
Cc: gnats-bugs@NetBSD.org, port-xen-maintainer@NetBSD.org,
        gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update
 failed, ret: -22 on -current MP domU (amd64)
Date: Tue, 14 Feb 2012 19:19:44 +0100

 On Tue, Feb 14, 2012 at 11:41:02PM +0530, Cherry G. Mathew wrote:
 > Hi Manuel,
 > 
 > 
 > > I've made progress on this; I think I understood what's going on and
 > > I have a fix.
 > >
 > >
 > > The page is inded still used as a page table; it's still in another CPU's
 > > ci_kpm_pdir. The reason is that xen_kpm_sync() is not working as expected,
 > > leading to races between CPUs.
 > > 1 the check (xpq_cpu != &x86_curcpu) is always false because we
 > >   have different x86_curcpu symbols with different addresses in the kernel.
 > >   Fortunably, all addresses dissaemble to the same code.
 > 
 > Ok, this is my messup - I was looking for a way to not use %gs/%fs
 > before they are setup via cpu_init_msrs(). The reason it doesn't work,
 > as dh pointed out elsewhere, is because x86_curcpu() is defined in a
 > header as static inline.
 > 
 > 
 > 
 > 
 > >
 > > 3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
 > >   needs to know on which CPU a pmap is loaded *now*:
 > >   pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
 > >   to a new pmap, leaving a window where a pmap is still in a CPU's
 > >   ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
 > >   by the hypervisor at any time, it can be large enough to let another
 > >   CPU free the PTP and reuse it as a normal page.
 > >
 > 
 > Makes sense.
 > 
 > >
 > > To fix 2) I choose to avoid cross-calls and IPIs completely, and instead
 > > use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
 > > It's safe because we just need to update the table page, a tlbflush IPI will
 > > happen later. As a side effect, we don't need a different code for bootstrap.
 > >
 > >
 > > to fix 3), I introduced a pm_xen_ptp_cpus which is updated from
 > > cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
 > > ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
 > >
 > >
 > > While there I removed the unused pmap_is_active() function;
 > > and added some more details to DIAGNOSTIC panics.
 > >
 > 
 > Can we add these broken down into separate patch sets ?

 Not easy, they're touching the same part of the code.

 > 
 > >
 > > The attached patch implements this; it has been tested on 4-CPUs domU
 > > (on a physical dual-core box, so the race described in 2) is more likely to
 > > happen) and I couldn't trigger the panic any more (a build.sh -j8 release
 > > would never complete before for me).
 > >
 > 
 > ...
 > 
 > > @@ -387,68 +360,30 @@ pmap_kpm_sync_xcall(void *arg1, void *ar
 > >  void
 > >  xen_kpm_sync(struct pmap *pmap, int index)
 > >  {
 > > -      uint64_t where;
 > > +      CPU_INFO_ITERATOR cii;
 > > +      struct cpu_info *ci;
 > >
 > >
 > >        KASSERT(pmap != NULL);
 > >
 > >
 > >        pmap_pte_flush();
 > >
 > >
 > > -      if (__predict_false(xpq_cpu != &x86_curcpu)) { /* Too early to xcall */
 > > -              CPU_INFO_ITERATOR cii;
 > > -              struct cpu_info *ci;
 > > -              int s = splvm();
 > > -              for (CPU_INFO_FOREACH(cii, ci)) {
 > > -                      if (ci == NULL) {
 > > -                              continue;
 > > -                      }
 > > -                      if (pmap == pmap_kernel() ||
 > > -                          ci->ci_cpumask & pmap->pm_cpus) {
 > > -                              pmap_kpm_setpte(ci, pmap, index);
 > > -                      }
 > > +      for (CPU_INFO_FOREACH(cii, ci)) {
 > > +              if (ci == NULL) {
 > > +                      continue;
 > >                }
 > > -              pmap_pte_flush();
 > > -              splx(s);
 > > -              return;
 > > -      }
 > > -
 > > -      if (pmap == pmap_kernel()) {
 > > -              where = xc_broadcast(XC_HIGHPRI,
 > > -                  pmap_kpm_sync_xcall, pmap, &index);
 > > -              xc_wait(where);
 > > -      } else {
 > > -              KASSERT(mutex_owned(pmap->pm_lock));
 > > -              KASSERT(kpreempt_disabled());
 > > -
 > > -              CPU_INFO_ITERATOR cii;
 > > -              struct cpu_info *ci;
 > > -              for (CPU_INFO_FOREACH(cii, ci)) {
 > > -                      if (ci == NULL) {
 > > -                              continue;
 > > -                      }
 > > -                      while (ci->ci_cpumask & pmap->pm_cpus) {
 > > -#ifdef MULTIPROCESSOR
 > > -#define CPU_IS_CURCPU(ci) __predict_false((ci) == curcpu())
 > > -#else /* MULTIPROCESSOR */
 > > -#define CPU_IS_CURCPU(ci) __predict_true((ci) == curcpu())
 > > -#endif /* MULTIPROCESSOR */
 > > -#if 0 /* XXX: Race with remote pmap_load() */
 > > -                              if (ci->ci_want_pmapload &&
 > > -                                  !CPU_IS_CURCPU(ci)) {
 > > -                                      /*
 > > -                                       * XXX: make this more cpu
 > > -                                       *  cycle friendly/co-operate
 > > -                                       *  with pmap_load()
 > > -                                       */
 > > -                                      continue;
 > > -                                  }
 > > -#endif /* 0 */
 > > -                              where = xc_unicast(XC_HIGHPRI, pmap_kpm_sync_xcall,
 > > -                                  pmap, &index, ci);
 > > -                              xc_wait(where);
 > > -                              break;
 > > -                      }
 > > +              if (pmap != pmap_kernel() &&
 > > +                  (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) == 0)
 > > +                      continue;
 > > +
 > > +              /* take the lock and check again */
 > > +              mutex_enter(&ci->ci_kpm_mtx);
 > > +              if (pmap == pmap_kernel() ||
 > > +                  (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) != 0) {
 > > +                      pmap_kpm_setpte(ci, pmap, index);
 > 
 > Isn't a tlb shootdown needed after this, to make sure the old pte is not
 > referred to in the TLB ?

 The pmap code does it. Plus, I suspect that the hypervisor does tlb
 flushes itself, to make sure you can't write accidentally to a ptp page
 using stale TLB entries.

 > 
 > Separately, please note that the tlb shootdown code ignores requests to
 > shootdown when the pmap is being destroyed. 

 this shouldn't be a problem.

 > 
 > http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap_tlb.c#222
 > 
 > I have a hypothesis that stale TLB entries involving PTPs in the
 > ci_kpm_pdir[] may be responsible for the "other" bug that riz@ is seeing
 > ?

 I don't think so. The bug is that a new, free page is still registered
 as PTP to the hypervisor. It's not a TLB issue (this is in the CPU's cache,
 not in the hypervisor's data structure). I don't know yet if it's because
 NetBSD freed a page without clearing the PTE (this would be a NetBSD
 bug) or if Xen didn't track it properly (this would be a hypervisor bug).

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Cherry G. Mathew <cherry@zyx.in>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org,  port-xen-maintainer@NetBSD.org,  gnats-admin@NetBSD.org,  netbsd-bugs@NetBSD.org
Subject: Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)
Date: Wed, 15 Feb 2012 00:49:38 +0530

 >>>>> Manuel Bouyer <bouyer@antioche.eu.org> writes:

     >> On Tue, Feb 14, 2012 at 11:41:02PM +0530, Cherry G. Mathew wrote:

 [...]

     >> Isn't a tlb shootdown needed after this, to make sure the old pte
     >> is not referred to in the TLB ?

     >> The pmap code does it. Plus, I suspect that the hypervisor does
     >> tlb flushes itself, to make sure you can't write accidentally to
     >> a ptp page using stale TLB entries.

     >> 
     >> Separately, please note that the tlb shootdown code ignores
     >> requests to shootdown when the pmap is being destroyed.

     >> this shouldn't be a problem.

 Perhaps - but if there's a stale entry pointing to the now destroyed
 pmap pdir....

 Incidentally, it looks like dom0 with MP hits this bug/symptom at boot -
 might be easier to trigger + debug. (I don't have a dom0 setup handy).

 Cheers,
 -- 
 Cherry

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: "Cherry G. Mathew" <cherry@zyx.in>
Cc: gnats-bugs@NetBSD.org, port-xen-maintainer@NetBSD.org,
        gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update
 failed, ret: -22 on -current MP domU (amd64)
Date: Tue, 14 Feb 2012 20:30:38 +0100

 On Wed, Feb 15, 2012 at 12:49:38AM +0530, Cherry G. Mathew wrote:
 >     >> this shouldn't be a problem.
 > 
 > Perhaps - but if there's a stale entry pointing to the now destroyed
 > pmap pdir....

 The hypervisor won't allow that.

 > 
 > Incidentally, it looks like dom0 with MP hits this bug/symptom at boot -
 > might be easier to trigger + debug. (I don't have a dom0 setup handy).

 It doens't in my case (or, it didn't when I tried it some time ago)

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc: bouyer@antioche.eu.org, bouyer@NetBSD.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org, riz@NetBSD.org
Subject: Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update
 failed, ret: -22 on -current MP domU (amd64)
Date: Tue, 14 Feb 2012 14:58:15 -0600

 I just saw this panic with today's -current.  Relevant(?) bits look
 like:

 ...
 pmap_kenter_pa: mapping already present
 evtchn_do_event: handler 0xffffffff80121b77 didn't lower ipl 8 7
 evtchn_do_event: handler 0xffffffff80121b77 didn't lower ipl 8 7
 xpq_flush_queue: 1 entries (0 successful)
 0x000000017e476de0: 0x0000000190e00007
 panic: HYPERVISOR_mmu_update failed, ret: -22

 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 rip ffffffff80134575 cs e030 rflags 246 cr2
 7f7ff78002b4 cpl 6 rsp ffffa0005f9ff710 Stopped in pid 19481.1 (sh) at
 netbsd:breakpoint+0x5:  leave breakpoint() at netbsd:breakpoint+0x5
 vpanic() at netbsd:vpanic+0x1f2
 printf_nolog() at netbsd:printf_nolog
 xpq_queue_machphys_update() at netbsd:xpq_queue_machphys_update
 pmap_enter_ma() at netbsd:pmap_enter_ma+0xb74
 pmap_enter() at netbsd:pmap_enter+0x35
 uvm_fault_internal() at netbsd:uvm_fault_internal+0xf17
 trap() at netbsd:trap+0x5f5
 --- trap (number 1600061548) ---
 65736961720034:
 ds          ffea
 es          f750
 fs          100
 gs          e180
 rdi         0
 rsi         d
 rbp         ffffa0005f9ff710
 rbx         104
 rdx         0
 rcx         8
 rax         1
 r8          ffffa00004559000
 r9          1
 r10         0
 r11         ffffa0000c761380
 r12         ffffffff804b7350    copyright+0x40090
 r13         ffffa0005f9ff750
 r14         ffffffea
 r15         2
 rip         ffffffff80134575    breakpoint+0x5
 cs          e030
 rflags      246
 rsp         ffffa0005f9ff710
 ss          e02b
 netbsd:breakpoint+0x5:  leave
 db{2}> 
 db{2}> 

 relevant bits of 'xm dmesg' looks like:

 (XEN) domain.c:509:d9 Attempt to change CR4 flags 00002620 -> 00000620
 (XEN) domain.c:509:d9 Attempt to change CR4 flags 00002620 -> 00000620
 (XEN) domain.c:509:d9 Attempt to change CR4 flags 00002620 -> 00000620
 (XEN) mm.c:2012:d9 Bad type (saw 00000000e8000001 != exp
 0000000020000000) for mfn 190e00 (pfn 59a5b) 
 (XEN) mm.c:745:d9 Attempt to create linear p.t. with write perms

 If you need more info, just let me know... (this is a test vm, so can
 be killed at will/on demand..)

 Later...

 Greg Oster

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Greg Oster <oster@cs.usask.ca>
Cc: gnats-bugs@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
        riz@NetBSD.org
Subject: Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update
 failed, ret: -22 on -current MP domU (amd64)
Date: Tue, 14 Feb 2012 22:02:43 +0100

 On Tue, Feb 14, 2012 at 02:58:15PM -0600, Greg Oster wrote:
 > 
 > I just saw this panic with today's -current.  Relevant(?) bits look
 > like:
 > 
 > ...
 > pmap_kenter_pa: mapping already present
 > evtchn_do_event: handler 0xffffffff80121b77 didn't lower ipl 8 7
 > evtchn_do_event: handler 0xffffffff80121b77 didn't lower ipl 8 7
 > xpq_flush_queue: 1 entries (0 successful)
 > 0x000000017e476de0: 0x0000000190e00007
 > panic: HYPERVISOR_mmu_update failed, ret: -22
 > 
 > fatal breakpoint trap in supervisor mode
 > trap type 1 code 0 rip ffffffff80134575 cs e030 rflags 246 cr2
 > 7f7ff78002b4 cpl 6 rsp ffffa0005f9ff710 Stopped in pid 19481.1 (sh) at
 > netbsd:breakpoint+0x5:  leave breakpoint() at netbsd:breakpoint+0x5
 > vpanic() at netbsd:vpanic+0x1f2
 > printf_nolog() at netbsd:printf_nolog
 > xpq_queue_machphys_update() at netbsd:xpq_queue_machphys_update
 > pmap_enter_ma() at netbsd:pmap_enter_ma+0xb74
 > pmap_enter() at netbsd:pmap_enter+0x35
 > uvm_fault_internal() at netbsd:uvm_fault_internal+0xf17
 > trap() at netbsd:trap+0x5f5
 > --- trap (number 1600061548) ---
 > 65736961720034:
 > ds          ffea
 > es          f750
 > fs          100
 > gs          e180
 > rdi         0
 > rsi         d
 > rbp         ffffa0005f9ff710
 > rbx         104
 > rdx         0
 > rcx         8
 > rax         1
 > r8          ffffa00004559000
 > r9          1
 > r10         0
 > r11         ffffa0000c761380
 > r12         ffffffff804b7350    copyright+0x40090
 > r13         ffffa0005f9ff750
 > r14         ffffffea
 > r15         2
 > rip         ffffffff80134575    breakpoint+0x5
 > cs          e030
 > rflags      246
 > rsp         ffffa0005f9ff710
 > ss          e02b
 > netbsd:breakpoint+0x5:  leave
 > db{2}> 
 > db{2}> 
 > 
 > relevant bits of 'xm dmesg' looks like:
 > 
 > (XEN) domain.c:509:d9 Attempt to change CR4 flags 00002620 -> 00000620
 > (XEN) domain.c:509:d9 Attempt to change CR4 flags 00002620 -> 00000620
 > (XEN) domain.c:509:d9 Attempt to change CR4 flags 00002620 -> 00000620
 > (XEN) mm.c:2012:d9 Bad type (saw 00000000e8000001 != exp
 > 0000000020000000) for mfn 190e00 (pfn 59a5b) 
 > (XEN) mm.c:745:d9 Attempt to create linear p.t. with write perms
 > 
 > If you need more info, just let me know... (this is a test vm, so can
 > be killed at will/on demand..)

 Yes, that's the same problem. Is it with i386 or amd64 ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Greg Oster <oster@cs.usask.ca>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
 riz@NetBSD.org
Subject: Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update
 failed, ret: -22 on -current MP domU (amd64)
Date: Tue, 14 Feb 2012 15:08:42 -0600

 On Tue, 14 Feb 2012 22:02:43 +0100
 Manuel Bouyer <bouyer@antioche.eu.org> wrote:

 > On Tue, Feb 14, 2012 at 02:58:15PM -0600, Greg Oster wrote:
 > > 
 [snip]
 > > 
 > > relevant bits of 'xm dmesg' looks like:
 > > 
 > > (XEN) domain.c:509:d9 Attempt to change CR4 flags 00002620 ->
 > > 00000620 (XEN) domain.c:509:d9 Attempt to change CR4 flags 00002620
 > > -> 00000620 (XEN) domain.c:509:d9 Attempt to change CR4 flags
 > > 00002620 -> 00000620 (XEN) mm.c:2012:d9 Bad type (saw
 > > 00000000e8000001 != exp 0000000020000000) for mfn 190e00 (pfn
 > > 59a5b) (XEN) mm.c:745:d9 Attempt to create linear p.t. with write
 > > perms
 > > 
 > > If you need more info, just let me know... (this is a test vm, so
 > > can be killed at will/on demand..)
 > 
 > Yes, that's the same problem. Is it with i386 or amd64 ?

 amd64 (on a Q6600 Intel box)

 Later...

 Greg Oster

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, riz@NetBSD.org
Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 during heavy activity)
Date: Fri, 17 Feb 2012 13:12:46 +0100

 On Mon, Feb 13, 2012 at 05:37:38PM +0100, Manuel Bouyer wrote:
 > All errors are of the same type, and looks similar to what you've
 > reported first. It means we're trying to use a page as a page table
 > which is already mapped read/write. So it's not the same problem
 > as the bug I corrected (I should have checked more carefully), where
 > Xen was telling me that a domU is trying to map read/write a page
 > already used as page table (so it's the opposite !).
 > 
 > Today I ran into the issue you're describing now. So the patch I
 > proposed is correct and fixes an issue; but it's not this one ...

 I think I found where the problem comes from:
 in uvm, uvm_km_pgremove_intrsafe() will return a page to the free list.
 But as it uses pmap_extract(), the mapping of the pages in pmap_kernel()
 is removed *after* calling uvm_km_pgremove_intrsafe().
 So there is a window where pages are returned to the free list, but
 are still mapped in pmap_kernel(); another CPU can allocate and map this
 page in this time (remember a CPU can be preempted by the hypervisor,
 so the window can be quite long).
 I confirmed this by adding an (expensive) check in uvm_pagefree() looking
 for existing mappings of a page.
 Multiple mappings of the same physical page with different
 attributes are not a problem on real hardware, so this is a problem only for
 Xen.
 One trivial fix is to call pmap_kremove() from uvm_km_pgremove_intrsafe()
 just after pmap_extract() has been done (and remove it from callers).
 I'm testing this now

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, riz@NetBSD.org
Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 during heavy activity)
Date: Fri, 17 Feb 2012 16:30:26 +0100

 --3MwIy2ne0vdjdPXF
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Fri, Feb 17, 2012 at 01:12:46PM +0100, Manuel Bouyer wrote:
 > I think I found where the problem comes from:
 > in uvm, uvm_km_pgremove_intrsafe() will return a page to the free list.
 > But as it uses pmap_extract(), the mapping of the pages in pmap_kernel()
 > is removed *after* calling uvm_km_pgremove_intrsafe().
 > So there is a window where pages are returned to the free list, but
 > are still mapped in pmap_kernel(); another CPU can allocate and map this
 > page in this time (remember a CPU can be preempted by the hypervisor,
 > so the window can be quite long).
 > I confirmed this by adding an (expensive) check in uvm_pagefree() looking
 > for existing mappings of a page.
 > Multiple mappings of the same physical page with different
 > attributes are not a problem on real hardware, so this is a problem only for
 > Xen.
 > One trivial fix is to call pmap_kremove() from uvm_km_pgremove_intrsafe()
 > just after pmap_extract() has been done (and remove it from callers).
 > I'm testing this now

 See attached patch. I couldn't make my test domUs (a i386PAE with 4
 virtual CPU on a dual-core box, and a amd64 with 8 virtual CPUs on
 a 8-core box), several build.sh -j<as_appropriate> release have
 completed.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

 --3MwIy2ne0vdjdPXF
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=diff

 Index: uvm/uvm_km.c
 ===================================================================
 RCS file: /cvsroot/src/sys/uvm/uvm_km.c,v
 retrieving revision 1.119
 diff -u -p -u -r1.119 uvm_km.c
 --- uvm/uvm_km.c	4 Feb 2012 17:56:17 -0000	1.119
 +++ uvm/uvm_km.c	17 Feb 2012 15:26:30 -0000
 @@ -472,12 +472,18 @@ uvm_km_pgremove_intrsafe(struct vm_map *
  		if (!pmap_extract(pmap_kernel(), start, &pa)) {
  			continue;
  		}
 +#ifdef __PMAP_NEED_UNMAP_BEFORE_FREE
 +		pmap_kremove(start, PAGE_SIZE);
 +#endif
  		pg = PHYS_TO_VM_PAGE(pa);
  		KASSERT(pg);
  		KASSERT(pg->uobject == NULL && pg->uanon == NULL);
  		KASSERT((pg->flags & PG_BUSY) == 0);
  		uvm_pagefree(pg);
  	}
 +#ifndef __PMAP_NEED_UNMAP_BEFORE_FREE
 +	pmap_kremove(start, end - start);
 +#endif
  }

  #if defined(DEBUG)
 @@ -670,7 +676,6 @@ uvm_km_free(struct vm_map *map, vaddr_t 
  		 * remove it after.  See comment below about KVA visibility.
  		 */
  		uvm_km_pgremove_intrsafe(map, addr, addr + size);
 -		pmap_kremove(addr, size);
  	}

  	/*
 @@ -747,7 +752,6 @@ again:
  			} else {
  				uvm_km_pgremove_intrsafe(kernel_map, va,
  				    va + size);
 -				pmap_kremove(va, size);
  				vmem_free(kmem_va_arena, va, size);
  				return ENOMEM;
  			}
 @@ -783,7 +787,6 @@ uvm_km_kmem_free(vmem_t *vm, vmem_addr_t
  	}
  #endif /* PMAP_UNMAP_POOLPAGE */
  	uvm_km_pgremove_intrsafe(kernel_map, addr, addr + size);
 -	pmap_kremove(addr, size);
  	pmap_update(pmap_kernel());

  	vmem_free(vm, addr, size);
 Index: uvm/uvm_kmguard.c
 ===================================================================
 RCS file: /cvsroot/src/sys/uvm/uvm_kmguard.c,v
 retrieving revision 1.9
 diff -u -p -u -r1.9 uvm_kmguard.c
 --- uvm/uvm_kmguard.c	5 Feb 2012 11:08:06 -0000	1.9
 +++ uvm/uvm_kmguard.c	17 Feb 2012 15:26:30 -0000
 @@ -180,7 +180,6 @@ uvm_kmguard_free(struct uvm_kmguard *kg,
  	 */

  	uvm_km_pgremove_intrsafe(kernel_map, va, va + PAGE_SIZE * 2);
 -	pmap_kremove(va, PAGE_SIZE * 2);
  	pmap_update(pmap_kernel());

  	/*
 Index: uvm/uvm_map.c
 ===================================================================
 RCS file: /cvsroot/src/sys/uvm/uvm_map.c,v
 retrieving revision 1.312
 diff -u -p -u -r1.312 uvm_map.c
 --- uvm/uvm_map.c	28 Jan 2012 00:00:06 -0000	1.312
 +++ uvm/uvm_map.c	17 Feb 2012 15:26:30 -0000
 @@ -2246,7 +2246,6 @@ uvm_unmap_remove(struct vm_map *map, vad
  			if ((entry->flags & UVM_MAP_KMAPENT) == 0) {
  				uvm_km_pgremove_intrsafe(map, entry->start,
  				    entry->end);
 -				pmap_kremove(entry->start, len);
  			}
  		} else if (UVM_ET_ISOBJ(entry) &&
  			   UVM_OBJ_IS_KERN_OBJECT(entry->object.uvm_obj)) {
 Index: arch/x86/include/pmap.h
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/x86/include/pmap.h,v
 retrieving revision 1.49
 diff -u -p -u -r1.49 pmap.h
 --- arch/x86/include/pmap.h	4 Dec 2011 16:24:13 -0000	1.49
 +++ arch/x86/include/pmap.h	17 Feb 2012 15:26:30 -0000
 @@ -296,6 +298,14 @@ void		pmap_tlb_intr(void);
  #define PMAP_GROWKERNEL		/* turn on pmap_growkernel interface */
  #define PMAP_FORK		/* turn on pmap_fork interface */

 +#ifdef XEN
 +/*
 + * If a free vm_page is allocated for a PDP, is will be rejected
 + * by Xen if it has still some R/W mapping.
 + */
 +#define __PMAP_NEED_UNMAP_BEFORE_FREE
 +#endif
 +
  /*
   * Do idle page zero'ing uncached to avoid polluting the cache.
   */

 --3MwIy2ne0vdjdPXF--

From: Greg Oster <oster@cs.usask.ca>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
 riz@NetBSD.org
Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 during heavy activity)
Date: Fri, 17 Feb 2012 11:26:06 -0600

 On Fri, 17 Feb 2012 16:30:26 +0100
 Manuel Bouyer <bouyer@antioche.eu.org> wrote:

 > On Fri, Feb 17, 2012 at 01:12:46PM +0100, Manuel Bouyer wrote:
 > > I think I found where the problem comes from:
 > > in uvm, uvm_km_pgremove_intrsafe() will return a page to the free
 > > list. But as it uses pmap_extract(), the mapping of the pages in
 > > pmap_kernel() is removed *after* calling uvm_km_pgremove_intrsafe().
 > > So there is a window where pages are returned to the free list, but
 > > are still mapped in pmap_kernel(); another CPU can allocate and map
 > > this page in this time (remember a CPU can be preempted by the
 > > hypervisor, so the window can be quite long).
 > > I confirmed this by adding an (expensive) check in uvm_pagefree()
 > > looking for existing mappings of a page.
 > > Multiple mappings of the same physical page with different
 > > attributes are not a problem on real hardware, so this is a problem
 > > only for Xen.
 > > One trivial fix is to call pmap_kremove() from
 > > uvm_km_pgremove_intrsafe() just after pmap_extract() has been done
 > > (and remove it from callers). I'm testing this now
 > 
 > See attached patch. I couldn't make my test domUs (a i386PAE with 4
 > virtual CPU on a dual-core box, and a amd64 with 8 virtual CPUs on
 > a 8-core box), several build.sh -j<as_appropriate> release have
 > completed.

 With this patch I was able to do a successful 'build.sh -j8' build on an
 amd64 DOMU that was previously failing.

 Later...

 Greg Oster

From: "Manuel Bouyer" <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/45975 CVS commit: src/sys/arch
Date: Fri, 17 Feb 2012 18:40:21 +0000

 Module Name:	src
 Committed By:	bouyer
 Date:		Fri Feb 17 18:40:20 UTC 2012

 Modified Files:
 	src/sys/arch/x86/include: cpu.h pmap.h
 	src/sys/arch/x86/x86: cpu.c pmap.c
 	src/sys/arch/xen/include: hypervisor.h intr.h
 	src/sys/arch/xen/x86: cpu.c x86_xpmap.c xen_ipi.c xen_pmap.c

 Log Message:
 Apply patch proposed in PR port-xen/45975 (this does not solve the exact
 problem reported here but is part of the solution):
 xen_kpm_sync() is not working as expected,
 leading to races between CPUs.
 1 the check (xpq_cpu != &x86_curcpu) is always false because we
   have different x86_curcpu symbols with different addresses in the kernel.
   Fortunably, all addresses dissaemble to the same code.
   Because of this we always use the code intended for bootstrap, which doesn't
   use cross-calls or lock.

 2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
   which cause it to sleep and pmap.c doesn't like that. It triggers this
   KASSERT() in pmap_unmap_ptes():
   KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
 3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
   needs to know on which CPU a pmap is loaded *now*:
   pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
   to a new pmap, leaving a window where a pmap is still in a CPU's
   ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
   by the hypervisor at any time, it can be large enough to let another
   CPU free the PTP and reuse it as a normal page.

 To fix 2), avoid cross-calls and IPIs completely, and instead
 use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
 It's safe because we just need to update the table page, a tlbflush IPI will
 happen later. As a side effect, we don't need a different code for bootstrap,
 fixing 1). The mutex added to struct cpu needs a small headers reorganisation.

 to fix 3), introduce a pm_xen_ptp_cpus which is updated from
 cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
 ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.

 While there I removed the unused pmap_is_active() function;
 and added some more details to DIAGNOSTIC panics.


 To generate a diff of this commit:
 cvs rdiff -u -r1.47 -r1.48 src/sys/arch/x86/include/cpu.h
 cvs rdiff -u -r1.49 -r1.50 src/sys/arch/x86/include/pmap.h
 cvs rdiff -u -r1.96 -r1.97 src/sys/arch/x86/x86/cpu.c
 cvs rdiff -u -r1.164 -r1.165 src/sys/arch/x86/x86/pmap.c
 cvs rdiff -u -r1.36 -r1.37 src/sys/arch/xen/include/hypervisor.h
 cvs rdiff -u -r1.33 -r1.34 src/sys/arch/xen/include/intr.h
 cvs rdiff -u -r1.80 -r1.81 src/sys/arch/xen/x86/cpu.c
 cvs rdiff -u -r1.38 -r1.39 src/sys/arch/xen/x86/x86_xpmap.c
 cvs rdiff -u -r1.9 -r1.10 src/sys/arch/xen/x86/xen_ipi.c
 cvs rdiff -u -r1.16 -r1.17 src/sys/arch/xen/x86/xen_pmap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Jeff Rizzo <riz@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: Greg Oster <oster@cs.usask.ca>, bouyer@NetBSD.org
Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 during heavy activity)
Date: Sat, 18 Feb 2012 09:33:36 -0800

 On 2/17/12 9:30 AM, Greg Oster wrote:
 >   With this patch I was able to do a successful 'build.sh -j8' build on an
 >   amd64 DOMU that was previously failing.
 >
 >

 With the second patch (and the first), things are definitely _more_ 
 stable, though I still get crashes (with, unfortunately, no backtrace 
 info, because the console is garbled) doing a -j 16/-j 24 build on an 
 8-vcpu amazon EC2 instance.  On another dom0 where I was seeing this 
 particular problem, it seems to be gone.


From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Jeff Rizzo <riz@netbsd.org>
Cc: gnats-bugs@netbsd.org, Greg Oster <oster@cs.usask.ca>
Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 during heavy activity)
Date: Sat, 18 Feb 2012 19:27:56 +0100

 On Sat, Feb 18, 2012 at 09:33:36AM -0800, Jeff Rizzo wrote:
 > On 2/17/12 9:30 AM, Greg Oster wrote:
 > >  With this patch I was able to do a successful 'build.sh -j8' build on an
 > >  amd64 DOMU that was previously failing.
 > >
 > >
 > 
 > With the second patch (and the first), things are definitely _more_
 > stable, though I still get crashes (with, unfortunately, no
 > backtrace info, because the console is garbled) doing a -j 16/-j 24
 > build on an 8-vcpu amazon EC2 instance.  On another dom0 where I was
 > seeing this particular problem, it seems to be gone.

 Do you know the hypervisor version on amazon EC2 ?
 We have fixed a few Xen bugs related to PDP recursive mappings;
 AKAIF they've been integrated upstream but maybe amazon don't have them
 all ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Jeff Rizzo <riz@netbsd.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@netbsd.org, Greg Oster <oster@cs.usask.ca>
Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 during heavy activity)
Date: Sun, 19 Feb 2012 09:18:31 -0800

 On 2/18/12 10:27 AM, Manuel Bouyer wrote:
 > On Sat, Feb 18, 2012 at 09:33:36AM -0800, Jeff Rizzo wrote:
 >> On 2/17/12 9:30 AM, Greg Oster wrote:
 >>>   With this patch I was able to do a successful 'build.sh -j8' build on an
 >>>   amd64 DOMU that was previously failing.
 >>>
 >>>
 >> With the second patch (and the first), things are definitely _more_
 >> stable, though I still get crashes (with, unfortunately, no
 >> backtrace info, because the console is garbled) doing a -j 16/-j 24
 >> build on an 8-vcpu amazon EC2 instance.  On another dom0 where I was
 >> seeing this particular problem, it seems to be gone.
 > Do you know the hypervisor version on amazon EC2 ?
 > We have fixed a few Xen bugs related to PDP recursive mappings;
 > AKAIF they've been integrated upstream but maybe amazon don't have them
 > all ?
 >

 I have seen everything from 3.0 to 3.4;  most usually 3.4.

 +j

From: "Manuel Bouyer" <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/45975 CVS commit: src/sys/uvm
Date: Mon, 20 Feb 2012 19:14:24 +0000

 Module Name:	src
 Committed By:	bouyer
 Date:		Mon Feb 20 19:14:24 UTC 2012

 Modified Files:
 	src/sys/uvm: uvm_km.c uvm_kmguard.c uvm_map.c

 Log Message:
 When using uvm_km_pgremove_intrsafe() make sure mappings are removed
 before returning the pages to the free pool. Otherwise, under Xen,
 a page which still has a writable mapping could be allocated for
 a PDP by another CPU and the hypervisor would refuse it (this is
 PR port-xen/45975).
 For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
 and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
 (as suggested by Chuck Silvers on tech-kern@, see also
 http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
 followups).


 To generate a diff of this commit:
 cvs rdiff -u -r1.121 -r1.122 src/sys/uvm/uvm_km.c
 cvs rdiff -u -r1.9 -r1.10 src/sys/uvm/uvm_kmguard.c
 cvs rdiff -u -r1.314 -r1.315 src/sys/uvm/uvm_map.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/45975 CVS commit: [netbsd-6] src/sys
Date: Wed, 22 Feb 2012 18:56:50 +0000

 Module Name:	src
 Committed By:	riz
 Date:		Wed Feb 22 18:56:49 UTC 2012

 Modified Files:
 	src/sys/arch/x86/include [netbsd-6]: cpu.h pmap.h
 	src/sys/arch/x86/x86 [netbsd-6]: cpu.c pmap.c
 	src/sys/arch/xen/include [netbsd-6]: hypervisor.h intr.h
 	src/sys/arch/xen/x86 [netbsd-6]: cpu.c x86_xpmap.c xen_ipi.c xen_pmap.c
 	src/sys/uvm [netbsd-6]: uvm_km.c uvm_kmguard.c uvm_map.c

 Log Message:
 Pull up following revision(s) (requested by bouyer in ticket #29):
 	sys/arch/xen/x86/x86_xpmap.c: revision 1.39
 	sys/arch/xen/include/hypervisor.h: revision 1.37
 	sys/arch/xen/include/intr.h: revision 1.34
 	sys/arch/xen/x86/xen_ipi.c: revision 1.10
 	sys/arch/x86/x86/cpu.c: revision 1.97
 	sys/arch/x86/include/cpu.h: revision 1.48
 	sys/uvm/uvm_map.c: revision 1.315
 	sys/arch/x86/x86/pmap.c: revision 1.165
 	sys/arch/xen/x86/cpu.c: revision 1.81
 	sys/arch/x86/x86/pmap.c: revision 1.167
 	sys/arch/xen/x86/cpu.c: revision 1.82
 	sys/arch/x86/x86/pmap.c: revision 1.168
 	sys/arch/xen/x86/xen_pmap.c: revision 1.17
 	sys/uvm/uvm_km.c: revision 1.122
 	sys/uvm/uvm_kmguard.c: revision 1.10
 	sys/arch/x86/include/pmap.h: revision 1.50
 Apply patch proposed in PR port-xen/45975 (this does not solve the exact
 problem reported here but is part of the solution):
 xen_kpm_sync() is not working as expected,
 leading to races between CPUs.
 1 the check (xpq_cpu != &x86_curcpu) is always false because we
   have different x86_curcpu symbols with different addresses in the kernel.
   Fortunably, all addresses dissaemble to the same code.
   Because of this we always use the code intended for bootstrap, which doesn't
   use cross-calls or lock.
 2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
   which cause it to sleep and pmap.c doesn't like that. It triggers this
   KASSERT() in pmap_unmap_ptes():
   KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
 3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
   needs to know on which CPU a pmap is loaded *now*:
   pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
   to a new pmap, leaving a window where a pmap is still in a CPU's
   ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
   by the hypervisor at any time, it can be large enough to let another
   CPU free the PTP and reuse it as a normal page.
 To fix 2), avoid cross-calls and IPIs completely, and instead
 use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
 It's safe because we just need to update the table page, a tlbflush IPI will
 happen later. As a side effect, we don't need a different code for bootstrap,
 fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
 to fix 3), introduce a pm_xen_ptp_cpus which is updated from
 cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
 ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
 While there I removed the unused pmap_is_active() function;
 and added some more details to DIAGNOSTIC panics.
 When using uvm_km_pgremove_intrsafe() make sure mappings are removed
 before returning the pages to the free pool. Otherwise, under Xen,
 a page which still has a writable mapping could be allocated for
 a PDP by another CPU and the hypervisor would refuse it (this is
 PR port-xen/45975).
 For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
 and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
 (as suggested by Chuck Silvers on tech-kern@, see also
 http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
 followups).
 Avoid early use of xen_kpm_sync(); locks are not available at this time.
 Don't call cpu_init() twice.
 Makes LOCKDEBUG kernels boot again
 Revert pmap_pte_flush() -> xpq_flush_queue() in previous.


 To generate a diff of this commit:
 cvs rdiff -u -r1.47 -r1.47.2.1 src/sys/arch/x86/include/cpu.h
 cvs rdiff -u -r1.49 -r1.49.2.1 src/sys/arch/x86/include/pmap.h
 cvs rdiff -u -r1.96 -r1.96.8.1 src/sys/arch/x86/x86/cpu.c
 cvs rdiff -u -r1.164 -r1.164.2.1 src/sys/arch/x86/x86/pmap.c
 cvs rdiff -u -r1.36.2.1 -r1.36.2.2 src/sys/arch/xen/include/hypervisor.h
 cvs rdiff -u -r1.33 -r1.33.8.1 src/sys/arch/xen/include/intr.h
 cvs rdiff -u -r1.80 -r1.80.2.1 src/sys/arch/xen/x86/cpu.c
 cvs rdiff -u -r1.38 -r1.38.2.1 src/sys/arch/xen/x86/x86_xpmap.c
 cvs rdiff -u -r1.9 -r1.9.2.1 src/sys/arch/xen/x86/xen_ipi.c
 cvs rdiff -u -r1.16 -r1.16.2.1 src/sys/arch/xen/x86/xen_pmap.c
 cvs rdiff -u -r1.120 -r1.120.2.1 src/sys/uvm/uvm_km.c
 cvs rdiff -u -r1.9 -r1.9.2.1 src/sys/uvm/uvm_kmguard.c
 cvs rdiff -u -r1.313 -r1.313.2.1 src/sys/uvm/uvm_map.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, riz@NetBSD.org
Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 during heavy activity)
Date: Sat, 25 Feb 2012 13:25:48 +0100

 Hi,
 this PR could be closed now, isn't it ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Fri, 02 Mar 2012 07:51:01 +0000
State-Changed-Why:
previous feedback was received a couple weeks back


State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Fri, 02 Mar 2012 07:51:21 +0000
State-Changed-Why:
is this fully fixed now?


From: riz@NetBSD.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/45975
Date: Mon, 26 Mar 2012 13:51:41 -0700

 I just got this same panic on a fresh 6.0_BETA *i386* EC2 instance (actually
 two separate instances);  I'm not sure if this is the same, but it looks
 similar to my eyes.

 Both hypervisors had the same version:  
 hypervisor0 at mainbus0: Xen version 3.1.2-128.1.10.el5


 Here's the full console log; I've heard that I may be able to get better
 console access on EC2 now, but I haven't figured out how yet.


 i-2a30074e
 2012-03-26T20:14:44+0000
 Xen Minimal OS!
   start_info: 0xa01000(VA)
     nr_pages: 0x26700
   shared_inf: 0xdee40000(MA)
      pt_base: 0xa04000(VA)
 nr_pt_frames: 0x9
     mfn_list: 0x967000(VA)
    mod_start: 0x0(VA)
      mod_len: 0
        flags: 0x0
     cmd_line:  root=/dev/sda1 ro 4
   stack:      0x946780-0x966780
 MM: Init
       _text: 0x0(VA)
      _etext: 0x61e65(VA)
    _erodata: 0x76000(VA)
      _edata: 0x7b6d4(VA)
 stack start: 0x946780(VA)
        _end: 0x966d34(VA)
   start_pfn: a10
     max_pfn: 26700
 Mapping memory range 0xc00000 - 0x26700000
 setting 0x0-0x76000 readonly
 skipped 0x1000
 MM: Initialise page allocator for b3e000(b3e000)-0(26700000)
 MM: done
 Demand map pfns at 26701000-36701000.
 Heap resides at 36702000-76702000.
 Initialising timer interface
 Initialising console ... done.
 gnttab_table mapped at 0x26701000.
 Initialising scheduler
 Thread "Idle": pointer: 0x36702008, stack: 0xbf0000
 Initialising xenbus
 Thread "xenstore": pointer: 0x36702478, stack: 0x26600000
 Dummy main: start_info=0x966880
 Thread "main": pointer: 0x367028e8, stack: 0x26610000
 "main" "root=/dev/sda1" "ro" "4" 
 vbd 2049 is hd0
 ******************* BLKFRONT for device/vbd/2049 **********


 backend at /local/domain/0/backend/vbd/967/2049
 Failed to read /local/domain/0/backend/vbd/967/2049/feature-barrier.
 Failed to read /local/domain/0/backend/vbd/967/2049/feature-flush-cache.
 2097152 sectors of 0 bytes
 **************************
 vbd 2050 is hd1
 ******************* BLKFRONT for device/vbd/2050 **********


 backend at /local/domain/0/backend/vbd/967/2050
 Failed to read /local/domain/0/backend/vbd/967/2050/feature-barrier.
 Failed to read /local/domain/0/backend/vbd/967/2050/feature-flush-cache.
 10485760 sectors of 0 bytes
 **************************
 [H[J  Booting 'NetBSD AMI'

 root (hd0)
  Filesystem type is ext2fs, using whole disk
 kernel /boot/netbsd root=xbd1

 xc_dom_probe_bzimage_kernel: kernel is not a bzImage
 close blk: backend at /local/domain/0/backend/vbd/967/2049
 close blk: backend at /local/domain/0/backend/vbd/967/2050
 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
     2006, 2007, 2008, 2009, 2010, 2011, 2012
     The NetBSD Foundation, Inc.  All rights reserved.
 Copyright (c) 1982, 1986, 1989, 1991, 1993
     The Regents of the University of California.  All rights reserved.

 NetBSD 6.0_BETA (EC2) #3: Sun Mar 25 21:02:35 PDT 2012
 	riz@breadfruit.tastylime.net:/home/riz/obj.i386/sys/arch/i386/compile/EC2
 total memory = 615 MB
 avail memory = 596 MB
 mainbus0 (root)
 hypervisor0 at mainbus0: Xen version 3.1.2-128.1.10.el5
 vcpu0 at hypervisor0: Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz, id 0x1067a
 xenbus0 at hypervisor0: Xen Virtual Bus Interface
 xencons0 at hypervisor0: Xen Virtual Console Driver
 npx0 at hypervisor0: using exception 16
 xennet0 at xenbus0 id 0: Xen Virtual Network Interface
 xennet0: MAC address 12:31:39:14:5d:6f
 xbd0 at xenbus0 id 2049: Xen Virtual Block Device Interface
 xbd1 at xenbus0 id 2050: Xen Virtual Block Device Interface
 balloon0 at xenbus0 id 0: Xen Balloon driver
 balloon0: current reservation: 629760 KiB
 xennet0: using RX copy mode
 balloon0: current reservation: 157440 pages => target: 157440 pages
 boot device: xbd1
 root on xbd1a dumps on xbd1b
 root file system type: ffs
 xpq_flush_queue: 3 entries (0 successful) on cpu0 (0)
 cpu0 (0):
   0x00000003f4bea000: 0x00000003fa2a9001
   0x00000003f4bea008: 0x000000026ab68001
   0x00000003f4bea010: 0x000000026ab67001
 panic: HYPERVISOR_mmu_update failed, ret: -22

 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 eip c0136674 cs 9 eflags 282 cr2 36fc0000 ilevel 8
 Stopped in pid 1.1 (init) at    netbsd:breakpoint+0x4:  popl    %ebp
 breakpoint(c0491d6a,c04f45a0,c04c7424,d8d25a78,c04f46a4,8,d8d25a5c,c04ff970,3,3)
  at netbsd:breakpoint+0x4
 vpanic(c04c7424,d8d25a78,d8d25a9c,c13c9800,8,1,d8d25acc,c04312e8,c04c7424,ffffff
 ea) at netbsd:vpanic+0x218
 panic(c04c7424,ffffffea,3,6ab67001,2,2,d8d25adc,c03f861c,c133be18,c150de8c) at n
 etbsd:panic+0x18
 xpq_flush_queue(c04cfbea,0,d8d25b8c,c03f92d4,c150de5c,0,0,bf7ff000,6,f4bea018) a
 t netbsd:xpq_flush_queue+0x188
 xpq_queue_tlb_flush(6,c04872f4,0,c071f000,f4bea010,c071f000,d8d25b7c,c013565d,f4
 bea010,3) at netbsd:xpq_queue_tlb_flush+0x19
 tlbflush(f4bea010,3,6ab67001,2,0,bfdfc000,0,3,0,0) at netbsd:tlbflush+0x1a
 cpu_load_pmap(c150ef08,c04f2b00,d8d25c0c,c03fa726,c150de58,d8d25bd8,0,0,ffffffff
 ,ffffffff) at netbsd:cpu_load_pmap+0xfd
 pmap_load(c049bd64,d8d25d2c,c0100848,c01d3d6e,c049bd64,bf7ffff5,b,0,ffffffff,fff
 fffff) at netbsd:pmap_load+0x1b0
 do_pmap_load(c01d3d6e,c049bd64,bf7ffff5,b,0,ffffffff,ffffffff,0,b0717,c14f9800) a
 t netbsd:do_pmap_load+0x16
 copyout(c14f9800,7ca000,c0633200,0,c010006d,0,0,0,0,0) at netbsd:copyout+0x48
 ds          d8d20011
 es          c04c0011    copyright+0x345b1
 fs          31
 gs          d8d20011
 edi         d8d25a78
 esi         c04c7424    copyright+0x3b9c4
 ebp         d8d25a0c
 ebx         104
 edx         ffffffff
 ecx         0
 eax         1
 eip         c0136674    breakpoint+0x4
 cs          9
 eflags      282
 esp         d8d25a0c
 ss          11
 netbsd:breakpoint+0x4:  popl    %ebp
 db{0}> 

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, riz@NetBSD.org
Subject: Re: port-xen/45975
Date: Sun, 1 Apr 2012 16:41:51 +0200

 On Mon, Mar 26, 2012 at 08:55:02PM +0000, riz@NetBSD.org wrote:
 >  Both hypervisors had the same version:  
 >  hypervisor0 at mainbus0: Xen version 3.1.2-128.1.10.el5

 This would need to be confirmed but I fear the problem is in the
 hypervisor: a lot of bug related to recursive mappings have been 
 fixed in Xen; and I'm almost sure 3.1.2 does not have them. The last
 3.1 version is 3.1.4; I don't know if it's possible to get amazon to
 run at last this version ...

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: "Aaron J. Grier" <agrier@poofygoof.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-xen/45975
Date: Wed, 4 Jul 2012 16:12:23 -0700

 was this ever pulled up to NetBSD-6?  I'm seeing this with an amd64 domU
 on my xen 4.1.2 system (with NetBSD-6 as dom0) during a build.sh -j2.
 for comparison, an i386 domU runs to completion.

 * dom0

 NetBSD pythagoras.poofy.goof.com 6.0_BETA2 NetBSD 6.0_BETA2 (XEN3_DOM0) #2: Mon Jun 11 16:27:07 PDT 2012 agrier@pythagoras.poofy.goof.com:/disk/teraraid/obj/amd64/disk/teraraid/NetBSD/6/src/sys/arch/amd64/compile/XEN3_DOM0 amd64

 * domU

 NetBSD milo.pythagoras.poofy.goof.com 6.0_BETA2 NetBSD 6.0_BETA2 (XEN3_DOMU) #2: Mon Jun 11 16:30:47 PDT 2012 agrier@pythagoras.poofy.goof.com:/disk/teraraid/obj/amd64/disk/teraraid/NetBSD/6/src/sys/arch/amd64/compile/XEN3_DOMU amd64

 * the domU panic

 xpq_flush_queue: 1 entries (0 successful) on cpu0 (0)
 cpu0 (0):
   0x0000000000000000: 0x0000000000000000
  kpm_pdir[254]: 0x189c5f027
 cpu1 (1):
  kpm_pdir[254]: 0x199dba027
 panic: HYPERVISOR_mmu_update failed, ret: -22

 cpu0: Begin traceback...
 printf_nolog() at netbsd:printf_nolog
 xpq_queue_machphys_update() at netbsd:xpq_queue_machphys_update
 pmap_free_ptp() at netbsd:pmap_free_ptp+0xef
 pmap_remove() at netbsd:pmap_remove+0x25b
 uvm_unmap_remove() at netbsd:uvm_unmap_remove+0x256
 uvm_unmap1() at netbsd:uvm_unmap1+0x35
 uvmspace_exec() at netbsd:uvmspace_exec+0xb5
 execve_runproc() at netbsd:execve_runproc+0x51c
 execve1() at netbsd:execve1+0x33
 syscall() at netbsd:syscall+0xc4
 cpu0: End traceback...

 dump to dev 142,1 not possible
 rebooting...

 dom0 xl dmesg:

 (XEN) domain.c:652:d30 Attempt to change CR4 flags 00000660 -> 00000620
 (XEN) mm.c:658:d30 Could not get page ref for pfn 0
 (XEN) mm.c:3459:d30 Could not get page for normal update

 -- 
   Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, riz@NetBSD.org
Subject: Re: port-xen/45975
Date: Thu, 5 Jul 2012 10:31:51 +0200

 On Thu, Jul 05, 2012 at 01:20:04AM +0000, Aaron J. Grier wrote:
 > The following reply was made to PR port-xen/45975; it has been noted by GNATS.
 > 
 > From: "Aaron J. Grier" <agrier@poofygoof.com>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-xen/45975
 > Date: Wed, 4 Jul 2012 16:12:23 -0700
 > 
 >  was this ever pulled up to NetBSD-6?

 It was pulled up on 2012/02/22 (ticket #29)

 > I'm seeing this with an amd64 domU
 >  on my xen 4.1.2 system (with NetBSD-6 as dom0) during a build.sh -j2.
 >  for comparison, an i386 domU runs to completion.
 >  
 >  * dom0
 >  
 >  NetBSD pythagoras.poofy.goof.com 6.0_BETA2 NetBSD 6.0_BETA2 (XEN3_DOM0) #2: Mon Jun 11 16:27:07 PDT 2012 agrier@pythagoras.poofy.goof.com:/disk/teraraid/obj/amd64/disk/teraraid/NetBSD/6/src/sys/arch/amd64/compile/XEN3_DOM0 amd64
 >  
 >  * domU
 >  
 >  NetBSD milo.pythagoras.poofy.goof.com 6.0_BETA2 NetBSD 6.0_BETA2 (XEN3_DOMU) #2: Mon Jun 11 16:30:47 PDT 2012 agrier@pythagoras.poofy.goof.com:/disk/teraraid/obj/amd64/disk/teraraid/NetBSD/6/src/sys/arch/amd64/compile/XEN3_DOMU amd64
 >  
 >  * the domU panic
 >  
 >  xpq_flush_queue: 1 entries (0 successful) on cpu0 (0)
 >  cpu0 (0):
 >    0x0000000000000000: 0x0000000000000000

 This looks bogus. xpmap_ptetomach() returned 0 as PTE's machine address.


 >   kpm_pdir[254]: 0x189c5f027
 >  cpu1 (1):
 >   kpm_pdir[254]: 0x199dba027
 >  panic: HYPERVISOR_mmu_update failed, ret: -22
 >  
 >  cpu0: Begin traceback...
 >  printf_nolog() at netbsd:printf_nolog
 >  xpq_queue_machphys_update() at netbsd:xpq_queue_machphys_update
 >  pmap_free_ptp() at netbsd:pmap_free_ptp+0xef
 >  pmap_remove() at netbsd:pmap_remove+0x25b
 >  uvm_unmap_remove() at netbsd:uvm_unmap_remove+0x256
 >  uvm_unmap1() at netbsd:uvm_unmap1+0x35
 >  uvmspace_exec() at netbsd:uvmspace_exec+0xb5
 >  execve_runproc() at netbsd:execve_runproc+0x51c
 >  execve1() at netbsd:execve1+0x33
 >  syscall() at netbsd:syscall+0xc4
 >  cpu0: End traceback...

 So either something did race and already freed this pdes[], or the p2m table
 is corrupted. It looks like yet another problem.

 Could you see if a HEAD kernel has the same issue ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 20 May 2017 19:24:11 +0000
State-Changed-Why:
*crickets*


State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 20 May 2017 19:24:27 +0000
State-Changed-Why:
what's the current state of this PR?


From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: netbsd-bugs@NetBSD.org, gnats-admin@NetBSD.org, dholland@NetBSD.org,
        riz@NetBSD.org
Subject: Re: port-xen/45975 (panic: HYPERVISOR_mmu_update failed, ret: -22
 during heavy activity)
Date: Sat, 20 May 2017 21:44:00 +0200

 On Sat, May 20, 2017 at 07:24:28PM +0000, dholland@NetBSD.org wrote:
 > Synopsis: panic: HYPERVISOR_mmu_update failed, ret: -22 during heavy activity
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: dholland@NetBSD.org
 > State-Changed-When: Sat, 20 May 2017 19:24:27 +0000
 > State-Changed-Why:
 > what's the current state of this PR?

 AFAIK it's fixed, as long as the domU is running on an hypervisor
 which doesn't have bugs related to recursive mappings. Antything newer
 than 3.x should be safe.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 20 May 2017 20:54:51 +0000
State-Changed-Why:
xen bugs aren't our problem, so it's fixed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.