NetBSD Problem Report #48739

From www@NetBSD.org  Sat Apr 12 13:02:06 2014
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 8FF38A5813
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 12 Apr 2014 13:02:06 +0000 (UTC)
Message-Id: <20140412130204.1C347A5825@mollari.NetBSD.org>
Date: Sat, 12 Apr 2014 13:02:04 +0000 (UTC)
From: erplefoo@gmail.com
Reply-To: erplefoo@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Reproducible panic in ld_virtio.c on NetBSD/amd64 guest running under qemu on CentOS 6.5
X-Send-Pr-Version: www-1.0

>Number:         48739
>Notify-List:    khorben@defora.org
>Category:       kern
>Synopsis:       Reproducible panic in ld_virtio.c on NetBSD/amd64 guest running under qemu on CentOS 6.5
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 12 13:05:00 +0000 2014
>Closed-Date:    Tue Nov 17 23:32:25 +0000 2015
>Last-Modified:  Tue Nov 17 23:32:25 +0000 2015
>Originator:     Sean Davis
>Release:        netbsd-6 as of 20140410
>Organization:
tinkering guild
>Environment:
NetBSD ansible.endersgame.net 6.1_STABLE NetBSD 6.1_STABLE (ANSIBLE-$Revision: 1.44 $) #4: Mon Apr  7 10:07:54 UTC 2014  root@ansible.endersgame.net:/mnt/ld2a/build/obj/sys/arch/amd64/compile/ANSIBLE amd64

>Description:
Every time I run something IO-heavy (installing world from sets, bonnie++,
extracting a very large tarball, etc) the system will panic. The panics point
to virtio; I have included a collection at the end of this PR.

These panics also take place when the system is set for IDE storage rather than
VirtIO.

In the effort to pin it down, I wrote a small program to malloc and memset 4G.
This reproduces the issue as long as swap is configured - without swap, I see
UVM kill the process as expected and the system remains operational. Relevant
output is below, followed by the kernel configuration currently in use. I have
tried it with GENERIC with the same results. I've not been able to get a crash
dump, but would be happy to provide any requested information from DDB as long
as somebody can point me to the right commands.

Note: the dmesg output shows 4094 MB RAM, as I configured it to have 4095 in
the hypervisor to see if that made a difference. When the system is configured
for only 1024MB RAM, this does not happen. At first I attributed this to the
"Other OS" template specifying a 32-bit bus, but the same thing happens when
run when it is switched to a redhat template which specifies a 64 bit bus.

the qemu version is quite old:
[dive@vmhost1 ~]$ rpm -qa|grep qemu
gpxe-roms-qemu-0.9.7-6.10.el6.noarch
qemu-img-0.12.1.2-2.415.el6_5.6.x86_64
qemu-kvm-0.12.1.2-2.415.el6_5.6.x86_64


1) when no swap is configured:
UVM: pid 958 (nbpanic), uid 0 killed: out of swap

2) when swap is configured:
[ssh session] - note: without being root, I got the UVM kill; hence the sudo.
dive@ansible ~ $ sudo ./nbpanic
trying 4294967296 bytes
malloc(4294967296)
malloc(4294967296) gave us 0x7f7ef7700000
memset(0x7f7ef7700000,1,4294967296)
Connection to ansible closed.

[virtual machine console]
uvm_fault(0xffffffff804887e0, 0xffff800047c23000, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff802cc7d4 cs 8 rflags 10206 cr2  ffff800047c23ff8 cpl 8 rsp fffffe80043f27b8
panic: trap
cpu0: Begin traceback...
printf_nolog() at netbsd:printf_nolog
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
ld_virtio_start() at netbsd:ld_virtio_start+0x177
ldstart() at netbsd:ldstart+0x6f
ldstrategy() at netbsd:ldstrategy+0x104
bdev_strategy() at netbsd:bdev_strategy+0x47
spec_strategy() at netbsd:spec_strategy+0x2e
VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x33
swstrategy() at netbsd:swstrategy+0xc2
bdev_strategy() at netbsd:bdev_strategy+0x47
spec_strategy() at netbsd:spec_strategy+0x2e
VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x33
uvm_swap_io() at netbsd:uvm_swap_io+0x11f
swapcluster_flush() at netbsd:swapcluster_flush+0x49
uvm_pageout() at netbsd:uvm_pageout+0x31b
cpu0: End traceback...

dump to dev 19,17 not possible
rebooting...
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011, 2012
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 6.1_STABLE (ANSIBLE-$Revision: 1.44 $) #4: Mon Apr  7 10:07:54 UTC 2014
        root@ansible.endersgame.net:/mnt/ld2a/build/obj/sys/arch/amd64/compile/ANSIBLE
total memory = 4094 MB
avail memory = 3970 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
oVirt oVirt Node (6-5.el6.centos.11.2)
mainbus0 (root)
mainbus0: Intel MP Specification (Version 1.4) (BOCHSCPU 0.1         )
cpu0 at mainbus0 apid 0: Westmere E56xx/L56xx/X56xx (Nehalem-C), id 0x206c1
cpu1 at mainbus0 apid 1: Westmere E56xx/L56xx/X56xx (Nehalem-C), id 0x206c1
cpu2 at mainbus0 apid 2: Westmere E56xx/L56xx/X56xx (Nehalem-C), id 0x206c1
cpu3 at mainbus0 apid 3: Westmere E56xx/L56xx/X56xx (Nehalem-C), id 0x206c1
mpbios: bus 0 is type PCI   
mpbios: bus 1 is type ISA   
ioapic0 at mainbus0 apid 0: pa 0xfec00000, version 11, 24 pins
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 0x8086 product 0x1237 (rev. 0x02)
pcib0 at pci0 dev 1 function 0: vendor 0x8086 product 0x7000 (rev. 0x00)
piixide0 at pci0 dev 1 function 1: Intel 82371SB IDE Interface (PIIX3) (rev. 0x00)
piixide0: bus-master DMA support present
piixide0: primary channel wired to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14
atabus0 at piixide0 channel 0
piixide0: secondary channel wired to compatibility mode
piixide0: secondary channel interrupting at ioapic0 pin 15
atabus1 at piixide0 channel 1
vendor 0x8086 product 0x7020 (USB serial bus, revision 0x01) at pci0 dev 1 function 2 not configured
piixpm0 at pci0 dev 1 function 3: vendor 0x8086 product 0x7113 (rev. 0x03)
timecounter: Timecounter "piixpm0" frequency 3579545 Hz quality 1000
piixpm0: 24-bit timer
piixpm0: interrupting at ioapic0 pin 9
iic0 at piixpm0: I2C bus
vga0 at pci0 dev 2 function 0: vendor 0x1b36 product 0x0100 (rev. 0x04)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
drm at vga0 not configured
virtio0 at pci0 dev 3 function 0
virtio0: Virtio Network Device (rev. 0x00)
vioif0 at virtio0: Ethernet address 00:1a:4a:10:d4:25
virtio0: allocated 20480 byte for virtqueue 0 for rx, size 256
virtio0: using 8192 byte (512 entries) indirect descriptors
virtio0: allocated 81920 byte for virtqueue 1 for tx, size 256
virtio0: using 69632 byte (4352 entries) indirect descriptors
virtio0: allocated 8192 byte for virtqueue 2 for control, size 64
virtio0: interrupting at ioapic0 pin 11
virtio1 at pci0 dev 4 function 0
virtio1: Virtio Console Device (rev. 0x00)
virtio1: no matching child driver; not configured
virtio2 at pci0 dev 5 function 0
virtio2: Virtio Block Device (rev. 0x00)
ld0 at virtio2
virtio2: allocated 45056 byte for virtqueue 0 for I/O request, size 128
virtio2: using 36864 byte (2304 entries) indirect descriptors
ld0: 200 GB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 419430400 sectors
virtio2: interrupting at ioapic0 pin 10
virtio3 at pci0 dev 6 function 0
virtio3: Virtio Block Device (rev. 0x00)
ld1 at virtio3
virtio3: allocated 45056 byte for virtqueue 0 for I/O request, size 128
virtio3: using 36864 byte (2304 entries) indirect descriptors
ld1: 20480 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 41943040 sectors
virtio3: interrupting at ioapic0 pin 10
virtio4 at pci0 dev 7 function 0
virtio4: Virtio Block Device (rev. 0x00)
ld2 at virtio4
virtio4: allocated 45056 byte for virtqueue 0 for I/O request, size 128
virtio4: using 36864 byte (2304 entries) indirect descriptors
ld2: 40960 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 83886080 sectors
virtio4: interrupting at ioapic0 pin 11
virtio5 at pci0 dev 8 function 0
virtio5: Virtio Network Device (rev. 0x00)
vioif1 at virtio5: Ethernet address 00:1a:4a:10:d4:0e
virtio5: allocated 20480 byte for virtqueue 0 for rx, size 256
virtio5: using 8192 byte (512 entries) indirect descriptors
virtio5: allocated 81920 byte for virtqueue 1 for tx, size 256
virtio5: using 69632 byte (4352 entries) indirect descriptors
virtio5: allocated 8192 byte for virtqueue 2 for control, size 64
virtio5: interrupting at ioapic0 pin 11
virtio6 at pci0 dev 9 function 0
virtio6: Virtio Block Device (rev. 0x00)
ld3 at virtio6
virtio6: allocated 45056 byte for virtqueue 0 for I/O request, size 128
virtio6: using 36864 byte (2304 entries) indirect descriptors
ld3: 40960 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 83886080 sectors
virtio6: interrupting at ioapic0 pin 10
virtio7 at pci0 dev 10 function 0
virtio7: Virtio Block Device (rev. 0x00)
ld4 at virtio7
virtio7: allocated 45056 byte for virtqueue 0 for I/O request, size 128
virtio7: using 36864 byte (2304 entries) indirect descriptors
ld4: 40960 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 83886080 sectors
virtio7: interrupting at ioapic0 pin 10
isa0 at pcib0
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
attimer0 at isa0 port 0x40-0x43
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
atapibus at piixide0 not configured
boot device: ld1
root on ld1a dumps on ld1b
/: replaying log to memory
root file system type: ffs
/: replaying log to disk
/mnt/wd-spindle-0: replaying log to disk
/mnt/pliant-ssd-0: replaying log to disk
/mnt/ocz-ssd-0: replaying log to disk
/mnt/wd-spindle-1: replaying log to disk
Accounting started



Another, from when it was running bonnie++ rather than my program:
uvm_fault(0xffffffff80e0ce20, 0xffff80008e8ec000, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff8084c104 cs 8 rflags 10206 cr2  ffff80008e8ecff8 cpl 8 rsp fffffe810f8ee3d8
panic: trap
cpu0: Begin traceback...
printf_nolog() at netbsd:printf_nolog
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
ld_virtio_start() at netbsd:ld_virtio_start+0x177
ldstart() at netbsd:ldstart+0x6f
ldstrategy() at netbsd:ldstrategy+0x104
bdev_strategy() at netbsd:bdev_strategy+0x47
spec_strategy() at netbsd:spec_strategy+0x2e
VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x33
genfs_do_io() at netbsd:genfs_do_io+0x1a6
genfs_gop_write() at netbsd:genfs_gop_write+0x55
genfs_do_putpages() at netbsd:genfs_do_putpages+0xbe5
VOP_PUTPAGES() at netbsd:VOP_PUTPAGES+0x3a
ffs_write() at netbsd:ffs_write+0x2f9
VOP_WRITE() at netbsd:VOP_WRITE+0x37
vn_write() at netbsd:vn_write+0xf9
dofilewrite() at netbsd:dofilewrite+0x7d
sys_write() at netbsd:sys_write+0x62
syscall() at netbsd:syscall+0xc4
cpu0: End traceback...

I tried with and without ACPI, the trace was the same.

Kernel config:
# NetBSD 6 oVirt/KVM Kernel Configuration
#
# Minimal configuration.
#
# $egnet: ANSIBLE,v 1.44 2014/04/07 10:04:55 dive Exp $

machine amd64 x86

ident "ANSIBLE-$Revision: 1.44 $"

maxusers 64

makeoptions COPTS="-O2 -fno-omit-frame-pointer"

### BEGIN XXX
makeoptions DEBUG="-g"
options DDB
options DDB_HISTORY_SIZE=1024
options DDB_COMMANDONENTER="trace;show registers"
options INSECURE
### END XXX

options AIO
options BUFQ_FCFS
options BUFQ_DISKSORT
options COMPAT_43
options COREDUMP
options CPU_IN_CKSUM
options EXEC_ELF64
options EXEC_SCRIPT
options FILEASSOC
options HOSTZEROBROADCAST=0
options INET
options IPFILTER_LOG
options MPBIOS
options MPBIOS_SCANPCI
options MQUEUE
options MTRR
options MULTIPROCESSOR
options NTP
options P1003_1B_SEMAPHORE
options PAX_ASLR=0
options PAX_MPROTECT=0
options PCKBD_CNATTACH_MAY_FAIL
options PFIL_HOOKS
options PTRACE
options RFC2292
options RTC_OFFSET=0
options SCHED_4BSD
options SYSVMSG
options SYSVSEM
options SYSVSHM
options USER_VA0_DISABLE_DEFAULT=1
options VCONS_DRAW_INTR
options VERIFIED_EXEC_FP_SHA512
options VERIFIED_EXEC_FP_SHA256
options VGA_POST
options VMSWAP
options WAPBL
options WSDISPLAY_COMPAT_PCVT
options WSDISPLAY_COMPAT_SYSCONS
options WSEMUL_VT100
options WS_KERNEL_FG=WSCOL_GREEN
options secmodel_bsd44

file-system FFS
file-system PTYFS
file-system UNION

config netbsd root on ? type ?

mainbus0 at root
cpu* at mainbus?
ioapic* at mainbus? apid ?
pci* at mainbus? bus ?
pci* at pchb? bus ?
pchb* at pci? dev ? function ?
pcib* at pci? dev ? function ?
isa0 at pcib?
com0 at isa? port 0x3f8 irq 4
com1 at isa? port 0x2f8 irq 3
pckbc* at isa?
pckbd* at pckbc?
pms* at pckbc?
vga* at pci? dev ? function ?
wsdisplay* at vga? console ?
wsdisplay* at wsemuldisplaydev?
wskbd* at pckbd? console ?
wsmouse* at pms? mux 0
attimer0 at isa?
piixpm* at pci? dev ? function ?
iic* at piixpm?
virtio* at pci? dev ? function ?
viomb* at virtio?
ld* at virtio?
vioif* at virtio?
piixide* at pci? dev ? function ? flags 0x0000
atabus* at piixide? channel ?
wd* at atabus? drive ? flags 0x0000

pseudo-device accf_data
pseudo-device accf_http
pseudo-device bpfilter
pseudo-device bridge
pseudo-device clockctl
pseudo-device cpuctl
pseudo-device crypto
pseudo-device drvctl
pseudo-device fss
pseudo-device ipfilter
pseudo-device ksyms
pseudo-device loop
pseudo-device pty
pseudo-device rnd
pseudo-device swcrypto
pseudo-device tap
pseudo-device tun
pseudo-device veriexec 1
pseudo-device wsfont
pseudo-device wsmux

>How-To-Repeat:
enable a swap partition, do something that causes memory usage to require swap. I don't think that's the only case that triggers it, but it's the one I can reproduce.

A simple C program to malloc 4GB and then memset it to 1 reproduces this on my test system, but only with a swap device configured - without one, UVM kills the process as expected.
>Fix:
None known; Running without swap seems to avoid it, and it "feels" like it's most likely with amounts of RAM near or above 4GB: happens more on 8G than 4G minus 1MB, and doesn't seem to happen on 1G.

>Release-Note:

>Audit-Trail:
From: Sean Davis <erplefoo@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/48739: Reproducible panic in ld_virtio.c on NetBSD/amd64
 guest running under qemu on CentOS 6.5
Date: Sat, 12 Apr 2014 08:34:24 -0500

 --20cf30223c9132474d04f6d882c3
 Content-Type: text/plain; charset=UTF-8

 Another datapoint, which shows it happening as root but not as a normal
 user.
 This time 8GB RAM, no swap enabled:

 [ssh session]

 dive@ansible ~ $ ./nbpanic
 trying 8589934592 bytes
 malloc(8589934592)
 malloc(8589934592) gave us 0x7f7df7700000
 memset(0x7f7df7700000,1,8589934592)
 Killed
 dive@ansible ~ $ sudo ./nbpanic

 We trust you have received the usual lecture from the local System
 Administrator. It usually boils down to these three things:

     #1) Respect the privacy of others.
     #2) Think before you type.
     #3) With great power comes great responsibility.

 Password:
 trying 8589934592 bytes
 malloc(8589934592)
 malloc(8589934592) gave us 0x7f7df7700000
 memset(0x7f7df7700000,1,8589934592)


 [VM console]

 UVM: pid 588 (nbpanic), uid 1000 killed: out of swap
 fatal page fault in supervisor mode
 trap type 6 code 2 rip ffffffff802cc7df cs 8 rflags 10246 cr2
 ffff80008e900000 cpl 6 rsp fffffe811021eb00
 panic: trap
 cpu0: Begin traceback...
 printf_nolog() at netbsd:printf_nolog
 startlwp() at netbsd:startlwp
 alltraps() at netbsd:alltraps+0x96
 ld_virtio_vq_done1() at netbsd:ld_virtio_vq_done1+0x77
 ld_virtio_vq_done() at netbsd:ld_virtio_vq_done+0x3d
 virtio_vq_intr() at netbsd:virtio_vq_intr+0x75
 virtio_intr() at netbsd:virtio_intr+0x38
 intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
 cpu0: End traceback...

 dump to dev 19,17 not possible
 rebooting...



 On Sat, Apr 12, 2014 at 8:05 AM, <gnats-admin@netbsd.org> wrote:

 > Thank you very much for your problem report.
 > It has the internal identification `kern/48739'.
 > The individual assigned to look at your
 > report is: kern-bug-people.
 >
 > >Category:       kern
 > >Responsible:    kern-bug-people
 > >Synopsis:       Reproducible panic in ld_virtio.c on NetBSD/amd64 guest
 > running under qemu on CentOS 6.5
 > >Arrival-Date:   Sat Apr 12 13:05:00 +0000 2014
 >
 >


 -- 
 Sean

 --20cf30223c9132474d04f6d882c3
 Content-Type: text/html; charset=UTF-8
 Content-Transfer-Encoding: quoted-printable

 <div dir=3D"ltr">Another datapoint, which shows it happening as root but no=
 t as a normal user.<br>This time 8GB RAM, no swap enabled:<br><br>[ssh sess=
 ion]<br><br>dive@ansible ~ $ ./nbpanic <br>trying 8589934592 bytes<br>mallo=
 c(8589934592)<br>
 malloc(8589934592) gave us 0x7f7df7700000<br>memset(0x7f7df7700000,1,858993=
 4592)<br>Killed<br>dive@ansible ~ $ sudo ./nbpanic <br><br>We trust you hav=
 e received the usual lecture from the local System<br>Administrator. It usu=
 ally boils down to these three things:<br>
 <br>=C2=A0=C2=A0=C2=A0 #1) Respect the privacy of others.<br>=C2=A0=C2=A0=
 =C2=A0 #2) Think before you type.<br>=C2=A0=C2=A0=C2=A0 #3) With great powe=
 r comes great responsibility.<br><br>Password:<br>trying 8589934592 bytes<b=
 r>malloc(8589934592)<br>malloc(8589934592) gave us 0x7f7df7700000<br>
 memset(0x7f7df7700000,1,8589934592)<br><br><br>[VM console]<br><br>UVM: pid=
  588 (nbpanic), uid 1000 killed: out of swap<br>fatal page fault in supervi=
 sor mode<br>trap type 6 code 2 rip ffffffff802cc7df cs 8 rflags 10246 cr2=
 =C2=A0 ffff80008e900000 cpl 6 rsp fffffe811021eb00<br>
 panic: trap<br>cpu0: Begin traceback...<br>printf_nolog() at netbsd:printf_=
 nolog<br>startlwp() at netbsd:startlwp<br>alltraps() at netbsd:alltraps+0x9=
 6<br>ld_virtio_vq_done1() at netbsd:ld_virtio_vq_done1+0x77<br>ld_virtio_vq=
 _done() at netbsd:ld_virtio_vq_done+0x3d<br>
 virtio_vq_intr() at netbsd:virtio_vq_intr+0x75<br>virtio_intr() at netbsd:v=
 irtio_intr+0x38<br>intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x=
 1d<br>cpu0: End traceback...<br><br>dump to dev 19,17 not possible<br>reboo=
 ting...<br>
 <br></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On =
 Sat, Apr 12, 2014 at 8:05 AM,  <span dir=3D"ltr">&lt;<a href=3D"mailto:gnat=
 s-admin@netbsd.org" target=3D"_blank">gnats-admin@netbsd.org</a>&gt;</span>=
  wrote:<br>
 <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
 x #ccc solid;padding-left:1ex">Thank you very much for your problem report.=
 <br>
 It has the internal identification `kern/48739&#39;.<br>
 The individual assigned to look at your<br>
 report is: kern-bug-people.<br>
 <br>
 &gt;Category: =C2=A0 =C2=A0 =C2=A0 kern<br>
 &gt;Responsible: =C2=A0 =C2=A0kern-bug-people<br>
 &gt;Synopsis: =C2=A0 =C2=A0 =C2=A0 Reproducible panic in ld_virtio.c on Net=
 BSD/amd64 guest running under qemu on CentOS 6.5<br>
 &gt;Arrival-Date: =C2=A0 Sat Apr 12 13:05:00 +0000 2014<br>
 <br>
 </blockquote></div><br><br clear=3D"all"><br>-- <br>Sean
 </div>

 --20cf30223c9132474d04f6d882c3--

From: Pierre Pronchery <khorben@defora.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48739: Reproducible panic in ld_virtio.c on NetBSD/amd64
 guest running under qemu on CentOS 6.5
Date: Thu, 02 Jul 2015 04:53:19 +0200

 			Hi,

 I'm apparently hit by the same thing:

 > /netbsd: uvm_fault(0xffffffff810b16c0, 0xffff80008e900000, 1) -> e              
 > /netbsd: fatal page fault in supervisor mode                      
 > /netbsd: trap type 6 code 0 rip ffffffff809cce74 cs 8 rflags 10206 cr2 ffff80008e900ff8 ilevel 8 rsp fffffe810f5097c0
 > /netbsd: curlwp 0xfffffe8216316860 pid 26301.1 lowest kstack 0xfffffe810f5072c0 
 > /netbsd: panic: trap                      
 > /netbsd: cpu1: Begin traceback...                      
 > /netbsd: vpanic() at netbsd:vpanic+0x13c                      
 > /netbsd: snprintf() at netbsd:snprintf                      
 > /netbsd: startlwp() at netbsd:startlwp                      
 > /netbsd: alltraps() at netbsd:alltraps+0x96                      
 > /netbsd: ld_virtio_start() at netbsd:ld_virtio_start+0x14b                      
 > /netbsd: ldstart() at netbsd:ldstart+0x6b                      
 > /netbsd: ldstrategy() at netbsd:ldstrategy+0x101                      
 > /netbsd: bdev_strategy() at netbsd:bdev_strategy+0x5b                           
 > /netbsd: spec_strategy() at netbsd:spec_strategy+0x2c                           
 > /netbsd: VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x33                      
 > /netbsd: genfs_do_io() at netbsd:genfs_do_io+0x195                      
 > /netbsd: genfs_gop_write() at netbsd:genfs_gop_write+0x52                       
 > /netbsd: genfs_do_putpages() at netbsd:genfs_do_putpages+0xbec                  
 > /netbsd: VOP_PUTPAGES() at netbsd:VOP_PUTPAGES+0x3a                      
 > /netbsd: ffs_write() at netbsd:ffs_write+0x354                      
 > /netbsd: VOP_WRITE() at netbsd:VOP_WRITE+0x37                      
 > /netbsd: vn_write() at netbsd:vn_write+0xec                      
 > /netbsd: dofilewrite() at netbsd:dofilewrite+0x97                      
 > /netbsd: sys_write() at netbsd:sys_write+0x5f                      
 > /netbsd: syscall() at netbsd:syscall+0x9a                      
 > /netbsd: --- syscall (number 4) ---   

 I am running NetBSD 7.0_RC1 with the tickets "841, 847, 849, 853, 855"
 applied.

 It seems to happen right after this call:

 > 363         bus_dmamap_sync(vsc->sc_dmat, vr->vr_payload,
 > 364                         0, bp->b_bcount,
 > 365                         isread?BUS_DMASYNC_PREREAD:BUS_DMASYNC_PREWRITE);

 > ffffffff80609537:       e8 ef e6 c1 ff          callq  ffffffff80227c2b <bus_dmamap_sync>
 > ffffffff8060953c:       49 8b 76 20             mov    0x20(%r14),%rsi

 The crash happened repeatedly, while running either "rsync -a"
 retrieving a couple GB of remote data, or while running "git checkout"
 locally on the netbsd-src.git repository (so I/O intensive situation).
 This VM has 8 GB RAM and as much swap, running on "Joyent SmartDC HVM
 (7.20150420T105949Z)" with 2 CPUs configured.

 Let me know if more details are required to diagnose this.

 HTH,
 -- 
 khorben

From: Dustin Marquess <dmarquess@gmail.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Subject: Re: kern/48739
Date: Tue, 25 Aug 2015 00:41:35 -0500

 I'm seeing this exact same issue running the latest netbsd-7 snapshot
 from nyftp.netbsd.org.

 Details:

 Host: IBM x3755 M3 (4 x AMD Opteron 6220 [32 cores total]), 64MB RAM
 Host OS: FreeBSD 11-CURRENT + bhyve

 6 vCPUs passed to VM and 8GB RAM passed to VM.  8GB of swap enabled on
 virtual disk.  Running build.sh will eventually panic.

State-Changed-From-To: open->feedback
State-Changed-By: christos@NetBSD.org
State-Changed-When: Tue, 27 Oct 2015 22:01:02 -0400
State-Changed-Why:
should be fixed on head


State-Changed-From-To: feedback->closed
State-Changed-By: khorben@NetBSD.org
State-Changed-When: Tue, 17 Nov 2015 23:32:25 +0000
State-Changed-Why:
I am not the original submitter, but being affected by the same problem and acknowledging the fix, I believe I can close this myself. Feel free to re-open if necessary.

I tested on an amd64 GENERIC kernel, as well as an i386 MODULAR kernel.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.