NetBSD Problem Report #51516
From Manuel.Bouyer@lip6.fr Wed Sep 28 14:00:44 2016
Return-Path: <Manuel.Bouyer@lip6.fr>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id D48B97A111
for <gnats-bugs@gnats.NetBSD.org>; Wed, 28 Sep 2016 14:00:44 +0000 (UTC)
Message-Id: <20160928140039.9FB2EA832@armandeche.soc.lip6.fr>
Date: Wed, 28 Sep 2016 16:00:39 +0200 (MEST)
From: Manuel.Bouyer@lip6.fr
Reply-To: Manuel.Bouyer@lip6.fr
To: gnats-bugs@NetBSD.org
Subject: kernel trap 34: mem address not aligned in pmap_page_protect
X-Send-Pr-Version: 3.95
>Number: 51516
>Category: port-sparc
>Synopsis: kernel trap 34: mem address not aligned in pmap_page_protect
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-sparc-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Sep 28 14:05:00 +0000 2016
>Last-Modified: Tue Oct 04 17:55:00 +0000 2016
>Originator: Manuel Bouyer
>Release: NetBSD 7.0_STABLE as of Sep 25
>Organization:
>Environment:
System: NetBSD samba.lip6.fr 7.0_STABLE NetBSD 7.0_STABLE (GENERIC_SUN4U) #0: Sun Sep 25 14:56:01 CEST 2016 bouyer@hop:/dsk/l1/misc/bouyer/tmp/sparc/obj/dsk/l1/misc/bouyer/netbsd-7/src/sys/arch/sparc/compile/GENERIC_SUN4U sparc
Architecture: sparc
Machine: sparc
>Description:
While pbulk building pbulk-medium, the build stops in emacs24.
First, in the build process a bootstrap-emacs process is stuck in
what looks like an infinite loop in the kernel (top shows 100% CPU),
related to pool(2):
db{0}> tr/t 0t11133
trace: pid 11133 lid 2 at 0x303c3a20
sleepq_block(0, 1, 5fdddf4, 0, 1ca6800, 5fddd20) at netbsd:sleepq_block+0xb8
sel_do_scan(0, 5fddd20, 1, 2, 1, 3a48e00) at netbsd:sel_do_scan+0x44c
pollcommon(16, 441c88, 1, 0, 0, 8) at netbsd:pollcommon+0xbc
sys_poll(5fddd20, 303c3de0, 303c3dd8, ad4000, 9cc, 1c632a4) at netbsd:sys_poll+0
x64
syscall(303c3ed0, 303c3f58, fe1d52cc, 5fddd20, 0, fe1d52d0) at netbsd:syscall+0x
380
?(441c88, 1, ffffffff, ad4000, fd9bfda8, ad4000) at 1010d40
Then, after killing -9 the process (a kill doesn't kill it),
the system panics with (I guess while pbulk removes the build tree
or cleans up depends):
login: trap type 0x34: cpu 0, pc=1415130 npc=1415134 pstate=0xffffffffff82000e<AM,PRIV,IE>
kernel trap 34: mem address not aligned
Stopped in pid 0.9 (system) at netbsd:pmap_page_protect+0x250: ld
[
%l5 + 0x8], %i4
db{0}> tr
genfs_do_putpages(3310230, 3310230, 7fffffff, ffffe000, 51d, 0) at netbsd:genfs_do_putpages+0xa10
genfs_putpages(31837868, 5c8c260, 5d85cc8, 31837cb0, 0, 0) at netbsd:genfs_putpages+0x24
VOP_PUTPAGES(5d82b28, 0, 0, 0, 0, a) at netbsd:VOP_PUTPAGES+0x44
uvm_vnp_setsize(5d82b28, 0, 0, 0, 0, 0) at netbsd:uvm_vnp_setsize+0x8c
ffs_truncate(5d82b28, 0, 0, 0, ffffffff, 0) at netbsd:ffs_truncate+0x574
ufs_truncate(5d82b28, ffffffff, 0, ffffffff, 3a5f5c0, 3da5000) at netbsd:ufs_truncate+0x1cc
ufs_inactive(0, 20012, 31837b7c, 3da5000, 5d85cc8, 5d82b28) at netbsd:ufs_inactive+0x1c8
VOP_INACTIVE(5d82b28, 31837bef, 4e5d7c0, 3a2d578, 31837bec, ffffffff) at netbsd: VOP_INACTIVE+0x2c
vrelel(5d82b28, 0, 4e5d7c0, 0, 5d82b3c, 1) at netbsd:vrelel+0x380
ufs_remove(0, 5c8c260, 5d85cc8, 31837cb0, 5d82b28, 5c93af8) at netbsd:ufs_remove+0xac
VOP_REMOVE(5c93af8, 5d82b28, 31837d50, 0, 0, 2) at netbsd:VOP_REMOVE+0x30
do_sys_unlinkat.isra.3(0, ff41a22c, 0, 0, 3daf400, 5d82b28) at netbsd:do_sys_unlinkat.isra.3+0x18c
syscall(31837ed0, 31837f48, ff7049f8, 4e5d7c0, 0, ff7049fc) at netbsd:syscall+0x380
?(ff41a22c, 0, 4, ff402050, ff41a1d0, ff41a1d0) at 1010d40
this is reproductible
>How-To-Repeat:
run pbulk with only pbulk-medium in the pkg list.
Maybe building only emacs24 would be enough, I didn't try.
>Fix:
>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: port-sparc-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: port-sparc/51516: kernel trap 34: mem address not aligned in pmap_page_protect
Date: Thu, 29 Sep 2016 05:04:10 +1000
> db{0}> tr/t 0t11133
> trace: pid 11133 lid 2 at 0x303c3a20
> sleepq_block(0, 1, 5fdddf4, 0, 1ca6800, 5fddd20) at netbsd:sleepq_block+=
0xb8
> sel_do_scan(0, 5fddd20, 1, 2, 1, 3a48e00) at netbsd:sel_do_scan+0x44c
> pollcommon(16, 441c88, 1, 0, 0, 8) at netbsd:pollcommon+0xbc
> sys_poll(5fddd20, 303c3de0, 303c3dd8, ad4000, 9cc, 1c632a4) at netbsd:sy=
s_poll+0
> x64
> syscall(303c3ed0, 303c3f58, fe1d52cc, 5fddd20, 0, fe1d52d0) at netbsd:sy=
scall+0x
> 380
> ?(441c88, 1, ffffffff, ad4000, fd9bfda8, ad4000) at 1010d40
> =
> Then, after killing -9 the process (a kill doesn't kill it),
> the system panics with (I guess while pbulk removes the build tree
> or cleans up depends):
> =
> login: trap type 0x34: cpu 0, pc=3D1415130 npc=3D1415134 pstate=3D0xffff=
ffffff82000e<AM,PRIV,IE>
> kernel trap 34: mem address not aligned
> Stopped in pid 0.9 (system) at netbsd:pmap_page_protect+0x250: ld =
=
> [
> %l5 + 0x8], %i4
> db{0}> tr
> genfs_do_putpages(3310230, 3310230, 7fffffff, ffffe000, 51d, 0) at netbs=
d:genfs_do_putpages+0xa10
> genfs_putpages(31837868, 5c8c260, 5d85cc8, 31837cb0, 0, 0) at netbsd:gen=
fs_putpages+0x24
> VOP_PUTPAGES(5d82b28, 0, 0, 0, 0, a) at netbsd:VOP_PUTPAGES+0x44
> uvm_vnp_setsize(5d82b28, 0, 0, 0, 0, 0) at netbsd:uvm_vnp_setsize+0x8c
> ffs_truncate(5d82b28, 0, 0, 0, ffffffff, 0) at netbsd:ffs_truncate+0x574
> ufs_truncate(5d82b28, ffffffff, 0, ffffffff, 3a5f5c0, 3da5000) at netbsd=
:ufs_truncate+0x1cc
> ufs_inactive(0, 20012, 31837b7c, 3da5000, 5d85cc8, 5d82b28) at netbsd:uf=
s_inactive+0x1c8
> VOP_INACTIVE(5d82b28, 31837bef, 4e5d7c0, 3a2d578, 31837bec, ffffffff) at=
netbsd: VOP_INACTIVE+0x2c
> vrelel(5d82b28, 0, 4e5d7c0, 0, 5d82b3c, 1) at netbsd:vrelel+0x380
> ufs_remove(0, 5c8c260, 5d85cc8, 31837cb0, 5d82b28, 5c93af8) at netbsd:uf=
s_remove+0xac
> VOP_REMOVE(5c93af8, 5d82b28, 31837d50, 0, 0, 2) at netbsd:VOP_REMOVE+0x3=
0
> do_sys_unlinkat.isra.3(0, ff41a22c, 0, 0, 3daf400, 5d82b28) at netbsd:do=
_sys_unlinkat.isra.3+0x18c
> syscall(31837ed0, 31837f48, ff7049f8, 4e5d7c0, 0, ff7049fc) at netbsd:sy=
scall+0x380
> ?(ff41a22c, 0, 4, ff402050, ff41a1d0, ff41a1d0) at 1010d40
> Stopped in pid 0.9 (system) at netbsd:pmap_page_protect+0x250: ld [%l5 =
+ 0x8], %i4
from ddb please "p $l5". can you map pmap_page_protect+0x250
back to a specific line number?
thanks.
.mrg.
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: port-sparc-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-sparc/51516: kernel trap 34: mem address not aligned in
pmap_page_protect
Date: Tue, 4 Oct 2016 15:30:23 +0200
On Wed, Sep 28, 2016 at 07:05:00PM +0000, matthew green wrote:
> > Stopped in pid 0.9 (system) at netbsd:pmap_page_protect+0x250: ld [%l5 =
> + 0x8], %i4
>
> from ddb please "p $l5". can you map pmap_page_protect+0x250
> back to a specific line number?
Well, after rebuilding a kernel with -g, another emacs24 build would hang
with 100% kernel in top, but I could kill it without triggering a panic.
Another run, this time, paniced while building emacs24 without
intervention from my part:
login: cpu0: data fault: pc=131f4c8 rpc=57f3a7ec addr=65206000
kernel trap 30: data access exception
Stopped in pid 0.38 (system) at netbsd:mutex_oncpu.part.0+0x8: ld [
%g1 + 0xc], %g2
db{0}> p %g1
131f4c8
db{0}> p %g2
131f4c8
db{0}> tr
vfs_vnode_iterator_next(5866b28, 1169a20, 30283c68, 1000, 8000, 4b75210) at netbsd:vfs_vnode_iterator_next+0x4c
ffs_sync(11, 3, 3a47ec0, 0, 0, 5c20458) at netbsd:ffs_sync+0xfc
VFS_SYNC(3da5000, 3, 3a47ec0, 1ca8338, 3da5024, 3da5000) at netbsd:VFS_SYNC+0x24
sync_fsync(0, 12, 3d5f2a0, 1, 3da5000, 3f23c50) at netbsd:sync_fsync+0x6c
VOP_FSYNC(3f23c50, 3a47ec0, 8, 0, 0, 0) at netbsd:VOP_FSYNC+0x48
sched_sync(1c6b208, 1cba124, 3a35c70, 1, 0, 57f3a7eb) at netbsd:sched_sync+0x148
lwp_trampoline(f0075db8, fffa3cf8, 111800, 1106c8, fffa3df8, 1) at netbsd:lwp_trampoline+0x8
db{0}> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
20234 2 3 0 80 6d052a0 bootstrap-emacs select
20234 1 2 0 40000 4628aa0 bootstrap-emacs
20083 1 3 0 80 4628020 sh wait
1187 1 3 0 80 6d057e0 gmake wait
[...]
0 40 3 0 200 3a6a560 physiod physiod
0 39 3 0 200 3d5f000 aiodoned aiodoned
0 > 38 7 0 200 3d5f2a0 ioflush
0 37 3 0 200 3a6a800 pgdaemon pgdaemon
0 34 3 0 200 3a6a2c0 atapibus0 sccomp
[...]
db{0}> tr/a 4628aa0
trace: pid 20234 lid 1 at 0x302c3c80
preempt(1888800, 1000000, 20000, 4628aa0, 0, fb5) at netbsd:preempt+0x48
trap(302c3ed0, fffffffe, ff7d739c, ff82000a, 42dc1b8, b188) at netbsd:trap+0x724
?(fe351e74, 7fff, ff7afc00, 0, 0, 138) at 1010be0
From gdb on netbsd.gdb I could get:
(gdb) x/i vfs_vnode_iterator_next+0x40
0x15efca0 <vfs_vnode_iterator_next+64>: ld [ %i0 + 0x78 ], %g2
(gdb)
0x15efca4 <vfs_vnode_iterator_next+68>: st %g2, [ %g1 ]
(gdb)
0x15efca8 <vfs_vnode_iterator_next+72>: clr [ %i0 + 0x14 ]
(gdb)
0x15efcac <vfs_vnode_iterator_next+76>: call 0x13673e0 <mutex_enter>
0x15efcb0 <vfs_vnode_iterator_next+80>: ld [ %i5 ], %o0
(gdb)
0x15efcb4 <vfs_vnode_iterator_next+84>: ld [ %i5 + 0x48 ], %g1
(gdb)
0x15efcb8 <vfs_vnode_iterator_next+88>: mov %i5, %o1
(vfs_vnode_iterator_next+0x4c is the call to mutex_enter, as expected).
but gdb couldn't find a matching line number:
(gdb) l *(vfs_vnode_iterator_next+0x4c)
(gdb)
maybe because it's a sparc gdb on a GENERIC_SUN4U kernel ?
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: port-sparc-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-sparc/51516: kernel trap 34: mem address not aligned in
pmap_page_protect
Date: Tue, 4 Oct 2016 19:51:20 +0200
On Tue, Oct 04, 2016 at 03:30:23PM +0200, Manuel Bouyer wrote:
> On Wed, Sep 28, 2016 at 07:05:00PM +0000, matthew green wrote:
> > > Stopped in pid 0.9 (system) at netbsd:pmap_page_protect+0x250: ld [%l5 =
> > + 0x8], %i4
> >
> > from ddb please "p $l5". can you map pmap_page_protect+0x250
> > back to a specific line number?
>
> Well, after rebuilding a kernel with -g, another emacs24 build would hang
> with 100% kernel in top, but I could kill it without triggering a panic.
>
> Another run, this time, paniced while building emacs24 without
> intervention from my part:
another one. I had to kill the bootstrap-emacs process, and
after some time I got (pbulk already had moved to the next package):
cpu0: data fault: pc=15de7a4 rpc=3fff00000000 addr=10474000
kernel trap 30: data access exception
Stopped in pid 6194.1 (rm) at netbsd:uvm_pagefree+0x1e4: ld [
%g1 + %o0], %g2
db{0}> p %g1
15de7a4
db{0}> p %g2
15de7a4
db{0}> p %o0
15de7a4
db{0}> p $o0
e7b8000
db{0}> p $g1
1cbd590
db{0}> p $g2
0
db{0}> tr
genfs_do_putpages(0, 0, 7fffffff, ffffe000, 317b36e0, 1) at netbsd:genfs_do_putp
ages+0xc88
genfs_putpages(317b3868, 500d5a0, 501b720, 317b3cb0, 0, 0) at netbsd:genfs_putpa
ges+0x24
VOP_PUTPAGES(50162c0, 0, 0, 0, 0, a) at netbsd:VOP_PUTPAGES+0x44
uvm_vnp_setsize(50162c0, 0, 0, 0, 0, 0) at netbsd:uvm_vnp_setsize+0x8c
ffs_truncate(50162c0, 0, 0, 0, ffffffff, 0) at netbsd:ffs_truncate+0x574
ufs_truncate(50162c0, ffffffff, 0, ffffffff, 3a5f5c0, 3da5000) at netbsd:ufs_tru
ncate+0x1cc
ufs_inactive(0, 20012, 317b3b7c, 3da5000, 501b720, 50162c0) at netbsd:ufs_inacti
ve+0x1c8
VOP_INACTIVE(50162c0, 317b3bef, 45d4560, 3a2cdf8, 317b3bec, ffffffff) at netbsd:
VOP_INACTIVE+0x2c
vrelel(50162c0, 0, 45d4560, 0, 50162d4, 1) at netbsd:vrelel+0x380
ufs_remove(0, 500d5a0, 501b720, 317b3cb0, 50162c0, 5011e80) at netbsd:ufs_remove
+0xac
VOP_REMOVE(5011e80, 50162c0, 317b3d50, 0, 0, 2) at netbsd:VOP_REMOVE+0x30
do_sys_unlinkat.isra.3(0, ff40f5dc, 0, 0, 3daf000, 50162c0) at netbsd:do_sys_unl
inkat.isra.3+0x18c
syscall(317b3ed0, 317b3f48, ff7049f8, 45d4560, 0, ff7049fc) at netbsd:syscall+0x
380
?(ff40f5dc, 0, 3, ff402050, ff40f580, ff40f580) at 1010d40
(gdb) l *(genfs_do_putpages+0xc88)
(gdb) x/i *(genfs_do_putpages+0xc88)
0x119c7c8 <genfs_do_putpages+3208>: call 0x15de5c0 <uvm_pagefree>
0x119c7cc <genfs_do_putpages+3212>: st %i0, [ %fp + -260 ]
As the box doesn't panic when building other packages, I guess it's related
to what's happening in this emacs process stuck in a kernel loop ...
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.