NetBSD Problem Report #51516

From Manuel.Bouyer@lip6.fr  Wed Sep 28 14:00:44 2016
Return-Path: <Manuel.Bouyer@lip6.fr>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id D48B97A111
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 28 Sep 2016 14:00:44 +0000 (UTC)
Message-Id: <20160928140039.9FB2EA832@armandeche.soc.lip6.fr>
Date: Wed, 28 Sep 2016 16:00:39 +0200 (MEST)
From: Manuel.Bouyer@lip6.fr
Reply-To: Manuel.Bouyer@lip6.fr
To: gnats-bugs@NetBSD.org
Subject: kernel trap 34: mem address not aligned in pmap_page_protect
X-Send-Pr-Version: 3.95

>Number:         51516
>Category:       port-sparc
>Synopsis:       kernel trap 34: mem address not aligned in pmap_page_protect
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-sparc-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Sep 28 14:05:00 +0000 2016
>Last-Modified:  Tue Oct 04 17:55:00 +0000 2016
>Originator:     Manuel Bouyer
>Release:        NetBSD 7.0_STABLE as of Sep 25
>Organization:
>Environment:
System: NetBSD samba.lip6.fr 7.0_STABLE NetBSD 7.0_STABLE (GENERIC_SUN4U) #0: Sun Sep 25 14:56:01 CEST 2016  bouyer@hop:/dsk/l1/misc/bouyer/tmp/sparc/obj/dsk/l1/misc/bouyer/netbsd-7/src/sys/arch/sparc/compile/GENERIC_SUN4U sparc
Architecture: sparc
Machine: sparc
>Description:
	While pbulk building pbulk-medium, the build stops in emacs24.
	First, in the build process a bootstrap-emacs process is stuck in
	what looks like an infinite loop in the kernel (top shows 100% CPU),
	related to pool(2):
db{0}> tr/t 0t11133
trace: pid 11133 lid 2 at 0x303c3a20
sleepq_block(0, 1, 5fdddf4, 0, 1ca6800, 5fddd20) at netbsd:sleepq_block+0xb8
sel_do_scan(0, 5fddd20, 1, 2, 1, 3a48e00) at netbsd:sel_do_scan+0x44c
pollcommon(16, 441c88, 1, 0, 0, 8) at netbsd:pollcommon+0xbc
sys_poll(5fddd20, 303c3de0, 303c3dd8, ad4000, 9cc, 1c632a4) at netbsd:sys_poll+0
x64
syscall(303c3ed0, 303c3f58, fe1d52cc, 5fddd20, 0, fe1d52d0) at netbsd:syscall+0x
380
?(441c88, 1, ffffffff, ad4000, fd9bfda8, ad4000) at 1010d40

	Then, after killing -9 the process (a kill doesn't kill it),
	the system panics with (I guess while pbulk removes the build tree
	or cleans up depends):

login: trap type 0x34: cpu 0, pc=1415130 npc=1415134 pstate=0xffffffffff82000e<AM,PRIV,IE>
kernel trap 34: mem address not aligned
Stopped in pid 0.9 (system) at  netbsd:pmap_page_protect+0x250: ld              
[
%l5 + 0x8], %i4
db{0}> tr
genfs_do_putpages(3310230, 3310230, 7fffffff, ffffe000, 51d, 0) at netbsd:genfs_do_putpages+0xa10
genfs_putpages(31837868, 5c8c260, 5d85cc8, 31837cb0, 0, 0) at netbsd:genfs_putpages+0x24
VOP_PUTPAGES(5d82b28, 0, 0, 0, 0, a) at netbsd:VOP_PUTPAGES+0x44
uvm_vnp_setsize(5d82b28, 0, 0, 0, 0, 0) at netbsd:uvm_vnp_setsize+0x8c
ffs_truncate(5d82b28, 0, 0, 0, ffffffff, 0) at netbsd:ffs_truncate+0x574
ufs_truncate(5d82b28, ffffffff, 0, ffffffff, 3a5f5c0, 3da5000) at netbsd:ufs_truncate+0x1cc
ufs_inactive(0, 20012, 31837b7c, 3da5000, 5d85cc8, 5d82b28) at netbsd:ufs_inactive+0x1c8
VOP_INACTIVE(5d82b28, 31837bef, 4e5d7c0, 3a2d578, 31837bec, ffffffff) at netbsd: VOP_INACTIVE+0x2c
vrelel(5d82b28, 0, 4e5d7c0, 0, 5d82b3c, 1) at netbsd:vrelel+0x380
ufs_remove(0, 5c8c260, 5d85cc8, 31837cb0, 5d82b28, 5c93af8) at netbsd:ufs_remove+0xac
VOP_REMOVE(5c93af8, 5d82b28, 31837d50, 0, 0, 2) at netbsd:VOP_REMOVE+0x30
do_sys_unlinkat.isra.3(0, ff41a22c, 0, 0, 3daf400, 5d82b28) at netbsd:do_sys_unlinkat.isra.3+0x18c
syscall(31837ed0, 31837f48, ff7049f8, 4e5d7c0, 0, ff7049fc) at netbsd:syscall+0x380
?(ff41a22c, 0, 4, ff402050, ff41a1d0, ff41a1d0) at 1010d40

this is reproductible

>How-To-Repeat:
	run pbulk with only pbulk-medium in the pkg list.
	Maybe building only emacs24 would be enough, I didn't try.

>Fix:


>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: port-sparc-maintainer@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: port-sparc/51516: kernel trap 34: mem address not aligned in pmap_page_protect
Date: Thu, 29 Sep 2016 05:04:10 +1000

 > db{0}> tr/t 0t11133
 > trace: pid 11133 lid 2 at 0x303c3a20
 > sleepq_block(0, 1, 5fdddf4, 0, 1ca6800, 5fddd20) at netbsd:sleepq_block+=
 0xb8
 > sel_do_scan(0, 5fddd20, 1, 2, 1, 3a48e00) at netbsd:sel_do_scan+0x44c
 > pollcommon(16, 441c88, 1, 0, 0, 8) at netbsd:pollcommon+0xbc
 > sys_poll(5fddd20, 303c3de0, 303c3dd8, ad4000, 9cc, 1c632a4) at netbsd:sy=
 s_poll+0
 > x64
 > syscall(303c3ed0, 303c3f58, fe1d52cc, 5fddd20, 0, fe1d52d0) at netbsd:sy=
 scall+0x
 > 380
 > ?(441c88, 1, ffffffff, ad4000, fd9bfda8, ad4000) at 1010d40
 > =

 > 	Then, after killing -9 the process (a kill doesn't kill it),
 > 	the system panics with (I guess while pbulk removes the build tree
 > 	or cleans up depends):
 > =

 > login: trap type 0x34: cpu 0, pc=3D1415130 npc=3D1415134 pstate=3D0xffff=
 ffffff82000e<AM,PRIV,IE>
 > kernel trap 34: mem address not aligned
 > Stopped in pid 0.9 (system) at  netbsd:pmap_page_protect+0x250: ld      =
         =

 > [
 > %l5 + 0x8], %i4
 > db{0}> tr
 > genfs_do_putpages(3310230, 3310230, 7fffffff, ffffe000, 51d, 0) at netbs=
 d:genfs_do_putpages+0xa10
 > genfs_putpages(31837868, 5c8c260, 5d85cc8, 31837cb0, 0, 0) at netbsd:gen=
 fs_putpages+0x24
 > VOP_PUTPAGES(5d82b28, 0, 0, 0, 0, a) at netbsd:VOP_PUTPAGES+0x44
 > uvm_vnp_setsize(5d82b28, 0, 0, 0, 0, 0) at netbsd:uvm_vnp_setsize+0x8c
 > ffs_truncate(5d82b28, 0, 0, 0, ffffffff, 0) at netbsd:ffs_truncate+0x574
 > ufs_truncate(5d82b28, ffffffff, 0, ffffffff, 3a5f5c0, 3da5000) at netbsd=
 :ufs_truncate+0x1cc
 > ufs_inactive(0, 20012, 31837b7c, 3da5000, 5d85cc8, 5d82b28) at netbsd:uf=
 s_inactive+0x1c8
 > VOP_INACTIVE(5d82b28, 31837bef, 4e5d7c0, 3a2d578, 31837bec, ffffffff) at=
  netbsd: VOP_INACTIVE+0x2c
 > vrelel(5d82b28, 0, 4e5d7c0, 0, 5d82b3c, 1) at netbsd:vrelel+0x380
 > ufs_remove(0, 5c8c260, 5d85cc8, 31837cb0, 5d82b28, 5c93af8) at netbsd:uf=
 s_remove+0xac
 > VOP_REMOVE(5c93af8, 5d82b28, 31837d50, 0, 0, 2) at netbsd:VOP_REMOVE+0x3=
 0
 > do_sys_unlinkat.isra.3(0, ff41a22c, 0, 0, 3daf400, 5d82b28) at netbsd:do=
 _sys_unlinkat.isra.3+0x18c
 > syscall(31837ed0, 31837f48, ff7049f8, 4e5d7c0, 0, ff7049fc) at netbsd:sy=
 scall+0x380
 > ?(ff41a22c, 0, 4, ff402050, ff41a1d0, ff41a1d0) at 1010d40

 > Stopped in pid 0.9 (system) at  netbsd:pmap_page_protect+0x250: ld [%l5 =
 + 0x8], %i4

 from ddb please "p $l5".  can you map pmap_page_protect+0x250
 back to a specific line number?

 thanks.


 .mrg.

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: port-sparc-maintainer@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: port-sparc/51516: kernel trap 34: mem address not aligned in
 pmap_page_protect
Date: Tue, 4 Oct 2016 15:30:23 +0200

 On Wed, Sep 28, 2016 at 07:05:00PM +0000, matthew green wrote:
 >  > Stopped in pid 0.9 (system) at  netbsd:pmap_page_protect+0x250: ld [%l5 =
 >  + 0x8], %i4
 >  
 >  from ddb please "p $l5".  can you map pmap_page_protect+0x250
 >  back to a specific line number?

 Well, after rebuilding a kernel with -g, another emacs24 build would hang
 with  100% kernel in top, but I could kill it without triggering a panic.

 Another run, this time, paniced while building emacs24 without
 intervention from my part:
 login: cpu0: data fault: pc=131f4c8 rpc=57f3a7ec addr=65206000
 kernel trap 30: data access exception
 Stopped in pid 0.38 (system) at netbsd:mutex_oncpu.part.0+0x8:  ld              [
 %g1 + 0xc], %g2
 db{0}> p %g1
          131f4c8
  db{0}> p %g2
 	 131f4c8
 db{0}> tr
 vfs_vnode_iterator_next(5866b28, 1169a20, 30283c68, 1000, 8000, 4b75210) at netbsd:vfs_vnode_iterator_next+0x4c
 ffs_sync(11, 3, 3a47ec0, 0, 0, 5c20458) at netbsd:ffs_sync+0xfc
 VFS_SYNC(3da5000, 3, 3a47ec0, 1ca8338, 3da5024, 3da5000) at netbsd:VFS_SYNC+0x24
 sync_fsync(0, 12, 3d5f2a0, 1, 3da5000, 3f23c50) at netbsd:sync_fsync+0x6c
 VOP_FSYNC(3f23c50, 3a47ec0, 8, 0, 0, 0) at netbsd:VOP_FSYNC+0x48
 sched_sync(1c6b208, 1cba124, 3a35c70, 1, 0, 57f3a7eb) at netbsd:sched_sync+0x148
 lwp_trampoline(f0075db8, fffa3cf8, 111800, 1106c8, fffa3df8, 1) at netbsd:lwp_trampoline+0x8
 db{0}> ps
 PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 20234    2 3   0        80            6d052a0    bootstrap-emacs select
 20234    1 2   0     40000            4628aa0    bootstrap-emacs
 20083    1 3   0        80            4628020                 sh wait
 1187     1 3   0        80            6d057e0              gmake wait
 [...]
 0       40 3   0       200            3a6a560            physiod physiod
 0       39 3   0       200            3d5f000           aiodoned aiodoned
 0    >  38 7   0       200            3d5f2a0            ioflush
 0       37 3   0       200            3a6a800           pgdaemon pgdaemon
 0       34 3   0       200            3a6a2c0          atapibus0 sccomp
 [...]
 db{0}> tr/a 4628aa0
 trace: pid 20234 lid 1 at 0x302c3c80
 preempt(1888800, 1000000, 20000, 4628aa0, 0, fb5) at netbsd:preempt+0x48
 trap(302c3ed0, fffffffe, ff7d739c, ff82000a, 42dc1b8, b188) at netbsd:trap+0x724

 ?(fe351e74, 7fff, ff7afc00, 0, 0, 138) at 1010be0

 From gdb on netbsd.gdb I could get:
 (gdb) x/i vfs_vnode_iterator_next+0x40
    0x15efca0 <vfs_vnode_iterator_next+64>:      ld  [ %i0 + 0x78 ], %g2
 (gdb) 
    0x15efca4 <vfs_vnode_iterator_next+68>:      st  %g2, [ %g1 ]
 (gdb) 
    0x15efca8 <vfs_vnode_iterator_next+72>:      clr  [ %i0 + 0x14 ]
 (gdb) 
    0x15efcac <vfs_vnode_iterator_next+76>:      call  0x13673e0 <mutex_enter>
    0x15efcb0 <vfs_vnode_iterator_next+80>:      ld  [ %i5 ], %o0
 (gdb) 
    0x15efcb4 <vfs_vnode_iterator_next+84>:      ld  [ %i5 + 0x48 ], %g1
 (gdb) 
    0x15efcb8 <vfs_vnode_iterator_next+88>:      mov  %i5, %o1

 (vfs_vnode_iterator_next+0x4c is the call to mutex_enter, as expected).

 but gdb couldn't find a matching line number:
 (gdb) l *(vfs_vnode_iterator_next+0x4c)
 (gdb) 

 maybe because it's a sparc gdb on a GENERIC_SUN4U kernel ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: port-sparc-maintainer@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: port-sparc/51516: kernel trap 34: mem address not aligned in
 pmap_page_protect
Date: Tue, 4 Oct 2016 19:51:20 +0200

 On Tue, Oct 04, 2016 at 03:30:23PM +0200, Manuel Bouyer wrote:
 > On Wed, Sep 28, 2016 at 07:05:00PM +0000, matthew green wrote:
 > >  > Stopped in pid 0.9 (system) at  netbsd:pmap_page_protect+0x250: ld [%l5 =
 > >  + 0x8], %i4
 > >  
 > >  from ddb please "p $l5".  can you map pmap_page_protect+0x250
 > >  back to a specific line number?
 > 
 > Well, after rebuilding a kernel with -g, another emacs24 build would hang
 > with  100% kernel in top, but I could kill it without triggering a panic.
 > 
 > Another run, this time, paniced while building emacs24 without
 > intervention from my part:

 another one. I had to kill the bootstrap-emacs process, and
 after some time I got (pbulk already had moved to the next package):
 cpu0: data fault: pc=15de7a4 rpc=3fff00000000 addr=10474000
 kernel trap 30: data access exception
 Stopped in pid 6194.1 (rm) at   netbsd:uvm_pagefree+0x1e4:      ld              [
 %g1 + %o0], %g2
 db{0}> p %g1
          15de7a4
 db{0}> p %g2
          15de7a4
 db{0}> p %o0
          15de7a4
 db{0}> p $o0
          e7b8000
 db{0}> p $g1
          1cbd590
 db{0}> p $g2
                0
 db{0}> tr
 genfs_do_putpages(0, 0, 7fffffff, ffffe000, 317b36e0, 1) at netbsd:genfs_do_putp
 ages+0xc88
 genfs_putpages(317b3868, 500d5a0, 501b720, 317b3cb0, 0, 0) at netbsd:genfs_putpa
 ges+0x24
 VOP_PUTPAGES(50162c0, 0, 0, 0, 0, a) at netbsd:VOP_PUTPAGES+0x44
 uvm_vnp_setsize(50162c0, 0, 0, 0, 0, 0) at netbsd:uvm_vnp_setsize+0x8c
 ffs_truncate(50162c0, 0, 0, 0, ffffffff, 0) at netbsd:ffs_truncate+0x574
 ufs_truncate(50162c0, ffffffff, 0, ffffffff, 3a5f5c0, 3da5000) at netbsd:ufs_tru
 ncate+0x1cc
 ufs_inactive(0, 20012, 317b3b7c, 3da5000, 501b720, 50162c0) at netbsd:ufs_inacti
 ve+0x1c8
 VOP_INACTIVE(50162c0, 317b3bef, 45d4560, 3a2cdf8, 317b3bec, ffffffff) at netbsd:
 VOP_INACTIVE+0x2c
 vrelel(50162c0, 0, 45d4560, 0, 50162d4, 1) at netbsd:vrelel+0x380
 ufs_remove(0, 500d5a0, 501b720, 317b3cb0, 50162c0, 5011e80) at netbsd:ufs_remove
 +0xac
 VOP_REMOVE(5011e80, 50162c0, 317b3d50, 0, 0, 2) at netbsd:VOP_REMOVE+0x30
 do_sys_unlinkat.isra.3(0, ff40f5dc, 0, 0, 3daf000, 50162c0) at netbsd:do_sys_unl
 inkat.isra.3+0x18c
 syscall(317b3ed0, 317b3f48, ff7049f8, 45d4560, 0, ff7049fc) at netbsd:syscall+0x
 380
 ?(ff40f5dc, 0, 3, ff402050, ff40f580, ff40f580) at 1010d40

 (gdb) l *(genfs_do_putpages+0xc88)
 (gdb) x/i *(genfs_do_putpages+0xc88)
    0x119c7c8 <genfs_do_putpages+3208>:  call  0x15de5c0 <uvm_pagefree>
    0x119c7cc <genfs_do_putpages+3212>:  st  %i0, [ %fp + -260 ]

 As the box doesn't panic when building other packages, I guess it's related
 to what's happening in this emacs process stuck in a kernel loop ...

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.