NetBSD Problem Report #45677

From gson@gson.org  Fri Dec  2 19:53:04 2011
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id E216363D6F9
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  2 Dec 2011 19:53:03 +0000 (UTC)
Message-Id: <20111202195300.12E8F75E3F@guava.gson.org>
Date: Fri,  2 Dec 2011 21:53:00 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@gnats.NetBSD.org
Subject: The stress_killer test sometimes does
X-Send-Pr-Version: 3.95

>Number:         45677
>Category:       kern
>Synopsis:       The stress_killer test sometimes does
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Dec 02 19:55:00 +0000 2011
>Closed-Date:    Sat Nov 03 14:54:54 +0000 2012
>Last-Modified:  Sat Nov 03 14:54:54 +0000 2012
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current
>Organization:

>Environment:
System: NetBSD
Architecture: i386
Machine: i386
>Description:

The logs on both the TNF automated test server and my own show a
number of recent incidents where a kernel panic has occurred while
running the "stress_killer" ATF test.

Below are the log file URLs and backtraces of the most recent ones.
All of them (!) have different panic messages, but all the ones that
have valid backtraces show the pool_cache*() functions being involved.
Since the virtual machines in case are running with 32 MB of memory,
memory pressure may play a role.


http://releng.NetBSD.org/b5reports/i386/build/2011.11.16.14.24.43/test.log

    stress_killer: panic: kernel diagnostic assertion "(pp->pr_curpage == NULL && pp->pr_nitems == 0) || (pp->pr_curpage != NULL && pp->pr_nitems > 0)" failed: file "/bracket/i386/work/2011.11.16.14.24.43/src/sys/kern/subr_pool.c", line 1551
cpu0: Begin traceback...
kern_assert(c0b54b10,c0b54b51,c0be6228,c0be61e8,60f,ffffffff,c5108a2c,c09013aa,c0cb6c80,c0cc0740) at netbsd:kern_assert+0x23
pool_update_curpage(c0cc07b4,c53af000,1,c0523d6b,c508e410,c0ba8803,1,c067db6b,4,c0cc07b8) at netbsd:pool_update_curpage+0x56
pool_get(c0cc0740,1,c5108afc,c09374e7,0,c57af0c4,c57af070,c0887908,c53b5c00,c53a8efc) at netbsd:pool_get+0x35f
pool_cache_get_slow(0,1,c57af070,1,c50e2f0c,0,c5108b3c,c0891a4c,c57af070,c57af0c4) at netbsd:pool_cache_get_slow+0x19a
pool_cache_get_paddr(c0cc0740,1,0,c50fd594,c53a8efc,c57af070,c5108b8c,c0895faf,c500f654,0) at netbsd:pool_cache_get_paddr+0x1e9
uvm_mapent_alloc(c500f654,0,1,0,c509788a,c5099a50,0,c53a40d8,c5097084,c4de42b0) at netbsd:uvm_mapent_alloc+0x42
uvmspace_fork(c4de427c,1,0,c5099a50,c5099a50,c4ff0bfc,c5108c5c,c052f322,c4ff0bfc,c5099a50) at netbsd:uvmspace_fork+0x199
uvm_proc_fork(c4ff0bfc,c5099a50,0,28,0,0,c07c2205,c57a8d40,c5108c40,0) at netbsd:uvm_proc_fork+0x1f
fork1(c50f3aa0,0,14,0,0,0,0,c5108d1c,0,c5108d48) at netbsd:fork1+0x3d2
sys_fork(c50f3aa0,c5108cf4,c5108d1c,308,c067b1f1,c4de5924,c067a4c6,c4de427c,1b04000,c4de5984) at netbsd:sys_fork+0x50
syscall(c5108d48,bbab00b3,ab,bfbf001f,bbab001f,bbbad6c4,0,bfbfec28,bbbab598,bbbad6c4) at netbsd:syscall+0xa1
cpu0: End traceback...


http://releng.NetBSD.org/b5reports/i386/build/2011.11.03.20.46.41/test.log

    stress_killer: panic: kernel diagnostic assertion "!pmap_extract(pmap_kernel(), va, NULL)" failed: file "/bracket/i386/work/2011.11.03.20.46.41/src/sys/uvm/uvm_km.c", line 707
cpu0: Begin traceback...
kern_assert(c0b4ee74,c0b4eeb5,c0bf5b44,c0bf5840,2c3,c5035a0c,c5035a1c,c067684e,c4de589c,101002) at netbsd:kern_assert+0x23
uvm_km_alloc_poolpage_cache(c0cba040,1,c5035a3c,c08fbbda,c4e90440,140,c5035a5c,c0520a5b,c4e90590,c0ba2933) at netbsd:uvm_km_alloc_poolpage_cache+0x146
pool_grow(c4088974,c40946c0,c5035adc,c0795736,c4094734,c4e90440,1,c0795736,c4088df4,c4088978) at netbsd:pool_grow+0x2a
pool_get(c4088900,1,c5035b2c,c0796746,0,c4de1b40,1,c0796746,0,c4f9b528) at netbsd:pool_get+0x6d
pool_cache_get_slow(0,1,c5035b7c,c0797f8f,0,1,0,c0790163,c1019800,c) at netbsd:pool_cache_get_slow+0x19a
pool_cache_get_paddr(c4088900,1,0,c08a65e2,c4de1bcc,0,0,c4e90500,c,1) at netbsd:pool_cache_get_paddr+0x1e9
sigactsinit(c50d16d4,0,0,19,0,0,c07bcf25,c4efd800,c5035c40,0) at netbsd:sigactsinit+0x33
fork1(c4ee12c0,0,14,0,0,0,0,c5035d1c,0,c5035d48) at netbsd:fork1+0x36c
sys_fork(c4ee12c0,c5035cf4,c5035d1c,308,c0676061,c4de583c,c0675336,c4de41a8,15bb000,c4de589c) at netbsd:sys_fork+0x50
syscall(c5035d48,bbab00b3,ab,bfbf001f,bbab001f,bbbad6c4,0,bfbfec28,bbbab598,bbbad6c4) at netbsd:syscall+0xa1
cpu0: End traceback...


http://www.gson.org/netbsd/bugs/build/build/2011.10.11.15.13.08/test.log

    stress_killer: panic: kernel diagnostic assertion "(kmflags & (KM_SLEEP|KM_NOSLEEP)) != 0" failed: file "/bracket/i386/work/2011.10.11.15.13.08/src/sys/kern/subr_kmem.c", line 149
cpu0: Begin traceback...
kern_assert(c0b4a374,c0b4a3b5,c0bdab38,c0bdaaa8,95,c50af8ec,c08b360d,0,c4ea9098,c50af8dc) at netbsd:kern_assert+0x23
kmem_poolpage_alloc(1,0,c08b3187,0,c409d160,c50af90c,c0790cac,c4074c00,c54eb730,0) at netbsd:kmem_poolpage_alloc+0xbf
Bad frame pointer: 0xc0793c9a
cpu0: End traceback...


http://www.gson.org/netbsd/bugs/build/build/2011.11.29.03.50.32/test.log

stress_killer: panic: kernel diagnostic assertion "pcg->pcg_size == PCG_NOBJECTS_NORMAL" failed: file "/bracket/i386/work/2011.11.29.03.50.32/src/sys/kern/subr_pool.c", line 2316
cpu0: Begin traceback...
kern_assert(c0b5b05c,c0b5b20d,c0bee4f4,c0bedeac,90c,c40ba020,c0c89eb0,c053701a,c40ba020,0) at netbsd:kern_assert+0x23
pool_cache_invalidate_groups(c0cc8c7c,c0c9f700,c4c7bbcc,c05244a8,c0c9f700,0,c0bef917,0,c0c2f570,ffffffff) at netbsd:pool_cache_invalidate_groups+0xc5
pool_cache_invalidate(c0cc8b80,ffffffff,c4c7bbcc,0,b37,c0c9f700,c4c7bc1c,c07aaf69,c0c9f704,c0c9f700) at netbsd:pool_cache_invalidate+0x8c
pool_reclaim(c0cc8b80,0,c4c7bc5c,c08af4ec,c0d0c37c,c079ff0b,c0cc8b80,0,c0d0bfc0,1) at netbsd:pool_reclaim+0x78
pool_drain_end(c0cc8b80,b37,0,13000,0,9,f14278db,54d80ac0,0,bbbd6000) at netbsd:pool_drain_end+0x38
uvm_pageout(c40ba020,e66000,e6f000,0,c0100307,0,0,0,0,0) at netbsd:uvm_pageout+0x536
cpu0: End traceback...


http://www.gson.org/netbsd/bugs/build/build/2011.12.01.00.34.05/test.log

Reader / writer lock error: rw_vector_enter: locking against myself

lock address : 0x00000000c4cc40d8
current cpu  :			0
current lwp  : 0x00000000c5671d40
owner/count  : 0x00000000c5671d40 flags	   : 0x0000000000000004

panic: lock error
cpu0: Begin traceback...
printf_nolog(c0bedde9,c0bb368b,c0afcf85,c0bb1e87,c4cc40d8,0,c5671d40,0,c5671d40,c5671d40) at netbsd:printf_nolog
lockdebug_abort(c4cc40d8,c0c8ad30,c0afcf85,c0bb1e87,c068011e,c4cc40d8,c50097f8,c054a874,ffffffff,1a2e) at netbsd:lockdebug_abort+0x2b
rw_abort(ffffffff,1a2e,c5009788,0,0,c10d58ac,c110dbd0,0,c06804d0,1) at netbsd:rw_abort+0x29
rw_vector_enter(c4cc40d8,0,c500982c,0,c5671d40,c4cc57b4,c54b2370,c500984c,c067f74b,c4cc40d4) at netbsd:rw_vector_enter+0x32b
vm_map_lock_read(c4cc40d4,c0ffc048,c4cd2b04,c500989c,c08ab38b,c40af080,c4000,0,c08928b3,c4cc40d8) at netbsd:vm_map_lock_read+0x23
uvm_fault_internal(c4cc40d4,10000,1,0,c067f74b,0,c0b1e8d8,112a063,38,6) at netbsd:uvm_fault_internal+0x84
trap() at netbsd:trap+0x322
--- trap (number 6) ---
pool_cache_get_slow(0,1,c5671d40,c40a8180,c40a8000,c40a80fc,c5009b3c,c07a2086,0,c506b64c) at netbsd:pool_cache_get_slow+0x86
pool_cache_get_paddr(c0cc8b80,1,0,c5204528,c4fd136c,c5204528,c5009b8c,c089c98f,0,bfc00000) at netbsd:pool_cache_get_paddr+0x1e9
uvmspace_alloc(0,bfc00000,1001,c053cc24,c506d852,c5204528,0,c4fc8098,c506d04c,c4fc8098) at netbsd:uvmspace_alloc+0x23
uvmspace_fork(c4cc40d4,1,0,c5204528,c5204528,c4fd136c,c5009c5c,c0531a42,c4fd136c,c5204528) at netbsd:uvmspace_fork+0x29
uvm_proc_fork(c4fd136c,c5204528,0,19,0,0,c566f2a0,c5671d40,c5009c5c,c01004d4) at netbsd:uvm_proc_fork+0x1f
fork1(c5671d40,0,14,0,0,0,0,c5009d1c,0,c5009d48) at netbsd:fork1+0x3d2
sys_fork(c5671d40,c5009cf4,c5009d1c,308,c067f931,c4cc5998,c067ec06,c4cc40d4,1387000,c4cc57b4) at netbsd:sys_fork+0x50
syscall(c5009d48,bbab00b3,ab,bfbf001f,bbab001f,bbbad6c4,0,bfbfec28,bbbab598,bbbad6c4) at netbsd:syscall+0xa1
cpu0: End traceback...

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/45677: The stress_killer test sometimes does
Date: Sat, 11 Feb 2012 01:13:28 +0200

 More similar crashes:

   http://releng.netbsd.org/b5reports/i386/build/2012.02.10.09.17.49/test.log
   http://releng.netbsd.org/b5reports/i386/build/2012.02.03.22.18.05/test.log
   http://releng.netbsd.org/b5reports/i386/build/2012.02.10.09.17.49/test.log
   http://www.gson.org/netbsd/bugs/build/build/2012.02.05.17.34.34/test.log

 A couple of these happened while running the stress_long test
 rather than stress_killer one, but it looks like the same bug.
 As before, the panic message varies, but the pool_* functions
 are always involved.

 I managed to get a crash dump from the last crash in the list above.
 The last console messages from that one were:

       stress_long: panic: pool_get: pool '(null)': pr_itemsperpage is zero, pool not initialized?
   cpu0: Begin traceback...
   printf_nolog(c0c21b9c,0,c43b788c,c08d42a6,3f,c1044f64,c43b78dc,c08d2464,c0d46ae0,0) at netbsd:printf_nolog
   pool_get(c0d09480,2,c0c629a8,c06adf1b,0,c0cf3740,4,c06addc5,6,c43b78d8) at netbsd:pool_get+0x535
   pool_cache_put_slow(6,c0cfab70,c43b793c,c06ab3c8,4000000,0,0,1c84000,c10c0000,0) at netbsd:pool_cache_put_slow+0x1c2
   pool_cache_put_paddr(c0d00240,c11bc000,ffffffff,c07c84df,98,0,c43b7a6c,c0d1d1a0,c134adc4,c43b7a30) at netbsd:pool_cache_put_paddr+0x107
   kmem_intr_free(c11bc000,2000,c43b7bd4,c10c9b40,bff04254,16d5103,0,c153a000,c0cf3740,0) at netbsd:kmem_intr_free+0x6f
   ufs_readdir(c43b7a98,2000,1001,c43b7ab0,c12be01c,2000,c43b7abc,c07c845e,c12be008,c0b7c760) at netbsd:ufs_readdir+0x35f
   VOP_READDIR(c134adc4,c43b7bd4,c10c9b40,c43b7bf8,0,0,c43b7b2c,c08c1b03,c11e8004,bbb2d000) at netbsd:VOP_READDIR+0x44
   getcwd_common(c12be008,c11a66e0,c43b7c70,c1459400,200,1,c153a000,1,5,0) at netbsd:getcwd_common+0x3ad
   sys___getcwd(c153a000,c43b7cf4,c43b7d1c,0,c06ab5ae,c11e9754,c43b0010,c15e3400,c06ab5ae,c43b7cf8) at netbsd:sys___getcwd+0xaa
   syscall(c43b7d48,bbba00b3,ab,bfbf001f,bbba001f,bfbfebc0,bfbfe756,bfbfe728,bbbab598,bfbfe756) at netbsd:syscall+0xad
   cpu0: End traceback...

 In case anyone would like to poke around in the crash dump,
 it can be downloaded as part of hard disk image from

   http://www.gson.org/netbsd/bugs/45677/wd0.img

 The image can then be booted in qemu:

   qemu -snapshot -nographic -hda wd0.img

 Then log in as root (no password) and type

   cd /var/crash
   gunzip netbsd*
   gdb netbsd.0
   target kvm netbsd.0.core
   where

 No debug symbols, unfortunately.
 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: open->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Sat, 03 Nov 2012 14:54:54 +0000
State-Changed-Why:
No panics in babylon5 i386 test logs since 2012.02.21.01.47.50, and
none in the gson.org testbed logs since 2012.03.12.21.35.10, so 
presumably fixed, maybe by src/sys/netinet/rfc6056.c 1.5.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.