NetBSD Problem Report #38497

From martin@aprisoft.de  Thu Apr 24 15:37:10 2008
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id E696D63B898
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 24 Apr 2008 15:37:09 +0000 (UTC)
Message-Id: <20080424153706.22E2BAF5824@emmas.aprisoft.de>
Date: Thu, 24 Apr 2008 17:37:06 +0200 (CEST)
From: martin@duskware.de
Reply-To: martin@duskware.de
To: gnats-bugs@gnats.NetBSD.org
Subject: Out of memory allocating ksiginfo
X-Send-Pr-Version: 3.95

>Number:         38497
>Category:       kern
>Synopsis:       Out of memory allocating ksiginfo
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    yamt
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Apr 24 15:40:00 +0000 2008
>Last-Modified:  Fri May 02 13:45:01 +0000 2008
>Originator:     Martin Husemann
>Release:        NetBSD 4.99.61
>Organization:
The NetBSD Foundation
>Environment:
System: NetBSD nelly.aprisoft.de 4.99.61 NetBSD 4.99.61 (NELLY.MP) #14: Thu Apr 24 15:31:41 CEST 2008 martin@emmas.aprisoft.de:/nelly/usr/src/sys/arch/sparc64/compile/NELLY.MP sparc64
Architecture: sparc64
Machine: sparc64
>Description:

While stressing NFS a bit and running top in another window,
this happened:

Out of memory allocating ksiginfo for pid 211
Mutex error: lockdebug_wantlock: acquiring sleep lock from interrupt context

lock address : 0x000000000d047f80 type     :     sleep/adaptive
shared holds :                  0 exclusive:                  0
shares wanted:                  0 exclusive:                  0
current cpu  :                  0 last held:                  1
current lwp  : 0x000000000d04f180 last held: 000000000000000000
last locked  : 0x000000000117b388 unlocked : 0x0000000001140c50
initialized  : 0x00000000010ec704
owner field  : 000000000000000000 wait/spin:                0/0

Turnstile chain at 0x14744e0.
=> No active turnstile for this lock.

panic: LOCKDEBUG
Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        nop
db{0}> bt
lockdebug_abort1(1487780, 1487780, 11cc650, 11e8308, 1, e0017330) at netbsd:lock
debug_abort1+0x7c
mutex_vector_enter(d047f80, 1140c04, 11cc400, d04f180, 146f880, 11e6000) at netb
sd:mutex_vector_enter+0x264
ip_drain(1, 111accc, 11cc400, 146f800, d04f180, 1) at netbsd:ip_drain+0x8
m_reclaim(14073b8, 11c5688, d044eb0, 2, d04f180, 1410000) at netbsd:m_reclaim+0x
50
pool_reclaim(d04b8c0, 11e88a8, 215, 11e90a0, d04f180, 3fff) at netbsd:pool_recla
im+0x14
pool_reclaim_callback(d04ba28, d04b8c0, 0, 11e7448, d04f180, 43b86e8) at netbsd:
pool_reclaim_callback+0x30
callback_run_roundrobin(0, 0, 144c258, e00172f8, 0, ffffffffffffffff) at netbsd:
callback_run_roundrobin+0xa0
uvm_map_prepare(144c240, 39c6000, 40000, 0, ffffffffffffffff, 40000) at netbsd:u
vm_map_prepare+0x1a4
uvm_map(c, e0017438, 40000, 0, ffffffffffffffff, e0017330) at netbsd:uvm_map+0xb
8
km_vacache_alloc(144c240, 2, 144c448, d04f180, 144c448, 11e6000) at netbsd:km_va
cache_alloc+0x4c
pool_grow(144c398, 2, 13f2, 11dcfd0, 0, 11d2800) at netbsd:pool_grow+0x24
pool_get(144c398, 2, 11cc400, 11e8000, d04f180, 1410000) at netbsd:pool_get+0x14
4
uvm_km_alloc_poolpage_cache(0, 0, d04b970, d04f180, d04b970, 3fff) at netbsd:uvm
_km_alloc_poolpage_cache+0x30
pool_grow(d04b8c0, 0, 11cc400, 11e8000, d04f180, a) at netbsd:pool_grow+0x24
pool_get(d04b8c0, 0, d04ba58, d04f180, 1033d60, d738000) at netbsd:pool_get+0x14
4
pool_cache_get_slow(0, e001798c, e0017980, 0, 0, 0) at netbsd:pool_cache_get_slo
w+0x204
pool_cache_get_paddr(d04b8c0, 0, 0, 11e8000, d04f180, 1) at netbsd:pool_cache_ge
t_paddr+0x154
m_get(0, 1, 144c8f8, d04f180, 11d2800, 11e6000) at netbsd:m_get+0x1c
m_gethdr(1, 1, 50, 1410190, 0, 11d2800) at netbsd:m_gethdr+0x8
hme_get(39f0800, 2f, 9a6041, 11e8000, d04f180, 1410000) at netbsd:hme_get+0x8
hme_read(39f0800, 2f, 9a6041, 1ffe8c06244, 8100, 3fff) at netbsd:hme_read+0x58
hme_rint(39f0800, 0, 0, 2, d04f180, a) at netbsd:hme_rint+0xe4
hme_intr(39f0800, 0, e0017ed0, d04f180, 1033d60, d738000) at netbsd:hme_intr+0x5
4
sparc_interrupt(14877c0, 1107790, 11cc400, 11e8000, d04f180, 0) at netbsd:sparc_
interrupt+0x23c
mutex_vector_exit(14711e0, d71bd08, 14711e0, 11e8000, d04f180, 1) at netbsd:mute
x_vector_exit+0xf4
timer_intr(0, d047f40, 144c8f8, d04f180, 11d2800, 11e6000) at netbsd:timer_intr+
0x130
softint_thread(d02e0c0, d04f180, 11e5400, 11e5000, 11e5000, 11d2800) at netbsd:s
oftint_thread+0xd0
lwp_trampoline(f005eaf0, fffb1cf8, 110000, 10ee98, fffb1df8, 1) at netbsd:lwp_tr
ampoline+0x8
db{0}> mach cpu 1
db{1}> bt
VOP_LOCK(f35a9a0, 10002, fffffffffffffff8, 0, 40e20cea, 0) at netbsd:VOP_LOCK+0x
28
vn_lock(f35a9a0, 20002, 0, 17, 20b000, 109400) at netbsd:vn_lock+0xb4
vn_write(eef9940, eef9940, f333bf0, d048b40, 1, 40411e58) at netbsd:vn_write+0x9
4
dofilewrite(16, eef9940, 20bc30, 2ff, 1, 1) at netbsd:dofilewrite+0x60
sys_write(1, f333dc0, f333e00, ffffffffbf3b765b, 0, 0) at netbsd:sys_write+0x60
syscall_plain(f333ed0, 3, 40b3c2cc, 19, 40b3c2cc, 800) at netbsd:syscall_plain+0
x120
?(1, 20bc30, 2ff, 0, ffffffffffffb590, 7) at 0x10092fc
db{1}> mach cpu 0
db{0}> ps /l
 PID         LID S     FLAGS       STRUCT LWP *               NAME WAIT
 570           1 3        84           110c18c0                 as piperd
 387           1 3         4            fbc8aa0                cc1 netio
 460           1 3        84            fbc9c20                 cc wait
 405           1 3        84            fbc98a0                 sh wait
 566           1 3        84           110c1c40                 as piperd
 497           1 2         4            fbc8020                cc1
 562           1 3         4            fbc83a0                 as uvn_fp2
 397           1 3        84            fbc8e20                 as piperd
 526           1 3        84            fbc91a0                 cc wait
 496           1 2         4            f364000                cc1
 371           1 3        84            f364a80                 cc wait
 436           1 3        84            fbc9520                 cc wait
 494           1 3        84            f365180                 sh wait
 391           1 3        84            f1783c0                 sh wait
 226           1 3        84            f364700                 sh wait
 427           1 3        84            f364e00               make select
 297           1 7  20000004            f178740                top
 392           1 3        84            f365500               qvwm select
 324           1 2         4            f365880             xclock
 97            1 3        84            f365c00               tcsh pause
 96            1 3        84            f178040               tcsh pause
 360           1 3        84            f1798c0                ssh select
 351           1 3        84            e3da020                 sh wait
 346           1 3        84            f179c40               qvwm select
 348           1 3        84            f178ac0               rxvt select
 342           1 3        84            f178e40               rxvt select
 335           1 2         4            f1791c0              xload
 338           1 3        84            f179540                 sh wait
 313           1 3        84            e3da3a0            XFree86 select
 318           1 3        84            e3db8a0              xinit wait
 300           1 3        84            e3db1a0          ssh-agent select
 293           1 3        84            e3da720                 sh wait
 292           1 3        84            d05a020              getty tty
 280           1 3        84            e3daaa0              getty tty
 283           1 3        84            e3dae20              getty tty
 285           1 3        84            d05e040              login wait
 276           1 3        84            e3db520               cron nanoslp
 273           1 3        84            e3dbc20              inetd kqueue
 230           1 3        84            e06fc00               sshd select
 237           1 3         4            e06ea80             upsmon vm_map
 231           1 3        84            e06f500             upsmon piperd
 245           1 3         4            e06f180               upsd vm_map
 227           1 3        84            e06e000           apcsmart select
 211           1 3        84            e06e380               ntpd pause
 202           1 3         4            e06e700              mserv vm_map
 87            1 3         4            e06f880            syslogd vm_map
 1             1 3        84            d05ee40               init wait
>0            34 3       204            e06ee00            swapiod swapiod
              33 3       204            d05e3c0        vmem_rehash vmem_rehash
              32 3       204            d05e740           aiodoned aiodoned
              31 3       204            d05f540            ioflush syncer
              30 3       204            d05eac0           pgdaemon pgdaemon
              29 3       204            d05fc40              nfsio nfsiod
              28 3       204            d05f8c0              nfsio nfsiod
              27 3       204            d05f1c0              nfsio nfsiod
              26 3       204            d05a3a0              nfsio vm_map
              17 3       204            d05a720           scsibus0 sccomp
              16 3       204            d05aaa0            xcall/1 xcall
              15 1       204            d05ae20          softser/1
              14 1       204            d05b1a0          softclk/1
              13 1       204            d05b520          softbio/1
              12 1       204            d05b8a0          softnet/1
              11 1       205            d05bc20             idle/1
              10 3       204            d04e000           pmfevent pmfevent
               9 3       204            d04e380            cachegc cachegc
               8 3       204            d04e700              vrele vrele
               7 3       204            d04ea80            xcall/0 xcall
               6 1       204            d04ee00          softser/0
           >   5 7  20000204            d04f180          softclk/0
               4 1       204            d04f500          softbio/0
               3 1       204            d04f880          softnet/0
               2 1       205            d04fc00             idle/0
               1 3       204            140f5a0            swapper schedule
db{0}>


>How-To-Repeat:

Run a parallel build over NFS and top?

>Fix:
n/a

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->yamt
Responsible-Changed-By: yamt@NetBSD.org
Responsible-Changed-When: Fri, 25 Apr 2008 00:33:25 +0000
Responsible-Changed-Why:
mine.


From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/38497 (Out of memory allocating ksiginfo)
Date: Mon, 28 Apr 2008 16:02:07 +0100

 On Fri, Apr 25, 2008 at 12:33:28AM +0000, yamt@NetBSD.org wrote:

 > Synopsis: Out of memory allocating ksiginfo
 > 
 > Responsible-Changed-From-To: kern-bug-people->yamt
 > Responsible-Changed-By: yamt@NetBSD.org
 > Responsible-Changed-When: Fri, 25 Apr 2008 00:33:25 +0000
 > Responsible-Changed-Why:
 > mine.

 I have a workaround for the panic: don't take softnet_lock in protocol drain
 routines. Should I commit it or are you working on this?

 Thanks,
 Andrew

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: yamt@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        martin@duskware.de
Subject: Re: kern/38497 (Out of memory allocating ksiginfo)
Date: Tue, 29 Apr 2008 00:22:56 +0900 (JST)

 >  I have a workaround for the panic: don't take softnet_lock in protocol drain
 >  routines. Should I commit it or are you working on this?
 >  
 >  Thanks,
 >  Andrew

 i have not done anything yet.  please feel free to commit any
 workaround or fix.

 is the workaround safe?
 eg. what happens if a softnet blocks and another lwp calls
 its drain routine?

 YAMAMOTO Takashi

From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/38497 CVS commit: src/sys/netinet
Date: Fri,  2 May 2008 13:40:33 +0000 (UTC)

 Module Name:	src
 Committed By:	ad
 Date:		Fri May  2 13:40:33 UTC 2008

 Modified Files:
 	src/sys/netinet: if_arp.c ip_input.c tcp_subr.c

 Log Message:
 PR kern/38497 Out of memory allocating ksiginfo

 Work around: don't acquire softnet_lock in protocol drain routines.


 To generate a diff of this commit:
 cvs rdiff -r1.135 -r1.136 src/sys/netinet/if_arp.c
 cvs rdiff -r1.269 -r1.270 src/sys/netinet/ip_input.c
 cvs rdiff -r1.230 -r1.231 src/sys/netinet/tcp_subr.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.