NetBSD Problem Report #38497
From martin@aprisoft.de Thu Apr 24 15:37:10 2008
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id E696D63B898
for <gnats-bugs@gnats.NetBSD.org>; Thu, 24 Apr 2008 15:37:09 +0000 (UTC)
Message-Id: <20080424153706.22E2BAF5824@emmas.aprisoft.de>
Date: Thu, 24 Apr 2008 17:37:06 +0200 (CEST)
From: martin@duskware.de
Reply-To: martin@duskware.de
To: gnats-bugs@gnats.NetBSD.org
Subject: Out of memory allocating ksiginfo
X-Send-Pr-Version: 3.95
>Number: 38497
>Category: kern
>Synopsis: Out of memory allocating ksiginfo
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: yamt
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Apr 24 15:40:00 +0000 2008
>Last-Modified: Fri May 02 13:45:01 +0000 2008
>Originator: Martin Husemann
>Release: NetBSD 4.99.61
>Organization:
The NetBSD Foundation
>Environment:
System: NetBSD nelly.aprisoft.de 4.99.61 NetBSD 4.99.61 (NELLY.MP) #14: Thu Apr 24 15:31:41 CEST 2008 martin@emmas.aprisoft.de:/nelly/usr/src/sys/arch/sparc64/compile/NELLY.MP sparc64
Architecture: sparc64
Machine: sparc64
>Description:
While stressing NFS a bit and running top in another window,
this happened:
Out of memory allocating ksiginfo for pid 211
Mutex error: lockdebug_wantlock: acquiring sleep lock from interrupt context
lock address : 0x000000000d047f80 type : sleep/adaptive
shared holds : 0 exclusive: 0
shares wanted: 0 exclusive: 0
current cpu : 0 last held: 1
current lwp : 0x000000000d04f180 last held: 000000000000000000
last locked : 0x000000000117b388 unlocked : 0x0000000001140c50
initialized : 0x00000000010ec704
owner field : 000000000000000000 wait/spin: 0/0
Turnstile chain at 0x14744e0.
=> No active turnstile for this lock.
panic: LOCKDEBUG
Stopped in pid 0.5 (system) at netbsd:cpu_Debugger+0x4: nop
db{0}> bt
lockdebug_abort1(1487780, 1487780, 11cc650, 11e8308, 1, e0017330) at netbsd:lock
debug_abort1+0x7c
mutex_vector_enter(d047f80, 1140c04, 11cc400, d04f180, 146f880, 11e6000) at netb
sd:mutex_vector_enter+0x264
ip_drain(1, 111accc, 11cc400, 146f800, d04f180, 1) at netbsd:ip_drain+0x8
m_reclaim(14073b8, 11c5688, d044eb0, 2, d04f180, 1410000) at netbsd:m_reclaim+0x
50
pool_reclaim(d04b8c0, 11e88a8, 215, 11e90a0, d04f180, 3fff) at netbsd:pool_recla
im+0x14
pool_reclaim_callback(d04ba28, d04b8c0, 0, 11e7448, d04f180, 43b86e8) at netbsd:
pool_reclaim_callback+0x30
callback_run_roundrobin(0, 0, 144c258, e00172f8, 0, ffffffffffffffff) at netbsd:
callback_run_roundrobin+0xa0
uvm_map_prepare(144c240, 39c6000, 40000, 0, ffffffffffffffff, 40000) at netbsd:u
vm_map_prepare+0x1a4
uvm_map(c, e0017438, 40000, 0, ffffffffffffffff, e0017330) at netbsd:uvm_map+0xb
8
km_vacache_alloc(144c240, 2, 144c448, d04f180, 144c448, 11e6000) at netbsd:km_va
cache_alloc+0x4c
pool_grow(144c398, 2, 13f2, 11dcfd0, 0, 11d2800) at netbsd:pool_grow+0x24
pool_get(144c398, 2, 11cc400, 11e8000, d04f180, 1410000) at netbsd:pool_get+0x14
4
uvm_km_alloc_poolpage_cache(0, 0, d04b970, d04f180, d04b970, 3fff) at netbsd:uvm
_km_alloc_poolpage_cache+0x30
pool_grow(d04b8c0, 0, 11cc400, 11e8000, d04f180, a) at netbsd:pool_grow+0x24
pool_get(d04b8c0, 0, d04ba58, d04f180, 1033d60, d738000) at netbsd:pool_get+0x14
4
pool_cache_get_slow(0, e001798c, e0017980, 0, 0, 0) at netbsd:pool_cache_get_slo
w+0x204
pool_cache_get_paddr(d04b8c0, 0, 0, 11e8000, d04f180, 1) at netbsd:pool_cache_ge
t_paddr+0x154
m_get(0, 1, 144c8f8, d04f180, 11d2800, 11e6000) at netbsd:m_get+0x1c
m_gethdr(1, 1, 50, 1410190, 0, 11d2800) at netbsd:m_gethdr+0x8
hme_get(39f0800, 2f, 9a6041, 11e8000, d04f180, 1410000) at netbsd:hme_get+0x8
hme_read(39f0800, 2f, 9a6041, 1ffe8c06244, 8100, 3fff) at netbsd:hme_read+0x58
hme_rint(39f0800, 0, 0, 2, d04f180, a) at netbsd:hme_rint+0xe4
hme_intr(39f0800, 0, e0017ed0, d04f180, 1033d60, d738000) at netbsd:hme_intr+0x5
4
sparc_interrupt(14877c0, 1107790, 11cc400, 11e8000, d04f180, 0) at netbsd:sparc_
interrupt+0x23c
mutex_vector_exit(14711e0, d71bd08, 14711e0, 11e8000, d04f180, 1) at netbsd:mute
x_vector_exit+0xf4
timer_intr(0, d047f40, 144c8f8, d04f180, 11d2800, 11e6000) at netbsd:timer_intr+
0x130
softint_thread(d02e0c0, d04f180, 11e5400, 11e5000, 11e5000, 11d2800) at netbsd:s
oftint_thread+0xd0
lwp_trampoline(f005eaf0, fffb1cf8, 110000, 10ee98, fffb1df8, 1) at netbsd:lwp_tr
ampoline+0x8
db{0}> mach cpu 1
db{1}> bt
VOP_LOCK(f35a9a0, 10002, fffffffffffffff8, 0, 40e20cea, 0) at netbsd:VOP_LOCK+0x
28
vn_lock(f35a9a0, 20002, 0, 17, 20b000, 109400) at netbsd:vn_lock+0xb4
vn_write(eef9940, eef9940, f333bf0, d048b40, 1, 40411e58) at netbsd:vn_write+0x9
4
dofilewrite(16, eef9940, 20bc30, 2ff, 1, 1) at netbsd:dofilewrite+0x60
sys_write(1, f333dc0, f333e00, ffffffffbf3b765b, 0, 0) at netbsd:sys_write+0x60
syscall_plain(f333ed0, 3, 40b3c2cc, 19, 40b3c2cc, 800) at netbsd:syscall_plain+0
x120
?(1, 20bc30, 2ff, 0, ffffffffffffb590, 7) at 0x10092fc
db{1}> mach cpu 0
db{0}> ps /l
PID LID S FLAGS STRUCT LWP * NAME WAIT
570 1 3 84 110c18c0 as piperd
387 1 3 4 fbc8aa0 cc1 netio
460 1 3 84 fbc9c20 cc wait
405 1 3 84 fbc98a0 sh wait
566 1 3 84 110c1c40 as piperd
497 1 2 4 fbc8020 cc1
562 1 3 4 fbc83a0 as uvn_fp2
397 1 3 84 fbc8e20 as piperd
526 1 3 84 fbc91a0 cc wait
496 1 2 4 f364000 cc1
371 1 3 84 f364a80 cc wait
436 1 3 84 fbc9520 cc wait
494 1 3 84 f365180 sh wait
391 1 3 84 f1783c0 sh wait
226 1 3 84 f364700 sh wait
427 1 3 84 f364e00 make select
297 1 7 20000004 f178740 top
392 1 3 84 f365500 qvwm select
324 1 2 4 f365880 xclock
97 1 3 84 f365c00 tcsh pause
96 1 3 84 f178040 tcsh pause
360 1 3 84 f1798c0 ssh select
351 1 3 84 e3da020 sh wait
346 1 3 84 f179c40 qvwm select
348 1 3 84 f178ac0 rxvt select
342 1 3 84 f178e40 rxvt select
335 1 2 4 f1791c0 xload
338 1 3 84 f179540 sh wait
313 1 3 84 e3da3a0 XFree86 select
318 1 3 84 e3db8a0 xinit wait
300 1 3 84 e3db1a0 ssh-agent select
293 1 3 84 e3da720 sh wait
292 1 3 84 d05a020 getty tty
280 1 3 84 e3daaa0 getty tty
283 1 3 84 e3dae20 getty tty
285 1 3 84 d05e040 login wait
276 1 3 84 e3db520 cron nanoslp
273 1 3 84 e3dbc20 inetd kqueue
230 1 3 84 e06fc00 sshd select
237 1 3 4 e06ea80 upsmon vm_map
231 1 3 84 e06f500 upsmon piperd
245 1 3 4 e06f180 upsd vm_map
227 1 3 84 e06e000 apcsmart select
211 1 3 84 e06e380 ntpd pause
202 1 3 4 e06e700 mserv vm_map
87 1 3 4 e06f880 syslogd vm_map
1 1 3 84 d05ee40 init wait
>0 34 3 204 e06ee00 swapiod swapiod
33 3 204 d05e3c0 vmem_rehash vmem_rehash
32 3 204 d05e740 aiodoned aiodoned
31 3 204 d05f540 ioflush syncer
30 3 204 d05eac0 pgdaemon pgdaemon
29 3 204 d05fc40 nfsio nfsiod
28 3 204 d05f8c0 nfsio nfsiod
27 3 204 d05f1c0 nfsio nfsiod
26 3 204 d05a3a0 nfsio vm_map
17 3 204 d05a720 scsibus0 sccomp
16 3 204 d05aaa0 xcall/1 xcall
15 1 204 d05ae20 softser/1
14 1 204 d05b1a0 softclk/1
13 1 204 d05b520 softbio/1
12 1 204 d05b8a0 softnet/1
11 1 205 d05bc20 idle/1
10 3 204 d04e000 pmfevent pmfevent
9 3 204 d04e380 cachegc cachegc
8 3 204 d04e700 vrele vrele
7 3 204 d04ea80 xcall/0 xcall
6 1 204 d04ee00 softser/0
> 5 7 20000204 d04f180 softclk/0
4 1 204 d04f500 softbio/0
3 1 204 d04f880 softnet/0
2 1 205 d04fc00 idle/0
1 3 204 140f5a0 swapper schedule
db{0}>
>How-To-Repeat:
Run a parallel build over NFS and top?
>Fix:
n/a
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->yamt
Responsible-Changed-By: yamt@NetBSD.org
Responsible-Changed-When: Fri, 25 Apr 2008 00:33:25 +0000
Responsible-Changed-Why:
mine.
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/38497 (Out of memory allocating ksiginfo)
Date: Mon, 28 Apr 2008 16:02:07 +0100
On Fri, Apr 25, 2008 at 12:33:28AM +0000, yamt@NetBSD.org wrote:
> Synopsis: Out of memory allocating ksiginfo
>
> Responsible-Changed-From-To: kern-bug-people->yamt
> Responsible-Changed-By: yamt@NetBSD.org
> Responsible-Changed-When: Fri, 25 Apr 2008 00:33:25 +0000
> Responsible-Changed-Why:
> mine.
I have a workaround for the panic: don't take softnet_lock in protocol drain
routines. Should I commit it or are you working on this?
Thanks,
Andrew
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: yamt@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
martin@duskware.de
Subject: Re: kern/38497 (Out of memory allocating ksiginfo)
Date: Tue, 29 Apr 2008 00:22:56 +0900 (JST)
> I have a workaround for the panic: don't take softnet_lock in protocol drain
> routines. Should I commit it or are you working on this?
>
> Thanks,
> Andrew
i have not done anything yet. please feel free to commit any
workaround or fix.
is the workaround safe?
eg. what happens if a softnet blocks and another lwp calls
its drain routine?
YAMAMOTO Takashi
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/38497 CVS commit: src/sys/netinet
Date: Fri, 2 May 2008 13:40:33 +0000 (UTC)
Module Name: src
Committed By: ad
Date: Fri May 2 13:40:33 UTC 2008
Modified Files:
src/sys/netinet: if_arp.c ip_input.c tcp_subr.c
Log Message:
PR kern/38497 Out of memory allocating ksiginfo
Work around: don't acquire softnet_lock in protocol drain routines.
To generate a diff of this commit:
cvs rdiff -r1.135 -r1.136 src/sys/netinet/if_arp.c
cvs rdiff -r1.269 -r1.270 src/sys/netinet/ip_input.c
cvs rdiff -r1.230 -r1.231 src/sys/netinet/tcp_subr.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.