NetBSD Problem Report #52858
From martin@duskware.de Sun Dec 24 20:26:42 2017
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id BC3527A188
for <gnats-bugs@gnats.NetBSD.org>; Sun, 24 Dec 2017 20:26:42 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: kernel lock up
X-Send-Pr-Version: 3.95
>Number: 52858
>Category: kern
>Synopsis: kernel lock up
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: riastradh
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Dec 24 20:30:00 +0000 2017
>Closed-Date: Sun Jul 26 16:18:37 +0000 2020
>Last-Modified: Sun Jul 26 16:18:37 +0000 2020
>Originator: Martin Husemann
>Release: NetBSD 8.99.9
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD thirdstage.duskware.de 8.99.9 NetBSD 8.99.9 (MODULAR) #40: Sat Dec 23 21:52:55 CET 2017 martin@thirdstage.duskware.de:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
Architecture: sparc64
Machine: sparc64
>Description:
My machine "randomly" locked up (happened only once, no idea how to reproduce)
ddb backtrace shows:
intr_list_handler(10473bc10, 7, e0047620, 8000000000000000, 1042d60, 10473bb30) at netbsd:intr_list_handler+0x10
sparc_interrupt(1, 7, e00476d0, 8000000000000000, 6, e0048000) at netbsd:sparc_interrupt+0x294
sparc_interrupt(103b915b0, 70000000001, ff070000000001, 18b8400, 6, e0048000) at netbsd:sparc_interrupt+0x294
pool_grow(103b915b0, 2, 18d8800, 18b8400, 0, 2000) at netbsd:pool_grow+0x508
pool_catchup(103b91500, 103b915b1, 18e1800, 18e0c00, 8e7, 105823a20) at netbsd:pool_catchup+0x20
pool_get(18e0d00, 2, 105b1f000, 103b915b0, 105120780, 103b91500) at netbsd:pool_get+0x550
pool_cache_get_slow(103b91740, 7, e0047bb8, 104a41bc0, 2, 103b91500) at netbsd:pool_cache_get_slow+0x1b8
pool_cache_get_paddr(103b91500, 2, 104a41bc0, 1858730, 7, 103b91740) at netbsd:pool_cache_get_paddr+0x298
bge_newbuf_std(1046a2000, 105, 104a41b20, 104a2bed8, 1ce9000, 1046a2828) at netbsd:bge_newbuf_std+0x190
bge_intr(1046a2000, 6, 60000, 1cf34c000, 600e, 6) at netbsd:bge_intr+0xbfc
intr_biglock_wrapper(103b4e548, 0, e0047ed0, 18e0c00, 1042dc0, 103b91500) at netbsd:intr_biglock_wrapper+0x10
sparc_interrupt(1c9e098, 105823a20, ff070000000001, 18d2800, 0, 103ae8280) at netbsd:sparc_interrupt+0x294
>How-To-Repeat:
n/a
>Fix:
n/a
>Release-Note:
>Audit-Trail:
From: coypu@sdf.org
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/52858: kernel lock up
Date: Mon, 25 Dec 2017 02:45:10 +0000
It looks like we can spin forever in pool_catchup if we have PR_WAITOK
allocation sleeping followed by a PR_NOWAIT allocation.
Single CPU, no kpreemption arch
[lwp #1]
|
[ ?? ]
|
[pool_grow with PR_WAITOK
[set PR_GROWING
[allocation, decide to sleep
|
zzZzzZ [lwp #2]
|
[ ?? ]
|
[pool_catchup with PR_NOWAIT
[see PR_GROWING already set,
[spin forever returning ERESTART
[(nothing ever preempts me or
increases the pool items)
Responsible-Changed-From-To: kern-bug-people->riastradh
Responsible-Changed-By: maya@NetBSD.org
Responsible-Changed-When: Mon, 25 Dec 2017 02:57:20 +0000
Responsible-Changed-Why:
Ping, you might know about this. also, note christos added a change to have corret ERESTART behaviour following your commit.
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/52858: kernel lock up
Date: Tue, 26 Dec 2017 18:12:10 +0900
On Mon, Dec 25, 2017 at 11:50 AM, <coypu@sdf.org> wrote:
> The following reply was made to PR kern/52858; it has been noted by GNATS.
>
> From: coypu@sdf.org
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/52858: kernel lock up
> Date: Mon, 25 Dec 2017 02:45:10 +0000
>
> It looks like we can spin forever in pool_catchup if we have PR_WAITOK
> allocation sleeping followed by a PR_NOWAIT allocation.
>
>
> Single CPU, no kpreemption arch
>
> [lwp #1]
> |
> [ ?? ]
> |
> [pool_grow with PR_WAITOK
> [set PR_GROWING
> [allocation, decide to sleep
> |
> zzZzzZ [lwp #2]
> |
> [ ?? ]
> |
> [pool_catchup with PR_NOWAIT
> [see PR_GROWING already set,
> [spin forever returning ERESTART
> [(nothing ever preempts me or
> increases the pool items)
>
FYI: similar backtrace here (on amd64 though):
http://mail-index.netbsd.org/source-changes-d/2017/12/26/msg009751.html
ozaki-r
State-Changed-From-To: open->feedback
State-Changed-By: prlw1@NetBSD.org
State-Changed-When: Sun, 26 Jul 2020 15:28:17 +0000
State-Changed-Why:
The bug discussed in the link ozaki-r added about looks very similar and
was fixed by
http://cvsweb.netbsd.org/cgi-bin/cvsweb.cgi/src/sys/kern/subr_pool.c#rev1.220
Issue resolved?
State-Changed-From-To: feedback->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Sun, 26 Jul 2020 16:18:37 +0000
State-Changed-Why:
I have no way that would have reliably triggered it and have not seen it
again, so no way to verify the fix.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.