NetBSD Problem Report #58964
From www@netbsd.org Mon Jan 6 15:24:51 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
client-signature RSA-PSS (2048 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id D50321A9238
for <gnats-bugs@gnats.NetBSD.org>; Mon, 6 Jan 2025 15:24:51 +0000 (UTC)
Message-Id: <20250106152450.2FF641A923B@mollari.NetBSD.org>
Date: Mon, 6 Jan 2025 15:24:50 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: uvm: missing wakeup on uvmexp.free
X-Send-Pr-Version: www-1.0
>Number: 58964
>Category: kern
>Synopsis: uvm: missing wakeup on uvmexp.free
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: chs
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jan 06 15:25:06 +0000 2025
>Last-Modified: Mon Jan 06 15:26:14 +0000 2025
>Originator: Taylor R Campbell
>Release: 10.1
>Organization:
The NetBageSleepDaemon Foundation
>Environment:
>Description:
I found my bulk build machine hanging today, with lots of processes waiting on flt_pmfail2, flt_pmfail1, flt_noram5, plpg, tstile, and various zfs condvars.
I entered ddb. Sampling of output:
db{0}> ps
...
24652 24652 3 0 0 ffff80ace8f2c480 sh tstile
10978 10978 3 1 180 ffff80abc3a61a00 sh wait
14020 14020 3 0 180 ffff80aba5153540 sh wait
8120 8120 3 1 0 ffff80abb5c2b0c0 sh flt_pmfail2
3599 3599 3 0 0 ffff80ad78b6f540 sh flt_pmfail1
13915 13915 3 0 180 ffff80abb4992980 sh pipe_rd
7909 7909 3 10 0 ffff80abb7e9e580 pkg_admin flt_noram5
19573 19573 3 19 0 ffff80b168ecc8c0 sh flt_noram5
...
0 13053 3 22 200 ffff80b1a9ca3340 zio_write_issue zio_data_buf_983
0 10086 3 23 200 ffff80ac379152c0 zio_write_issue plpg
0 24427 3 8 200 ffff80af3710e040 zio_write_issue zio_data_buf_983
...
0 1630 3 11 200 ffff80b0945d0a80 zio_write_issue zio_data_buf_983
0 16826 3 19 200 ffff80b15d39d740 zio_write_issue plpg
0 17874 3 2 200 ffff80ab83eb81c0 zio_write_issue zio_data_buf_983
...
0 891 3 2 240 ffff80abb8d30bc0 ioflush tstile
0 890 3 23 200 ffff80abb8d30780 pgdaemon &tx->tx_quiesce_
...
db{0}> show all tstiles
PID LID COMMAND WAITING-FOR WAIT-CHANNEL
23365 23365 sync 0 ffff80a95d5dbf80
24652 24652 sh 0 ffff80a95d5dbf80
0 891 system 0 ffff80aed792bd40
I then woke everything waiting on uvmexp.free and the system began to make progress again:
db{0}> call wakeup(uvmexp+0x10)
7
db{0}> continue
Nothing stuck on flt_pmfail2, flt_pmfail1, flt_noram5, plpg, or tstile, according to `ps -Alww | grep ...'.
>How-To-Repeat:
no idea
>Fix:
Yes, please!
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->chs
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Mon, 06 Jan 2025 15:26:14 +0000
Responsible-Changed-Why:
Can I trouble you to take a look?
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.