NetBSD Problem Report #33278

From jld@panix.com  Tue Apr 18 02:51:26 2006
Return-Path: <jld@panix.com>
Received: from mail3.panix.com (mail3.panix.com [166.84.1.74])
	by narn.netbsd.org (Postfix) with ESMTP id 34E2463B8A0
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 18 Apr 2006 02:51:26 +0000 (UTC)
Message-Id: <200604180251.k3I2pPb23386@byzantium.nyc.access.net>
Date: Mon, 17 Apr 2006 22:51:25 -0400 (EDT)
From: jld@panix.com
Reply-To: jld@panix.com
To: gnats-bugs@netbsd.org
Subject: lockup with uvm_fault sleeping on "flt_pmfail2" and pagedaemon running but not helping
X-Send-Pr-Version: 3.95

>Number:         33278
>Category:       kern
>Synopsis:       A process repeatetdly sleeps in uvm_fault on "flt_pmfail2"; pagedaemon is repeatedly woken up and does not help it; and nothing else can run.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Apr 18 02:55:00 +0000 2006
>Last-Modified:  Fri Apr 21 00:40:01 +0000 2006
>Originator:     Jed Davis
>Release:        NetBSD 3.0
>Organization:
PANIX Public Access Internet and UNIX, NYC
>Environment:
System: NetBSD mailproc1.panix.com 3.0 NetBSD 3.0 (PANIX-APPLIANCE) #0: Wed Mar 22 20:57:32 EST 2006  root@trinity.nyc.access.net:/devel/netbsd/3.0/src/sys/arch/i386/compile/PANIX-APPLIANCE i386
Architecture: i386
Machine: i386
>Description:

The host in question runs 3.0/i386, is diskless, and has an NFS swap
file which is only rarely used.  Earlier today it locked up (answered
ping but did nothing else) in an interesting way -- breaking it
repeatedly, or setting a breakpoint on ltsleep(), showed that it was
alternating between running a user command -- which had taken a page
fault and kept sleeping with message "flt_pmfail2", which appears to
happen only when the pmap_enter to resolve the fault fails -- and the
pagedaemon, which was being woken up from the sleep at the top of
the loop in uvm_pageout(), doing something (it wasn't clear what),
then waking up the user process and going back to sleep.  Clearly the
pagedaemon wasn't helping whatever the faulting process's problem was,
because it would keep failing the pmap_enter and going back to sleep.

And the weird thing is that it wasn't out of ram -- uvmexp reported
30054 pages free, and 14709 of 131072 swap pages in use.  Full "show
uvm" output was:

Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
  250200 VM pages: 118266 active, 72662 inactive, 1856 wired, 30054 free
  min  10% (25) anon, 5% (12) file, 5% (12) exec
  max  90% (230) anon, 10% (25) file, 30% (76) exec
  pages  171710 anon, 17940 file, 11042 exec
  freemin=64, free-target=85, inactive-target=63642, wired-max=83400
  faults=857377139, traps=1028644388, intrs=206793316, ctxswitch=594529707
  softint=197099867, syscalls=-1658109657, swapins=343, swapouts=363
  fault counts:
    noram=21795, noanon=0, pgwait=28, pgrele=0
    ok relocks(total)=4302(4326), anget(retrys)=210566639(3935), amapcopy=120469760
    neighbor anon/obj pg=170692371/1524327373, gets(lock/unlock)=444367593/391
    cases: anon=132483510, anoncow=69967129, obj=378986201, prcopy=65381150, przero=229724299
  daemon and swap counts:
    woke=30080168, revs=739, scans=1438018, obscans=137186, anscans=14713
    busy=0, freed=151899, reactivate=540466, deactivate=1706185
    pageouts=986, pending=0, nswget=3386
    nswapdev=1, nanon=365567, nanonneeded=365567 nfreeanon=192581
    swpages=131072, swpginuse=14709, swpgonly=11331 paging=0

>How-To-Repeat:

No obvious way to reproduce, and this hasn't been occurring often enough
to be a pain, yet.  In any case, since the host is diskless, I can't get
a core.  My hope is that I've gathered enough information that a problem
might be visible from inspection, for someone else at least.

>Fix:

Rebooting the box makes for a passable, if rather suboptimal, workaround.

r1.45 of uvm_bio.c was mentioned on a mailing list recently, and it 
looks like it might be related, but I don't know if it is.

>Audit-Trail:
From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/33278: lockup with uvm_fault sleeping on "flt_pmfail2" and
 pagedaemon running but not helping
Date: Fri, 21 Apr 2006 09:38:26 +0900

 > And the weird thing is that it wasn't out of ram -- uvmexp reported
 > 30054 pages free, and 14709 of 131072 swap pages in use.  Full "show
 > uvm" output was:

 kmem_map starvation?

 > r1.45 of uvm_bio.c was mentioned on a mailing list recently, and it 
 > looks like it might be related, but I don't know if it is.

 i don't think it's related.

 YAMAMOTO Takashi

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.