NetBSD Problem Report #55491
From martin@aprisoft.de Wed Jul 15 04:32:15 2020
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id ADF901A9213
for <gnats-bugs@gnats.NetBSD.org>; Wed, 15 Jul 2020 04:32:15 +0000 (UTC)
Message-Id: <20200715043205.C75C75CC80A@emmas.aprisoft.de>
Date: Wed, 15 Jul 2020 06:32:05 +0200 (CEST)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: pgdaemon endless loop
X-Send-Pr-Version: 3.95
>Number: 55491
>Category: port-sh3
>Synopsis: pgdaemon endless loop
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: port-sh3-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jul 15 04:35:00 +0000 2020
>Last-Modified: Fri Feb 19 18:35:01 +0000 2021
>Originator: Martin Husemann
>Release: NetBSD 9.99.69
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD last-of-the-heroes.aprisoft.de 9.99.69 NetBSD 9.99.69 (GENERIC) #63: Tue Jul 14 11:32:20 CEST 2020 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/landisk/compile/GENERIC landisk
Architecture: sh3el
Machine: landisk
>Description:
Running full ATF tests on this machine hangs:
atf/atf-c/check_test (818/874): 10 test cases
build_c_o: [3.081994s] Passed.
build_cpp: [1.743369s] Passed.
build_cxx_o:
db> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
25802 25802 3 0 40000 8ccb96c0 cc1plus flt_noram5
24275 24275 3 0 80 8ce750c0 c++ wait
21692 21692 3 0 80 8e805d00 check_test wait
24928 24928 3 0 80 8d000700 check_test wait
[...]
0 26773 5 0 200 8cd000c0 (zombie)
0 7119 3 0 200 8d793640 fss0 fssbs
0 124 3 0 200 8fe529c0 physiod physiod
0 106 3 0 200 8fe21400 pooldrain pooldrain
0 105 3 0 200 8fe21100 ioflush syncer
0 > 104 7 0 200 8fe52cc0 pgdaemon
0 100 3 0 200 8fee9a00 usb2 usbevt
0 99 3 0 200 8fee9700 usb1 usbevt
0 98 3 0 200 8fe526c0 usb0 usbevt
0 97 3 0 200 8fe523c0 npfgc0 npfgcw
[...]
db> show uvmexp
Current UVM status:
pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=1
14958 VM pages: 5770 active, 2786 inactive, 2213 wired, 11 free
pages 2460 anon, 4723 file, 3585 exec
freemin=74, free-target=98, wired-max=4986
resv-pg=1, resv-kernel=5
bootpages=239, poolpages=3805
faults=60765971, traps=158990659, intrs=9052432, ctxswitch=31345702
softint=2311240, syscalls=158990408
fault counts:
noram=409, noanon=0, pgwait=15, pgrele=0
ok relocks(total)=372218(372541), anget(retrys)=16874866(20771), amapcopy=11
516323
neighbor anon/obj pg=12090708/117784454, gets(lock/unlock)=28832568/351945
cases: anon=10829102, anoncow=6045719, obj=24998180, prcopy=3833877, przero=
14753627
daemon and swap counts:
woke=1302, revs=1263, scans=1040082, obscans=329196, anscans=69546
busy=286, freed=398741, reactivate=124785, deactivate=1253914
pageouts=6698, pending=96001, nswget=37390
nswapdev=1, swpgavail=65531
swpages=65531, swpginuse=1903, swpgonly=1578, paging=0
db> c
Stopped in pid 0.104 (system) at netbsd:cpu_Debugger+0x2: rts
db> show uvmexp
Current UVM status:
pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=1
14958 VM pages: 5770 active, 2786 inactive, 2213 wired, 11 free
pages 2460 anon, 4723 file, 3585 exec
freemin=74, free-target=98, wired-max=4986
resv-pg=1, resv-kernel=5
bootpages=239, poolpages=3805
faults=60765971, traps=158990660, intrs=9054426, ctxswitch=31345702
softint=2311240, syscalls=158990408
fault counts:
noram=409, noanon=0, pgwait=15, pgrele=0
ok relocks(total)=372218(372541), anget(retrys)=16874866(20771), amapcopy=11
516323
neighbor anon/obj pg=12090708/117784454, gets(lock/unlock)=28832568/351945
cases: anon=10829102, anoncow=6045719, obj=24998180, prcopy=3833877, przero=
14753627
daemon and swap counts:
woke=1302, revs=1263, scans=1040082, obscans=329196, anscans=69546
busy=286, freed=398741, reactivate=124785, deactivate=1253914
pageouts=6698, pending=96001, nswget=37390
nswapdev=1, swpgavail=65531
swpages=65531, swpginuse=1903, swpgonly=1578, paging=0
>How-To-Repeat:
s/a
>Fix:
n/a
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->port-sh3-maintainer
Responsible-Changed-By: rin@NetBSD.org
Responsible-Changed-When: Fri, 02 Oct 2020 09:00:38 +0000
Responsible-Changed-Why:
sh3 specific
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-sh3/55491: pmap corruption
Date: Fri, 19 Feb 2021 19:31:27 +0100
When the endless loop in this PR happens
pmap_page_protect()
is called and in the "default:" remove-all case. It loops over
all page entries in
while ((pv = SLIST_FIRST(&pvh->pvh_head)) != NULL)
and tries to pmap_remove() each VA, but in the bug case there is a VA
in the list for which no mapping exists.
I have no idea yet how this VA gets into the list (or what happened
to it/the mapping).
Martin
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.