NetBSD Problem Report #55491

From martin@aprisoft.de  Wed Jul 15 04:32:15 2020
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id ADF901A9213
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 15 Jul 2020 04:32:15 +0000 (UTC)
Message-Id: <20200715043205.C75C75CC80A@emmas.aprisoft.de>
Date: Wed, 15 Jul 2020 06:32:05 +0200 (CEST)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: pgdaemon endless loop
X-Send-Pr-Version: 3.95

>Number:         55491
>Category:       port-sh3
>Synopsis:       pgdaemon endless loop
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-sh3-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 15 04:35:00 +0000 2020
>Last-Modified:  Fri Feb 19 18:35:01 +0000 2021
>Originator:     Martin Husemann
>Release:        NetBSD 9.99.69
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD last-of-the-heroes.aprisoft.de 9.99.69 NetBSD 9.99.69 (GENERIC) #63: Tue Jul 14 11:32:20 CEST 2020 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/landisk/compile/GENERIC landisk
Architecture: sh3el
Machine: landisk
>Description:

Running full ATF tests on this machine hangs:

atf/atf-c/check_test (818/874): 10 test cases
    build_c_o: [3.081994s] Passed.
    build_cpp: [1.743369s] Passed.
    build_cxx_o: 

db> ps
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
25802 25802 3   0     40000           8ccb96c0            cc1plus flt_noram5
24275 24275 3   0        80           8ce750c0                c++ wait
21692 21692 3   0        80           8e805d00         check_test wait
24928 24928 3   0        80           8d000700         check_test wait
[...]
0     26773 5   0       200           8cd000c0           (zombie)
0     7119 3   0       200           8d793640               fss0 fssbs
0      124 3   0       200           8fe529c0            physiod physiod
0      106 3   0       200           8fe21400          pooldrain pooldrain
0      105 3   0       200           8fe21100            ioflush syncer
0    > 104 7   0       200           8fe52cc0           pgdaemon
0      100 3   0       200           8fee9a00               usb2 usbevt
0       99 3   0       200           8fee9700               usb1 usbevt
0       98 3   0       200           8fe526c0               usb0 usbevt
0       97 3   0       200           8fe523c0             npfgc0 npfgcw
[...]
db> show uvmexp
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=1
  14958 VM pages: 5770 active, 2786 inactive, 2213 wired, 11 free
  pages  2460 anon, 4723 file, 3585 exec
  freemin=74, free-target=98, wired-max=4986
  resv-pg=1, resv-kernel=5
  bootpages=239, poolpages=3805
  faults=60765971, traps=158990659, intrs=9052432, ctxswitch=31345702
   softint=2311240, syscalls=158990408
  fault counts:
    noram=409, noanon=0, pgwait=15, pgrele=0
    ok relocks(total)=372218(372541), anget(retrys)=16874866(20771), amapcopy=11
516323
    neighbor anon/obj pg=12090708/117784454, gets(lock/unlock)=28832568/351945
    cases: anon=10829102, anoncow=6045719, obj=24998180, prcopy=3833877, przero=
14753627
  daemon and swap counts:
    woke=1302, revs=1263, scans=1040082, obscans=329196, anscans=69546
    busy=286, freed=398741, reactivate=124785, deactivate=1253914
    pageouts=6698, pending=96001, nswget=37390
    nswapdev=1, swpgavail=65531
    swpages=65531, swpginuse=1903, swpgonly=1578, paging=0
db> c
Stopped in pid 0.104 (system) at        netbsd:cpu_Debugger+0x2:        rts
db> show uvmexp
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=1
  14958 VM pages: 5770 active, 2786 inactive, 2213 wired, 11 free
  pages  2460 anon, 4723 file, 3585 exec
  freemin=74, free-target=98, wired-max=4986
  resv-pg=1, resv-kernel=5
  bootpages=239, poolpages=3805
  faults=60765971, traps=158990660, intrs=9054426, ctxswitch=31345702
   softint=2311240, syscalls=158990408
  fault counts:
    noram=409, noanon=0, pgwait=15, pgrele=0
    ok relocks(total)=372218(372541), anget(retrys)=16874866(20771), amapcopy=11
516323
    neighbor anon/obj pg=12090708/117784454, gets(lock/unlock)=28832568/351945
    cases: anon=10829102, anoncow=6045719, obj=24998180, prcopy=3833877, przero=
14753627
  daemon and swap counts:
    woke=1302, revs=1263, scans=1040082, obscans=329196, anscans=69546
    busy=286, freed=398741, reactivate=124785, deactivate=1253914
    pageouts=6698, pending=96001, nswget=37390
    nswapdev=1, swpgavail=65531
    swpages=65531, swpginuse=1903, swpgonly=1578, paging=0



>How-To-Repeat:
s/a

>Fix:
n/a

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->port-sh3-maintainer
Responsible-Changed-By: rin@NetBSD.org
Responsible-Changed-When: Fri, 02 Oct 2020 09:00:38 +0000
Responsible-Changed-Why:
sh3 specific


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-sh3/55491: pmap corruption
Date: Fri, 19 Feb 2021 19:31:27 +0100

 When the endless loop in this PR happens

 	pmap_page_protect()

 is called and in the "default:" remove-all case. It loops over
 all page entries in

 	while ((pv = SLIST_FIRST(&pvh->pvh_head)) != NULL)

 and tries to pmap_remove() each VA, but in the bug case there is a VA
 in the list for which no mapping exists.

 I have no idea yet how this VA gets into the list (or what happened
 to it/the mapping).

 Martin

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.