NetBSD Problem Report #55421

From martin@duskware.de  Fri Jun 26 08:01:52 2020
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id CAEE91A9217
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 26 Jun 2020 08:01:52 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: kernel stalls on pager_map, not allowing any userland processes to run
X-Send-Pr-Version: 3.95

>Number:         55421
>Category:       kern
>Synopsis:       kernel stalls on pager_map, not allowing any userland processes to run
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jun 26 08:05:00 +0000 2020
>Last-Modified:  Fri Jun 26 08:25:01 +0000 2020
>Originator:     Martin Husemann
>Release:        NetBSD 9.99.68
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD gethsemane.duskware.de 9.99.68 NetBSD 9.99.68 (GETHSEMANE) #53: Fri Jun 26 06:59:52 CEST 2020 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/macppc/compile/GETHSEMANE macppc
Architecture: powerpc
Machine: macppc
>Description:

Trying to update userland on this machine stalled during set extraction:

While unpacking sets:

[ 5371.8397401] load: 0.00  cmd: tar 1078 [pager_map] 1.03u 2.12s 0% 4224k
db{0}> ps
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
2299  2299 3   1         0            f1283c0               cron uvnfp2
2113  2113 3   1         0            f0cf680               cron uvnfp2
1741  1741 3   1         0           10668400               cron uvnfp2
1752  1752 3   1         0           10668d00               cron uvnfp2
1483  1483 3   0         0           1157d940               cron uvnfp2
1575  1575 3   1         0           1157d640               cron uvnfp2
1633  1633 3   1         0           10eb0700               cron uvnfp2
1325  1325 3   1         0           1080c940               cron uvnfp2
1235  1235 3   1         0           10eb0400               cron biowait
1078  1078 3   0   1000000           10668700                tar pager_map
972    972 3   0        80           1080c040               tcsh pause
958    958 3   1        c0           10eb0d00              login wait
960    960 3   1        80           10a0c380              getty nanoslp
714    714 3   1        80           10668a00              getty nanoslp
829    829 3   1        80           102e93c0              getty nanoslp
828    828 3   0        80           10668100              getty nanoslp
895    895 3   0        80           1157d040               cron nanoslp
815    815 3   1        80           10eb0a00              inetd kqueue
858    858 3   1        80           10acf9c0               sshd select
852    852 3   1        80           10acfcc0             powerd kqueue
698    698 3   0   1000000           10eb0100               ntpd biowait
738    738 3   0        80           1080cc40            syslogd kqueue
337    337 3   1        80           10acf3c0             dhcpcd poll
336    336 3   1        80           10acf0c0             dhcpcd poll
335    335 3   0        80           10a0cc80             dhcpcd poll
334    334 3   1        80           10a0c680             dhcpcd poll
194    194 3   1        80           1080c640            wdogctl nanoslp
1        1 3   0        80           10294380               init wait
0      198 3   1       200           102e96c0            physiod physiod
0      165 3   1       240           102e9cc0            ioflush pager_map
0      163 3   1       200           1021a3c0          pooldrain pooldrain
0       31 3   1       240           102e99c0           pgdaemon pgdaemon
0      125 3   1       200           102e90c0          swwreboot swwreboot
0      124 3   0       200           10294980          atapibus0 sccomp
0      122 3   0       200           5f8a9c80               usb1 usbevt
0      121 3   0       200           10294c80               usb0 usbevt
0      119 3   0       200           10294680             npfgc0 npfgcw
0      118 3   0       200           10294080            rt_free rt_free
0      117 3   0       200           10267c40              unpgc unpgc
0      116 3   0       200           10267940    key_timehandler key_timehandler

0      115 3   1       200           10267640    icmp6_wqinput/1 icmp6_wqinput
0      114 3   0       200           10267340    icmp6_wqinput/0 icmp6_wqinput
0      113 3   0       200           10267040          nd6_timer nd6_timer
0      112 3   1       200           1021fd00    carp6_wqinput/1 carp6_wqinput
0      111 3   0       200           1021fa00    carp6_wqinput/0 carp6_wqinput
0      110 3   1       200           1021f700     carp_wqinput/1 carp_wqinput
0      109 3   0       200           1021acc0     carp_wqinput/0 carp_wqinput
0      108 3   1       200           1021a6c0     icmp_wqinput/1 icmp_wqinput
0      107 3   0       200           1021a0c0     icmp_wqinput/0 icmp_wqinput
0      106 3   0       200           1021f400           rt_timer rt_timer
0      105 3   0       200           1021a9c0        vmem_rehash vmem_rehash
0      104 3   0       200           1021f100          entbutler entropy
0       30 3   0       280           5f8a9980           fw0probe ieee1394
0       29 3   0       200           5f8a9680         usbtask-dr usbtsk
0       28 3   0       200           5f8a9380         usbtask-hc usbtsk
0       27 3   0       200           5f8a9080            atabus2 atath
0       26 3   1       240           5f8e2c40            atabus1 atath
0       25 3   0       240           5f8e2940            atabus0 atath
0       24 3   0       200           5f8e2640               iic1 iicintr
0       23 3   0       200           5f8e2340                pmu wait
0       22 3   0       200           5f8e2040               iic0 iicintr
0       21 3   1       200           5f8f1d00            xcall/1 xcall
0       20 1   1       200           5f8f1a00          softser/1
0    >  19 7   1       200           5f8f1700          softclk/1
0       18 1   1       200           5f8f1400          softbio/1
0       17 1   1       200           5f8f1100          softnet/1
0    >  16 1   1       201           5f909cc0             idle/1
0       15 3   0       200           5f9099c0             sysmon smtaskq
0       14 3   0       200           5f9096c0         pmfsuspend pmfsuspend
0       13 3   0       200           5f9093c0           pmfevent pmfevent
0       12 3   0       200           5f9090c0         sopendfree sopendfr
0       11 3   1       200           5fb1ec80            iflnkst iflnkst
0       10 3   0       200           5fb1e980           nfssilly nfssilly
0        9 3   0       200           5fb1e680             vdrain vdrain
0        8 3   0       200           5fb1e380          modunload mod_unld
0        7 3   0       200           5fb1e080            xcall/0 xcall
0        6 1   0       200           5fb28c40          softser/0
0        5 1   0       200           5fb28940          softclk/0
0        4 3   0       200           5fb28640          softbio/0 tstile
0        3 1   0       200           5fb28340          softnet/0
0    >   2 1   0       201           5fb28040             idle/0
0        0 3   0       200             ccc540            swapper uvm
db{0}> sh uvmexp
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=1
  376111 VM pages: 255899 active, 85393 inactive, 2899 wired, 12180 free
  pages  6814 anon, 332580 file, 4799 exec
  freemin=512, free-target=682, wired-max=125370
  resv-pg=1, resv-kernel=10
  bootpages=8111, poolpages=18967
  faults=251967, traps=257241, intrs=1739388, ctxswitch=366741
   softint=399627, syscalls=0
  fault counts:
    noram=0, noanon=0, pgwait=0, pgrele=0
    ok relocks(total)=1532(1532), anget(retrys)=33487(0), amapcopy=18577
    neighbor anon/obj pg=26623/261522, gets(lock/unlock)=71570/1541
    cases: anon=19255, anoncow=14232, obj=58563, prcopy=12998, przero=30179
  daemon and swap counts:
    woke=133, revs=133, scans=370647, obscans=358696, anscans=0
    busy=12, freed=358696, reactivate=143, deactivate=488602
    pageouts=0, pending=0, nswget=0
    nswapdev=1, swpgavail=1152017
    swpages=1152017, swpginuse=0, swpgonly=0, paging=0
db{0}> reboot
swwdog: 60 second timer expired
[ 5476.7749240] panic: watchdog timer expired
[ 5476.7749240] cpu0: Begin traceback...
[ 5476.7749240] 0x1000fdf0: at vpanic+0x12c
[ 5476.7749240] 0x1000fe20: at panic+0x50
[ 5476.7749240] 0x1000fe60: at swwdog_panic+0x90
[ 5476.7749240] 0x1000fe70: at callout_softclock+0x418
[ 5476.7749240] 0x1000feb0: at softint_dispatch+0x140
[ 5476.7749240] 0x1000ff20: at softint_fast_dispatch+0xdc
[ 5476.7749240] saved LR(0xfb3ffb79) is invalid.cpu0: End traceback...


I've been told this usually points at driver issues (not completing IO 
requests). In this case the only involved file system was ffs on plain
old ATA disk, full dmesg at

	http://netbsd.org/~martin/macppc-atf/dmesg.txt

>How-To-Repeat:
Not sure, only happend once so far
>Fix:
n/a

>Audit-Trail:
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org
Cc: Martin Husemann <martin@NetBSD.org>
Subject: Re: kern/55421: kernel stalls on pager_map, not allowing any userland
 processes to run
Date: Fri, 26 Jun 2020 17:22:39 +0900

 I observed a similar problems on macppc. It occurred often for me.
 As a workaround, I turned off __HAVE_FAST_SOFTINTS:

 ----
 Index: src/sys/arch/powerpc/include/intr.h
 ===================================================================
 RCS file: /home/netbsd/src/sys/arch/powerpc/include/intr.h,v
 retrieving revision 1.17
 diff -p -u -r1.17 intr.h
 --- src/sys/arch/powerpc/include/intr.h	16 Apr 2020 23:29:52 -0000	1.17
 +++ src/sys/arch/powerpc/include/intr.h	26 Jun 2020 08:14:56 -0000
 @@ -34,7 +34,10 @@ __KERNEL_RCSID(0, "$NetBSD: intr.h,v 1.1
   #ifndef _POWERPC_INTR_MACHDEP_H_
   #define _POWERPC_INTR_MACHDEP_H_

 +#if 0
 +/* XXX kern/55421, port-vax/55415 */
   #define	__HAVE_FAST_SOFTINTS	1
 +#endif


   /* Interrupt priority `levels'. */
 ----

 Since then I've never observed similar stalls. I wonder whether
 this is related to port-vax/55415 or not:

 http://gnats.netbsd.org/55415

 (vax is also port with __HAVE_FAST_SOFTINTS enabled). But, I've
 never still examined a problem in details yet...

 Thanks,
 rin

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.