NetBSD Problem Report #48432

From www@NetBSD.org  Sat Dec  7 15:36:20 2013
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id AA0F6A641B
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  7 Dec 2013 15:36:20 +0000 (UTC)
Message-Id: <20131207153618.E258DA644F@mollari.NetBSD.org>
Date: Sat,  7 Dec 2013 15:36:18 +0000 (UTC)
From: jcarr@poethecat.com
Reply-To: jcarr@poethecat.com
To: gnats-bugs@NetBSD.org
Subject: System hangs under light load
X-Send-Pr-Version: www-1.0

>Number:         48432
>Category:       port-sun3
>Synopsis:       System hangs under light load
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    port-sun3-maintainer
>State:          feedback
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Dec 07 15:40:00 +0000 2013
>Closed-Date:    
>Last-Modified:  Sun Jan 20 20:55:51 +0000 2019
>Originator:     John Carr
>Release:        NetBSD/sun3-6.1.2
>Organization:
>Environment:
NetBSD moonbeam 6.1.2 NetBSD 6.1.2 (GENERIC) sun3                               

>Description:
(noticed same issue on an upgrade from NetBSD-5.2.1 to 6.1.2 as well as a new install of 6.1.2. Additionally, seen on 6.0, 6.1, 6.1.1)

System will, essentially, just hang, unable to create new processes with no syslog output. Existing processes will continue, but any new ones created (like cron instances, shells in screen, etc.) will not finish; they'll spawn, but hang. Additionally, they cannot be killed. Within a pretty short amount of time, the system grinds itself down as processes get backed up and nothing will run. I've recreated this happen on a system running nothing more than screen with top in one window. Additionally, it's happened in single user when running a few commands in a row. I've not seen a commonality of how/when this happens yet. The only recourse is to break to debugger and sync, halt, or reboot.

That said, it can be easily recreated but not in a consistent manner. However, I've yet to have the system run for more than about 10 minutes of consistent use (edit of files, man, ftp, etc.) before it hangs. Essentially, it's not a usable system for much more than login and top.

My guess would be that this is a kernel issue, but I'm not sure what info to provide. More than happy to do so.
>How-To-Repeat:
Use the system.
>Fix:
N/A

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-sun3/48432: System hangs under light load
Date: Sat, 7 Dec 2013 16:34:49 +0000

 On Sat, Dec 07, 2013 at 03:40:00PM +0000, jcarr@poethecat.com wrote:
  > My guess would be that this is a kernel issue, but I'm not sure
  > what info to provide. More than happy to do so.

 First thing that comes to mind: is it unable to create processes (that
 is, things hang in fork) or are processes unable to exit, so that
 after a while the process table fills?

 Can you get ps output, perhaps from ddb?

 -- 
 David A. Holland
 dholland@netbsd.org

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org, para@netbsd.org
Cc: port-sun3-maintainer@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: port-sun3/48432: System hangs under light load
Date: Sun, 08 Dec 2013 11:30:01 +1100

 i wonder if this is another symptom of vmem/kmem changes in netbsd-6.

 Lars?


 .mrg.

From: Lars Heidieker <lars@heidieker.de>
To: gnats-bugs@NetBSD.org, port-sun3-maintainer@netbsd.org, 
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, jcarr@poethecat.com
Cc: 
Subject: Re: port-sun3/48432: System hangs under light load
Date: Sun, 08 Dec 2013 16:57:50 +0100

 On 12/08/2013 01:35 AM, matthew green wrote:
 > The following reply was made to PR port-sun3/48432; it has been noted by GNATS.
 > 
 > From: matthew green <mrg@eterna.com.au>
 > To: gnats-bugs@NetBSD.org, para@netbsd.org
 > Cc: port-sun3-maintainer@netbsd.org, gnats-admin@netbsd.org,
 >     netbsd-bugs@netbsd.org
 > Subject: re: port-sun3/48432: System hangs under light load
 > Date: Sun, 08 Dec 2013 11:30:01 +1100
 > 
 >  i wonder if this is another symptom of vmem/kmem changes in netbsd-6.
 >  
 >  Lars?
 >  
 >  
 >  .mrg.
 >  
 > 

 for a sun3 I could think of the kva being very limited to be a problem,
 not so for sun3x.
 could be anything most likely something port specific.
 can you get some back traces and a dmesg?

 Lars

From: John Carr <jcarr@poethecat.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-sun3/48432: System hangs under light load
Date: Sun, 08 Dec 2013 10:54:22 -0500

 On 12/07/2013 11:35 AM, David Holland wrote:
 >   On Sat, Dec 07, 2013 at 03:40:00PM +0000, jcarr@poethecat.com wrote:
 >    > My guess would be that this is a kernel issue, but I'm not sure
 >    > what info to provide. More than happy to do so.
 >   
 >   First thing that comes to mind: is it unable to create processes (that
 >   is, things hang in fork) or are processes unable to exit, so that
 >   after a while the process table fills?
 >   
 >   Can you get ps output, perhaps from ddb?

 Sure thing. It's below. I'll leave it at the debugger prompt for later 
 troubleshooting.


 Stopped in pid 0.2 (system) at  netbsd:cpu_Debugger+0x6: unlk    a6
 db> ps
 PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 1081     1 3   0         0            e542600               cron vm_map
 1112     1 3   0        80            e543680               sshd netio
 539      1 3   0         0            e543c00               sshd vm_map
 970      1 3   0         0            e542b80               cron vm_map
 527      1 3   0         0            e4ea060               cron vm_map
 941      1 3   0         0            e5428c0               cron vm_map
 767      1 3   0        80            e341900             pickup kqueue
 758      1 3   0   1000000            e5433c0                csh vm_map
 945      1 3   0        80            e542e40               sshd select
 1008     1 3   0         0            e543940               cron vm_map
 903      1 3   0         0            e31f0a0              login vm_map
 722      1 3   0         0            e543100                csh vm_map
 558      1 3   0        80            e4ea5e0                top ttyout
 457      1 3   0        80            e4eb3a0                csh pause
 467      1 3   0         0            e4ea320       screen-4.0.3 vm_map
 468      1 3   0        80            e4eae20       screen-4.0.3 pause
 445      1 3   0        80            e4ea8a0                csh pause
 66       1 3   0        80            e4eb660                 su wait
 451      1 3   0        80            e4eb0e0                 sh wait
 452      1 3   0        80            e4eab60              login wait
 392      1 3   0        80            e340300            telnetd select
 389      1 3   0        80            e31ede0              getty ttyraw
 382      1 3   0        80            e340040               qmgr kqueue
 375      1 3   0         0            e4ebbe0               cron vm_map
 364      1 3   0        80            e4eb920              inetd kqueue
 346      1 3   0        80            e340e00             master kqueue
 117      1 3   0        80            e341640            syslogd kqueue
 1        1 3   0        80            e31e2e0               init wait
 0       36 3   0       200            e3405c0              nfsio nfsiod
 0       35 3   0       200            e340880              nfsio nfsiod
 0       34 3   0       200            e340b40              nfsio nfsiod
 0       33 3   0       200            e3410c0              nfsio nfsiod
 0       32 3   0       200            e341380            physiod physiod
 0       31 3   0       200            e31eb20           aiodoned aiodoned
 0       30 3   0       200            e31e5a0            ioflush syncer
 0       29 3   0       200            e31e860           pgdaemon pgdaemon
 0       26 3   0       200            e31e020              unpgc unpgc
 0       25 3   0       200            e341bc0        vmem_rehash vmem_rehash
 0       16 3   0       200            e31f360           scsibus0 sccomp
 0       15 3   0       200            e31f620         pmfsuspend pmfsuspend
 0       14 3   0       200            e31f8e0           pmfevent pmfevent
 0       13 3   0       200            e31fba0         sopendfree sopendfr
 0       12 3   0       200            e304000           nfssilly nfssilly
 0       11 3   0       200            e3042c0            cachegc cachegc
 0       10 3   0       200            e304580              vrele vrele
 0        9 3   0       200            e304840             vdrain vdrain
 0        8 3   0       200            e304b00          modunload mod_unld
 0        7 3   0       200            e304dc0            xcall/0 xcall
 0        6 1   0       200            e305080          softser/0
 0        5 1   0       200            e305340          softclk/0
 0        4 1   0       200            e305600          softbio/0
 0        3 1   0       200            e3058c0          softnet/0
 0    >   2 7   0       201            e305b80             idle/0
 0        1 3   0       200            e1b5b20            swapper uvm

From: John Carr <jcarr@poethecat.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-sun3/48432: System hangs under light load
Date: Sun, 08 Dec 2013 11:10:27 -0500

 On 12/08/2013 11:00 AM, Lars Heidieker wrote:
 >   
 >   for a sun3 I could think of the kva being very limited to be a problem,
 >   not so for sun3x.
 >   could be anything most likely something port specific.
 >   can you get some back traces and a dmesg?

 Sure thing. At the point of the 'ps' in the previous email, the system 
 is completely wedged (even the screen session isn't responding), which 
 is a bit worse than usual.  bt and dmesg are below:


 db> bt
 cpu_Debugger(?)
 zs_abort(0) + 6
 zstty_stint(e32fdcc,0) + 8a
 zsc_intr_hard(e32fdc0) + a8
 zshard(e32fdc0) + c
 isr_autovec(0,1,e00457a,e303800,20000e00) + 6e
 m68k_compat_13_sigreturn13_stub(e305b80) + 68
 lwp_trampoline() + e

 db> dmesg
 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
      2006, 2007, 2008, 2009, 2010, 2011, 2012
      The NetBSD Foundation, Inc.  All rights reserved.
 Copyright (c) 1982, 1986, 1989, 1991, 1993
      The Regents of the University of California.  All rights reserved.

 NetBSD 6.1.2 (GENERIC)
 Model: sun3 60
 fpu: mc68882
 total memory = 24576 KB
 avail memory = 21984 KB
 timecounter: Timecounters tick every 10.000 msec
 mainbus0 (root)
 obio0 at mainbus0
 zsc0 at obio0 addr 0x0 ipl 6: (softpri 3)
 kbd0 at zsc0 channel 0: baud rate 1200
 ms0 at zsc0 channel 1: baud rate 1200
 zsc1 at obio0 addr 0x20000 ipl 6: (softpri 3)
 zstty0 at zsc1 channel 0 (console i/o)
 zstty1 at zsc1 channel 1
 eeprom0 at obio0 addr 0x40000
 oclock0 at obio0 addr 0x60000 ipl 5: intersil7170
 memerr0 at obio0 addr 0x80000 ipl 7: (Parity memory)
 intreg0 at obio0 addr 0xa0000
 le0 at obio0 addr 0x120000 ipl 3: address 08:00:01:01:73:55
 le0: 8 receive buffers, 2 transmit buffers
 si0 at obio0 addr 0x140000 ipl 2: options=0xf
 scsibus0 at si0: 8 targets, 8 luns per target
 obmem0 at mainbus0
 cgfour0 at obmem0 addr 0xff300000 (1152x900)
 enabling interrupts
 timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 scsibus0: waiting 2 seconds for devices to settle...
 sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST32155N, 0594> disk fixed
 sd0: 2049 MB, 4177 cyl, 8 head, 125 sec, 512 bytes/sect x 4197405 sectors
 sd0: async, 8-bit transfers
 boot device: sd0a
 root on sd0a dumps on sd0b
 root file system type: ffs

From: John Carr <jcarr@poethecat.com>
To: gnats-bugs@NetBSD.org
Cc: Lars Heidieker <lars@heidieker.de>, port-sun3-maintainer@netbsd.org, 
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: port-sun3/48432: System hangs under light load
Date: Sun, 08 Dec 2013 11:16:56 -0500

 On 12/08/2013 11:00 AM, Lars Heidieker wrote:
 >   On 12/08/2013 01:35 AM, matthew green wrote:
 >
 >   >  i wonder if this is another symptom of vmem/kmem changes in netbsd-6.
 >   
 >   for a sun3 I could think of the kva being very limited to be a problem,
 >   not so for sun3x.
 >   could be anything most likely something port specific.
 >   can you get some back traces and a dmesg?

 Sure thing. At the point of the 'ps' in the previous email, the system 
 is completely wedged (even the screen session isn't responding), which 
 is a bit worse than usual.  bt and dmesg are below:


 db> bt
 cpu_Debugger(?)
 zs_abort(0) + 6
 zstty_stint(e32fdcc,0) + 8a
 zsc_intr_hard(e32fdc0) + a8
 zshard(e32fdc0) + c
 isr_autovec(0,1,e00457a,e303800,20000e00) + 6e
 m68k_compat_13_sigreturn13_stub(e305b80) + 68
 lwp_trampoline() + e

 db> dmesg
 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
      2006, 2007, 2008, 2009, 2010, 2011, 2012
      The NetBSD Foundation, Inc.  All rights reserved.
 Copyright (c) 1982, 1986, 1989, 1991, 1993
      The Regents of the University of California.  All rights reserved.

 NetBSD 6.1.2 (GENERIC)
 Model: sun3 60
 fpu: mc68882
 total memory = 24576 KB
 avail memory = 21984 KB
 timecounter: Timecounters tick every 10.000 msec
 mainbus0 (root)
 obio0 at mainbus0
 zsc0 at obio0 addr 0x0 ipl 6: (softpri 3)
 kbd0 at zsc0 channel 0: baud rate 1200
 ms0 at zsc0 channel 1: baud rate 1200
 zsc1 at obio0 addr 0x20000 ipl 6: (softpri 3)
 zstty0 at zsc1 channel 0 (console i/o)
 zstty1 at zsc1 channel 1
 eeprom0 at obio0 addr 0x40000
 oclock0 at obio0 addr 0x60000 ipl 5: intersil7170
 memerr0 at obio0 addr 0x80000 ipl 7: (Parity memory)
 intreg0 at obio0 addr 0xa0000
 le0 at obio0 addr 0x120000 ipl 3: address 08:00:01:01:73:55
 le0: 8 receive buffers, 2 transmit buffers
 si0 at obio0 addr 0x140000 ipl 2: options=0xf
 scsibus0 at si0: 8 targets, 8 luns per target
 obmem0 at mainbus0
 cgfour0 at obmem0 addr 0xff300000 (1152x900)
 enabling interrupts
 timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 scsibus0: waiting 2 seconds for devices to settle...
 sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST32155N, 0594> disk fixed
 sd0: 2049 MB, 4177 cyl, 8 head, 125 sec, 512 bytes/sect x 4197405 sectors
 sd0: async, 8-bit transfers
 boot device: sd0a
 root on sd0a dumps on sd0b
 root file system type: ffs

From: John Carr <jcarr@poethecat.com>
To: gnats-bugs@NetBSD.org
Cc: David Holland <dholland-bugs@netbsd.org>, 
 port-sun3-maintainer@netbsd.org, gnats-admin@netbsd.org, 
 netbsd-bugs@netbsd.org
Subject: Re: port-sun3/48432: System hangs under light load
Date: Sun, 08 Dec 2013 11:15:45 -0500

 On 12/07/2013 11:35 AM, David Holland wrote:
 >   On Sat, Dec 07, 2013 at 03:40:00PM +0000, jcarr@poethecat.com wrote:
 >    > My guess would be that this is a kernel issue, but I'm not sure
 >    > what info to provide. More than happy to do so.
 >   
 >   First thing that comes to mind: is it unable to create processes (that
 >   is, things hang in fork) or are processes unable to exit, so that
 >   after a while the process table fills?
 >   
 >   Can you get ps output, perhaps from ddb?

 Sure thing. It's below. I'll leave it at the debugger prompt for later 
 troubleshooting.


 Stopped in pid 0.2 (system) at  netbsd:cpu_Debugger+0x6: unlk    a6
 db> ps
 PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 1081     1 3   0         0            e542600               cron vm_map
 1112     1 3   0        80            e543680               sshd netio
 539      1 3   0         0            e543c00               sshd vm_map
 970      1 3   0         0            e542b80               cron vm_map
 527      1 3   0         0            e4ea060               cron vm_map
 941      1 3   0         0            e5428c0               cron vm_map
 767      1 3   0        80            e341900             pickup kqueue
 758      1 3   0   1000000            e5433c0                csh vm_map
 945      1 3   0        80            e542e40               sshd select
 1008     1 3   0         0            e543940               cron vm_map
 903      1 3   0         0            e31f0a0              login vm_map
 722      1 3   0         0            e543100                csh vm_map
 558      1 3   0        80            e4ea5e0                top ttyout
 457      1 3   0        80            e4eb3a0                csh pause
 467      1 3   0         0            e4ea320       screen-4.0.3 vm_map
 468      1 3   0        80            e4eae20       screen-4.0.3 pause
 445      1 3   0        80            e4ea8a0                csh pause
 66       1 3   0        80            e4eb660                 su wait
 451      1 3   0        80            e4eb0e0                 sh wait
 452      1 3   0        80            e4eab60              login wait
 392      1 3   0        80            e340300            telnetd select
 389      1 3   0        80            e31ede0              getty ttyraw
 382      1 3   0        80            e340040               qmgr kqueue
 375      1 3   0         0            e4ebbe0               cron vm_map
 364      1 3   0        80            e4eb920              inetd kqueue
 346      1 3   0        80            e340e00             master kqueue
 117      1 3   0        80            e341640            syslogd kqueue
 1        1 3   0        80            e31e2e0               init wait
 0       36 3   0       200            e3405c0              nfsio nfsiod
 0       35 3   0       200            e340880              nfsio nfsiod
 0       34 3   0       200            e340b40              nfsio nfsiod
 0       33 3   0       200            e3410c0              nfsio nfsiod
 0       32 3   0       200            e341380            physiod physiod
 0       31 3   0       200            e31eb20           aiodoned aiodoned
 0       30 3   0       200            e31e5a0            ioflush syncer
 0       29 3   0       200            e31e860           pgdaemon pgdaemon
 0       26 3   0       200            e31e020              unpgc unpgc
 0       25 3   0       200            e341bc0        vmem_rehash 
 vmem_rehash
 0       16 3   0       200            e31f360           scsibus0 sccomp
 0       15 3   0       200            e31f620         pmfsuspend pmfsuspend
 0       14 3   0       200            e31f8e0           pmfevent pmfevent
 0       13 3   0       200            e31fba0         sopendfree sopendfr
 0       12 3   0       200            e304000           nfssilly nfssilly
 0       11 3   0       200            e3042c0            cachegc cachegc
 0       10 3   0       200            e304580              vrele vrele
 0        9 3   0       200            e304840             vdrain vdrain
 0        8 3   0       200            e304b00          modunload mod_unld
 0        7 3   0       200            e304dc0            xcall/0 xcall
 0        6 1   0       200            e305080          softser/0
 0        5 1   0       200            e305340          softclk/0
 0        4 1   0       200            e305600          softbio/0
 0        3 1   0       200            e3058c0          softnet/0
 0    >   2 7   0       201            e305b80             idle/0
 0        1 3   0       200            e1b5b20            swapper uvm

State-Changed-From-To: open->feedback
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Sun, 20 Jan 2019 20:55:51 +0000
State-Changed-Why:
we did have some kva starvation problems in -6 that were resolved
by -7 and some fixes were never back ported.  netbsd-6 is now EOL,
but i believe this is fixed in supported releases.

can you check or confirm?  thanks.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.