NetBSD Problem Report #48432
From www@NetBSD.org Sat Dec 7 15:36:20 2013
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id AA0F6A641B
for <gnats-bugs@gnats.NetBSD.org>; Sat, 7 Dec 2013 15:36:20 +0000 (UTC)
Message-Id: <20131207153618.E258DA644F@mollari.NetBSD.org>
Date: Sat, 7 Dec 2013 15:36:18 +0000 (UTC)
From: jcarr@poethecat.com
Reply-To: jcarr@poethecat.com
To: gnats-bugs@NetBSD.org
Subject: System hangs under light load
X-Send-Pr-Version: www-1.0
>Number: 48432
>Category: port-sun3
>Synopsis: System hangs under light load
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: port-sun3-maintainer
>State: feedback
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Dec 07 15:40:00 +0000 2013
>Closed-Date:
>Last-Modified: Sun Jan 20 20:55:51 +0000 2019
>Originator: John Carr
>Release: NetBSD/sun3-6.1.2
>Organization:
>Environment:
NetBSD moonbeam 6.1.2 NetBSD 6.1.2 (GENERIC) sun3
>Description:
(noticed same issue on an upgrade from NetBSD-5.2.1 to 6.1.2 as well as a new install of 6.1.2. Additionally, seen on 6.0, 6.1, 6.1.1)
System will, essentially, just hang, unable to create new processes with no syslog output. Existing processes will continue, but any new ones created (like cron instances, shells in screen, etc.) will not finish; they'll spawn, but hang. Additionally, they cannot be killed. Within a pretty short amount of time, the system grinds itself down as processes get backed up and nothing will run. I've recreated this happen on a system running nothing more than screen with top in one window. Additionally, it's happened in single user when running a few commands in a row. I've not seen a commonality of how/when this happens yet. The only recourse is to break to debugger and sync, halt, or reboot.
That said, it can be easily recreated but not in a consistent manner. However, I've yet to have the system run for more than about 10 minutes of consistent use (edit of files, man, ftp, etc.) before it hangs. Essentially, it's not a usable system for much more than login and top.
My guess would be that this is a kernel issue, but I'm not sure what info to provide. More than happy to do so.
>How-To-Repeat:
Use the system.
>Fix:
N/A
>Release-Note:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-sun3/48432: System hangs under light load
Date: Sat, 7 Dec 2013 16:34:49 +0000
On Sat, Dec 07, 2013 at 03:40:00PM +0000, jcarr@poethecat.com wrote:
> My guess would be that this is a kernel issue, but I'm not sure
> what info to provide. More than happy to do so.
First thing that comes to mind: is it unable to create processes (that
is, things hang in fork) or are processes unable to exit, so that
after a while the process table fills?
Can you get ps output, perhaps from ddb?
--
David A. Holland
dholland@netbsd.org
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org, para@netbsd.org
Cc: port-sun3-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: port-sun3/48432: System hangs under light load
Date: Sun, 08 Dec 2013 11:30:01 +1100
i wonder if this is another symptom of vmem/kmem changes in netbsd-6.
Lars?
.mrg.
From: Lars Heidieker <lars@heidieker.de>
To: gnats-bugs@NetBSD.org, port-sun3-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, jcarr@poethecat.com
Cc:
Subject: Re: port-sun3/48432: System hangs under light load
Date: Sun, 08 Dec 2013 16:57:50 +0100
On 12/08/2013 01:35 AM, matthew green wrote:
> The following reply was made to PR port-sun3/48432; it has been noted by GNATS.
>
> From: matthew green <mrg@eterna.com.au>
> To: gnats-bugs@NetBSD.org, para@netbsd.org
> Cc: port-sun3-maintainer@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org
> Subject: re: port-sun3/48432: System hangs under light load
> Date: Sun, 08 Dec 2013 11:30:01 +1100
>
> i wonder if this is another symptom of vmem/kmem changes in netbsd-6.
>
> Lars?
>
>
> .mrg.
>
>
for a sun3 I could think of the kva being very limited to be a problem,
not so for sun3x.
could be anything most likely something port specific.
can you get some back traces and a dmesg?
Lars
From: John Carr <jcarr@poethecat.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-sun3/48432: System hangs under light load
Date: Sun, 08 Dec 2013 10:54:22 -0500
On 12/07/2013 11:35 AM, David Holland wrote:
> On Sat, Dec 07, 2013 at 03:40:00PM +0000, jcarr@poethecat.com wrote:
> > My guess would be that this is a kernel issue, but I'm not sure
> > what info to provide. More than happy to do so.
>
> First thing that comes to mind: is it unable to create processes (that
> is, things hang in fork) or are processes unable to exit, so that
> after a while the process table fills?
>
> Can you get ps output, perhaps from ddb?
Sure thing. It's below. I'll leave it at the debugger prompt for later
troubleshooting.
Stopped in pid 0.2 (system) at netbsd:cpu_Debugger+0x6: unlk a6
db> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
1081 1 3 0 0 e542600 cron vm_map
1112 1 3 0 80 e543680 sshd netio
539 1 3 0 0 e543c00 sshd vm_map
970 1 3 0 0 e542b80 cron vm_map
527 1 3 0 0 e4ea060 cron vm_map
941 1 3 0 0 e5428c0 cron vm_map
767 1 3 0 80 e341900 pickup kqueue
758 1 3 0 1000000 e5433c0 csh vm_map
945 1 3 0 80 e542e40 sshd select
1008 1 3 0 0 e543940 cron vm_map
903 1 3 0 0 e31f0a0 login vm_map
722 1 3 0 0 e543100 csh vm_map
558 1 3 0 80 e4ea5e0 top ttyout
457 1 3 0 80 e4eb3a0 csh pause
467 1 3 0 0 e4ea320 screen-4.0.3 vm_map
468 1 3 0 80 e4eae20 screen-4.0.3 pause
445 1 3 0 80 e4ea8a0 csh pause
66 1 3 0 80 e4eb660 su wait
451 1 3 0 80 e4eb0e0 sh wait
452 1 3 0 80 e4eab60 login wait
392 1 3 0 80 e340300 telnetd select
389 1 3 0 80 e31ede0 getty ttyraw
382 1 3 0 80 e340040 qmgr kqueue
375 1 3 0 0 e4ebbe0 cron vm_map
364 1 3 0 80 e4eb920 inetd kqueue
346 1 3 0 80 e340e00 master kqueue
117 1 3 0 80 e341640 syslogd kqueue
1 1 3 0 80 e31e2e0 init wait
0 36 3 0 200 e3405c0 nfsio nfsiod
0 35 3 0 200 e340880 nfsio nfsiod
0 34 3 0 200 e340b40 nfsio nfsiod
0 33 3 0 200 e3410c0 nfsio nfsiod
0 32 3 0 200 e341380 physiod physiod
0 31 3 0 200 e31eb20 aiodoned aiodoned
0 30 3 0 200 e31e5a0 ioflush syncer
0 29 3 0 200 e31e860 pgdaemon pgdaemon
0 26 3 0 200 e31e020 unpgc unpgc
0 25 3 0 200 e341bc0 vmem_rehash vmem_rehash
0 16 3 0 200 e31f360 scsibus0 sccomp
0 15 3 0 200 e31f620 pmfsuspend pmfsuspend
0 14 3 0 200 e31f8e0 pmfevent pmfevent
0 13 3 0 200 e31fba0 sopendfree sopendfr
0 12 3 0 200 e304000 nfssilly nfssilly
0 11 3 0 200 e3042c0 cachegc cachegc
0 10 3 0 200 e304580 vrele vrele
0 9 3 0 200 e304840 vdrain vdrain
0 8 3 0 200 e304b00 modunload mod_unld
0 7 3 0 200 e304dc0 xcall/0 xcall
0 6 1 0 200 e305080 softser/0
0 5 1 0 200 e305340 softclk/0
0 4 1 0 200 e305600 softbio/0
0 3 1 0 200 e3058c0 softnet/0
0 > 2 7 0 201 e305b80 idle/0
0 1 3 0 200 e1b5b20 swapper uvm
From: John Carr <jcarr@poethecat.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-sun3/48432: System hangs under light load
Date: Sun, 08 Dec 2013 11:10:27 -0500
On 12/08/2013 11:00 AM, Lars Heidieker wrote:
>
> for a sun3 I could think of the kva being very limited to be a problem,
> not so for sun3x.
> could be anything most likely something port specific.
> can you get some back traces and a dmesg?
Sure thing. At the point of the 'ps' in the previous email, the system
is completely wedged (even the screen session isn't responding), which
is a bit worse than usual. bt and dmesg are below:
db> bt
cpu_Debugger(?)
zs_abort(0) + 6
zstty_stint(e32fdcc,0) + 8a
zsc_intr_hard(e32fdc0) + a8
zshard(e32fdc0) + c
isr_autovec(0,1,e00457a,e303800,20000e00) + 6e
m68k_compat_13_sigreturn13_stub(e305b80) + 68
lwp_trampoline() + e
db> dmesg
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 6.1.2 (GENERIC)
Model: sun3 60
fpu: mc68882
total memory = 24576 KB
avail memory = 21984 KB
timecounter: Timecounters tick every 10.000 msec
mainbus0 (root)
obio0 at mainbus0
zsc0 at obio0 addr 0x0 ipl 6: (softpri 3)
kbd0 at zsc0 channel 0: baud rate 1200
ms0 at zsc0 channel 1: baud rate 1200
zsc1 at obio0 addr 0x20000 ipl 6: (softpri 3)
zstty0 at zsc1 channel 0 (console i/o)
zstty1 at zsc1 channel 1
eeprom0 at obio0 addr 0x40000
oclock0 at obio0 addr 0x60000 ipl 5: intersil7170
memerr0 at obio0 addr 0x80000 ipl 7: (Parity memory)
intreg0 at obio0 addr 0xa0000
le0 at obio0 addr 0x120000 ipl 3: address 08:00:01:01:73:55
le0: 8 receive buffers, 2 transmit buffers
si0 at obio0 addr 0x140000 ipl 2: options=0xf
scsibus0 at si0: 8 targets, 8 luns per target
obmem0 at mainbus0
cgfour0 at obmem0 addr 0xff300000 (1152x900)
enabling interrupts
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST32155N, 0594> disk fixed
sd0: 2049 MB, 4177 cyl, 8 head, 125 sec, 512 bytes/sect x 4197405 sectors
sd0: async, 8-bit transfers
boot device: sd0a
root on sd0a dumps on sd0b
root file system type: ffs
From: John Carr <jcarr@poethecat.com>
To: gnats-bugs@NetBSD.org
Cc: Lars Heidieker <lars@heidieker.de>, port-sun3-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: port-sun3/48432: System hangs under light load
Date: Sun, 08 Dec 2013 11:16:56 -0500
On 12/08/2013 11:00 AM, Lars Heidieker wrote:
> On 12/08/2013 01:35 AM, matthew green wrote:
>
> > i wonder if this is another symptom of vmem/kmem changes in netbsd-6.
>
> for a sun3 I could think of the kva being very limited to be a problem,
> not so for sun3x.
> could be anything most likely something port specific.
> can you get some back traces and a dmesg?
Sure thing. At the point of the 'ps' in the previous email, the system
is completely wedged (even the screen session isn't responding), which
is a bit worse than usual. bt and dmesg are below:
db> bt
cpu_Debugger(?)
zs_abort(0) + 6
zstty_stint(e32fdcc,0) + 8a
zsc_intr_hard(e32fdc0) + a8
zshard(e32fdc0) + c
isr_autovec(0,1,e00457a,e303800,20000e00) + 6e
m68k_compat_13_sigreturn13_stub(e305b80) + 68
lwp_trampoline() + e
db> dmesg
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 6.1.2 (GENERIC)
Model: sun3 60
fpu: mc68882
total memory = 24576 KB
avail memory = 21984 KB
timecounter: Timecounters tick every 10.000 msec
mainbus0 (root)
obio0 at mainbus0
zsc0 at obio0 addr 0x0 ipl 6: (softpri 3)
kbd0 at zsc0 channel 0: baud rate 1200
ms0 at zsc0 channel 1: baud rate 1200
zsc1 at obio0 addr 0x20000 ipl 6: (softpri 3)
zstty0 at zsc1 channel 0 (console i/o)
zstty1 at zsc1 channel 1
eeprom0 at obio0 addr 0x40000
oclock0 at obio0 addr 0x60000 ipl 5: intersil7170
memerr0 at obio0 addr 0x80000 ipl 7: (Parity memory)
intreg0 at obio0 addr 0xa0000
le0 at obio0 addr 0x120000 ipl 3: address 08:00:01:01:73:55
le0: 8 receive buffers, 2 transmit buffers
si0 at obio0 addr 0x140000 ipl 2: options=0xf
scsibus0 at si0: 8 targets, 8 luns per target
obmem0 at mainbus0
cgfour0 at obmem0 addr 0xff300000 (1152x900)
enabling interrupts
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST32155N, 0594> disk fixed
sd0: 2049 MB, 4177 cyl, 8 head, 125 sec, 512 bytes/sect x 4197405 sectors
sd0: async, 8-bit transfers
boot device: sd0a
root on sd0a dumps on sd0b
root file system type: ffs
From: John Carr <jcarr@poethecat.com>
To: gnats-bugs@NetBSD.org
Cc: David Holland <dholland-bugs@netbsd.org>,
port-sun3-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-sun3/48432: System hangs under light load
Date: Sun, 08 Dec 2013 11:15:45 -0500
On 12/07/2013 11:35 AM, David Holland wrote:
> On Sat, Dec 07, 2013 at 03:40:00PM +0000, jcarr@poethecat.com wrote:
> > My guess would be that this is a kernel issue, but I'm not sure
> > what info to provide. More than happy to do so.
>
> First thing that comes to mind: is it unable to create processes (that
> is, things hang in fork) or are processes unable to exit, so that
> after a while the process table fills?
>
> Can you get ps output, perhaps from ddb?
Sure thing. It's below. I'll leave it at the debugger prompt for later
troubleshooting.
Stopped in pid 0.2 (system) at netbsd:cpu_Debugger+0x6: unlk a6
db> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
1081 1 3 0 0 e542600 cron vm_map
1112 1 3 0 80 e543680 sshd netio
539 1 3 0 0 e543c00 sshd vm_map
970 1 3 0 0 e542b80 cron vm_map
527 1 3 0 0 e4ea060 cron vm_map
941 1 3 0 0 e5428c0 cron vm_map
767 1 3 0 80 e341900 pickup kqueue
758 1 3 0 1000000 e5433c0 csh vm_map
945 1 3 0 80 e542e40 sshd select
1008 1 3 0 0 e543940 cron vm_map
903 1 3 0 0 e31f0a0 login vm_map
722 1 3 0 0 e543100 csh vm_map
558 1 3 0 80 e4ea5e0 top ttyout
457 1 3 0 80 e4eb3a0 csh pause
467 1 3 0 0 e4ea320 screen-4.0.3 vm_map
468 1 3 0 80 e4eae20 screen-4.0.3 pause
445 1 3 0 80 e4ea8a0 csh pause
66 1 3 0 80 e4eb660 su wait
451 1 3 0 80 e4eb0e0 sh wait
452 1 3 0 80 e4eab60 login wait
392 1 3 0 80 e340300 telnetd select
389 1 3 0 80 e31ede0 getty ttyraw
382 1 3 0 80 e340040 qmgr kqueue
375 1 3 0 0 e4ebbe0 cron vm_map
364 1 3 0 80 e4eb920 inetd kqueue
346 1 3 0 80 e340e00 master kqueue
117 1 3 0 80 e341640 syslogd kqueue
1 1 3 0 80 e31e2e0 init wait
0 36 3 0 200 e3405c0 nfsio nfsiod
0 35 3 0 200 e340880 nfsio nfsiod
0 34 3 0 200 e340b40 nfsio nfsiod
0 33 3 0 200 e3410c0 nfsio nfsiod
0 32 3 0 200 e341380 physiod physiod
0 31 3 0 200 e31eb20 aiodoned aiodoned
0 30 3 0 200 e31e5a0 ioflush syncer
0 29 3 0 200 e31e860 pgdaemon pgdaemon
0 26 3 0 200 e31e020 unpgc unpgc
0 25 3 0 200 e341bc0 vmem_rehash
vmem_rehash
0 16 3 0 200 e31f360 scsibus0 sccomp
0 15 3 0 200 e31f620 pmfsuspend pmfsuspend
0 14 3 0 200 e31f8e0 pmfevent pmfevent
0 13 3 0 200 e31fba0 sopendfree sopendfr
0 12 3 0 200 e304000 nfssilly nfssilly
0 11 3 0 200 e3042c0 cachegc cachegc
0 10 3 0 200 e304580 vrele vrele
0 9 3 0 200 e304840 vdrain vdrain
0 8 3 0 200 e304b00 modunload mod_unld
0 7 3 0 200 e304dc0 xcall/0 xcall
0 6 1 0 200 e305080 softser/0
0 5 1 0 200 e305340 softclk/0
0 4 1 0 200 e305600 softbio/0
0 3 1 0 200 e3058c0 softnet/0
0 > 2 7 0 201 e305b80 idle/0
0 1 3 0 200 e1b5b20 swapper uvm
State-Changed-From-To: open->feedback
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Sun, 20 Jan 2019 20:55:51 +0000
State-Changed-Why:
we did have some kva starvation problems in -6 that were resolved
by -7 and some fixes were never back ported. netbsd-6 is now EOL,
but i believe this is fixed in supported releases.
can you check or confirm? thanks.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.