NetBSD Problem Report #51665
From martin@duskware.de Sun Nov 27 11:38:32 2016
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id DA51C7A26A
for <gnats-bugs@gnats.NetBSD.org>; Sun, 27 Nov 2016 11:38:32 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: shark locks up during nightly run
X-Send-Pr-Version: 3.95
>Number: 51665
>Category: port-arm
>Synopsis: shark locks up during nightly run
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-arm-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Nov 27 11:40:00 +0000 2016
>Closed-Date: Thu Oct 19 13:09:19 +0000 2017
>Last-Modified: Thu Oct 19 13:09:19 +0000 2017
>Originator: Martin Husemann
>Release: NetBSD 7.99.42
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD drowsy.duskware.de 7.99.42 NetBSD 7.99.42 (GENERIC) #28: Tue Nov 22 12:31:52 CET 2016 martin@martins.aprisoft.de:/ssd/src/sys/arch/shark/compile/GENERIC shark
Architecture: earmv4
Machine: shark
>Description:
A shark with 64MB memory installed reproducably locks up during the nightly
security run. An identical machine with 96MB ram works fine.
The only difference between the machines is:
NetBSD 7.99.42 (GENERIC) #28: Tue Nov 22 12:31:52 CET 2016
martin@martins.aprisoft.de:/ssd/src/sys/arch/shark/compile/GENERIC
-total memory = 65536 KB
-avail memory = 58916 KB
+total memory = 98304 KB
+avail memory = 91012 KB
sysctl_createv: sysctl_create(machine_arch) returned 17
timecounter: Timecounters tick every 15.625 msec
mainbus0 (root)
@@ -32,7 +32,7 @@
openprom at ofbus0 not configured
options at ofbus0 not configured
aliases at ofbus0 not configured
-memory@e000000 at ofbus0 not configured
+memory@f000000 at ofbus0 not configured
mmu at ofbus0 not configured
ofbus2 at ofbus0 (vlbus)
ofisa0 at ofbus2 (isa)
@@ -52,7 +52,7 @@
scr0 at ofisascr0
com1 at ofisa0 (ir@i2f8): ns16550a, working fifo
cs0 at ofisa0 (ethernet@i300): CRUS,CS8900
-cs0: CS8900 rev. F, address 08:00:2b:81:62:5e, media UTP
+cs0: CS8900 rev. F, address 08:00:2b:81:65:72, media UTP
joy0 at ofisa0 (game@i201): ESST,game
joy0: joystick not connected
midi@i330 at ofisa0 not configured
@@ -90,13 +90,11 @@
clock: hz=64 stathz = 0 profhz = 0
timecounter: Timecounter "i8253" frequency 1193182 Hz quality 100
wd0 at atabus0 drive 0
-wd0: <IBM-DBCA-206480>
-wd0: drive supports 16-sector PIO transfers, LBA addressing
-wd0: 6194 MB, 13424 cyl, 15 head, 63 sec, 512 bytes/sect x 12685680 sectors
-wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
+wd0: <SAMSUNG HM120IC>
+wd0: drive supports 16-sector PIO transfers, LBA48 addressing
+wd0: 111 GB, 232581 cyl, 16 head, 63 sec, 512 bytes/sect x 234441648 sectors
+wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
boot device: wd0
root on wd0a dumps on wd0b
Full dmesg of the "low memory" machine:
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 7.99.42 (GENERIC) #28: Tue Nov 22 12:31:52 CET 2016
martin@martins.aprisoft.de:/ssd/src/sys/arch/shark/compile/GENERIC
total memory = 65536 KB
avail memory = 58916 KB
sysctl_createv: sysctl_create(machine_arch) returned 17
timecounter: Timecounters tick every 15.625 msec
mainbus0 (root)
cpu0 at mainbus0 core 0: SA-110 step S (SA-1 V4 core)
cpu0: DC enabled IC enabled WB enabled EABT
cpu0: 16KB/32B 32-way L1 VIVT Instruction cache
cpu0: 16KB/32B 32-way write-back L1 VIVT Data cache
ofbus0 (root)
ofbus1 at ofbus0 (packages)
client-services at ofbus1 not configured
terminal-emulator at ofbus1 not configured
stringio at ofbus1 not configured
deblocker at ofbus1 not configured
obp-tftp at ofbus1 not configured
ufs-file-system at ofbus1 not configured
fat-file-system at ofbus1 not configured
iso9660-file-system at ofbus1 not configured
disk-label at ofbus1 not configured
dropin-file-system at ofbus1 not configured
sound.wav at ofbus1 not configured
chosen at ofbus0 not configured
openprom at ofbus0 not configured
options at ofbus0 not configured
aliases at ofbus0 not configured
memory@e000000 at ofbus0 not configured
mmu at ofbus0 not configured
ofbus2 at ofbus0 (vlbus)
ofisa0 at ofbus2 (isa)
dma-controller@i00 at ofisa0 not configured
interrupt-controller@i20 at ofisa0 not configured
timer@i40 at ofisa0 not configured
configuration@i15c at ofisa0 not configured
com0 at ofisa0 (serial@i3f8): ns16550a, working fifo
com0: console
lpt0 at ofisa0 (parallel@i378)
pckbc0 at ofisa0 (8042@i60)
power@i380 at ofisa0 not configured
ofbus3 at ofisa0 (gpio@i3e0)
eeprom at ofbus3 not configured
ofrtc0 at ofisa0 (rtc@i70): rtc
ofisascr0 at ofisa0 (scr@i24)
scr0 at ofisascr0
com1 at ofisa0 (ir@i2f8): ns16550a, working fifo
cs0 at ofisa0 (ethernet@i300): CRUS,CS8900
cs0: CS8900 rev. F, address 08:00:2b:81:62:5e, media UTP
joy0 at ofisa0 (game@i201): ESST,game
joy0: joystick not connected
midi@i330 at ofisa0 not configured
ess0 at ofisa0 (sound@i220): ESST,es1887-codec
ess0: ESS Technology ES1887 [version 0x688b]
ess0: audio1 interrupting at irq 9
ess0: audio2 interrupting at irq 15
audio0 at ess0: full duplex, playback, capture, mmap, independent
opl0 at ess0: model OPL3
midi0 at opl0: ESS Yamaha OPL3
wdc0 at ofisa0 (ide@i1f0)
atabus0 at wdc0 channel 0
pci at ofbus2 not configured
igsfb0 at ofbus2 (display@it3b0): IGS CyberPro 2010 at 0x06000000
unable to find font Gallant 12x22
igsfb0: 2MB, 1024x768, 8bpp
igsfb0: using 8bpp for X
wsdisplay0 at igsfb0 kbdmux 1
wsmux1: connecting to wsdisplay0
ofrom0 at ofbus0 (flash@7000000): 0x7000000-0x707ffff
ofrom1 at ofbus0 (romcard@10000000): 0x10000000-0x10ffffff
ofbus4 at ofbus0 (cpus)
cpu@0 at ofbus4 not configured
ofbus5 at ofbus0 (udp)
nfs at ofbus5 not configured
spl_masks[0]=ffffffff
spl_masks[1]=ffffffff
spl_masks[2]=ffffffff
spl_masks[3]=ffffffff
spl_masks[4]=ffffffff
spl_masks[5]=ffffbf5f
spl_masks[6]=ffff3d5f
spl_masks[7]=ffff3d47
timecounter: Timecounter "clockinterrupt" frequency 64 Hz quality 0
clock: hz=64 stathz = 0 profhz = 0
timecounter: Timecounter "i8253" frequency 1193182 Hz quality 100
wd0 at atabus0 drive 0
wd0: <IBM-DBCA-206480>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 6194 MB, 13424 cyl, 15 head, 63 sec, 512 bytes/sect x 12685680 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
boot device: wd0
root on wd0a dumps on wd0b
>How-To-Repeat:
s/a
>Fix:
n/a
>Release-Note:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Sun, 27 Nov 2016 20:14:56 +0000
On Sun, Nov 27, 2016 at 11:40:00AM +0000, martin@NetBSD.org wrote:
> A shark with 64MB memory installed reproducably locks up during the nightly
> security run. An identical machine with 96MB ram works fine.
What FS(es) are you using? Also, can you get into ddb, and if so,
what wchans are involved?
--
David A. Holland
dholland@netbsd.org
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Mon, 28 Nov 2016 09:17:28 +0100
On Sun, Nov 27, 2016 at 08:15:02PM +0000, David Holland wrote:
> What FS(es) are you using? Also, can you get into ddb, and if so,
> what wchans are involved?
Yeah, sorry, missed to add some important details.
On the (working) 96M machine:
file system: /dev/rwd0c
format FFSv1
endian little-endian
magic 11954 time Mon Nov 28 07:50:02 2016
superblock location 8192 id [ 4c4ad59b 16a2beb3 ]
cylgrp dynamic inodes 4.4BSD sblock FFSv2 fslevel 4
nbfree 6838896 ndir 9473 nifree 14130460 nffree 18293
ncg 610 size 57561588 blocks 56663655
bsize 16384 shift 14 mask 0xffffc000
fsize 2048 shift 11 mask 0xfffff800
frag 8 shift 3 fsbtodb 2
bpg 11796 fpg 94368 ipg 23296
minfree 5% optim time maxcontig 4 maxbpg 4096
and it is mounted with with log option.
On the non-working 64M machine:
file system: /dev/rwd0c
format FFSv1
endian little-endian
magic 11954 time Mon Nov 28 09:14:43 2016
superblock location 8192 id [ 0 0 ]
cylgrp dynamic inodes 4.4BSD sblock FFSv2 fslevel 4
nbfree 370206 ndir 3231 nifree 1373859 nffree 18669
ncg 195 size 5869867 blocks 5688887
bsize 8192 shift 13 mask 0xffffe000
fsize 1024 shift 10 mask 0xfffffc00
frag 8 shift 3 fsbtodb 1
bpg 3780 fpg 30240 ipg 7296
minfree 5% optim time maxcontig 8 maxbpg 2048
and it is also mounted with "log".
AFAIK shark can not boot from FFSv2 (firmware loads kernel directly from the
filesystem).
I can not break int ddb when it hangs.
Martin
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Tue, 3 Jan 2017 18:08:05 +0000
On Mon, Nov 28, 2016 at 08:20:01AM +0000, Martin Husemann wrote:
> file system: /dev/rwd0c
^
Is that normal for shark?
> superblock location 8192 id [ 0 0 ]
^^^^^^^^^^^^^^^
This (on the small fs) seems odd, since newfs unconditionally does:
sblock.fs_id[0] = (long)tv.tv_sec; /* XXXfvdl huh? */
sblock.fs_id[1] = arc4random() & INT32_MAX;
There isn't any obvious reason why having zero there would cause a
problem, but it's also the only obvious thing so far to chase after.
(Except for "deadlock because out of memory", like other people have
seen recently; but if that were all I'd expect you to be able to get
into ddb.)
--
David A. Holland
dholland@netbsd.org
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Sat, 14 Jan 2017 09:52:16 +0100
On Tue, Jan 03, 2017 at 06:10:01PM +0000, David Holland wrote:
> > file system: /dev/rwd0c
> ^
> Is that normal for shark?
Yes, no MBR.
> > superblock location 8192 id [ 0 0 ]
> ^^^^^^^^^^^^^^^
> This (on the small fs) seems odd, since newfs unconditionally does:
>
> sblock.fs_id[0] = (long)tv.tv_sec; /* XXXfvdl huh? */
> sblock.fs_id[1] = arc4random() & INT32_MAX;
This disk is ancient, so I guess it comes from back when these
#define fs_old_headswitch fs_id[0]
#define fs_old_trkseek fs_id[1]
were still used (but useless).
There is no easy way to update it with in-tree tools, isn't it?
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Tue, 17 Jan 2017 18:43:11 +0100
Now this is funny...
I updated the machine to -current and it survived three nightly runs.
But: now I can make it lock up on demand by running:
cd /usr/tests/bin/ps
atf-run | atf-report
and it hangs here:
Tests root: /usr/tests/bin/ps
t_ps (1/1): 8 test cases
default_columns: [0.478048s] Passed.
duplicate_column: [0.071451s] Passed.
minus_O:
No break into ddb, does not answer pings.
The other shark (with more memory) runs the test just fine:
Tests root: /usr/tests/bin/ps
t_ps (1/1): 8 test cases
default_columns: [17.447618s] Passed.
duplicate_column: [3.776681s] Passed.
minus_O: [6.962482s] Passed.
minus_o: [8.462016s] Passed.
override_heading_all_null: [6.037955s] Passed.
override_heading_embedded_specials: [7.169616s] Passed.
override_heading_simple: [7.478057s] Passed.
override_heading_some_null: [7.134459s] Passed.
[64.606009s]
Summary for 1 test programs:
8 passed test cases.
0 failed test cases.
0 expected failed test cases.
0 skipped test cases.
While updating I noticed that besides the memory installed, another difference
is the very slow root disk.
Martin
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Tue, 17 Jan 2017 18:44:25 +0000
On Sat, Jan 14, 2017 at 08:55:01AM +0000, Martin Husemann wrote:
> On Tue, Jan 03, 2017 at 06:10:01PM +0000, David Holland wrote:
> > > file system: /dev/rwd0c
> > ^
> > Is that normal for shark?
>
> Yes, no MBR.
Using the "all" partition (whether it's c or d) isn't usually
normal... and I dimly recall that it can cause problems, but not
what.
> > > superblock location 8192 id [ 0 0 ]
> > ^^^^^^^^^^^^^^^
> > This (on the small fs) seems odd, since newfs unconditionally does:
> >
> > sblock.fs_id[0] = (long)tv.tv_sec; /* XXXfvdl huh? */
> > sblock.fs_id[1] = arc4random() & INT32_MAX;
>
> This disk is ancient, so I guess it comes from back when these
>
> #define fs_old_headswitch fs_id[0]
> #define fs_old_trkseek fs_id[1]
>
> were still used (but useless).
>
> There is no easy way to update it with in-tree tools, isn't it?
No, but it shouldn't actually matter.
--
David A. Holland
dholland@netbsd.org
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Tue, 17 Jan 2017 18:48:57 +0000
On Tue, Jan 17, 2017 at 05:45:00PM +0000, Martin Husemann wrote:
> Now this is funny...
>
> I updated the machine to -current and it survived three nightly runs.
> But: now I can make it lock up on demand by running:
>
> cd /usr/tests/bin/ps
> atf-run | atf-report
>
> and it hangs here:
>
> Tests root: /usr/tests/bin/ps
>
> t_ps (1/1): 8 test cases
> default_columns: [0.478048s] Passed.
> duplicate_column: [0.071451s] Passed.
> minus_O:
>
>
> No break into ddb, does not answer pings.
Bizarre...
Does it also die without atf-run? That would make it possible to
insert debug prints. (Otherwise atf will just hide them from you for
atf reasons.)
> While updating I noticed that besides the memory installed,
> another difference is the very slow root disk.
That's consistent with it tickling some kind of race, but not very
enlightening :-(
--
David A. Holland
dholland@netbsd.org
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Tue, 17 Jan 2017 20:04:15 +0100
On Tue, Jan 17, 2017 at 06:45:01PM +0000, David Holland wrote:
> Using the "all" partition (whether it's c or d) isn't usually
> normal... and I dimly recall that it can cause problems, but not
> what.
Oh, I thought you were asking about 'c' vs. 'd'.
The 'a' partition starts at offset 0, and I likely just typoed the command;
the dumpfs output for wd0a is identical (of course) - and fstab has / on wd0a.
And for | atf-report - of course now I can't make it hang any more; really
seems to be some kind of race.
Martin
State-Changed-From-To: open->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Thu, 19 Oct 2017 13:09:19 +0000
State-Changed-Why:
The hard lockups did go away.
Main changes : a new power supply, various pmap fixes
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.