NetBSD Problem Report #51665

From martin@duskware.de  Sun Nov 27 11:38:32 2016
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id DA51C7A26A
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 27 Nov 2016 11:38:32 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: shark locks up during nightly run
X-Send-Pr-Version: 3.95

>Number:         51665
>Category:       port-arm
>Synopsis:       shark locks up during nightly run
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-arm-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Nov 27 11:40:00 +0000 2016
>Closed-Date:    Thu Oct 19 13:09:19 +0000 2017
>Last-Modified:  Thu Oct 19 13:09:19 +0000 2017
>Originator:     Martin Husemann
>Release:        NetBSD 7.99.42
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD drowsy.duskware.de 7.99.42 NetBSD 7.99.42 (GENERIC) #28: Tue Nov 22 12:31:52 CET 2016 martin@martins.aprisoft.de:/ssd/src/sys/arch/shark/compile/GENERIC shark
Architecture: earmv4
Machine: shark
>Description:

A shark with 64MB memory installed reproducably locks up during the nightly
security run. An identical machine with 96MB ram works fine.

The only difference between the machines is:


 NetBSD 7.99.42 (GENERIC) #28: Tue Nov 22 12:31:52 CET 2016
 	martin@martins.aprisoft.de:/ssd/src/sys/arch/shark/compile/GENERIC
-total memory = 65536 KB
-avail memory = 58916 KB
+total memory = 98304 KB
+avail memory = 91012 KB
 sysctl_createv: sysctl_create(machine_arch) returned 17
 timecounter: Timecounters tick every 15.625 msec
 mainbus0 (root)
@@ -32,7 +32,7 @@
 openprom at ofbus0 not configured
 options at ofbus0 not configured
 aliases at ofbus0 not configured
-memory@e000000 at ofbus0 not configured
+memory@f000000 at ofbus0 not configured
 mmu at ofbus0 not configured
 ofbus2 at ofbus0 (vlbus)
 ofisa0 at ofbus2 (isa)
@@ -52,7 +52,7 @@
 scr0 at ofisascr0
 com1 at ofisa0 (ir@i2f8): ns16550a, working fifo
 cs0 at ofisa0 (ethernet@i300): CRUS,CS8900
-cs0: CS8900 rev. F, address 08:00:2b:81:62:5e, media UTP
+cs0: CS8900 rev. F, address 08:00:2b:81:65:72, media UTP
 joy0 at ofisa0 (game@i201): ESST,game
 joy0: joystick not connected
 midi@i330 at ofisa0 not configured
@@ -90,13 +90,11 @@
 clock: hz=64 stathz = 0 profhz = 0
 timecounter: Timecounter "i8253" frequency 1193182 Hz quality 100
 wd0 at atabus0 drive 0
-wd0: <IBM-DBCA-206480>
-wd0: drive supports 16-sector PIO transfers, LBA addressing
-wd0: 6194 MB, 13424 cyl, 15 head, 63 sec, 512 bytes/sect x 12685680 sectors
-wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
+wd0: <SAMSUNG HM120IC>
+wd0: drive supports 16-sector PIO transfers, LBA48 addressing
+wd0: 111 GB, 232581 cyl, 16 head, 63 sec, 512 bytes/sect x 234441648 sectors
+wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
 boot device: wd0
 root on wd0a dumps on wd0b


Full dmesg of the "low memory" machine:

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 7.99.42 (GENERIC) #28: Tue Nov 22 12:31:52 CET 2016
	martin@martins.aprisoft.de:/ssd/src/sys/arch/shark/compile/GENERIC
total memory = 65536 KB
avail memory = 58916 KB
sysctl_createv: sysctl_create(machine_arch) returned 17
timecounter: Timecounters tick every 15.625 msec
mainbus0 (root)
cpu0 at mainbus0 core 0: SA-110 step S (SA-1 V4 core)
cpu0: DC enabled IC enabled WB enabled EABT
cpu0: 16KB/32B 32-way L1 VIVT Instruction cache
cpu0: 16KB/32B 32-way write-back L1 VIVT Data cache
ofbus0 (root)
ofbus1 at ofbus0 (packages)
client-services at ofbus1 not configured
terminal-emulator at ofbus1 not configured
stringio at ofbus1 not configured
deblocker at ofbus1 not configured
obp-tftp at ofbus1 not configured
ufs-file-system at ofbus1 not configured
fat-file-system at ofbus1 not configured
iso9660-file-system at ofbus1 not configured
disk-label at ofbus1 not configured
dropin-file-system at ofbus1 not configured
sound.wav at ofbus1 not configured
chosen at ofbus0 not configured
openprom at ofbus0 not configured
options at ofbus0 not configured
aliases at ofbus0 not configured
memory@e000000 at ofbus0 not configured
mmu at ofbus0 not configured
ofbus2 at ofbus0 (vlbus)
ofisa0 at ofbus2 (isa)
dma-controller@i00 at ofisa0 not configured
interrupt-controller@i20 at ofisa0 not configured
timer@i40 at ofisa0 not configured
configuration@i15c at ofisa0 not configured
com0 at ofisa0 (serial@i3f8): ns16550a, working fifo
com0: console
lpt0 at ofisa0 (parallel@i378)
pckbc0 at ofisa0 (8042@i60)
power@i380 at ofisa0 not configured
ofbus3 at ofisa0 (gpio@i3e0)
eeprom at ofbus3 not configured
ofrtc0 at ofisa0 (rtc@i70): rtc
ofisascr0 at ofisa0 (scr@i24)
scr0 at ofisascr0
com1 at ofisa0 (ir@i2f8): ns16550a, working fifo
cs0 at ofisa0 (ethernet@i300): CRUS,CS8900
cs0: CS8900 rev. F, address 08:00:2b:81:62:5e, media UTP
joy0 at ofisa0 (game@i201): ESST,game
joy0: joystick not connected
midi@i330 at ofisa0 not configured
ess0 at ofisa0 (sound@i220): ESST,es1887-codec
ess0: ESS Technology ES1887 [version 0x688b]
ess0: audio1 interrupting at irq 9
ess0: audio2 interrupting at irq 15
audio0 at ess0: full duplex, playback, capture, mmap, independent
opl0 at ess0: model OPL3
midi0 at opl0: ESS Yamaha OPL3
wdc0 at ofisa0 (ide@i1f0)
atabus0 at wdc0 channel 0
pci at ofbus2 not configured
igsfb0 at ofbus2 (display@it3b0): IGS CyberPro 2010 at 0x06000000
unable to find font Gallant 12x22
igsfb0: 2MB, 1024x768, 8bpp
igsfb0: using 8bpp for X
wsdisplay0 at igsfb0 kbdmux 1
wsmux1: connecting to wsdisplay0
ofrom0 at ofbus0 (flash@7000000): 0x7000000-0x707ffff
ofrom1 at ofbus0 (romcard@10000000): 0x10000000-0x10ffffff
ofbus4 at ofbus0 (cpus)
cpu@0 at ofbus4 not configured
ofbus5 at ofbus0 (udp)
nfs at ofbus5 not configured
spl_masks[0]=ffffffff
spl_masks[1]=ffffffff
spl_masks[2]=ffffffff
spl_masks[3]=ffffffff
spl_masks[4]=ffffffff
spl_masks[5]=ffffbf5f
spl_masks[6]=ffff3d5f
spl_masks[7]=ffff3d47
timecounter: Timecounter "clockinterrupt" frequency 64 Hz quality 0
clock: hz=64 stathz = 0 profhz = 0
timecounter: Timecounter "i8253" frequency 1193182 Hz quality 100
wd0 at atabus0 drive 0
wd0: <IBM-DBCA-206480>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 6194 MB, 13424 cyl, 15 head, 63 sec, 512 bytes/sect x 12685680 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
boot device: wd0
root on wd0a dumps on wd0b


>How-To-Repeat:
s/a

>Fix:
n/a

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Sun, 27 Nov 2016 20:14:56 +0000

 On Sun, Nov 27, 2016 at 11:40:00AM +0000, martin@NetBSD.org wrote:
  > A shark with 64MB memory installed reproducably locks up during the nightly
  > security run. An identical machine with 96MB ram works fine.

 What FS(es) are you using? Also, can you get into ddb, and if so,
 what wchans are involved?

 -- 
 David A. Holland
 dholland@netbsd.org

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Mon, 28 Nov 2016 09:17:28 +0100

 On Sun, Nov 27, 2016 at 08:15:02PM +0000, David Holland wrote:
 >  What FS(es) are you using? Also, can you get into ddb, and if so,
 >  what wchans are involved?

 Yeah, sorry, missed to add some important details.

 On the (working) 96M machine:

 file system: /dev/rwd0c
 format  FFSv1
 endian  little-endian
 magic   11954           time    Mon Nov 28 07:50:02 2016
 superblock location     8192    id      [ 4c4ad59b 16a2beb3 ]
 cylgrp  dynamic inodes  4.4BSD  sblock  FFSv2   fslevel 4
 nbfree  6838896 ndir    9473    nifree  14130460        nffree  18293
 ncg     610     size    57561588        blocks  56663655
 bsize   16384   shift   14      mask    0xffffc000
 fsize   2048    shift   11      mask    0xfffff800
 frag    8       shift   3       fsbtodb 2
 bpg     11796   fpg     94368   ipg     23296
 minfree 5%      optim   time    maxcontig 4     maxbpg  4096

 and it is mounted with with log option.

 On the non-working 64M machine:

 file system: /dev/rwd0c
 format  FFSv1
 endian  little-endian
 magic   11954           time    Mon Nov 28 09:14:43 2016
 superblock location     8192    id      [ 0 0 ]
 cylgrp  dynamic inodes  4.4BSD  sblock  FFSv2   fslevel 4
 nbfree  370206  ndir    3231    nifree  1373859 nffree  18669
 ncg     195     size    5869867 blocks  5688887
 bsize   8192    shift   13      mask    0xffffe000
 fsize   1024    shift   10      mask    0xfffffc00
 frag    8       shift   3       fsbtodb 1
 bpg     3780    fpg     30240   ipg     7296
 minfree 5%      optim   time    maxcontig 8     maxbpg  2048

 and it is also mounted with "log".

 AFAIK shark can not boot from FFSv2 (firmware loads kernel directly from the
 filesystem).

 I can not break int ddb when it hangs.

 Martin

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Tue, 3 Jan 2017 18:08:05 +0000

 On Mon, Nov 28, 2016 at 08:20:01AM +0000, Martin Husemann wrote:
  >  file system: /dev/rwd0c
                           ^
 Is that normal for shark?

  >  superblock location     8192    id      [ 0 0 ]
                                     ^^^^^^^^^^^^^^^
 This (on the small fs) seems odd, since newfs unconditionally does:

         sblock.fs_id[0] = (long)tv.tv_sec;      /* XXXfvdl huh? */
         sblock.fs_id[1] = arc4random() & INT32_MAX;

 There isn't any obvious reason why having zero there would cause a
 problem, but it's also the only obvious thing so far to chase after.

 (Except for "deadlock because out of memory", like other people have
 seen recently; but if that were all I'd expect you to be able to get
 into ddb.)

 -- 
 David A. Holland
 dholland@netbsd.org

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Sat, 14 Jan 2017 09:52:16 +0100

 On Tue, Jan 03, 2017 at 06:10:01PM +0000, David Holland wrote:
 >   >  file system: /dev/rwd0c
 >                            ^
 >  Is that normal for shark?

 Yes, no MBR.

 >   >  superblock location     8192    id      [ 0 0 ]
 >                                      ^^^^^^^^^^^^^^^
 >  This (on the small fs) seems odd, since newfs unconditionally does:
 >  
 >          sblock.fs_id[0] = (long)tv.tv_sec;      /* XXXfvdl huh? */
 >          sblock.fs_id[1] = arc4random() & INT32_MAX;

 This disk is ancient, so I guess it comes from back when these

 #define      fs_old_headswitch       fs_id[0]
 #define      fs_old_trkseek  fs_id[1]

 were still used (but useless).

 There is no easy way to update it with in-tree tools, isn't it?

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Tue, 17 Jan 2017 18:43:11 +0100

 Now this is funny...

 I updated the machine to -current and it survived three nightly runs.
 But: now I can make it lock up on demand by running:

 	cd /usr/tests/bin/ps
 	atf-run | atf-report

 and it hangs here:

 Tests root: /usr/tests/bin/ps

 t_ps (1/1): 8 test cases
     default_columns: [0.478048s] Passed.
     duplicate_column: [0.071451s] Passed.
     minus_O: 


 No break into ddb, does not answer pings.

 The other shark (with more memory) runs the test just fine:

 Tests root: /usr/tests/bin/ps

 t_ps (1/1): 8 test cases
     default_columns: [17.447618s] Passed.
     duplicate_column: [3.776681s] Passed.
     minus_O: [6.962482s] Passed.
     minus_o: [8.462016s] Passed.
     override_heading_all_null: [6.037955s] Passed.
     override_heading_embedded_specials: [7.169616s] Passed.
     override_heading_simple: [7.478057s] Passed.
     override_heading_some_null: [7.134459s] Passed.
 [64.606009s]

 Summary for 1 test programs:
     8 passed test cases.
     0 failed test cases.
     0 expected failed test cases.
     0 skipped test cases.


 While updating I noticed that besides the memory installed, another difference
 is the very slow root disk.

 Martin

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Tue, 17 Jan 2017 18:44:25 +0000

 On Sat, Jan 14, 2017 at 08:55:01AM +0000, Martin Husemann wrote:
  >  On Tue, Jan 03, 2017 at 06:10:01PM +0000, David Holland wrote:
  >  >   >  file system: /dev/rwd0c
  >  >                            ^
  >  >  Is that normal for shark?
  >  
  >  Yes, no MBR.

 Using the "all" partition (whether it's c or d) isn't usually
 normal... and I dimly recall that it can cause problems, but not
 what.

  >  >   >  superblock location     8192    id      [ 0 0 ]
  >  >                                      ^^^^^^^^^^^^^^^
  >  >  This (on the small fs) seems odd, since newfs unconditionally does:
  >  >  
  >  >          sblock.fs_id[0] = (long)tv.tv_sec;      /* XXXfvdl huh? */
  >  >          sblock.fs_id[1] = arc4random() & INT32_MAX;
  >  
  >  This disk is ancient, so I guess it comes from back when these
  >  
  >  #define      fs_old_headswitch       fs_id[0]
  >  #define      fs_old_trkseek  fs_id[1]
  >  
  >  were still used (but useless).
  >  
  >  There is no easy way to update it with in-tree tools, isn't it?

 No, but it shouldn't actually matter.

 -- 
 David A. Holland
 dholland@netbsd.org

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Tue, 17 Jan 2017 18:48:57 +0000

 On Tue, Jan 17, 2017 at 05:45:00PM +0000, Martin Husemann wrote:
  >  Now this is funny...
  >  
  >  I updated the machine to -current and it survived three nightly runs.
  >  But: now I can make it lock up on demand by running:
  >  
  >  	cd /usr/tests/bin/ps
  >  	atf-run | atf-report
  >  
  >  and it hangs here:
  >  
  >  Tests root: /usr/tests/bin/ps
  >  
  >  t_ps (1/1): 8 test cases
  >      default_columns: [0.478048s] Passed.
  >      duplicate_column: [0.071451s] Passed.
  >      minus_O: 
  >  
  >  
  >  No break into ddb, does not answer pings.

 Bizarre...

 Does it also die without atf-run? That would make it possible to
 insert debug prints. (Otherwise atf will just hide them from you for
 atf reasons.)

  > While updating I noticed that besides the memory installed,
  > another difference is the very slow root disk.

 That's consistent with it tickling some kind of race, but not very
 enlightening :-(

 -- 
 David A. Holland
 dholland@netbsd.org

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-arm/51665: shark locks up during nightly run
Date: Tue, 17 Jan 2017 20:04:15 +0100

 On Tue, Jan 17, 2017 at 06:45:01PM +0000, David Holland wrote:
 >  Using the "all" partition (whether it's c or d) isn't usually
 >  normal... and I dimly recall that it can cause problems, but not
 >  what.

 Oh, I thought you were asking about 'c' vs. 'd'.

 The 'a' partition starts at offset 0, and I likely just typoed the command;
 the dumpfs output for wd0a is identical (of course) - and fstab has / on wd0a.

 And for | atf-report - of course now I can't make it hang any more; really
 seems to be some kind of race.

 Martin

State-Changed-From-To: open->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Thu, 19 Oct 2017 13:09:19 +0000
State-Changed-Why:
The hard lockups did go away.
Main changes : a new power supply, various pmap fixes


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.