NetBSD Problem Report #53628

From louis@zabrico.com  Mon Sep 24 15:36:08 2018
Return-Path: <louis@zabrico.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 7C3787A183
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 24 Sep 2018 15:36:08 +0000 (UTC)
Message-Id: <20180924153557.3D4E03A71B@maat.zabrico.com>
Date: Mon, 24 Sep 2018 11:35:57 -0400 (EDT)
From: louis@zabrico.com
Reply-To: louis@zabrico.com
To: gnats-bugs@NetBSD.org
Subject: Regular panic. Possibly twa driver.
X-Send-Pr-Version: 3.95

>Number:         53628
>Category:       port-i386
>Synopsis:       Regular panic. Possibly twa driver.
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    port-i386-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Sep 24 15:40:00 +0000 2018
>Last-Modified:  Wed Oct 10 13:20:00 +0000 2018
>Originator:     Louis Guillaume
>Release:        NetBSD 8.0_STABLE
>Environment:
System: NetBSD maat.zabrico.com 8.0_STABLE NetBSD 8.0_STABLE (GENERIC) #4: Fri Aug 31 04:07:15 EDT 2018 louis@maat.zabrico.com:/usr/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
I've been having some regular panics on NetBSD-8/i386 lately (since upgrading from netbsd-7). Kernel is GENERIC, built locally, with no modification. dmesg excerpt is at the bottom of this message.

Unfortunately the dump device is not well set up on here - swap is on raidframe on wedges - I'm starting to believe that was a bad idea. I get this message "dump device bad" in the logs. A picture of the panic message can be found here:

  http://zabrico.com/~louis/IMG_20180921_081608.jpg

Also noteworthy is that I'm using the twa driver with raidframe on gpt/wedges and LVM with FFSv2 partitions and "log" turned on. Others have suggested a bug in the twa driver.

See also:

  http://mail-index.netbsd.org/port-i386/2018/09/22/msg003744.html
  http://mail-index.netbsd.org/current-users/2017/11/15/msg032649.html  

Please let me know if I can provide more details.

--
Louis   




NetBSD 8.0_STABLE (GENERIC) #4: Fri Aug 31 04:07:15 EDT 2018
         louis@xxxxxxxxx:/usr/obj/sys/arch/i386/compile/GENERIC
total memory = 2047 MB
avail memory = 1994 MB
timecounter: Timecounters tick every 10.000 msec
Kernelized RAIDframe activated
running cgd selftest aes-xts-256 aes-xts-512 done
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100 
To Be Filled By O.E.M. To Be Filled By O.E.M. (To Be Filled By O.E.M.)
mainbus0 (root)
ACPI: RSDP 0x00000000000F5D10 000014 (v00 ACPIAM)
ACPI: RSDT 0x000000007FFF0000 000030 (v01 A M I  OEMRSDT  09000230 MSFT 00000097)
ACPI: FACP 0x000000007FFF0200 000081 (v02 A M I  OEMFACP  09000230 MSFT 00000097)
ACPI: DSDT 0x000000007FFF0400 002B73 (v01 0AAZB  0AAZB015 00000015 MSFT 0100000D)
ACPI: FACS 0x000000007FFFF000 000040
ACPI: APIC 0x000000007FFF0300 000084 (v01 A M I  OEMAPIC  09000230 MSFT 00000097)
ACPI: OEMB 0x000000007FFFF040 000055 (v01 A M I  OEMBIOS  09000230 MSFT 00000097)
ACPI: Executed 1 blocks of module-level executable AML code
ACPI: 1 ACPI AML tables successfully acquired and loaded
ioapic0 at mainbus0 apid 8: pa 0xfec00000, version 0x20, 24 pins
ioapic1 at mainbus0 apid 9: pa 0xfec80000, version 0x20, 24 pins
ioapic2 at mainbus0 apid 10: pa 0xfec80400, version 0x20, 24 pins
cpu0 at mainbus0 apid 0
cpu0: Intel(R) XEON(TM) CPU 2.40GHz, id 0xf24
cpu0: package 0, core 0, smt 0
cpu1 at mainbus0 apid 1
cpu1: Intel(R) XEON(TM) CPU 2.40GHz, id 0xf24
cpu1: package 0, core 0, smt 1
cpu2 at mainbus0 apid 6
cpu2: Intel(R) XEON(TM) CPU 2.40GHz, id 0xf24
cpu2: package 3, core 0, smt 0
cpu3 at mainbus0 apid 7
cpu3: Intel(R) XEON(TM) CPU 2.40GHz, id 0xf24
cpu3: package 3, core 0, smt 1
acpi0 at mainbus0: Intel ACPICA 20170303
acpi0: X/RSDT: OemId <A M I ,OEMRSDT ,09000230>, AslId <MSFT,00000097>
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x43 irq 0
pckbc1 at acpi0 (PS2K, PNP0303) (kbd port): io 0x60,0x64 irq 1
pckbc2 at acpi0 (PS2M, PNP0F03) (aux port): irq 12
pcppi1 at acpi0 (SPKR, PNP0800): io 0x61
midi0 at pcppi1: PC speaker
sysbeep0 at pcppi1
COPR (PNP0C04) at acpi0 not configured
SYSR (PNP0C02) at acpi0 not configured
FWH (INT0800) at acpi0 not configured
OSYS (PNP0C02) at acpi0 not configured
SYSM (PNP0C01) at acpi0 not configured
acpibut0 at acpi0 (SPBT, PNP0C0C-255): ACPI Power Button
apm0 at acpi0: Power Management spec V1.2
pckbd0 at pckbc1 (kbd slot)
pckbc1: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard
attimer1: attached to pcppi1
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 8086 product 2540 (rev. 0x03)
ppb0 at pci0 dev 2 function 0: vendor 8086 product 2543 (rev. 0x03)
pci1 at ppb0 bus 2
pci1: i/o space, memory space enabled
vendor 8086 product 1461 (interrupt system, IO(x) APIC, revision 0x03) at pci1 dev 28 function 0 not
 configured
ppb1 at pci1 dev 29 function 0: vendor 8086 product 1460 (rev. 0x03)
pci2 at ppb1 bus 4
pci2: i/o space, memory space enabled
wm0 at pci2 dev 1 function 0: Intel i82545EM 1000BASE-T Ethernet (rev. 0x01)
wm0: interrupting at ioapic2 pin 0
wm0: 64-bit 100MHz PCIX bus
wm0: 64 words (6 address bits) MicroWire EEPROM
wm0: Ethernet address 00:e0:81:23:f8:37
wm0: 0x201c02<LOCK_EECD,IOH_VALID,BUS64,PCIX,WOL>
makphy0 at wm0 phy 1: Marvell 88E1011 Gigabit PHY, rev. 3
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
vendor 8086 product 1461 (interrupt system, IO(x) APIC, revision 0x03) at pci1 dev 30 function 0 not
 configured
ppb2 at pci1 dev 31 function 0: vendor 8086 product 1460 (rev. 0x03)
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled
twa0 at pci3 dev 3 function 0: 3ware 9550SX series (rev. 0x00)
twa0: interrupting at ioapic1 pin 0
twa0: AEN 0x0053: INFO: Battery capacity test needed:
twa0: 8 ports, Firmware FE9X 3.08.00.022, BIOS BE9X 3.08.00.004
twa0: Monitor BL9X 3.01.00.006, PCB Rev 032 , Achip 1.70    , Pchip 1.60
twa0: port 0: ST2000LM003 HN-M201RAD                   1907729 MB
twa0: port 1: ST2000LM003 HN-M201RAD                   1907729 MB
twa0: port 6: ST31000340AS                             953869 MB
twa0: port 7: WDC WD20EZRX-00D8PB0                     1907729 MB
twa0: AMCC    9550SX-8LP DISK 3.08AGC11854F975D0005898
ld0 at twa0 unit 0
ld0: 1862 GB, 243151 cyl, 255 head, 63 sec, 512 bytes/sect x 3906228224 sectors
twa0: AMCC    9550SX-8LP DISK 3.08CH117795F9744600A1E6
ld1 at twa0 unit 1
ld1: 1862 GB, 243151 cyl, 255 head, 63 sec, 512 bytes/sect x 3906228224 sectors
twa0: AMCC    9550SX-8LP DISK 3.086QJ04D9TFF0414003736
ld2 at twa0 unit 2
ld2: 931 GB, 121575 cyl, 255 head, 63 sec, 512 bytes/sect x 1953103872 sectors
twa0: AMCC    9550SX-8LP DISK 3.08M205246162E1F500C060
ld3 at twa0 unit 3
ld3: 1862 GB, 243151 cyl, 255 head, 63 sec, 512 bytes/sect x 3906228224 sectors
uhci0 at pci0 dev 29 function 0: vendor 8086 product 2482 (rev. 0x02)
uhci0: interrupting at ioapic0 pin 16
usb0 at uhci0: USB revision 1.0
ppb3 at pci0 dev 30 function 0: vendor 8086 product 244e (rev. 0x42)
pci4 at ppb3 bus 1
pci4: i/o space, memory space enabled
fxp0 at pci4 dev 1 function 0: i82551 Ethernet (rev. 0x10)
fxp0: interrupting at ioapic0 pin 17
fxp0: Ethernet address 00:e0:81:23:f8:36
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vga0 at pci4 dev 2 function 0: vendor 1002 product 4752 (rev. 0x27)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation), using wskbd0
wsmux1: connecting to wsdisplay0
drm at vga0 not configured
ichlpcib0 at pci0 dev 31 function 0: vendor 8086 product 2480 (rev. 0x02)
timecounter: Timecounter "ichlpcib0" frequency 3579545 Hz quality 1000
ichlpcib0: 24-bit timer
tco0 at ichlpcib0: TCO (watchdog) timer configured.
tco0: Min/Max interval 2/37 seconds
piixide0 at pci0 dev 31 function 1: Intel 82801CA IDE Controller (ICH3) (rev. 0x02)
piixide0: bus-master DMA support present
piixide0: primary channel configured to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14
atabus0 at piixide0 channel 0
piixide0: secondary channel configured to compatibility mode
piixide0: secondary channel interrupting at ioapic0 pin 15
atabus1 at piixide0 channel 1
ichsmb0 at pci0 dev 31 function 3: vendor 8086 product 2483 (rev. 0x02)
ichsmb0: interrupting at ioapic0 pin 17
iic0 at ichsmb0: I2C bus
isa0 at ichlpcib0
acpicpu0 at cpu0: ACPI CPU
acpicpu0: C1: HLT, lat   0 us, pow     0 mW
acpicpu0: T0: I/O, lat   1 us, pow     0 mW, 100 %
acpicpu0: T1: I/O, lat   1 us, pow     0 mW,  88 %
acpicpu0: T2: I/O, lat   1 us, pow     0 mW,  76 %
acpicpu0: T3: I/O, lat   1 us, pow     0 mW,  64 %
iic0 at ichsmb0: I2C bus
isa0 at ichlpcib0
acpicpu0 at cpu0: ACPI CPU     
acpicpu0: C1: HLT, lat   0 us, pow     0 mW 
acpicpu0: T0: I/O, lat   1 us, pow     0 mW, 100 %
acpicpu0: T1: I/O, lat   1 us, pow     0 mW,  88 %
acpicpu0: T2: I/O, lat   1 us, pow     0 mW,  76 %
acpicpu0: T3: I/O, lat   1 us, pow     0 mW,  64 %
acpicpu0: T4: I/O, lat   1 us, pow     0 mW,  52 %
acpicpu0: T5: I/O, lat   1 us, pow     0 mW,  40 %
acpicpu0: T6: I/O, lat   1 us, pow     0 mW,  28 %
acpicpu0: T7: I/O, lat   1 us, pow     0 mW,  16 % 
acpicpu1 at cpu1: ACPI CPU
acpicpu2 at cpu2: ACPI CPU      
acpicpu3 at cpu3: ACPI CPU
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0     
twa0: AEN 0x0001: INFO: Controller reset occurred: resets=1
ld2: GPT GUID: 80f0fc79-31c3-42e8-a266-1df5c78996da
dk0 at ld2: "pv1", 1953103616 blocks at 128, type: raidframe
IPsec: Initialized Security Association Processing.
uhub0 at usb0: vendor 8086 (0x8086) UHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
ld0: GPT GUID: 0e9593b3-65ff-493c-bb2b-cfc6bf839c6d
dk1 at ld0: "boot1", 524288 blocks at 128, type: ffs    
dk2 at ld0: "disk1", 3905703680 blocks at 524416, type: raidframe
ld1: GPT GUID: fc0d50d8-d5aa-4a48-b731-2b91560e3c54
dk3 at ld1: "boot0", 524288 blocks at 128, type: ffs      
dk4 at ld1: "disk0", 3905703680 blocks at 524416, type: raidframe
ld3: GPT GUID: 84ec71e6-0701-4f85-bd25-a466d9e63719
dk5 at ld3: "7c79f4aa-075b-4c61-a8c5-352e492b395a", 3906227968 blocks at 128, type: raidframe
raid2: RAID Level 1     
raid2: Components: /dev/dk4 /dev/dk2
raid2: Total Sectors: 3905703616 (1907081 MB)
raid2: GPT GUID: d771ba77-891e-430e-84f3-18e03b655f6c
dk6 at raid2: "raid2a", 4194304 blocks at 128, type: ffs
dk7 at raid2: "swap", 8388608 blocks at 4194432, type: swap
dk8 at raid2: "pv0", 3893120543 blocks at 12583040, type: raidframe
boot device: dk6
root on dk6 dumps on dk7


>How-To-Repeat:
>Fix:

>Audit-Trail:
From: coypu@sdf.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-i386/53628: Regular panic. Possibly twa driver.
Date: Sat, 6 Oct 2018 01:46:47 +0000

 Some people mentioned it might be due to making ld MPSAFE and twa not
 being MPSAFE.

 Here's a diff that should mark ld not MPSAFE, for you to have a usable
 system until then. I've also uploaded a netbsd-8 kernel built with this
 change here, for your convenience:
 https://ftp.netbsd.org/pub/NetBSD/misc/maya/netbsd8-ld-mpsafe

 digest sha512 /home/fly/obj8/sys/arch/amd64/compile/GENERIC/netbsd
 SHA512 (/home/fly/obj8/sys/arch/amd64/compile/GENERIC/netbsd) = 765d8dcd773674cf75d99158f800d36ef523ce3045e1f2ee67518fefe14865271cea1d769016d7d4485071f24f651796cc98e03fe190758df5f2012291ada802


 diff --git a/sys/dev/ld.c b/sys/dev/ld.c
 index 62d470c15d3c..1be2666a9ac3 100644
 --- a/sys/dev/ld.c
 +++ b/sys/dev/ld.c
 @@ -92,7 +92,7 @@ const struct bdevsw ld_bdevsw = {
  	.d_dump = lddump,
  	.d_psize = ldsize,
  	.d_discard = lddiscard,
 -	.d_flag = D_DISK | D_MPSAFE
 +	.d_flag = D_DISK
  };

  const struct cdevsw ld_cdevsw = {
 @@ -107,7 +107,7 @@ const struct cdevsw ld_cdevsw = {
  	.d_mmap = nommap,
  	.d_kqfilter = nokqfilter,
  	.d_discard = lddiscard,
 -	.d_flag = D_DISK | D_MPSAFE
 +	.d_flag = D_DISK
  };

  static struct	dkdriver lddkdriver = {

From: coypu@sdf.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-i386/53628: Regular panic. Possibly twa driver.
Date: Sat, 6 Oct 2018 02:36:41 +0000

 woops, re-uploaded now built for i386 :)
 (I hope you didn't use it before I realized and deleted it)

 you can change kernels with:
 cp /netbsd /onetbsd # so dropping to boot prompt and typing 'boot onetbsd' still works for sure
 cp new-kernel /netbsd

From: Louis Guillaume <louis@zabrico.com>
To: gnats-bugs@NetBSD.org, port-i386-maintainer@netbsd.org,
        gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: port-i386/53628: Regular panic. Possibly twa driver.
Date: Sat, 6 Oct 2018 16:15:41 -0400

 On 10/5/18 10:40 PM, coypu@sdf.org wrote:
 > The following reply was made to PR port-i386/53628; it has been noted by GNATS.
 > 
 > From: coypu@sdf.org
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: port-i386/53628: Regular panic. Possibly twa driver.
 > Date: Sat, 6 Oct 2018 02:36:41 +0000
 > 
 >   woops, re-uploaded now built for i386 :)
 >   (I hope you didn't use it before I realized and deleted it)
 >   
 >   you can change kernels with:
 >   cp /netbsd /onetbsd # so dropping to boot prompt and typing 'boot onetbsd' still works for sure
 >   cp new-kernel /netbsd
 >   
 > 

 This is great - thank you! Everything seems to be working well at this 
 point. If all still works in a week or so - I'll follow up to ask about 
 pulling the patch up to netbsd-8 if that's ok...

 Louis


From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-i386/53628 Regular panic. Possibly twa driver.
Date: Wed, 10 Oct 2018 15:19:32 +0200

 The ld driver is supposed to be MPSAFE, just removing the flag is a bit
 strong.

 But maybe this is enough:

 Index: ld.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/ld.c,v
 retrieving revision 1.104
 diff -u -r1.104 ld.c
 --- ld.c        28 Oct 2017 03:47:24 -0000      1.104
 +++ ld.c        10 Oct 2018 13:17:52 -0000
 @@ -369,7 +369,11 @@
                 return (error);

         if (sc->sc_ioctl) {
 +               if ((sc->sc_flags & LDF_MPSAFE) == 0)
 +                       KERNEL_LOCK(1, curlwp);
                 error = (*sc->sc_ioctl)(sc, cmd, addr, flag, 0);
 +               if ((sc->sc_flags & LDF_MPSAFE) == 0)
 +                       KERNEL_UNLOCK_ONE(curlwp);
                 if (error != EPASSTHROUGH)
                         return (error);
         }
 @@ -388,7 +392,11 @@
         struct ld_softc *sc = device_private(self);

         if (sc->sc_ioctl) {
 +               if ((sc->sc_flags & LDF_MPSAFE) == 0)
 +                       KERNEL_LOCK(1, curlwp);
                 error = (*sc->sc_ioctl)(sc, DIOCCACHESYNC, NULL, 0, poll);
 +               if ((sc->sc_flags & LDF_MPSAFE) == 0)
 +                       KERNEL_UNLOCK_ONE(curlwp);
                 if (error != 0)
                         device_printf(self, "unable to flush cache\n");
         }


 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.