NetBSD Problem Report #49401

From gson@gson.org  Tue Nov 18 15:40:10 2014
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 2DA82A65E9
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 18 Nov 2014 15:40:10 +0000 (UTC)
Message-Id: <20141118154000.30D477483F6@guava.gson.org>
Date: Tue, 18 Nov 2014 17:40:00 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@gnats.NetBSD.org
Subject: Crash dumps stall on amd64 machine
X-Send-Pr-Version: 3.95

>Number:         49401
>Category:       port-amd64
>Synopsis:       Crash dumps stall on amd64 machine
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    port-amd64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Nov 18 15:45:00 +0000 2014
>Last-Modified:  Thu Dec 04 20:10:01 +0000 2014
>Originator:     Andreas Gustafsson
>Release:        NetBSD 6.1.4
>Organization:
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

One of my machines running NetBSD/amd64 6.1.4 has suffered kernel
panics on two recent occasions, and in both cases, it failed to write
a crash dump.  In the past, dumping has been slow as described in PR
38970, but in these two cases, the dump stalled completely, making no
progress in 20 hours.

When I looked at the console after the most recent crash, it said:

  dumping to dev 0,1 offset 33743591
  dump ahcisata0 port 2: device present, speed: 3.0GB/s
  achisata0: BSY never cleared, TD 0x80
  16292 16291 16290 16289 16288 

with the cursor positioned after the "16288 ".  When I looked again 20
hours later, the dump had made no further progress.

Earlier, the same machine was suffering from a problem where the same
error message "ahcisata0: BSY never cleared, TD 0x80" would occur
during boot as reported in PR 48214.  That was worked around by moving
the disk from a 6 Gbps SATA port to a 3 Gbps one, and the disk was
later replaced for unrelated reasons.

Here are the disk related parts of the dmesg:

  ahcisata0 at pci0 dev 31 function 2: vendor 0x8086 product 0x1c02 (rev. 0x05)
  ahcisata0: interrupting at ioapic0 pin 19
  ahcisata0: 64-bit DMA
  ahcisata0: AHCI revision 1.30, 6 ports, 32 slots, CAP 0xe730ff65<SXS,EMS,PSC,SSC,PMD,ISS=0x3=Gen3,SCLO,SAL,SALP,SSNTF,SNCQ,S64A>
  atabus0 at ahcisata0 channel 0
  atabus1 at ahcisata0 channel 1
  atabus2 at ahcisata0 channel 2
  atabus3 at ahcisata0 channel 3
  atabus4 at ahcisata0 channel 4
  atabus5 at ahcisata0 channel 5
[...]
  ahcisata0 port 1: device present, speed: 1.5Gb/s
  ahcisata0 port 2: device present, speed: 3.0Gb/s
  ahcisata0 port 3: device present, speed: 3.0Gb/s
  ahcisata0 port 4: device present, speed: 3.0Gb/s
  atapibus0 at atabus1: 1 targets
  cd0 at atapibus0 drive 0: <HL-DT-ST DVDRAM GH24NS95, KQHD1M40649, RN01> cdrom removable
  cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
  cd0(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
  wd0 at atabus2 drive 0
  wd0: <WDC WD1600AAJS-60M0A0>
  wd0: drive supports 16-sector PIO transfers, LBA48 addressing
  wd0: 149 GB, 310101 cyl, 16 head, 63 sec, 512 bytes/sect x 312581808 sectors
  wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
  wd0(ahcisata0:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100) (using DMA)
  wd1 at atabus3 drive 0
  wd1: <ST31500541AS>
  wd1: drive supports 16-sector PIO transfers, LBA48 addressing
  wd1: 1397 GB, 2907021 cyl, 16 head, 63 sec, 512 bytes/sect x 2930277168 sectors
  wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
  wd1(ahcisata0:3:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
  wd2 at atabus4 drive 0
  wd2: <INTEL SSDSA2M080G2GC>
  wd2: drive supports 16-sector PIO transfers, LBA48 addressing
  wd2: 76319 MB, 155061 cyl, 16 head, 63 sec, 512 bytes/sect x 156301488 sectors
  wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
  wd2(ahcisata0:4:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)

>How-To-Repeat:

>Fix:

>Audit-Trail:
From: Soren Jacobsen <snj@blef.org>
To: gnats-bugs@NetBSD.org
Cc: bouyer@NetBSD.org
Subject: Re: port-amd64/49401: Crash dumps stall on amd64 machine
Date: Thu, 4 Dec 2014 12:08:19 -0800

 The fix to PR kern/41095 never got pulled up to netbsd-6.  Different
 symptoms, but probably relevant here.  Manuel?

 Soren

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.