NetBSD Problem Report #51979
From martin@duskware.de Fri Feb 17 18:55:12 2017
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id ED4A27A21A
for <gnats-bugs@gnats.NetBSD.org>; Fri, 17 Feb 2017 18:55:12 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: kernel crash dump to wd0b failed
X-Send-Pr-Version: 3.95
>Number: 51979
>Category: kern
>Synopsis: kernel crash dump to wd0b failed
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: jdolecek
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Feb 17 19:00:00 +0000 2017
>Closed-Date: Sat Jun 16 10:41:55 +0000 2018
>Last-Modified: Sat Jun 16 10:41:55 +0000 2018
>Originator: Martin Husemann
>Release: NetBSD 7.99.60
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD night-owl.duskware.de 7.99.60 NetBSD 7.99.60 (NIGHT-OWL) #483: Fri Feb 17 13:49:36 CET 2017 martin@night-owl.duskware.de:/usr/src/sys/arch/amd64/compile/NIGHT-OWL amd64
Architecture: x86_64
Machine: amd64
>Description:
My kernel crashed but I could not get a dump written:
(manual transcript:)
dumping to dev 0,1 (offset=4481543, size=1012867):
dump panic: wddump: polled command has been queued
..
Dmesg fragment for wd0 is:
ahcisata0 at pci0 dev 31 function 2: vendor 8086 product 3b29 (rev. 0x05)
ahcisata0: interrupting at ioapic0 pin 19
ahcisata0: 64-bit DMA
ahcisata0: AHCI revision 1.30, 4 ports, 32 slots, CAP 0xff20ff63<SXS,EMS,PSC,SSC,PMD,ISS=0x2=Gen2,SCLO,SAL,SALP,SSS,SMPS,SSNTF,SNCQ,S64A>
atabus0 at ahcisata0 channel 0
atabus1 at ahcisata0 channel 1
atabus2 at ahcisata0 channel 4
atabus3 at ahcisata0 channel 5
ahcisata0 port 0: device present, speed: 3.0Gb/s
ahcisata0 port 4: device present, speed: 1.5Gb/s
acpiacad0: AC adapter online.
wd0 at atabus0 drive 0
wd0: <Hitachi HTS545050B9A300>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 465 GB, 969021 cyl, 16 head, 63 sec, 512 bytes/sect x 976773168 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(ahcisata0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
It used to work recentish (after maya fixed the off by one in sparse dumps)
>How-To-Repeat:
n/a
>Fix:
n/a
>Release-Note:
>Audit-Trail:
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/51979 CVS commit: [jdolecek-ncq] src/sys/dev
Date: Fri, 16 Jun 2017 20:40:49 +0000
Module Name: src
Committed By: jdolecek
Date: Fri Jun 16 20:40:49 UTC 2017
Modified Files:
src/sys/dev/ata [jdolecek-ncq]: ata.c ata_wdc.c atavar.h wd.c
src/sys/dev/ic [jdolecek-ncq]: ahcisata_core.c mvsata.c siisata.c wdc.c
src/sys/dev/scsipi [jdolecek-ncq]: atapi_wdc.c
Log Message:
adjust reset channel and dump paths
- channel reset now always kills active transfer, even on dump path, but
now doesn't touch the queued waiting transfers; also kill_xfer hook is
always called, so that HBA can free any private xfer resources and thus
the dump request has chance to work
- kill_xfer routines now always call ata_deactivate_xfer(); added KASSERT()s
to ata_free_xfer() to expect deactivated xfer
- when called during channel reset before dump, ata_kill_active() drops
any queued waiting transfers without processing
- do not (re)queue any transfers in wddone() when dumping
- kill AT_RST_NOCMD flag
This should also hopefully fix the 'polled command has been queued' panic
as reported in:
PR kern/11811 by John Hawkinson
PR kern/47041 by Taylor R Campbell
PR kern/51979 by Martin Husemann
dump tested working with piixide(4) and ahci(4). mvsata(4) dump times out,
but otherwise tested working, will be fixed separately. siisata(4) mechanically
changed and not tested.
To generate a diff of this commit:
cvs rdiff -u -r1.132.8.8 -r1.132.8.9 src/sys/dev/ata/ata.c
cvs rdiff -u -r1.105.6.3 -r1.105.6.4 src/sys/dev/ata/ata_wdc.c
cvs rdiff -u -r1.92.8.8 -r1.92.8.9 src/sys/dev/ata/atavar.h
cvs rdiff -u -r1.428.2.15 -r1.428.2.16 src/sys/dev/ata/wd.c
cvs rdiff -u -r1.57.6.12 -r1.57.6.13 src/sys/dev/ic/ahcisata_core.c
cvs rdiff -u -r1.35.6.10 -r1.35.6.11 src/sys/dev/ic/mvsata.c
cvs rdiff -u -r1.30.4.15 -r1.30.4.16 src/sys/dev/ic/siisata.c
cvs rdiff -u -r1.283.2.4 -r1.283.2.5 src/sys/dev/ic/wdc.c
cvs rdiff -u -r1.123.4.4 -r1.123.4.5 src/sys/dev/scsipi/atapi_wdc.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Wed, 26 Jul 2017 17:21:08 +0000
State-Changed-Why:
any luck? not sure whether a fix that might also apply to a PR as old
as 11811 is likely to be the same issue as one where it used to wokr
recentish... but worth a try I guess.
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51979 (kernel crash dump to wd0b failed)
Date: Thu, 27 Jul 2017 09:08:31 +0200
On Wed, Jul 26, 2017 at 05:21:09PM +0000, dholland@NetBSD.org wrote:
> any luck?
Well, the crash dump worked on other (later) instances, so it will
be tricky to test for real, also: the branch isn't merged yet, is it?
Martin
Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Sat, 07 Oct 2017 17:48:45 +0000
Responsible-Changed-Why:
Committed possible fix.
State-Changed-From-To: feedback->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sat, 16 Jun 2018 10:41:55 +0000
State-Changed-Why:
Assumed fixed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.