NetBSD Problem Report #52606

From martin@duskware.de  Mon Oct  9 10:19:25 2017
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id EF5157A208
	for <gnats-bugs@gnats.NetBSD.org>; Mon,  9 Oct 2017 10:19:24 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: cmdide transfers never finish
X-Send-Pr-Version: 3.95

>Number:         52606
>Category:       kern
>Synopsis:       cmdide transfers never finish
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    jdolecek
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Oct 09 10:20:00 +0000 2017
>Closed-Date:    Sun Oct 22 13:22:09 +0000 2017
>Last-Modified:  Sun Oct 22 13:22:09 +0000 2017
>Originator:     Martin Husemann
>Release:        NetBSD 8.99.2
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD setting-sun.duskware.de 8.99.2 NetBSD 8.99.2 (SETTINGSUN) #1: Fri Sep 15 14:34:34 CEST 2017 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/sparc64/compile/SETTINGSUN sparc64
Architecture: sparc64
Machine: sparc64

(can't boot the new kernel, so the above is from an older one)

>Description:

I have:

cmdide0 at pci1 dev 3 function 0: CMD Technology PCI0646 (rev. 0x03)
cmdide0: bus-master DMA support present
cmdide0: primary channel configured to native-PCI mode
cmdide0: using ivec 1820 for native-PCI interrupt
atabus0 at cmdide0 channel 0
cmdide0: secondary channel configured to native-PCI mode
atabus1 at cmdide0 channel 1
[..]
wd0 at atabus0 drive 0
wd0: <Maxtor 32049H2>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 19541 MB, 39704 cyl, 16 head, 63 sec, 512 bytes/sect x 40021632 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(cmdide0:0:0): using PIO mode 4, DMA mode 2 (using DMA)
wd1 at atabus1 drive 0
wd1: <WDC WD205AA>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 19569 MB, 39761 cyl, 16 head, 63 sec, 512 bytes/sect x 40079088 sectors
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
wd1(cmdide0:1:0): using PIO mode 4, DMA mode 2 (using DMA)

and the system hangs idle at mountroot(), apparently the cmdide transfers
never finish.

>How-To-Repeat:
Boot -current on a Sun U5

>Fix:
n/a

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Mon, 09 Oct 2017 22:02:31 +0000
Responsible-Changed-Why:
My changes broke this.


From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52606 CVS commit: src/sys/dev/scsipi
Date: Tue, 10 Oct 2017 21:37:49 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Tue Oct 10 21:37:49 UTC 2017

 Modified Files:
 	src/sys/dev/scsipi: atapi_wdc.c

 Log Message:
 revert the logic in wdc_atapi_intr() for wdc_wait_for_unbusy() to what it
 was before NCQ merge; it got broken during the efford to remove ch_status
 and ch_error on the branch

 fixes atapi timeouts in vbox and with real harware reported separately
 by Abhinav Upadhyay, Pault Goyette, Chavdar Ivanov, and Rares
 Aioanei; with a bit of luck it could also fix PR kern/52605 and/or PR
 kern/52606 by Martin Husemann


 To generate a diff of this commit:
 cvs rdiff -u -r1.127 -r1.128 src/sys/dev/scsipi/atapi_wdc.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52606
Date: Wed, 11 Oct 2017 07:06:44 +0200

 With a -current kernel it gets a tiny bit further:

 root on raid0a dumps on raid0b
 root file system type: ffs
 kern.module.path=/stand/sparc64/8.99.4/modules
 Wed Oct 11 06:30:53 MEST 2017
 Not checking /: fs_passno = 0 in /etc/fstab

 then it hangs endlessly and ddb ps shows:

 0       42 3   0       200          100c16ce0            raidio0 biowait

 Martin

From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52606 CVS commit: src/sys/dev/ata
Date: Sat, 14 Oct 2017 13:15:14 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Sat Oct 14 13:15:14 UTC 2017

 Modified Files:
 	src/sys/dev/ata: wd.c

 Log Message:
 only call drive reset with AT_POLL when the command itself was
 polled, so that the logic for AT_POLL matches how e.g. ata_dmaerr() is
 called; this was the original intent of the change in 1.428.2.25,
 to make the error handling safe wrt. polled xfers

 this is stopgap fix for ATA channel wedge after DMA error, as reported
 by Martin Husemann in PR kern/52606, and PR kern/52605

 problem happened due to ata_reset_channel() being called once in ata_dmaerr()
 with flags == 0, which freezed channel and set flag to reset via thread,
 then ata_reset_channel() was called via wdc_drive_reset() with AT_POLL, which
 just executed the reset and cleared the flag, without clearing the extra
 freeze; that logic will be refactored in separate commit


 To generate a diff of this commit:
 cvs rdiff -u -r1.430 -r1.431 src/sys/dev/ata/wd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/52606
Date: Sun, 15 Oct 2017 09:38:31 +0200

 On Wed, Oct 11, 2017 at 07:06:44AM +0200, Martin Husemann wrote:
 > With a -current kernel it gets a tiny bit further:
 > 
 > root on raid0a dumps on raid0b
 > root file system type: ffs
 > kern.module.path=/stand/sparc64/8.99.4/modules
 > Wed Oct 11 06:30:53 MEST 2017
 > Not checking /: fs_passno = 0 in /etc/fstab
 > 
 > then it hangs endlessly and ddb ps shows:
 > 
 > 0       42 3   0       200          100c16ce0            raidio0 biowait

 Exactly the same still happens with a -current kernel as of a few minutes
 ago.

 Martin

State-Changed-From-To: open->analyzed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Thu, 19 Oct 2017 19:53:13 +0000
State-Changed-Why:
cmdide driver shares the queue between the two channels, the queue code
doesn't count with this case. I'll need to figure some solution for this.


From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52606 CVS commit: src/sys/dev/pci
Date: Thu, 19 Oct 2017 20:11:38 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Thu Oct 19 20:11:38 UTC 2017

 Modified Files:
 	src/sys/dev/pci: cmdide.c pciidevar.h

 Log Message:
 replace the chek for the shared channel of cmdide(4) a flag of the
 product array, rather than switch inside attach routine

 XXX judging from product name, Silicon Image 0680 might be newer than 0649
 XXX and hence have actually independant channels, but I don't have the hw
 XXX so keeping as-is

 no functional change, just to improve visibility in course of fixing
 PR kern/52606


 To generate a diff of this commit:
 cvs rdiff -u -r1.39 -r1.40 src/sys/dev/pci/cmdide.c
 cvs rdiff -u -r1.47 -r1.48 src/sys/dev/pci/pciidevar.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52606 CVS commit: src/sys
Date: Fri, 20 Oct 2017 07:06:08 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Fri Oct 20 07:06:08 UTC 2017

 Modified Files:
 	src/sys/arch/acorn32/eb7500atx: rside.c
 	src/sys/arch/acorn32/mainbus: wdc_pioc.c
 	src/sys/arch/acorn32/podulebus: icside.c rapide.c simide.c
 	src/sys/arch/amiga/dev: efa.c wdc_acafh.c wdc_amiga.c wdc_buddha.c
 	    wdc_xsurf.c
 	src/sys/arch/arm/gemini: obio_wdc.c
 	src/sys/arch/atari/dev: wdc_mb.c
 	src/sys/arch/dreamcast/dev/g1: wdc_g1.c
 	src/sys/arch/evbarm/iq31244: wdc_obio.c
 	src/sys/arch/evbarm/tsarm: wdc_ts.c
 	src/sys/arch/evbppc/mpc85xx: wdc_obio.c
 	src/sys/arch/i386/pnpbios: pciide_pnpbios.c
 	src/sys/arch/landisk/dev: wdc_obio.c
 	src/sys/arch/mac68k/obio: wdc_obio.c
 	src/sys/arch/macppc/dev: kauai.c wdc_obio.c
 	src/sys/arch/mips/adm5120/dev: wdc_extio.c
 	src/sys/arch/mmeye/dev: wdc_mainbus.c
 	src/sys/arch/playstation2/dev: wdc_spd.c
 	src/sys/arch/prep/pnpbus: wdc_pnpbus.c
 	src/sys/dev/ata: ata.c ata_subr.c
 	src/sys/dev/ic: ahcisata_core.c ninjaata32.c siisata.c wdc.c wdc_upc.c
 	src/sys/dev/isa: wdc_isa.c
 	src/sys/dev/isapnp: wdc_isapnp.c
 	src/sys/dev/ofisa: wdc_ofisa.c
 	src/sys/dev/pci: artsata.c cmdide.c cypide.c pciide_common.c pdcsata.c
 	    satalink.c viaide.c
 	src/sys/dev/pcmcia: wdc_pcmcia.c
 	src/sys/dev/podulebus: dtide.c hcide.c
 	src/sys/dev/usb: umass_isdata.c

 Log Message:
 move ata_queue_alloc(1) and ata_queue_free() calls to ata_channel_init()
 and ata_channel_destroy() respectively, to make attachment code simpler,
 and to make it easier to spot special queue manipulation like cmdide(4)

 on topic of PR kern/52606


 To generate a diff of this commit:
 cvs rdiff -u -r1.15 -r1.16 src/sys/arch/acorn32/eb7500atx/rside.c
 cvs rdiff -u -r1.29 -r1.30 src/sys/arch/acorn32/mainbus/wdc_pioc.c
 cvs rdiff -u -r1.33 -r1.34 src/sys/arch/acorn32/podulebus/icside.c
 cvs rdiff -u -r1.31 -r1.32 src/sys/arch/acorn32/podulebus/rapide.c
 cvs rdiff -u -r1.30 -r1.31 src/sys/arch/acorn32/podulebus/simide.c
 cvs rdiff -u -r1.14 -r1.15 src/sys/arch/amiga/dev/efa.c
 cvs rdiff -u -r1.5 -r1.6 src/sys/arch/amiga/dev/wdc_acafh.c \
     src/sys/arch/amiga/dev/wdc_xsurf.c
 cvs rdiff -u -r1.39 -r1.40 src/sys/arch/amiga/dev/wdc_amiga.c
 cvs rdiff -u -r1.9 -r1.10 src/sys/arch/amiga/dev/wdc_buddha.c
 cvs rdiff -u -r1.7 -r1.8 src/sys/arch/arm/gemini/obio_wdc.c
 cvs rdiff -u -r1.39 -r1.40 src/sys/arch/atari/dev/wdc_mb.c
 cvs rdiff -u -r1.2 -r1.3 src/sys/arch/dreamcast/dev/g1/wdc_g1.c
 cvs rdiff -u -r1.10 -r1.11 src/sys/arch/evbarm/iq31244/wdc_obio.c
 cvs rdiff -u -r1.10 -r1.11 src/sys/arch/evbarm/tsarm/wdc_ts.c
 cvs rdiff -u -r1.5 -r1.6 src/sys/arch/evbppc/mpc85xx/wdc_obio.c
 cvs rdiff -u -r1.32 -r1.33 src/sys/arch/i386/pnpbios/pciide_pnpbios.c
 cvs rdiff -u -r1.9 -r1.10 src/sys/arch/landisk/dev/wdc_obio.c
 cvs rdiff -u -r1.28 -r1.29 src/sys/arch/mac68k/obio/wdc_obio.c
 cvs rdiff -u -r1.37 -r1.38 src/sys/arch/macppc/dev/kauai.c
 cvs rdiff -u -r1.60 -r1.61 src/sys/arch/macppc/dev/wdc_obio.c
 cvs rdiff -u -r1.9 -r1.10 src/sys/arch/mips/adm5120/dev/wdc_extio.c
 cvs rdiff -u -r1.5 -r1.6 src/sys/arch/mmeye/dev/wdc_mainbus.c
 cvs rdiff -u -r1.28 -r1.29 src/sys/arch/playstation2/dev/wdc_spd.c
 cvs rdiff -u -r1.14 -r1.15 src/sys/arch/prep/pnpbus/wdc_pnpbus.c
 cvs rdiff -u -r1.139 -r1.140 src/sys/dev/ata/ata.c
 cvs rdiff -u -r1.3 -r1.4 src/sys/dev/ata/ata_subr.c
 cvs rdiff -u -r1.58 -r1.59 src/sys/dev/ic/ahcisata_core.c
 cvs rdiff -u -r1.19 -r1.20 src/sys/dev/ic/ninjaata32.c
 cvs rdiff -u -r1.34 -r1.35 src/sys/dev/ic/siisata.c
 cvs rdiff -u -r1.287 -r1.288 src/sys/dev/ic/wdc.c
 cvs rdiff -u -r1.30 -r1.31 src/sys/dev/ic/wdc_upc.c
 cvs rdiff -u -r1.60 -r1.61 src/sys/dev/isa/wdc_isa.c
 cvs rdiff -u -r1.43 -r1.44 src/sys/dev/isapnp/wdc_isapnp.c
 cvs rdiff -u -r1.35 -r1.36 src/sys/dev/ofisa/wdc_ofisa.c
 cvs rdiff -u -r1.27 -r1.28 src/sys/dev/pci/artsata.c
 cvs rdiff -u -r1.40 -r1.41 src/sys/dev/pci/cmdide.c
 cvs rdiff -u -r1.31 -r1.32 src/sys/dev/pci/cypide.c
 cvs rdiff -u -r1.64 -r1.65 src/sys/dev/pci/pciide_common.c
 cvs rdiff -u -r1.28 -r1.29 src/sys/dev/pci/pdcsata.c
 cvs rdiff -u -r1.54 -r1.55 src/sys/dev/pci/satalink.c
 cvs rdiff -u -r1.85 -r1.86 src/sys/dev/pci/viaide.c
 cvs rdiff -u -r1.125 -r1.126 src/sys/dev/pcmcia/wdc_pcmcia.c
 cvs rdiff -u -r1.29 -r1.30 src/sys/dev/podulebus/dtide.c
 cvs rdiff -u -r1.26 -r1.27 src/sys/dev/podulebus/hcide.c
 cvs rdiff -u -r1.35 -r1.36 src/sys/dev/usb/umass_isdata.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52606 CVS commit: src/sys/dev/pci
Date: Sun, 22 Oct 2017 13:13:56 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Sun Oct 22 13:13:55 UTC 2017

 Modified Files:
 	src/sys/dev/pci: cmdide.c pciidevar.h

 Log Message:
 do not share queue between the non-indepedant channels; instead make
 sure only one of the channels is ever active on the same controller

 fixes PR kern/52606 by Martin Husemann, thanks for report and testing


 To generate a diff of this commit:
 cvs rdiff -u -r1.42 -r1.43 src/sys/dev/pci/cmdide.c
 cvs rdiff -u -r1.48 -r1.49 src/sys/dev/pci/pciidevar.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sun, 22 Oct 2017 13:22:09 +0000
State-Changed-Why:
Problem fixed. Thanks for report and testing!



>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.