NetBSD Problem Report #59023

From john@lily.zia.io  Wed Jan 22 09:07:08 2025
Return-Path: <john@lily.zia.io>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 007B61A923A
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 22 Jan 2025 09:07:07 +0000 (UTC)
Message-Id: <20250122070920.8CF8D43BA74@lily.zia.io>
Date: Wed, 22 Jan 2025 07:09:18 +0000 (UTC)
From: john@ziaspace.com
Reply-To: john@ziaspace.com
To: gnats-bugs@NetBSD.org
Subject: IDE can't do DMA in NetBSD 10
X-Send-Pr-Version: 3.95

>Number:         59023
>Category:       port-cobalt
>Synopsis:       IDE can't do DMA in NetBSD 10
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-cobalt-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 22 09:10:00 +0000 2025
>Last-Modified:  Mon Feb 17 19:05:02 +0000 2025
>Originator:     john@ziaspace.com
>Release:        NetBSD 10.1_STABLE
>Organization:

>Environment:


System: NetBSD lily.zia.io 10.1_STABLE NetBSD 10.1_STABLE (LILY) #0: Wed Jan 22 05:16:42 UTC 2025 john@r7900.zia.io:/usr/obj-cobalt/sys/arch/cobalt/compile/LILY cobalt
Architecture: mipsel
Machine: cobalt
>Description:

With NetBSD <= 9, booting a GENERIC kernel works fine.
Booting a NetBSD 10 GENERIC kernel gives:

...
[   4.0398597] wd0 at atabus0 drive 0
[   4.0498926] wd0: <SATA SSD>
[   4.0498926] wd0: 476 GB, 992277 cyl, 16 head, 63 sec, 512 bytes/sect x 1000215216 sectors
[   5.0499003] swwdog0: software watchdog initialized
viaide0:0:0: lost interrupt
[  15.0598369]  type: ata tc_bcount: 512 tc_skip: 0
viaide0:0:0: bus-master DMA error: missing interrupt, status=0x20
[  15.0727954] wd0: excessive DMA errors - 4 in last 6 transfers
[  15.0727954] wd0d: DMA error reading fsbn 1000215215 (wd0 bn 1000215215; cn 992276 tn 15 sn 62), xfer 1f30, retry 0
[  16.0898364] wd0: soft error (corrected) xfer 1f30
[  16.0898364] WARNING: 3 errors while detecting hardware; check system log.
[  16.1031654] boot device: wd0
[  16.1031654] root on wd0a dumps on wd0b
viaide0:0:0: lost interrupt
[  26.1098372]  type: ata tc_bcount: 512 tc_skip: 0
viaide0:0:0: bus-master DMA error: missing interrupt, status=0x20
[  26.1227792] wd0d: DMA error reading fsbn 1000215215 (wd0 bn 1000215215; cn 992276 tn 15 sn 62), xfer 1f30, retry 0
[  26.6198364] wd0: soft error (corrected) xfer 1f30
viaide0:0:0: lost interrupt
[  36.6198364]  type: ata tc_bcount: 512 tc_skip: 0
viaide0:0:0: bus-master DMA error: missing interrupt, status=0x20
[  36.6327738] wd0d: DMA error reading fsbn 1000215215 (wd0 bn 1000215215; cn 992276 tn 15 sn 62), xfer 1f30, retry 0
[  37.1298364] wd0: soft error (corrected) xfer 1f30
[  37.1298364] root file system type: ffs
[  37.1435006] kern.module.path=/stand/cobalt/10.1/modules
viaide0:0:0: lost interrupt
...

and so on.
>How-To-Repeat:

Boot a GENERIC kernel with a drive that supports DMA.
>Fix:

Compiling a kernel with wd* at atabus? drive ? flags 0x0ff0 gives a working system.
Not sure how to fix the broken DMA.

I've had issues with IDE on Amiga and hpcarm machines, too, although I 
can't say with any certainty that this is related.

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-cobalt/59023: IDE can't do DMA in NetBSD 10
Date: Thu, 23 Jan 2025 12:43:42 +0100

 On Wed, Jan 22, 2025 at 09:10:00AM +0000, john@ziaspace.com wrote:
 > With NetBSD <= 9, booting a GENERIC kernel works fine.
 > Booting a NetBSD 10 GENERIC kernel gives:
 > 
 > ...
 > [   4.0398597] wd0 at atabus0 drive 0
 > [   4.0498926] wd0: <SATA SSD>
 > [   4.0498926] wd0: 476 GB, 992277 cyl, 16 head, 63 sec, 512 bytes/sect x 1000215216 sectors
 > [   5.0499003] swwdog0: software watchdog initialized
 > viaide0:0:0: lost interrupt

 Can you show full dmesg output from -9 and -10 on that machine?
 Especially the interrupt details of the sata controller are interesting
 (or any errors).

 Martin

From: John Klos <john@klos.com>
To: gnats-bugs@netbsd.org
Cc: port-cobalt-maintainer@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: port-cobalt/59023: IDE can't do DMA in NetBSD 10
Date: Sun, 2 Feb 2025 22:08:00 +0000 (UTC)

 > > Booting a NetBSD 10 GENERIC kernel gives:
 > > ...
 > > [   4.0398597] wd0 at atabus0 drive 0
 > > [   4.0498926] wd0: <SATA SSD>
 > > [   4.0498926] wd0: 476 GB, 992277 cyl, 16 head, 63 sec, 512 bytes/sect x 1000215216 sectors
 > > [   5.0499003] swwdog0: software watchdog initialized
 > > viaide0:0:0: lost interrupt
 >
 > Can you show full dmesg output from -9 and -10 on that machine?
 > Especially the interrupt details of the sata controller are interesting
 > (or any errors).

 Here's NetBSD 10.1 with "wd* at atabus? drive ? flags 0x0ff0":

 [     1.000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
 [     1.000000]     2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
 [     1.000000]     2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023,
 [     1.000000]     2024, 2025
 [     1.000000]     The NetBSD Foundation, Inc.  All rights reserved.
 [     1.000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
 [     1.000000]     The Regents of the University of California.  All rights reserved.

 [     1.000000] NetBSD 10.1_STABLE (LILY) #0: Wed Jan 22 05:16:42 UTC 2025
 [     1.000000] 	john@r7900.zia.io:/usr/obj-cobalt/sys/arch/cobalt/compile/LILY
 [     1.000000] Cobalt RaQ 2
 [     1.000000] total memory = 256 MB
 [     1.000000] avail memory = 247 MB
 [     1.000000] timecounter: Timecounters tick every 10.000 msec
 [     1.000000] Kernelized RAIDframe activated
 [     1.000000] mainbus0 (root)
 [     1.000000] com0 at mainbus0 addr 0x1c800000 level 3: st16650a, 32-byte FIFO
 [     1.000000] com0: console
 [     1.000000] cpu0 at mainbus0: QED RM5200 CPU (0x28a0) Rev. 10.0 with built-in FPU Rev. 10.0
 [     1.000000] cpu0: 48 TLB entries, 16MB max page size
 [     1.000000] cpu0: 32KB/32B 2-way set-associative L1 instruction cache
 [     1.000000] cpu0: 32KB/32B 2-way set-associative write-back L1 data cache
 [     1.000000] mcclock0 at mainbus0 addr 0x10000070: mc146818 compatible time-of-day clock
 [     1.000000] lcdpanel0 at mainbus0 addr 0x1f000000
 [     1.000000] gt0 at mainbus0 addr 0x14000000
 [     1.000000] pci0 at gt0
 [     1.000000] pci0: i/o space, memory space enabled, rd/line, wr/inv ok
 [     1.000000] pchb0 at pci0 dev 0 function 0: Galileo GT-64111 System Controller, rev 1
 [     1.000000] tlp0 at pci0 dev 7 function 0: DECchip 21143 Ethernet, pass 4.1
 [     1.000000] tlp0: interrupting at level 1
 [     1.000000] tlp0: Ethernet address 00:10:e0:00:3f:58
 [     1.000000] lxtphy0 at tlp0 phy 1: LXT970 10/100 media interface, rev. 3
 [     1.000000] lxtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 [     1.000000] siop0 at pci0 dev 8 function 0: Symbios Logic 53c860 (ultra scsi)
 [     1.000000] siop0: interrupting at irq 4
 [     1.000000] scsibus0 at siop0: 8 targets, 8 luns per target
 [     1.000000] pcib0 at pci0 dev 9 function 0
 [     1.000000] pcib0: VIA Technologies VT82C586 PCI-ISA Bridge, rev 39
 [     1.000000] viaide0 at pci0 dev 9 function 1
 [     1.000000] viaide0: VIA Technologies VT82C586 (Apollo VP) ATA33 controller
 [     1.000000] viaide0: bus-master DMA support present
 [     1.000000] viaide0: primary channel configured to compatibility mode
 [     1.000000] viaide0: primary channel interrupting at irq 14
 [     1.000000] atabus0 at viaide0 channel 0
 [     1.000000] viaide0: secondary channel configured to compatibility mode
 [     1.000000] viaide0: secondary channel interrupting at irq 15
 [     1.000000] atabus1 at viaide0 channel 1
 [     1.000000] VIA Technologies VT83C572 USB Controller (USB serial bus, UHCI, revision 0x02) at pci0 dev 9 function 2 not configured
 [     1.000000] tlp1 at pci0 dev 12 function 0: DECchip 21143 Ethernet, pass 4.1
 [     1.000000] tlp1: interrupting at level 2
 [     1.000000] tlp1: Ethernet address 00:10:e0:00:3f:7d
 [     1.000000] lxtphy1 at tlp1 phy 1: LXT970 10/100 media interface, rev. 3
 [     1.000000] lxtphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 [     1.000000] WARNING: system needs entropy for security; see entropy(7)
 [     1.000000] timecounter: Timecounter "mips3_cp0_counter" frequency 125000000 Hz quality 100
 [     1.000003] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 [     1.000003] scsibus0: waiting 2 seconds for devices to settle...
 [     4.039857] wd0 at atabus0 drive 0
 [     4.049888] wd0: <SATA SSD>
 [     4.049888] wd0: drive supports 16-sector PIO transfers, LBA48 addressing
 [     4.049888] wd0: 476 GB, 992277 cyl, 16 head, 63 sec, 512 bytes/sect x 1000215216 sectors
 [     4.061445] wd0: 32-bit data port
 [     4.061445] wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
 [     4.061445] wd0(viaide0:0:0): using PIO mode 4
 [     5.049880] swwdog0: software watchdog initialized
 [     5.059936] boot device: wd0
 [     5.069877] root on wd0e dumps on wd0b
 [     5.089901] kern.module.path=/stand/cobalt/10.1/modules
 [     7.609879] entropy: best effort
 [    14.529872] entropy: ready

 Here're examples of errors that don't result in proper downgrading:

 Sep 29 23:48:26 lily /netbsd: [ 618.4316018] autoconfiguration error: viaide0:0:0: lost interrupt
 Sep 29 23:48:26 lily /netbsd: [ 618.4316018]    type: ata tc_bcount: 32768 tc_skip: 0
 Sep 29 23:48:26 lily /netbsd: [ 618.4316018] autoconfiguration error: viaide0:0:0: bus-master DMA error: missing interrupt, status=0x20
 Sep 29 23:48:26 lily /netbsd: [ 618.4447539] wd0e: DMA error reading fsbn 962465728 of 962465728-962465791 (wd0 bn 966682560; cn 959010 tn 7 sn 39), xfer 1f88, retry 0
 Sep 29 23:48:26 lily /netbsd: [ 618.9416160] wd0: soft error (corrected) xfer 1f88


 I don't have a dmesg from NetBSD 9, and my memory may be wrong about which 
 version of NetBSD lost the ability to downgrade properly. I will try to 
 test that soon.

 I do have a dmesg from NetBSD 6, where the downgrade happens successfully:

 pmap_steal_memory: seg 0: 0x474 0x474 0xfffe 0xfffe
 pmap_steal_memory: seg 0: 0x4aa 0x4aa 0xfffe 0xfffe
 pmap_steal_memory: seg 0: 0x4ac 0x4ac 0xfffe 0xfffe
 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
      2006, 2007, 2008, 2009, 2010, 2011, 2012
      The NetBSD Foundation, Inc.  All rights reserved.
 Copyright (c) 1982, 1986, 1989, 1991, 1993
      The Regents of the University of California.  All rights reserved.

 NetBSD 6.1.5 (GENERIC)
 Cobalt RaQ 2
 total memory = 256 MB
 avail memory = 246 MB
 timecounter: Timecounters tick every 10.000 msec
 mainbus0 (root)
 com0 at mainbus0 addr 0x1c800000 level 3: st16650a, working fifo
 com0: console
 cpu0 at mainbus0: QED RM5200 CPU (0x28a0) Rev. 10.0 with built-in FPU Rev. 10.0
 cpu0: 48 TLB entries, 16MB max page size
 cpu0: 32KB/32B 2-way set-associative L1 instruction cache
 cpu0: 32KB/32B 2-way set-associative write-back L1 data cache
 mcclock0 at mainbus0 addr 0x10000070: mc146818 compatible time-of-day clock
 panel0 at mainbus0 addr 0x1f000000
 gt0 at mainbus0 addr 0x14000000
 pci0 at gt0
 pci0: i/o space, memory space enabled, rd/line, wr/inv ok
 pchb0 at pci0 dev 0 function 0: Galileo GT-64111 System Controller, rev 1
 tlp0 at pci0 dev 7 function 0: DECchip 21143 Ethernet, pass 4.1
 tlp0: interrupting at level 1
 tlp0: Ethernet address 00:10:e0:00:3f:58
 lxtphy0 at tlp0 phy 1: LXT970 10/100 media interface, rev. 3
 lxtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 siop0 at pci0 dev 8 function 0: Symbios Logic 53c860 (ultra scsi)
 siop0: interrupting at irq 4
 scsibus0 at siop0: 8 targets, 8 luns per target
 pcib0 at pci0 dev 9 function 0
 pcib0: VIA Technologies VT82C586 PCI-ISA Bridge, rev 39
 viaide0 at pci0 dev 9 function 1
 viaide0: VIA Technologies VT82C586 (Apollo VP) ATA33 controller
 viaide0: bus-master DMA support present
 viaide0: primary channel configured to compatibility mode
 viaide0: primary channel interrupting at irq 14
 atabus0 at viaide0 channel 0
 viaide0: secondary channel configured to compatibility mode
 viaide0: secondary channel interrupting at irq 15
 atabus1 at viaide0 channel 1
 VIA Technologies VT83C572 USB Controller (USB serial bus, revision 0x02) at pci0 dev 9 function 2 not configured
 tlp1 at pci0 dev 12 function 0: DECchip 21143 Ethernet, pass 4.1
 tlp1: interrupting at level 2
 tlp1: Ethernet address 00:10:e0:00:3f:7d
 lxtphy1 at tlp1 phy 1: LXT970 10/100 media interface, rev. 3
 lxtphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 timecounter: Timecounter "mips3_cp0_counter" frequency 125000000 Hz quality 100
 scsibus0: waiting 2 seconds for devices to settle...
 wd0 at atabus0 drive 0
 wd0: <SATA SSD>
 wd0: drive supports 16-sector PIO transfers, LBA48 addressing
 wd0: 476 GB, 992277 cyl, 16 head, 63 sec, 512 bytes/sect x 1000215216 sectors
 wd0: 32-bit data port
 wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
 wd0(viaide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
 Kernelized RAIDframe activated
 boot device: wd0
 root on wd0a dumps on wd0b
 viaide0:0:0: lost interrupt
  	type: ata tc_bcount: 512 tc_skip: 0
 viaide0:0:0: bus-master DMA error: missing interrupt, status=0x20
 wd0: transfer error, downgrading to Ultra-DMA mode 1
 wd0(viaide0:0:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA)
 wd0a: DMA error reading fsbn 0 (wd0 bn 1071104; cn 1062 tn 9 sn 41), retrying
 wd0: soft error (corrected)
 viaide0:0:0: lost interrupt
  	type: ata tc_bcount: 8192 tc_skip: 0
 viaide0:0:0: bus-master DMA error: missing interrupt, status=0x20
 wd0: transfer error, downgrading to PIO mode 4
 wd0(viaide0:0:0): using PIO mode 4
 wd0a: DMA error reading fsbn 16 of 16-31 (wd0 bn 1071120; cn 1062 tn 9 sn 57), retrying
 wd0: soft error (corrected)
 root file system type: ffs
 pid 1(init): ABI set to O32 (e_flags=0x1007)

From: "Jonathan A. Kollasch" <jakllsch@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/59023 CVS commit: src/sys
Date: Mon, 17 Feb 2025 19:01:05 +0000

 Module Name:	src
 Committed By:	jakllsch
 Date:		Mon Feb 17 19:01:04 UTC 2025

 Modified Files:
 	src/sys/arch/i386/conf: LEGACY
 	src/sys/dev/ata: ata.c files.ata

 Log Message:
 Restore ATA DMA mode downgrade support everywhere; it's a necessary part
 of any system supporting (parallel) ATA DMA.  There is hardware out there,
 including cobalt, macppc, and sparc64 where this functionality is
 necessary to avoid non-functional disks, either in as-shipped hardware
 configurations or with add-in cards, or perhaps just with compromised
 IDE/PATA cables.

 Should address:
   PR 58767
   PR 59023
   PR 59078

 If anyone really insists on not having this support they can now turn it
 off themselves with `options ATA_NO_DOWNGRADE_MODE`


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.5 src/sys/arch/i386/conf/LEGACY
 cvs rdiff -u -r1.170 -r1.171 src/sys/dev/ata/ata.c
 cvs rdiff -u -r1.32 -r1.33 src/sys/dev/ata/files.ata

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.