NetBSD Problem Report #59023
From john@lily.zia.io Wed Jan 22 09:07:08 2025
Return-Path: <john@lily.zia.io>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 007B61A923A
for <gnats-bugs@gnats.NetBSD.org>; Wed, 22 Jan 2025 09:07:07 +0000 (UTC)
Message-Id: <20250122070920.8CF8D43BA74@lily.zia.io>
Date: Wed, 22 Jan 2025 07:09:18 +0000 (UTC)
From: john@ziaspace.com
Reply-To: john@ziaspace.com
To: gnats-bugs@NetBSD.org
Subject: IDE can't do DMA in NetBSD 10
X-Send-Pr-Version: 3.95
>Number: 59023
>Category: port-cobalt
>Synopsis: IDE can't do DMA in NetBSD 10
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-cobalt-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jan 22 09:10:00 +0000 2025
>Last-Modified: Mon Feb 17 19:05:02 +0000 2025
>Originator: john@ziaspace.com
>Release: NetBSD 10.1_STABLE
>Organization:
>Environment:
System: NetBSD lily.zia.io 10.1_STABLE NetBSD 10.1_STABLE (LILY) #0: Wed Jan 22 05:16:42 UTC 2025 john@r7900.zia.io:/usr/obj-cobalt/sys/arch/cobalt/compile/LILY cobalt
Architecture: mipsel
Machine: cobalt
>Description:
With NetBSD <= 9, booting a GENERIC kernel works fine.
Booting a NetBSD 10 GENERIC kernel gives:
...
[ 4.0398597] wd0 at atabus0 drive 0
[ 4.0498926] wd0: <SATA SSD>
[ 4.0498926] wd0: 476 GB, 992277 cyl, 16 head, 63 sec, 512 bytes/sect x 1000215216 sectors
[ 5.0499003] swwdog0: software watchdog initialized
viaide0:0:0: lost interrupt
[ 15.0598369] type: ata tc_bcount: 512 tc_skip: 0
viaide0:0:0: bus-master DMA error: missing interrupt, status=0x20
[ 15.0727954] wd0: excessive DMA errors - 4 in last 6 transfers
[ 15.0727954] wd0d: DMA error reading fsbn 1000215215 (wd0 bn 1000215215; cn 992276 tn 15 sn 62), xfer 1f30, retry 0
[ 16.0898364] wd0: soft error (corrected) xfer 1f30
[ 16.0898364] WARNING: 3 errors while detecting hardware; check system log.
[ 16.1031654] boot device: wd0
[ 16.1031654] root on wd0a dumps on wd0b
viaide0:0:0: lost interrupt
[ 26.1098372] type: ata tc_bcount: 512 tc_skip: 0
viaide0:0:0: bus-master DMA error: missing interrupt, status=0x20
[ 26.1227792] wd0d: DMA error reading fsbn 1000215215 (wd0 bn 1000215215; cn 992276 tn 15 sn 62), xfer 1f30, retry 0
[ 26.6198364] wd0: soft error (corrected) xfer 1f30
viaide0:0:0: lost interrupt
[ 36.6198364] type: ata tc_bcount: 512 tc_skip: 0
viaide0:0:0: bus-master DMA error: missing interrupt, status=0x20
[ 36.6327738] wd0d: DMA error reading fsbn 1000215215 (wd0 bn 1000215215; cn 992276 tn 15 sn 62), xfer 1f30, retry 0
[ 37.1298364] wd0: soft error (corrected) xfer 1f30
[ 37.1298364] root file system type: ffs
[ 37.1435006] kern.module.path=/stand/cobalt/10.1/modules
viaide0:0:0: lost interrupt
...
and so on.
>How-To-Repeat:
Boot a GENERIC kernel with a drive that supports DMA.
>Fix:
Compiling a kernel with wd* at atabus? drive ? flags 0x0ff0 gives a working system.
Not sure how to fix the broken DMA.
I've had issues with IDE on Amiga and hpcarm machines, too, although I
can't say with any certainty that this is related.
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-cobalt/59023: IDE can't do DMA in NetBSD 10
Date: Thu, 23 Jan 2025 12:43:42 +0100
On Wed, Jan 22, 2025 at 09:10:00AM +0000, john@ziaspace.com wrote:
> With NetBSD <= 9, booting a GENERIC kernel works fine.
> Booting a NetBSD 10 GENERIC kernel gives:
>
> ...
> [ 4.0398597] wd0 at atabus0 drive 0
> [ 4.0498926] wd0: <SATA SSD>
> [ 4.0498926] wd0: 476 GB, 992277 cyl, 16 head, 63 sec, 512 bytes/sect x 1000215216 sectors
> [ 5.0499003] swwdog0: software watchdog initialized
> viaide0:0:0: lost interrupt
Can you show full dmesg output from -9 and -10 on that machine?
Especially the interrupt details of the sata controller are interesting
(or any errors).
Martin
From: John Klos <john@klos.com>
To: gnats-bugs@netbsd.org
Cc: port-cobalt-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-cobalt/59023: IDE can't do DMA in NetBSD 10
Date: Sun, 2 Feb 2025 22:08:00 +0000 (UTC)
> > Booting a NetBSD 10 GENERIC kernel gives:
> > ...
> > [ 4.0398597] wd0 at atabus0 drive 0
> > [ 4.0498926] wd0: <SATA SSD>
> > [ 4.0498926] wd0: 476 GB, 992277 cyl, 16 head, 63 sec, 512 bytes/sect x 1000215216 sectors
> > [ 5.0499003] swwdog0: software watchdog initialized
> > viaide0:0:0: lost interrupt
>
> Can you show full dmesg output from -9 and -10 on that machine?
> Especially the interrupt details of the sata controller are interesting
> (or any errors).
Here's NetBSD 10.1 with "wd* at atabus? drive ? flags 0x0ff0":
[ 1.000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
[ 1.000000] 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
[ 1.000000] 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023,
[ 1.000000] 2024, 2025
[ 1.000000] The NetBSD Foundation, Inc. All rights reserved.
[ 1.000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[ 1.000000] The Regents of the University of California. All rights reserved.
[ 1.000000] NetBSD 10.1_STABLE (LILY) #0: Wed Jan 22 05:16:42 UTC 2025
[ 1.000000] john@r7900.zia.io:/usr/obj-cobalt/sys/arch/cobalt/compile/LILY
[ 1.000000] Cobalt RaQ 2
[ 1.000000] total memory = 256 MB
[ 1.000000] avail memory = 247 MB
[ 1.000000] timecounter: Timecounters tick every 10.000 msec
[ 1.000000] Kernelized RAIDframe activated
[ 1.000000] mainbus0 (root)
[ 1.000000] com0 at mainbus0 addr 0x1c800000 level 3: st16650a, 32-byte FIFO
[ 1.000000] com0: console
[ 1.000000] cpu0 at mainbus0: QED RM5200 CPU (0x28a0) Rev. 10.0 with built-in FPU Rev. 10.0
[ 1.000000] cpu0: 48 TLB entries, 16MB max page size
[ 1.000000] cpu0: 32KB/32B 2-way set-associative L1 instruction cache
[ 1.000000] cpu0: 32KB/32B 2-way set-associative write-back L1 data cache
[ 1.000000] mcclock0 at mainbus0 addr 0x10000070: mc146818 compatible time-of-day clock
[ 1.000000] lcdpanel0 at mainbus0 addr 0x1f000000
[ 1.000000] gt0 at mainbus0 addr 0x14000000
[ 1.000000] pci0 at gt0
[ 1.000000] pci0: i/o space, memory space enabled, rd/line, wr/inv ok
[ 1.000000] pchb0 at pci0 dev 0 function 0: Galileo GT-64111 System Controller, rev 1
[ 1.000000] tlp0 at pci0 dev 7 function 0: DECchip 21143 Ethernet, pass 4.1
[ 1.000000] tlp0: interrupting at level 1
[ 1.000000] tlp0: Ethernet address 00:10:e0:00:3f:58
[ 1.000000] lxtphy0 at tlp0 phy 1: LXT970 10/100 media interface, rev. 3
[ 1.000000] lxtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
[ 1.000000] siop0 at pci0 dev 8 function 0: Symbios Logic 53c860 (ultra scsi)
[ 1.000000] siop0: interrupting at irq 4
[ 1.000000] scsibus0 at siop0: 8 targets, 8 luns per target
[ 1.000000] pcib0 at pci0 dev 9 function 0
[ 1.000000] pcib0: VIA Technologies VT82C586 PCI-ISA Bridge, rev 39
[ 1.000000] viaide0 at pci0 dev 9 function 1
[ 1.000000] viaide0: VIA Technologies VT82C586 (Apollo VP) ATA33 controller
[ 1.000000] viaide0: bus-master DMA support present
[ 1.000000] viaide0: primary channel configured to compatibility mode
[ 1.000000] viaide0: primary channel interrupting at irq 14
[ 1.000000] atabus0 at viaide0 channel 0
[ 1.000000] viaide0: secondary channel configured to compatibility mode
[ 1.000000] viaide0: secondary channel interrupting at irq 15
[ 1.000000] atabus1 at viaide0 channel 1
[ 1.000000] VIA Technologies VT83C572 USB Controller (USB serial bus, UHCI, revision 0x02) at pci0 dev 9 function 2 not configured
[ 1.000000] tlp1 at pci0 dev 12 function 0: DECchip 21143 Ethernet, pass 4.1
[ 1.000000] tlp1: interrupting at level 2
[ 1.000000] tlp1: Ethernet address 00:10:e0:00:3f:7d
[ 1.000000] lxtphy1 at tlp1 phy 1: LXT970 10/100 media interface, rev. 3
[ 1.000000] lxtphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
[ 1.000000] WARNING: system needs entropy for security; see entropy(7)
[ 1.000000] timecounter: Timecounter "mips3_cp0_counter" frequency 125000000 Hz quality 100
[ 1.000003] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
[ 1.000003] scsibus0: waiting 2 seconds for devices to settle...
[ 4.039857] wd0 at atabus0 drive 0
[ 4.049888] wd0: <SATA SSD>
[ 4.049888] wd0: drive supports 16-sector PIO transfers, LBA48 addressing
[ 4.049888] wd0: 476 GB, 992277 cyl, 16 head, 63 sec, 512 bytes/sect x 1000215216 sectors
[ 4.061445] wd0: 32-bit data port
[ 4.061445] wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
[ 4.061445] wd0(viaide0:0:0): using PIO mode 4
[ 5.049880] swwdog0: software watchdog initialized
[ 5.059936] boot device: wd0
[ 5.069877] root on wd0e dumps on wd0b
[ 5.089901] kern.module.path=/stand/cobalt/10.1/modules
[ 7.609879] entropy: best effort
[ 14.529872] entropy: ready
Here're examples of errors that don't result in proper downgrading:
Sep 29 23:48:26 lily /netbsd: [ 618.4316018] autoconfiguration error: viaide0:0:0: lost interrupt
Sep 29 23:48:26 lily /netbsd: [ 618.4316018] type: ata tc_bcount: 32768 tc_skip: 0
Sep 29 23:48:26 lily /netbsd: [ 618.4316018] autoconfiguration error: viaide0:0:0: bus-master DMA error: missing interrupt, status=0x20
Sep 29 23:48:26 lily /netbsd: [ 618.4447539] wd0e: DMA error reading fsbn 962465728 of 962465728-962465791 (wd0 bn 966682560; cn 959010 tn 7 sn 39), xfer 1f88, retry 0
Sep 29 23:48:26 lily /netbsd: [ 618.9416160] wd0: soft error (corrected) xfer 1f88
I don't have a dmesg from NetBSD 9, and my memory may be wrong about which
version of NetBSD lost the ability to downgrade properly. I will try to
test that soon.
I do have a dmesg from NetBSD 6, where the downgrade happens successfully:
pmap_steal_memory: seg 0: 0x474 0x474 0xfffe 0xfffe
pmap_steal_memory: seg 0: 0x4aa 0x4aa 0xfffe 0xfffe
pmap_steal_memory: seg 0: 0x4ac 0x4ac 0xfffe 0xfffe
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 6.1.5 (GENERIC)
Cobalt RaQ 2
total memory = 256 MB
avail memory = 246 MB
timecounter: Timecounters tick every 10.000 msec
mainbus0 (root)
com0 at mainbus0 addr 0x1c800000 level 3: st16650a, working fifo
com0: console
cpu0 at mainbus0: QED RM5200 CPU (0x28a0) Rev. 10.0 with built-in FPU Rev. 10.0
cpu0: 48 TLB entries, 16MB max page size
cpu0: 32KB/32B 2-way set-associative L1 instruction cache
cpu0: 32KB/32B 2-way set-associative write-back L1 data cache
mcclock0 at mainbus0 addr 0x10000070: mc146818 compatible time-of-day clock
panel0 at mainbus0 addr 0x1f000000
gt0 at mainbus0 addr 0x14000000
pci0 at gt0
pci0: i/o space, memory space enabled, rd/line, wr/inv ok
pchb0 at pci0 dev 0 function 0: Galileo GT-64111 System Controller, rev 1
tlp0 at pci0 dev 7 function 0: DECchip 21143 Ethernet, pass 4.1
tlp0: interrupting at level 1
tlp0: Ethernet address 00:10:e0:00:3f:58
lxtphy0 at tlp0 phy 1: LXT970 10/100 media interface, rev. 3
lxtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
siop0 at pci0 dev 8 function 0: Symbios Logic 53c860 (ultra scsi)
siop0: interrupting at irq 4
scsibus0 at siop0: 8 targets, 8 luns per target
pcib0 at pci0 dev 9 function 0
pcib0: VIA Technologies VT82C586 PCI-ISA Bridge, rev 39
viaide0 at pci0 dev 9 function 1
viaide0: VIA Technologies VT82C586 (Apollo VP) ATA33 controller
viaide0: bus-master DMA support present
viaide0: primary channel configured to compatibility mode
viaide0: primary channel interrupting at irq 14
atabus0 at viaide0 channel 0
viaide0: secondary channel configured to compatibility mode
viaide0: secondary channel interrupting at irq 15
atabus1 at viaide0 channel 1
VIA Technologies VT83C572 USB Controller (USB serial bus, revision 0x02) at pci0 dev 9 function 2 not configured
tlp1 at pci0 dev 12 function 0: DECchip 21143 Ethernet, pass 4.1
tlp1: interrupting at level 2
tlp1: Ethernet address 00:10:e0:00:3f:7d
lxtphy1 at tlp1 phy 1: LXT970 10/100 media interface, rev. 3
lxtphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
timecounter: Timecounter "mips3_cp0_counter" frequency 125000000 Hz quality 100
scsibus0: waiting 2 seconds for devices to settle...
wd0 at atabus0 drive 0
wd0: <SATA SSD>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 476 GB, 992277 cyl, 16 head, 63 sec, 512 bytes/sect x 1000215216 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(viaide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
Kernelized RAIDframe activated
boot device: wd0
root on wd0a dumps on wd0b
viaide0:0:0: lost interrupt
type: ata tc_bcount: 512 tc_skip: 0
viaide0:0:0: bus-master DMA error: missing interrupt, status=0x20
wd0: transfer error, downgrading to Ultra-DMA mode 1
wd0(viaide0:0:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA)
wd0a: DMA error reading fsbn 0 (wd0 bn 1071104; cn 1062 tn 9 sn 41), retrying
wd0: soft error (corrected)
viaide0:0:0: lost interrupt
type: ata tc_bcount: 8192 tc_skip: 0
viaide0:0:0: bus-master DMA error: missing interrupt, status=0x20
wd0: transfer error, downgrading to PIO mode 4
wd0(viaide0:0:0): using PIO mode 4
wd0a: DMA error reading fsbn 16 of 16-31 (wd0 bn 1071120; cn 1062 tn 9 sn 57), retrying
wd0: soft error (corrected)
root file system type: ffs
pid 1(init): ABI set to O32 (e_flags=0x1007)
From: "Jonathan A. Kollasch" <jakllsch@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59023 CVS commit: src/sys
Date: Mon, 17 Feb 2025 19:01:05 +0000
Module Name: src
Committed By: jakllsch
Date: Mon Feb 17 19:01:04 UTC 2025
Modified Files:
src/sys/arch/i386/conf: LEGACY
src/sys/dev/ata: ata.c files.ata
Log Message:
Restore ATA DMA mode downgrade support everywhere; it's a necessary part
of any system supporting (parallel) ATA DMA. There is hardware out there,
including cobalt, macppc, and sparc64 where this functionality is
necessary to avoid non-functional disks, either in as-shipped hardware
configurations or with add-in cards, or perhaps just with compromised
IDE/PATA cables.
Should address:
PR 58767
PR 59023
PR 59078
If anyone really insists on not having this support they can now turn it
off themselves with `options ATA_NO_DOWNGRADE_MODE`
To generate a diff of this commit:
cvs rdiff -u -r1.4 -r1.5 src/sys/arch/i386/conf/LEGACY
cvs rdiff -u -r1.170 -r1.171 src/sys/dev/ata/ata.c
cvs rdiff -u -r1.32 -r1.33 src/sys/dev/ata/files.ata
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.