NetBSD Problem Report #53776
From hf@spg.tu-darmstadt.de Tue Dec 11 14:12:07 2018
Return-Path: <hf@spg.tu-darmstadt.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id AE16A7A19B
for <gnats-bugs@gnats.NetBSD.org>; Tue, 11 Dec 2018 14:12:07 +0000 (UTC)
Message-Id: <201812111411.wBBEBx33010427@Gstoder.nt.e-technik.tu-darmstadt.de>
Date: Tue, 11 Dec 2018 15:11:59 +0100 (CET)
From: Hauke Fath <hf@spg.tu-darmstadt.de>
Reply-To: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: Hauke Fath <hf@spg.tu-darmstadt.de>
Subject: wm(4) goes catatonic during large transfers
X-Send-Pr-Version: 3.95
>Number: 53776
>Category: kern
>Synopsis: wm(4) goes catatonic during large transfers
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: msaitoh
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Dec 11 14:15:00 +0000 2018
>Last-Modified: Fri Sep 03 09:10:00 +0000 2021
>Originator: Hauke Fath
>Release: NetBSD 8.99.26
>Organization:
Technische Universitaet Darmstadt
>Environment:
System: NetBSD Gstoder 8.99.26 NetBSD 8.99.26 (GA-MA770-UD3-$Revision$) #0: Fri Nov 23 16:14:26 CET 2018 hf@Hochstuhl:/var/obj/netbsd-builds/developer/amd64/sys/arch/amd64/compile/GA-MA770-UD3 amd64
Architecture: x86_64
Machine: amd64
>Description:
On an older AMD machine, an intel gbit ethernet interface
wm0 at pci2 dev 7 function 0: Intel i82541PI 1000BASE-T Ethernet (rev. 0x05)
wm0: interrupting at ioapic0 pin 21
wm0: 32-bit 33MHz PCI bus
wm0: 64 words (6 address bits) MicroWire EEPROM
wm0: Ethernet address 00:0e:0c:d8:3b:df
wm0: 0x220402<LOCK_EECD,IOH_VALID,ASF_FIRM,WOL>
igphy0 at wm0 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
goes catatonic during large network transfers - typically pkg
updates of firefox, rust, clang, or scp'ing an iso image.
Since the machine uses YP and NFS, it becomes unusable, and
the only way out is a hard reset. Log entries from dhcpcd(8)
indicate that the interface is flapping:
/netbsd: [ 288847.9198854] wm0: device timeout (txfree 3968 txsfree 0 txnext 3031)
/netbsd: [ 288848.5300993] wm0: link state DOWN (was UP)
syslogd[342]: last message repeated 2 times
dhcpcd[303]: wm0: carrier lost
Swapping out the NIC against another, known good intel card
does not make a difference; the cards work fine in other
(linux) machines.
>How-To-Repeat:
Transfer multi-megabyte data over the above wm(4) NIC.
>Fix:
Yes, please.
>Release-Note:
>Audit-Trail:
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: Robert Nestor <rnestor@mac.com>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53776: wm(4) goes catatonic during large transfers
Date: Tue, 11 Dec 2018 16:29:38 +0100
T24gVHVlLCAxMSBEZWMgMjAxOCAwOToxOToyMCAtMDYwMCwgUm9iZXJ0IE5lc3RvciB3cm90
ZToNCj4gSeKAmXZlIHNlZW4gd2hhdCBJIHRoaW5rIGlzIHRoZSBzYW1lIHRoaW5nIG9uIG15
IHN5c3RlbSB0aGF0IGFsc28gdXNlcyANCj4gdGhlIFdNIGludGVyZmFjZS4gIEkgdGhpbmsg
SSB0cmFja2VkIGl0IGRvd24gdG8gYSBwcm9ibGVtIGluIE5GUyANCj4gdGhvdWdoLiAgSSB3
YXMgYWJsZSB0byBmaXggaXQgb24gbXkgc3lzdGVtIGJ5IGFkZGluZyANCj4g4oCcLXI9MTAy
NCwtdz0xMDI0LHRjcOKAnSB0byBteSBORlMuICANCg0KLi4ub3VjaC4gSSByZW1lbWJlciBk
b2luZyB0aGlzIGxhc3QgZm9yIGEgM2NvbSBQQ01DSUEgY2FyZC4gRXZlbiA2OGsgDQpNYWNz
IGFyZSBmaW5lIHdpdGggNDA5Ni4gIDspDQoNCj4gVGhpcyBpcyB3aXRoIGFuIDguMCBhbWQ2
NCBzeXN0ZW0uICANCj4gSSB0aGluayB0aGUga2V5IGlzIHRoZSDigJx0Y3DigJ0gcGFyYW1l
dGVyLiAgSSB0cmllZCDigJx1ZHDigJ0gYW5kIHRoYXQgDQo+IGRpZG7igJl0IGhlbHAuDQoN
CllvdSBoYXZlIGEgcG9pbnQgaW4gdGhhdCB0aG9zZSBkb3dubG9hZHMgZW5kIHVwIG9uIGFu
IG5mcyBtb3VudGVkIA0Kc2VydmVyIHZvbHVtZSB2aWEgdWRwLiBCdXQgSSBoYXZlIHNlZW4g
dGhlIHNhbWUgcHJvYmxlbSB3aXRoIHNjcCgxKSBhdCANCmxlYXN0IG9uY2UuDQoNCkNoZWVy
aW8sDQpoYXVrZQ0KDQotLSANCiAgICAgVGhlIEFTQ0lJIFJpYmJvbiBDYW1wYWlnbiAgICAg
ICAgICAgICAgICAgICAgSGF1a2UgRmF0aA0KKCkgICAgIE5vIEhUTUwvUlRGIGluIGVtYWls
ICAgICAgICAgICAgSW5zdGl0dXQgZsO8ciBOYWNocmljaHRlbnRlY2huaWsNCi9cICAgICBO
byBXb3JkIGRvY3MgaW4gZW1haWwgICAgICAgICAgICAgICAgICAgICBUVSBEYXJtc3RhZHQN
CiAgICAgUmVzcGVjdCBmb3Igb3BlbiBzdGFuZGFyZHMgICAgICAgICAgICAgIFJ1ZiArNDkt
NjE1MS0xNi0yMTM0NA==
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53776: wm(4) goes catatonic during large transfers
Date: Fri, 14 Feb 2020 13:41:37 +0100
[Send to gnats-bugs, too]
FTR, this problem is very much alive in netbsd-9, and quite annoying, too.
If it is of any help, I could provide a NIC - we have a stack of them here.
Cheerio,
hauke
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/53776: wm(4) goes catatonic during large transfers
Date: Thu, 22 Oct 2020 16:51:14 +0200
On 2020-10-22 14:26, Hauke Fath wrote:
> Since I had to boot the machine with a -current kernel for unrelated
> reasons, I ventured to check - and indeed, I could download the Firefox
> 82 source tarball (320 MB) without any problems. So, I consider the
> problem fixed.
Ignore that, it just took a little longer until the machine went off the
net.
The bug is well and alive in -current.
Cheerio,
Hauke
--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-21344
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53776: wm(4) goes catatonic during large transfers
Date: Tue, 24 Aug 2021 16:57:11 +0100
I just tried NetBSD-9.99.88/amd64 on an old working server which
had no issues running ubuntu 20, and observed this bug. Maybe the
clue is that it too has a igphy ? (My other wm0s tend to have makphy
or ihphy and don't have this bug.)
wm1 at pci1 dev 0 function 1: 82575EB dual-1000baseT Ethernet (rev. 0x02)
wm1: for TX and RX interrupting at msix1 vec 0 affinity to 1
wm1: for TX and RX interrupting at msix1 vec 1 affinity to 2
wm1: for TX and RX interrupting at msix1 vec 2 affinity to 3
wm1: for TX and RX interrupting at msix1 vec 3 affinity to 4
wm1: for LINK interrupting at msix1 vec 4
wm1: PCI-Express bus
wm1: 16384 words (16 address bits) SPI EEPROM, version 2.10, Image Unique ID e1e00000
wm1: Ethernet address 00:1e:67:42:41:b5
wm1: Copper
wm1: 0x274440<SPI,IOH_VALID,PCIE,NEWQUEUE,ASF_FIRM,ARC_SUBSYS,WOL>
igphy1 at wm1 phy 1: i82566 10/100/1000 media interface, rev. 0
igphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53776: wm(4) goes catatonic during large transfers
Date: Wed, 25 Aug 2021 17:32:21 +0100
The igphy hunch appears to be correct:
uc> disable igphy
igphy* disabled
ukphy1 at wm1 phy 1: OUI 0x005500, model 0x0039, rev. 0
ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
and the computer is much happier.
Responsible-Changed-From-To: kern-bug-people->msaitoh
Responsible-Changed-By: msaitoh@NetBSD.org
Responsible-Changed-When: Thu, 26 Aug 2021 01:42:28 +0000
Responsible-Changed-Why:
mine.
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53776 (wm(4) goes catatonic during large transfers)
Date: Fri, 27 Aug 2021 09:32:08 +0100
Tricky: I warm booted back, leaving igphy1 in place to see if it would
break, but it is still happy. Will try a cold boot later...
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53776 (wm(4) goes catatonic during large transfers)
Date: Fri, 3 Sep 2021 10:08:54 +0100
I haven't seen the device timeout / txfree messages any more, even without
changing anything. Tested by netbooting, making the root mount computer
busy, and transferring xdebug.tar.xz.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.