NetBSD Problem Report #53776

From hf@spg.tu-darmstadt.de  Tue Dec 11 14:12:07 2018
Return-Path: <hf@spg.tu-darmstadt.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id AE16A7A19B
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 11 Dec 2018 14:12:07 +0000 (UTC)
Message-Id: <201812111411.wBBEBx33010427@Gstoder.nt.e-technik.tu-darmstadt.de>
Date: Tue, 11 Dec 2018 15:11:59 +0100 (CET)
From: Hauke Fath <hf@spg.tu-darmstadt.de>
Reply-To: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: Hauke Fath <hf@spg.tu-darmstadt.de>
Subject: wm(4) goes catatonic during large transfers
X-Send-Pr-Version: 3.95

>Number:         53776
>Category:       kern
>Synopsis:       wm(4) goes catatonic during large transfers
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    msaitoh
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Dec 11 14:15:00 +0000 2018
>Last-Modified:  Fri Sep 03 09:10:00 +0000 2021
>Originator:     Hauke Fath
>Release:        NetBSD 8.99.26
>Organization:
Technische Universitaet Darmstadt
>Environment:


System: NetBSD Gstoder 8.99.26 NetBSD 8.99.26 (GA-MA770-UD3-$Revision$) #0: Fri Nov 23 16:14:26 CET 2018 hf@Hochstuhl:/var/obj/netbsd-builds/developer/amd64/sys/arch/amd64/compile/GA-MA770-UD3 amd64
Architecture: x86_64
Machine: amd64
>Description:

	On an older AMD machine, an intel gbit ethernet interface

wm0 at pci2 dev 7 function 0: Intel i82541PI 1000BASE-T Ethernet (rev. 0x05)
wm0: interrupting at ioapic0 pin 21
wm0: 32-bit 33MHz PCI bus
wm0: 64 words (6 address bits) MicroWire EEPROM
wm0: Ethernet address 00:0e:0c:d8:3b:df
wm0: 0x220402<LOCK_EECD,IOH_VALID,ASF_FIRM,WOL>
igphy0 at wm0 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto

	goes catatonic during large network transfers - typically pkg
	updates of firefox, rust, clang, or scp'ing an iso image.

	Since the machine uses YP and NFS, it becomes unusable, and
	the only way out is a hard reset. Log entries from dhcpcd(8)
	indicate that the interface is flapping:

/netbsd: [ 288847.9198854] wm0: device timeout (txfree 3968 txsfree 0 txnext 3031)
/netbsd: [ 288848.5300993] wm0: link state DOWN (was UP)
syslogd[342]: last message repeated 2 times
dhcpcd[303]: wm0: carrier lost

	Swapping out the NIC against another, known good intel card
	does not make a difference; the cards work fine in other
	(linux) machines.


>How-To-Repeat:

	Transfer multi-megabyte data over the above wm(4) NIC.


>Fix:
	Yes, please.



>Release-Note:

>Audit-Trail:
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: Robert Nestor <rnestor@mac.com>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53776: wm(4) goes catatonic during large transfers 
Date: Tue, 11 Dec 2018 16:29:38 +0100

 T24gVHVlLCAxMSBEZWMgMjAxOCAwOToxOToyMCAtMDYwMCwgUm9iZXJ0IE5lc3RvciB3cm90
 ZToNCj4gSeKAmXZlIHNlZW4gd2hhdCBJIHRoaW5rIGlzIHRoZSBzYW1lIHRoaW5nIG9uIG15
 IHN5c3RlbSB0aGF0IGFsc28gdXNlcyANCj4gdGhlIFdNIGludGVyZmFjZS4gIEkgdGhpbmsg
 SSB0cmFja2VkIGl0IGRvd24gdG8gYSBwcm9ibGVtIGluIE5GUyANCj4gdGhvdWdoLiAgSSB3
 YXMgYWJsZSB0byBmaXggaXQgb24gbXkgc3lzdGVtIGJ5IGFkZGluZyANCj4g4oCcLXI9MTAy
 NCwtdz0xMDI0LHRjcOKAnSB0byBteSBORlMuICANCg0KLi4ub3VjaC4gSSByZW1lbWJlciBk
 b2luZyB0aGlzIGxhc3QgZm9yIGEgM2NvbSBQQ01DSUEgY2FyZC4gRXZlbiA2OGsgDQpNYWNz
 IGFyZSBmaW5lIHdpdGggNDA5Ni4gIDspDQoNCj4gVGhpcyBpcyB3aXRoIGFuIDguMCBhbWQ2
 NCBzeXN0ZW0uICANCj4gSSB0aGluayB0aGUga2V5IGlzIHRoZSDigJx0Y3DigJ0gcGFyYW1l
 dGVyLiAgSSB0cmllZCDigJx1ZHDigJ0gYW5kIHRoYXQgDQo+IGRpZG7igJl0IGhlbHAuDQoN
 CllvdSBoYXZlIGEgcG9pbnQgaW4gdGhhdCB0aG9zZSBkb3dubG9hZHMgZW5kIHVwIG9uIGFu
 IG5mcyBtb3VudGVkIA0Kc2VydmVyIHZvbHVtZSB2aWEgdWRwLiBCdXQgSSBoYXZlIHNlZW4g
 dGhlIHNhbWUgcHJvYmxlbSB3aXRoIHNjcCgxKSBhdCANCmxlYXN0IG9uY2UuDQoNCkNoZWVy
 aW8sDQpoYXVrZQ0KDQotLSANCiAgICAgVGhlIEFTQ0lJIFJpYmJvbiBDYW1wYWlnbiAgICAg
 ICAgICAgICAgICAgICAgSGF1a2UgRmF0aA0KKCkgICAgIE5vIEhUTUwvUlRGIGluIGVtYWls
 ICAgICAgICAgICAgSW5zdGl0dXQgZsO8ciBOYWNocmljaHRlbnRlY2huaWsNCi9cICAgICBO
 byBXb3JkIGRvY3MgaW4gZW1haWwgICAgICAgICAgICAgICAgICAgICBUVSBEYXJtc3RhZHQN
 CiAgICAgUmVzcGVjdCBmb3Igb3BlbiBzdGFuZGFyZHMgICAgICAgICAgICAgIFJ1ZiArNDkt
 NjE1MS0xNi0yMTM0NA==

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53776: wm(4) goes catatonic during large transfers
Date: Fri, 14 Feb 2020 13:41:37 +0100

 [Send to gnats-bugs, too]

 FTR, this problem is very much alive in netbsd-9, and quite annoying, too.

 If it is of any help, I could provide a NIC - we have a stack of them here.

 Cheerio,
 hauke


From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/53776: wm(4) goes catatonic during large transfers
Date: Thu, 22 Oct 2020 16:51:14 +0200

 On 2020-10-22 14:26, Hauke Fath wrote:
 > Since I had to boot the machine with a -current kernel for unrelated 
 > reasons, I ventured to check - and indeed, I could download the Firefox 
 > 82 source tarball (320 MB) without any problems. So, I consider the 
 > problem fixed.

 Ignore that, it just took a little longer until the machine went off the 
 net.

 The bug is well and alive in -current.

 Cheerio,
 Hauke

 -- 
       The ASCII Ribbon Campaign                    Hauke Fath
 ()     No HTML/RTF in email	        Institut für Nachrichtentechnik
 /\     No Word docs in email                     TU Darmstadt
       Respect for open standards              Ruf +49-6151-16-21344

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53776: wm(4) goes catatonic during large transfers
Date: Tue, 24 Aug 2021 16:57:11 +0100

 I just tried NetBSD-9.99.88/amd64 on an old working server which
 had no issues running ubuntu 20, and observed this bug. Maybe the
 clue is that it too has a igphy ? (My other wm0s tend to have makphy
 or ihphy and don't have this bug.)

 wm1 at pci1 dev 0 function 1: 82575EB dual-1000baseT Ethernet (rev. 0x02)
 wm1: for TX and RX interrupting at msix1 vec 0 affinity to 1
 wm1: for TX and RX interrupting at msix1 vec 1 affinity to 2
 wm1: for TX and RX interrupting at msix1 vec 2 affinity to 3
 wm1: for TX and RX interrupting at msix1 vec 3 affinity to 4
 wm1: for LINK interrupting at msix1 vec 4
 wm1: PCI-Express bus
 wm1: 16384 words (16 address bits) SPI EEPROM, version 2.10, Image Unique ID e1e00000
 wm1: Ethernet address 00:1e:67:42:41:b5
 wm1: Copper
 wm1: 0x274440<SPI,IOH_VALID,PCIE,NEWQUEUE,ASF_FIRM,ARC_SUBSYS,WOL>
 igphy1 at wm1 phy 1: i82566 10/100/1000 media interface, rev. 0
 igphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53776: wm(4) goes catatonic during large transfers
Date: Wed, 25 Aug 2021 17:32:21 +0100

 The igphy hunch appears to be correct:

 uc> disable igphy
 igphy* disabled

 ukphy1 at wm1 phy 1: OUI 0x005500, model 0x0039, rev. 0
 ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto

 and the computer is much happier.

Responsible-Changed-From-To: kern-bug-people->msaitoh
Responsible-Changed-By: msaitoh@NetBSD.org
Responsible-Changed-When: Thu, 26 Aug 2021 01:42:28 +0000
Responsible-Changed-Why:
mine.


From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53776 (wm(4) goes catatonic during large transfers)
Date: Fri, 27 Aug 2021 09:32:08 +0100

 Tricky: I warm booted back, leaving igphy1 in place to see if it would
 break, but it is still happy. Will try a cold boot later...

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53776 (wm(4) goes catatonic during large transfers)
Date: Fri, 3 Sep 2021 10:08:54 +0100

 I haven't seen the device timeout / txfree messages any more, even without
 changing anything. Tested by netbooting, making the root mount computer
 busy, and transferring xdebug.tar.xz.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.