NetBSD Problem Report #41140
From kardel@pip.acrys.com Sat Apr 4 15:56:06 2009
Return-Path: <kardel@pip.acrys.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 4C5AC63B8A5
for <gnats-bugs@gnats.NetBSD.org>; Sat, 4 Apr 2009 15:56:06 +0000 (UTC)
Message-Id: <200904041453.n34ErrTJ002169@pip.acrys.com>
Date: Sat, 4 Apr 2009 16:53:53 +0200 (MEST)
From: kardel@netbsd.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@gnats.NetBSD.org
Subject: 5-RC3 msk driver possibly broken
X-Send-Pr-Version: 3.95
>Number: 41140
>Category: kern
>Synopsis: ssh/bacula diconnect with errors when msk iface is used
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Apr 04 16:00:01 +0000 2009
>Closed-Date: Wed Mar 29 10:13:13 +0000 2017
>Last-Modified: Wed Mar 29 10:13:13 +0000 2017
>Originator: Frank Kardel
>Release: NetBSD 5.0_RC3-090330
>Organization:
>Environment:
NetBSD gaia.acrys.com 5.0_RC3 NetBSD 5.0_RC3 (GAIA) #2: Mon Mar 30 10:40:31 CEST 2009 kardel@gaia.acrys.com:/usr/obj/sys/arch/i386/compile/GAIA i386
Architecture: i386
Machine: i386
>Description:
High data volume ssh session break with (e. g. using rsync):
- MAC corruption
- bad packet length (with a ridiculous length value in the millions)
Bacula backup fail with:
03-Apr 02:05 Orcus-sd JobId 15806: Fatal error: bsock.c:415 Packet size too big from "client:x.y.z.u:36643. Terminating connection.
Symptoms seem similar to PR #31178.
Also it seems to be more likely the less mbufs are available.
Happens at 100Mb link rate
Bacula seem fine when using the elinkxl (ex*) driver. With msk* no full backup finished. With ex* the full backup went through.
dmesg sniplets:
mainbus0 (root)
cpu0 at mainbus0 apid 0: Intel 686-class, 2831MHz, id 0x10677
cpu0: Enhanced SpeedStep (1244 mV) 800 MHz
cpu0: Enhanced SpeedStep frequencies available (MHz): 7200 6400 5600 4800 4000 3100 2300 1500 700
cpu1 at mainbus0 apid 3: Intel 686-class, 2831MHz, id 0x10677
cpu2 at mainbus0 apid 1: Intel 686-class, 2831MHz, id 0x10677
cpu3 at mainbus0 apid 2: Intel 686-class, 2831MHz, id 0x10677
ioapic0 at mainbus0 apid 4: pa 0xfec00000, version 20, 24 pins
acpi0 at mainbus0: Intel ACPICA 20080321
acpi0: X/RSDT: OemId <IntelR,AWRDACPI,42302e31>, AslId <AWRD,00000000>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
...
mskc0 at pci3 dev 0 function 0mskc0: interrupt moderation is 0 us
, Yukon-2 EC rev. A3 (0x2): ioapic0 pin 19
msk0 at mskc0 port A: Ethernet address 00:xx:xx:xx:xx:xx
makphy0 at msk0 phy 0: Marvell 88E1111 Gigabit PHY, rev. 2
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
mskc1 at pci5 dev 0 function 0mskc1: interrupt moderation is 0 us
, Yukon-2 EC rev. A3 (0x2): ioapic0 pin 17
msk1 at mskc1 port A: Ethernet address 00:xx:xx:xx:xx:xx
makphy1 at msk1 phy 0: Marvell 88E1111 Gigabit PHY, rev. 2
makphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
...
>How-To-Repeat:
Run rsync via ssh or bacula for high volume data transfer with msk* driver on a 5.0_RC3 4-CPU (Q9550). Connections will break due to
protocol sanity checks (MAC, length issues).
>Fix:
ignore the two builtin msk interfaces - downgrade to e. g. ex*.
>Release-Note:
>Audit-Trail:
From: "Jean-Yves Migeon (NetBSD)" <jym@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/41140
Date: Thu, 09 Jul 2015 12:28:39 +0200
FWIW the problem still happens on NetBSD 6.1.5, amd64.
Got an Asus P6T recently with two LAN Marvell chips (Marvell88E8056),
and on occasion (after about 5-10min of connectivity) on high transfer
loads rsync() errors out with a "corrupt packet received, disconnected."
No message on dmesg, vmstat -i seems pretty normal and no error count
increased in netstat -i. I originally thought it was RAM error, but
neither memtest nor any other program crashes. chip + msk seems to be
the culprit.
Happens both with 1000baseT and 100baseTX media, although 100baseTX
seems to have a lower rate of failure.
Will report later with a -current kernel once I can reboot this machine.
--
Jean-Yves Migeon
jym@
State-Changed-From-To: open->closed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Wed, 29 Mar 2017 10:13:13 +0000
State-Changed-Why:
timeout - hw not available - closed by submitter (me)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.