NetBSD Problem Report #51057

From www@NetBSD.org  Sat Apr  9 09:47:49 2016
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id DDF6F7A497
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  9 Apr 2016 09:47:49 +0000 (UTC)
Message-Id: <20160409094748.C365F7AAA1@mollari.NetBSD.org>
Date: Sat,  9 Apr 2016 09:47:48 +0000 (UTC)
From: cryintothebluesky@googlemail.com
Reply-To: cryintothebluesky@googlemail.com
To: gnats-bugs@NetBSD.org
Subject: hme(4) device driver bug when tcp4csum and udp4csum are enabled
X-Send-Pr-Version: www-1.0

>Number:         51057
>Category:       kern
>Synopsis:       hme(4) device driver bug when tcp4csum and udp4csum are enabled
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 09 09:50:00 +0000 2016
>Last-Modified:  Mon Apr 11 16:00:01 +0000 2016
>Originator:     Roman
>Release:        7.0_STABLE
>Organization:
>Environment:
NetBSD ultra10 7.0_STABLE NetBSD 7.0_STABLE (GENERIC) #0: Sun Apr  3 17:13:43 BST 2016  roman@atom510:/opt/obj.sparc64/sys/arch/sparc64/compile/GENERIC sparc64
>Description:
All machines are on the same LAN, connected to a single
100Mbps switch. I run scp from NetBSD-7 on amd64 to NetBSD-7 on
sparc64, sooner or later sshd on sparc64 exits with error.

The way I manage to reproduce it:

1. Enable tcp4csum and udp4csum for hme0 on sparc64

2. Simulate heavy I/O on sparc64 by unpacking src.tgz which contains
pkgsrc, src and xsrc

3. Start scp for a 3GB file from amd64 to sparc64

Sooner or later sshd on sparc64 exists with error

Disabling tcp4csum and udp4csu for hme0 and repeating the above steps
always succeeds to scp the entire 3GB file without any errors.

So I'm inclined to think it's either hme0 hardware issue, or NetBSD
kernel bug.


ktrace shows the following:

   161      1 sshd     CALL  read(3,0xffffffffffff7310,0x4000)
   161      1 sshd     GIO   fd 3 read 56 bytes
       "QB\M-u\M-?J\M-V\^C\M-^\M-;i$\M^L\M-kWJ\^S\M-6\M^M\M-C\M-5;\M-u\M^OpV\^Z\M-o7\M-|\M-v8\M^G-dP\r\M^R\M-h7\M-15j\
        \M-)\^V)\M-UI\M-s\^Q\^E\M-S\^C\M^E?\M-02"
   161      1 sshd     RET   read 56/0x38
   161      1 sshd     CALL  __gettimeofday50(0xffffffffffff8710,0)
   161      1 sshd     RET   __gettimeofday50 0
   161      1 sshd     CALL  __sysctl(0xffffffffffff8608,2,0xffffffffffff99e0,0xffffffffffff8600,0,0)
   161      1 sshd     RET   __sysctl 0
   161      1 sshd     CALL  getpid
   161      1 sshd     RET   getpid 161/0xa1, 767/0x2ff
   161      1 sshd     CALL  __socket30(1,0x10000002,0)
   161      1 sshd     RET   __socket30 9
   161      1 sshd     CALL  connect(9,0xfffffffffca65218,0x6a)
   161      1 sshd     MISC  mbsoname: [/var/run/log]
   161      1 sshd     NAMI  "/var/run/log"
   161      1 sshd     RET   connect 0
   161      1 sshd     CALL  sendto(9,0xffffffffffff9048,0x53,0,0,0)
   161      1 sshd     MISC  msghdr: [name=0x0, namelen=0, iov=0x12d967b10, iovlen=1, control=0x0, controllen=0, flags=0]
   161      1 sshd     GIO   fd 9 wrote 83 bytes
       "<38>1 2016-04-07T15:28:58.973455+01:00 ultra10 sshd 161 - - Corrupted MAC on input."
   161      1 sshd     RET   sendto 83/0x53
   161      1 sshd     CALL  close(9)
   161      1 sshd     RET   close 0
   161      1 sshd     CALL  __gettimeofday50(0xffffffffffff89c0,0)
   161      1 sshd     RET   __gettimeofday50 0
   161      1 sshd     CALL  __sysctl(0xffffffffffff88b8,2,0xffffffffffff9c90,0xffffffffffff88b0,0,0)
   161      1 sshd     RET   __sysctl 0
   161      1 sshd     CALL  getpid
   161      1 sshd     RET   getpid 161/0xa1, 767/0x2ff
   161      1 sshd     CALL  __socket30(1,0x10000002,0)
   161      1 sshd     RET   __socket30 9
   161      1 sshd     CALL  connect(9,0xfffffffffca65218,0x6a)
   161      1 sshd     MISC  mbsoname: [/var/run/log]
   161      1 sshd     NAMI  "/var/run/log"
   161      1 sshd     RET   connect 0
   161      1 sshd     CALL  sendto(9,0xffffffffffff92f8,0x80,0,0,0)
   161      1 sshd     MISC  msghdr: [name=0x0, namelen=0, iov=0x12d967b10, iovlen=1, control=0x0, controllen=9, flags=0]
   161      1 sshd     GIO   fd 9 wrote 128 bytes
       "<34>1 2016-04-07T15:28:59.045004+01:00 ultra10 sshd 161 - - fatal: ssh_dispatch_run_fatal: message authenticat\
        ion code incorrect"
   161      1 sshd     RET   sendto 128/0x80
   161      1 sshd     CALL  close(9)
   161      1 sshd     RET   close 0


I assume that "Corrupted MAC on input" may have something to do with network card TCP checksumming

ultra10# cat /etc/ifconfig.hme0 
up
192.168.1.3 netmask 255.255.255.0 tcp4csum udp4csum

Others report similar issues:

https://tty1.net/blog/2014/ssh-corrupted-mac-on-input_en.html


>How-To-Repeat:

>Fix:

>Audit-Trail:
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51057: hme(4) device driver bug when tcp4csum and udp4csum
 are enabled
Date: Sat, 9 Apr 2016 09:32:35 -0500 (CDT)

 I see something similar on a machine with a qfe (4 x hme) sbus card that
 I use as a NAT firewall/router.  Both internal and exteral networks are
 on the qfe card.  If "tcp4csum-rx" is enabled on the internal interface,
 FTP connections from the local LAN are not forwarded to 'ftp-proxy'.

 There was one remote machine I connected to that would drop the SSH
 connection with similar messages to those reported here, but it was
 the only one that exhibited the problem.

 Lately, I experienced one particular web site that starts to work then
 fails and, once failed, fails for all hosts on my internal LAN for a
 couple of days, including one running redmond-OS.  The webside works
 for hosts outside my LAN (via excruciatingly slow X11 forwarding over
 SSH).  I briefly disabled all tcp4csum for internal and external networks,
 but that didn't help.  I'll try again by rebooting the firewall so it
 comes up with those options disabled.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51057: hme(4) device driver bug when tcp4csum and udp4csum
 are enabled
Date: Sun, 10 Apr 2016 15:04:08 -0500 (CDT)

 I arranged to reboot my NAT router/firewall with all qfe/hme hardware
 assist options off.  The problem website no-longer fails catastrophically,
 but neither does it work correctly.

 I have a PCI qfe card lying around somewhere, so I could potentially test
 the behavior in a non-SPARC system (macppc, i386, or amd64 fairly readily,
 prep if I can get my MTX604 to boot again).  I have to find it first.

 Back when I was using a qec/qe card (and the built-in le interface), I
 don't recall having any problems.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, cryintothebluesky@googlemail.com
Subject: re: kern/51057: hme(4) device driver bug when tcp4csum and udp4csum are enabled
Date: Mon, 11 Apr 2016 06:09:29 +1000

 FWIW, i am successfully using the PCI quad hme(4) in a sunblade 2500
 successfully, althought i am only using one interface so far.

 i have this memory of people saying quad hme(4) @ sbus was a problem
 a *long* time ago.


 .mrg.

From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
To: matthew green <mrg@eterna.com.au>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, cryintothebluesky@googlemail.com
Subject: re: kern/51057: hme(4) device driver bug when tcp4csum and udp4csum
 are enabled
Date: Sun, 10 Apr 2016 23:06:38 +0200

 On Mon, 11 Apr 2016 06:09:29 +1000, matthew green wrote:
 > i have this memory of people saying quad hme(4) @ sbus was a problem
 > a *long* time ago.

 I ran NetBSD on an SS20 with a quad hme as a router 24/7 for a few=20
 years. It was replaced because the SM71 cpus died, not because of the=20
 hme.

 FWIW,
 hauke

 --=20
 Hauke Fath                        <hauke@Espresso.Rhein-Neckar.DE>
 Ernst-Ludwig-Stra=DFe 15
 64625 Bensheim
 Germany

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, cryintothebluesky@googlemail.com
Subject: re: kern/51057: hme(4) device driver bug when tcp4csum and udp4csum are enabled
Date: Mon, 11 Apr 2016 06:14:03 +1000

 hmm, i see there is an uncommited patch in PR 24310 for hme(4).

 i don't see it being directly related, but it's the only thing i can
 see that isn't commited or clearly unrelated.


 .mrg.

From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51057: hme(4) device driver bug when tcp4csum and udp4csum
 are enabled
Date: Mon, 11 Apr 2016 10:57:45 -0500 (CDT)

 As a (relatively) quick test, I replaced the SPARC/qfe-based router with
 a Soekris net4501. and the problem website works fine now.

 I have a few single hme@sbus cards I could replace up to three of the qfe
 ports with and see how they behave.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.