NetBSD Problem Report #51753
From www@NetBSD.org Fri Dec 30 09:48:58 2016
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 2F6947A318
for <gnats-bugs@gnats.NetBSD.org>; Fri, 30 Dec 2016 09:48:58 +0000 (UTC)
Message-Id: <20161230094856.AE0F57A344@mollari.NetBSD.org>
Date: Fri, 30 Dec 2016 09:48:56 +0000 (UTC)
From: marcotte@panix.com
Reply-To: marcotte@panix.com
To: gnats-bugs@NetBSD.org
Subject: tcp SACK causes SSH disconnects
X-Send-Pr-Version: www-1.0
>Number: 51753
>Category: kern
>Synopsis: tcp SACK causes SSH disconnects
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Dec 30 09:50:00 +0000 2016
>Closed-Date: Mon Jun 04 10:34:20 +0000 2018
>Last-Modified: Mon Jun 04 10:34:20 +0000 2018
>Originator: Brian Marcotte
>Release: 7.0
>Organization:
Public Access Networks, Corp.
>Environment:
NetBSD trinity.nyc.access.net 7.0.2 NetBSD 7.0.2 (PANIX-XEN-STD) #1: Mon Nov 21 12:57:01 EST 2016 root@juggler.panix.com:/misc/obj/misc/devel/netbsd/7.0.2/src/sys/arch/i386/compile/PANIX-XEN-STD i386
>Description:
Ever since we started upgrading to NetBSD-7, we've been getting weird
SSH disconnects:
client: Corrupted MAC on input. Disconnecting: Packet corrupt
server: panix5 sshd[23482]: error: Received disconnect from x.x.x.x:
2: Packet corrupt
It turns out, just replacing the kernel only and keeping the NetBSD-6
userland will cause the problem to show up. SSH client/server versions
don't appear to matter.
I traced this down to a change in the kernel between 2013-Nov-12 and
2013-Nov-13. I suspect the problem is in one of these files commited
on that day:
sys/netinet/tcp_congctl.c 1.18
sys/netinet/tcp_congctl.h 1.7
sys/netinet/tcp_input.c 1.330
sys/netinet/tcp_sack.c 1.29
sys/netinet/tcp_subr.c 1.251
src/sys/netinet/tcp_var.h 1.171
The above commits added "cubic" congestion control but also moved SACK
code around.
>How-To-Repeat:
In our case, certain types of terminal output can cause the problem.
I can now get it to happen somewhat reliably by compiling a NetBSD
kernel.
It may be that there must be some other network problem for this to
happen as I've not seen anyone else report this problem.
>Fix:
I don't know how to fix it but turning off SACK seems to be a
workaround:
sysctl -w net.inet.tcp.sack.enable=0
>Release-Note:
>Audit-Trail:
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51753: tcp SACK causes SSH disconnects
Date: Fri, 30 Dec 2016 09:20:52 -0600 (CST)
This looks a lot like something I was seeing when working on some remote
systems. It was most likely to occur with high output traffic (such as
when running an update build of the system).
It also seemed to be affected by one or more intervening routers between
the client and server. A "direct" connection to the problem machine
would be unstable but SSH-ing to another host first and then connecting
to the problem system would be rock solid. That "other host" could be
another machine on the same LAN as the problem machine, also running
NetBSD-7/amd64, or another remote machine (not on same LAN) running
FreeBSD-10/amd64.
The problem machine was later replaced and the behavior has not been
observed on the replacement. Said problem machine is now in my possession
and did not exhibit the behavior when connected to my LAN.
There was a thread about this in one of the other mailing lists some
time back (either current-users@ or netbsd-users@), but I haven't been
able to locate it so far.
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51753: tcp SACK causes SSH disconnects
Date: Fri, 30 Dec 2016 09:48:51 -0600 (CST)
On Fri, 30 Dec 2016, John D. Baker wrote:
> There was a thread about this in one of the other mailing lists some
> time back (either current-users@ or netbsd-users@), but I haven't been
> able to locate it so far.
I found the thread (started by the PR originator, actually), here:
https://mail-index.netbsd.org/netbsd-users/2016/01/18/msg017647.html
and my observation here:
https://mail-index.netbsd.org/netbsd-users/2016/01/18/msg017654.html
but it has only marginally more information than what I posted to this
PR before.
There was another, earlier thread I originated relating to the same
issue, but I still can't find it.
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/51753: tcp SACK causes SSH disconnects
Date: Fri, 30 Dec 2016 17:11:18 +0000
This is a multi-part message in MIME format.
--------------030809060906030209040700
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
On 12/30/16 09:50, marcotte@panix.com wrote:
>> Number: 51753
>> Category: kern
>> Synopsis: tcp SACK causes SSH disconnects
Does this help?
Nick
--------------030809060906030209040700
Content-Type: text/x-patch;
name="pr51753.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="pr51753.diff"
Index: sys/netinet/tcp_congctl.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/tcp_congctl.c,v
retrieving revision 1.22
diff -u -p -r1.22 tcp_congctl.c
--- sys/netinet/tcp_congctl.c 13 Dec 2016 08:29:03 -0000 1.22
+++ sys/netinet/tcp_congctl.c 30 Dec 2016 17:10:27 -0000
@@ -707,7 +707,6 @@ tcp_newreno_fast_retransmit_newack(struc
tp->t_partialacks++;
TCP_TIMER_DISARM(tp, TCPT_REXMT);
tp->t_rtttime = 0;
- tp->snd_nxt = th->th_ack;
if (TCP_SACK_ENABLED(tp)) {
/*
@@ -734,6 +733,7 @@ tcp_newreno_fast_retransmit_newack(struc
tp->t_flags |= TF_ACKNOW;
(void) tcp_output(tp);
} else {
+ tp->snd_nxt = th->th_ack;
/*
* Set snd_cwnd to one segment beyond ACK'd offset
* snd_una is not yet updated when we're called
--------------030809060906030209040700--
From: Brian Marcotte <marcotte@panix.com>
To: Nick Hudson <skrll@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@NetBSD.org
Subject: Re: kern/51753: tcp SACK causes SSH disconnects
Date: Fri, 30 Dec 2016 13:18:02 -0500
> Does this help?
It may be helping, but I'll need another day to be sure. I'll let you
know over the weekend.
Thanks.
--
- Brian
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51753: tcp SACK causes SSH disconnects
Date: Sat, 31 Dec 2016 10:15:59 +0000
On Fri, Dec 30, 2016 at 03:25:01PM +0000, John D. Baker wrote:
> This looks a lot like something I was seeing when working on some remote
> systems. It was most likely to occur with high output traffic (such as
> when running an update build of the system).
I've seen this too; it's readily triggered by compiling, seems to be
correlated with output volume. I'd thought it was flaky hardware :-/
--
David A. Holland
dholland@netbsd.org
From: Brian Marcotte <marcotte@panix.com>
To: Nick Hudson <skrll@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gnats-bugs@NetBSD.org
Subject: Re: kern/51753: tcp SACK causes SSH disconnects
Date: Sun, 1 Jan 2017 10:18:49 -0500
> Subject: Re: kern/51753: tcp SACK causes SSH disconnects
> Does this help?
I've not seen any disconnects since I applied the patch. It appears
to resolve the problem.
Thank you.
--
- Brian
From: "Nick Hudson" <skrll@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/51753 CVS commit: src/sys/netinet
Date: Mon, 2 Jan 2017 09:29:38 +0000
Module Name: src
Committed By: skrll
Date: Mon Jan 2 09:29:38 UTC 2017
Modified Files:
src/sys/netinet: tcp_congctl.c
Log Message:
Restore behaviour to pre- tcp_congctl.c:1.18 for SACK. Further analysis
of the change is required.
OK kefren@
PR/51753 tcp SACK causes SSH disconnect
To generate a diff of this commit:
cvs rdiff -u -r1.22 -r1.23 src/sys/netinet/tcp_congctl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/51753 CVS commit: [netbsd-7] src/sys/netinet
Date: Thu, 5 Jan 2017 08:08:46 +0000
Module Name: src
Committed By: martin
Date: Thu Jan 5 08:08:46 UTC 2017
Modified Files:
src/sys/netinet [netbsd-7]: tcp_congctl.c
Log Message:
Pull up following revision(s) (requested by skrll in ticket #1347):
sys/netinet/tcp_congctl.c: revision 1.23
Restore behaviour to pre- tcp_congctl.c:1.18 for SACK. Further analysis
of the change is required.
OK kefren@
PR/51753 tcp SACK causes SSH disconnect
To generate a diff of this commit:
cvs rdiff -u -r1.19 -r1.19.4.1 src/sys/netinet/tcp_congctl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/51753 CVS commit: [netbsd-7-0] src/sys/netinet
Date: Thu, 5 Jan 2017 09:11:14 +0000
Module Name: src
Committed By: martin
Date: Thu Jan 5 09:11:14 UTC 2017
Modified Files:
src/sys/netinet [netbsd-7-0]: tcp_congctl.c
Log Message:
Pull up following revision(s) (requested by skrll in ticket #1347):
sys/netinet/tcp_congctl.c: revision 1.23
Restore behaviour to pre- tcp_congctl.c:1.18 for SACK. Further analysis
of the change is required.
OK kefren@
PR/51753 tcp SACK causes SSH disconnect
To generate a diff of this commit:
cvs rdiff -u -r1.19 -r1.19.8.1 src/sys/netinet/tcp_congctl.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Mon, 04 Jun 2018 09:34:29 +0000
State-Changed-Why:
Has this problem showed up since?
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51753 (tcp SACK causes SSH disconnects)
Date: Mon, 4 Jun 2018 11:37:29 +0200
I am not the original submitter, but could see this problem regularily
and for me it has never happened again. Whatever that is worth.
Martin
State-Changed-From-To: feedback->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Mon, 04 Jun 2018 10:34:20 +0000
State-Changed-Why:
Feedback received by martin, who had the same bug in the past. Feel free to report another bug or repl if you are seeing the problem again. thanks for the report & skrll for the fix.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.