NetBSD Problem Report #49390

From www@NetBSD.org  Thu Nov 13 16:32:39 2014
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 3880BA667B
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 13 Nov 2014 16:32:39 +0000 (UTC)
Message-Id: <20141113163237.5B666A66A4@mollari.NetBSD.org>
Date: Thu, 13 Nov 2014 16:32:37 +0000 (UTC)
From: uwe@NetBSD.org
Reply-To: uwe@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: Unsigned wraparound in tcp sequence number calculations on 64-bit machines
X-Send-Pr-Version: www-1.0

>Number:         49390
>Category:       kern
>Synopsis:       Unsigned wraparound in tcp sequence number calculations on 64-bit machines
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Nov 13 16:35:00 +0000 2014
>Last-Modified:  Thu Aug 13 03:25:01 +0000 2020
>Originator:     Valery Ushakov
>Release:        current
>Organization:
>Environment:
>Description:
tcp_output() mixes "long" and "uint32_t" when doing sequence number
calculations and on 64-bit that's a dangerous mix.

It looks like I forgot to report this when this was still fresh in my
memory, so I'd better file it now before I forgot even more details.
Apologies in advance for a bit of handwaving.

I actually encountered this problem while working on slirp-based NAT
in VirtualBox.  Slirp uses old BSD stack code.  In NetBSD the relevant
code is still mostly the same.  The bug that I did run into in slirp
is fixed in NetBSD and I haven't tried to trigger potentially buggy
code paths with other instances of this problem.  Someone with a
better clue should review it, but I think it should be fixed for
general correctness.

The bug that I ran into in slirp was fixed in NetBSD by rev. 1.112 of
tcp_output.c:

  http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/netinet/tcp_output.c.diff?r1=1.111&r2=1.112

  revision 1.112
  date: 2004-05-08 18:41:47 +0400;  author: chs;  state: Exp;  lines: +4 -4;
  work around an LP64 problem where we report an excessively large window
  due to incorrect mixing of types.

That problem was triggered by zero window probes by remote's persist
timer, when rcv_adv (owned by tcp_output - advertised right end of the
window) was "overtaken" by rvc_nxt by one byte - the probe accepted
into previously closed window (rcv_nxt is owned by tcp_input).  So

  /* 64 bit */ long win = (long)(tp->rcv_adv - tp->rcv_nxt);

became 4294967295L instead of -1 [gory details available upon request].


We still have more instances of code that does the same

  long delta = uint32_seq1 - uint32_seq2;


E.g. earlier in tcp_output() 

                /*
                 * "adv" is the amount we can increase the window,
                 * taking into account that we are limited by
                 * TCP_MAXWIN << tp->rcv_scale.
                 */
                long adv = min(win, (long)TCP_MAXWIN << tp->rcv_scale) -
                        (tp->rcv_adv - tp->rcv_nxt);

with rcv_adv and rcv_nxt again.


We also have it in tcp_seq.h in

  #define SEQ_SUB(a,b)    ((long)((a)-(b)))


I was reminded to file this b/c I accidentally stumbled upon this blog
post:

  http://blogmal.42.org/tidbits/tcp-bug.story

that seems directly relevant to the unfixed "long adv" mentioned
above.

>How-To-Repeat:
Source inspection.
>Fix:
There are two aspects to this.

One is the technicality of doing the correct casts.

The more interesting problem is verifying whether the surrounding code
is prepared to handle negative deltas.  It's obviously orthogonal to
the actual casting bug described here as on 32-bit machines there's no
bug.  We may formally ignore this aspect for now since fixing the
casts will make 64-bit code do the same thing that 32-bit code has
been doing all this time.  But it might be a good pretext for someone
with a clue to actually look at that "while there".

>Audit-Trail:
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49390: Unsigned wraparound in tcp sequence number
 calculations on 64-bit machines
Date: Thu, 13 Aug 2020 06:22:15 +0300

 Oops, looks like I forgot to reference this PR in the fix I've
 committed a while ago:

 RCS file: /cvsroot/src/sys/netinet/tcp_output.c,v
 [...]
 revision 1.207
 date: 2018-05-08 02:42:13 +0300;  author: uwe;  state: Exp;  lines: +19 -8;  commitid: q2ycosos011hhqBA;
 Fix unsigned wraparound on window size calculations.

 This is another instance where tp->rcv_adv - tp->rcv_nxt can wrap
 around after successful zero-window probe from the peer.  The first
 one was fixed by chs@ in revision 1.112 on 2004-05-08.

 While here, CSE and de-obfuscate the code a bit.

 -uwe

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.