NetBSD Problem Report #58508

From www@netbsd.org  Mon Jul 29 01:06:06 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits)
	 client-signature RSA-PSS (2048 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 32BCB1A923C
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 29 Jul 2024 01:06:06 +0000 (UTC)
Message-Id: <20240729010604.BD3C21A923E@mollari.NetBSD.org>
Date: Mon, 29 Jul 2024 01:06:04 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: experimental wg(4) queues LIFO, not FIFO, pending first handshake
X-Send-Pr-Version: www-1.0

>Number:         58508
>Category:       kern
>Synopsis:       experimental wg(4) queues LIFO, not FIFO, pending first handshake
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jul 29 01:10:00 +0000 2024
>Closed-Date:    Wed Oct 09 14:39:00 +0000 2024
>Last-Modified:  Wed Oct 09 14:39:00 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10
>Organization:
The NetWG Queuedation
>Environment:
>Description:
When there is no established session with a wg(4) peer, and something attempts to send a packet to the peer, wg(4) will initiate a session but the packet can't go out until a handshake completes.

During that time, wg(4) queued packets so that the moment the handshake completes, it can send data promptly to the peer (the responder is forbidden to send data until it has received the initiator's first data packet).

Currently, the queue has a single element, because normally the handshake takes a single round-trip to complete (initiator sends init, responder sends response, initiator can send data after that).  But it is effectively a LIFO queue, because wg(4) drops any additional outgoing packets.

This has the effect that if the handshake is delayed (e.g., because it has to wait for arp resolution), ping returns answers that look like:

64 bytes from 192.168.1.1: icmp_seq=0 ttl=64 time=4567.890 ms
64 bytes from 192.168.1.1: icmp_seq=5 ttl=64 time=7.706690 ms
64 bytes from 192.168.1.1: icmp_seq=6 ttl=64 time=1.490476 ms
...

Queueing older packets and dropping newer packets is a pretty weird thing to do.  It would probably be better to drop older packets and queue newer packets.  (It's not entirely clear to me it's even worthwhile to queue anything at all here -- ozaki-r's original draft would just send an empty keepalive packet once the session is established, and that worked too.  But it does potentially reduce some round-trips for higher-level applications.)

>How-To-Repeat:
ping
>Fix:
Change

        if (atomic_cas_ptr(&wgp->wgp_pending, NULL, m) != NULL)
                m_freem(m);

to

        if ((m = atomic_swap_ptr(&wgp->wgp_pending, m)) != NULL)
                m_freem(m);

more or less, in wg_output.

>Release-Note:

>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/58508 CVS commit: src/sys/net
Date: Mon, 29 Jul 2024 02:33:27 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 29 02:33:27 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Queue pending packet in FIFO order, not LIFO order.

 Sometimes the session takes a seconds to establish, for whatever
 reason.  It is better if the pending packet, which we queue up to
 send as soon as we get the responder's handshake response, is the
 most recent packet, rather than the first packet.

 That way, we don't wind up with a weird multi-second-delayed ping,
 followed by a bunch of dropped, followed by normal ping timings, or
 wind up sending the first TCP SYN instead of the most recent, or what
 have you.  Senders need to be prepared to retransmit anyway if
 packets are dropped.

 PR kern/58508: experimental wg(4) queues LIFO, not FIFO, pending
 first handshake


 To generate a diff of this commit:
 cvs rdiff -u -r1.114 -r1.115 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 29 Jul 2024 02:39:45 +0000
State-Changed-Why:
fixed in HEAD, needs pullup-10


From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Cc: 
Subject: Re: kern/58508: experimental wg(4) queues LIFO, not FIFO, pending first handshake
Date: Mon, 29 Jul 2024 14:47:59 +0000

 On reflection, I realize that LIFO and FIFO are not the right terms
 here.

 A single-entry queue is always both LIFO (last-in/first-out) and
 FIFO (first-in/first-out).

 What changed is the choice of which packets to drop when we are asked
 to transmit new ones and the (single-entry) queue has filled up.  Do
 we drop the oldest packet, or the newest packet (or something else)?

 Previously I had chosen drop-newest (but not for any particular reason
 I can remember, and certainly not for any reason I wrote down).  Now
 the choice is drop-oldest, like arp/ndp does.

 Too late to fix the commit message, oh well.

State-Changed-From-To: needs-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 09 Oct 2024 14:39:00 +0000
State-Changed-Why:
fixed in HEAD, pulled up to 10
pullup-10 #934 https://releng.netbsd.org/cgi-bin/req-10.cgi?show=934


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.