NetBSD Problem Report #56252

From www@netbsd.org  Wed Jun 16 23:34:54 2021
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id C5A501A921F
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 16 Jun 2021 23:34:54 +0000 (UTC)
Message-Id: <20210616233453.6CC691A923C@mollari.NetBSD.org>
Date: Wed, 16 Jun 2021 23:34:53 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: wg(4) state machine has race conditions
X-Send-Pr-Version: www-1.0

>Number:         56252
>Category:       kern
>Synopsis:       wg(4) state machine has race conditions
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    riastradh
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jun 16 23:35:00 +0000 2021
>Closed-Date:    
>Last-Modified:  Wed Oct 09 14:33:26 +0000 2024
>Originator:     Taylor R Campbell
>Release:        9.99.x
>Organization:
The NetBSD Floundation
>Environment:
succumbing to climate change
>Description:
The wg(4) session establishment and rekeying state machine has race conditions, and the WireGuard specification it aims to follow isn't clear on how they are to be resolved.  Someone needs to figure out an appropriate resolution.
>How-To-Repeat:
atf-run /usr/tests/net/if_wg
>Fix:
Yes, please!

>Release-Note:

>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/tests/net/if_wg
Date: Thu, 17 Jun 2021 12:16:09 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Thu Jun 17 12:16:09 UTC 2021

 Modified Files:
 	src/tests/net/if_wg: t_misc.sh

 Log Message:
 tests/net/if_wg: Mark as flaky (PR kern/56252).


 To generate a diff of this commit:
 cvs rdiff -u -r1.8 -r1.9 src/tests/net/if_wg/t_misc.sh

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56252: wg(4) state machine has race conditions
Date: Thu, 17 Jun 2021 16:17:20 +0300

 Duplicate of PR 55729?
 -- 
 Andreas Gustafsson, gson@gson.org

From: "Andreas Gustafsson" <gson@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/tests/net/if_wg
Date: Fri, 26 Nov 2021 20:02:35 +0000

 Module Name:	src
 Committed By:	gson
 Date:		Fri Nov 26 20:02:35 UTC 2021

 Modified Files:
 	src/tests/net/if_wg: t_misc.sh

 Log Message:
 The wg_handshake_timeout test case was failing because it contained
 atf_fail "failed to trigger PR kern/56252" without a corresponding
 atf_expect_fail "PR kern/56252", which makes no sense.  Since the
 test case does occasionally fail on real hardware, fix this by adding
 the atf_expect_fail rather than by removing the atf_fail.


 To generate a diff of this commit:
 cvs rdiff -u -r1.10 -r1.11 src/tests/net/if_wg/t_misc.sh

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->riastradh
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Fri, 26 Jul 2024 17:04:32 +0000
Responsible-Changed-Why:
mine


From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:37:59 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:37:59 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Rework some details of internal session state machine.

 This way:

 - There is a clear transition between when a session is being set up,
   and when it is exposed to the data rx path (wg_handle_msg_data):
   atomic_store_release to set wgs->wgs_state to INIT_PASSIVE or
   ESTABLISHED.

   (The transition INIT_PASSIVE -> ESTABLISHED is immaterial to the
   data rx path, so that's just atomic_store_relaxed.  Similarly the
   transition to DESTROYING.)

 - There is a clear transition between when a session is being set up,
   and when it is exposed to the data tx path (wg_output):
   atomic_store_release to set wgp->wgp_session_stable to it.

 - Every path that reinitializes a session must go through
   wg_destroy_session via wg_put_index_session first.  This avoids
   races between session reuse and the data rx/tx paths.

 - Add a log message at the time of every state transition.

 Prompted by:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.93 -r1.94 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:38:19 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:38:19 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Fix logic to ensure session initiation is underway.

 Previously, wg_task_send_init_message would call
 wg_send_handshake_msg_init if either:

 (a) the stable session is UNKNOWN, meaning a session has not yet been
     established, either by us or by the peer (but it could be in
     progress); or

 (b) the stable session is not UNKNOWN but the unstable session is
     _not_ INIT_ACTIVE, meaning there is an established session and we
     are not currently initiating a new session.

 If wg_output (or wgintr) found no established session while there was
 already a session being initiated, we may only enter
 wg_task_send_init_message after the session is already established,
 and trigger spurious reinitiation.

 Instead, create a separate flag to indicate whether it is mandatory
 to rekey because limits have passed.  Then create a session only if:

 (a) the stable session is not ESTABLISHED, or
 (b) the mandatory rekey flag is not set,

 and clear the mandatory rekey flag.

 While here, arrange to do rekey-after-time on tx, not on callout.  If
 there's no data to tx, we shouldn't reinitiate a session -- we should
 stay quiet on the network.

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.94 -r1.95 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:38:43 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:38:42 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Use callout_halt, not callout_stop.

 It's possible that callout_stop might work here, but let's simplify
 reasoning about it -- the timers in question only take the peer intr
 lock, so it's safe to wait for them while holding the peer lock in
 the handshake worker thread.

 We may have to undo the task bit but that will take a bit more
 analysis to determine.

 Prompted by (but probably won't fix anything in):

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.95 -r1.96 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:39:00 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:39:00 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Omit needless pserialize_perform on transition to DESTROYING.

 A session can still be used when it is in the DESTROYING state, so
 there's no need to wait for users to drain here -- that's the whole
 point of a separate DESTROYING state.

 It is only the transition from DESTROYING back to UNKNOWN, after the
 session has been unpublished so no new users can begin, that requires
 waiting for all users to drain, and we already do that in
 wg_destroy_session.

 Prompted by (but won't fix anything in, because this is just a
 performance optimization):

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.96 -r1.97 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:39:36 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:39:36 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Mark wgp_pending volatile to reflect its usage.

 Prompted by (but won't fix any part of):

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.98 -r1.99 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:40:03 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:40:02 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Fix session destruction.

 Schedule destruction as soon as the session is created, to ensure key
 erasure within 2*reject-after-time seconds.  Previously, we would
 schedule destruction of the previous session 1 second after the next
 one has been established.  Combined with a failure to update the
 state machine on keepalive packets, this led to temporary deadlock
 scenarios.

 To keep it simple, there's just one callout which runs every
 reject-after-time seconds and erases keys in sessions older than
 reject-after-time, so if a session is established the moment after it
 runs, the keys might not be erased until (2-eps)*reject-after-time
 seconds.

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.99 -r1.100 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:45:33 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:45:33 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Reject rx on sessions older than reject-after-time sec.

 Prompted by (but won't fix anything in):

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.100 -r1.101 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:45:51 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:45:51 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): On rx of valid ciphertext, make sure to update state machine.

 Previously, we also required the plaintext to be a plausible-looking
 IP packet before updating the state machine.

 But keepalive packets are empty -- and if the peer initiated the
 session to rekey after last tx but had no more data to tx, it will
 send a keepalive to finish session initiation.

 If we didn't update the state machine in that case, we would stay in
 INIT_PASSIVE state unable to tx on the session, which would make
 things hang.

 So make sure to always update the state machine once we have accepted
 a packet as genuine, even if it's genuine garbage on the inside.

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.101 -r1.102 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:46:17 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:46:16 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Make sure to update endpoint on keepalive packets too.

 Prompted by:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.102 -r1.103 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/tests/net/if_wg
Date: Sun, 28 Jul 2024 14:46:33 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:46:33 UTC 2024

 Modified Files:
 	src/tests/net/if_wg: t_misc.sh

 Log Message:
 tests/net/if_wg/t_misc: Tweak timeouts in wg_handshake_timeout.

 Most of the timers in wg(4) have only 1sec resolution, which might be
 rounded in either direction, so make sure there's a 2sec buffer on
 either side of the event we care about (the point at which wg(4)
 decides to stop retrying handshake).

 Won't fix any bugs, but might make the tests slightly less flaky.

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions


 To generate a diff of this commit:
 cvs rdiff -u -r1.12 -r1.13 src/tests/net/if_wg/t_misc.sh

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/tests/net/if_wg
Date: Sun, 28 Jul 2024 14:46:44 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:46:44 UTC 2024

 Modified Files:
 	src/tests/net/if_wg: t_misc.sh

 Log Message:
 tests/net/if_wg/t_misc: Elaborate in wg_rekey debug messages.

 Helpful for following the test log when things go wrong.

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.13 -r1.14 src/tests/net/if_wg/t_misc.sh

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/tests/net/if_wg
Date: Sun, 28 Jul 2024 14:47:06 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:47:06 UTC 2024

 Modified Files:
 	src/tests/net/if_wg: t_misc.sh

 Log Message:
 wg(4): Tests should pass now.

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.14 -r1.15 src/tests/net/if_wg/t_misc.sh

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:47:37 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:47:37 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Use 32-bit for times handled in rx/tx paths.

 The rx and tx paths require unlocked access to wgs_time_established
 (to decide whether it's time to rekey) and wgs_time_last_data_sent
 (to decide whether we need to reply to incoming data with a keepalive
 packet), so do it with atomic_load/store_*.

 On 32-bit platforms, we may not be able to do that on time_t.
 However, since sessions only last for a few minutes before
 reject-after-time kicks in and they are erased, 32 bits is plenty to
 record the durations that we need to record here, so this shouldn't
 introduce any new bugs even on hosts that exceed 136 years of uptime.

 Prompted by:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.103 -r1.104 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:47:58 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:47:58 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Make time_uptime32 work in netbsd<=10.

 This is the low 32 bits of time_uptime.

 Will simplify pullups to 10 for:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.104 -r1.105 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:48:14 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:48:13 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Fix quotation in comment.

 Prompted by:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.105 -r1.106 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:49:31 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:49:31 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Tidy up error branches.

 No functional change intended, except to add some log messages in
 failure cases.

 Cleanup after:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.107 -r1.108 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:50:31 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:50:31 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Delete temporary hacks to dump keys and packets.

 No longer useful for:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.110 -r1.111 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Sun, 28 Jul 2024 14:55:30 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Jul 28 14:55:30 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Explain why gethexdump/puthexdump is there, and tidy.

 This way I will not be tempted to replace it by in-line calls to
 libkern hexdump.

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.111 -r1.112 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Mon, 29 Jul 2024 02:28:59 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 29 02:28:58 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Put force_rekey state in the session, not the peer.

 That way, there is a time when one thread has exclusive access to the
 state, in wg_destroy_session under the peer lock, when we can clear
 the state without racing against the data tx path.

 This will work more reliably than the atomic_swap_uint I used before.

 Noted by kre@.

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.112 -r1.113 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Mon, 29 Jul 2024 02:33:44 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 29 02:33:44 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Sprinkle comments into wg_swap_sessions.

 No functional change intended.

 Prompted by:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.115 -r1.116 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Mon, 29 Jul 2024 02:33:58 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 29 02:33:58 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): No need for atomic access to wgs_time_established in tx/rx.

 This is stable while the session is visible to the tx/rx paths -- it
 is initialized before the session is exposed to tx/rx, and doesn't
 change until the session is no longer used by any tx/rx path and has
 been recycled.

 When I sprinkled atomic access to wgs_time_established in if_wg.c
 rev. 1.104, it was a vestige of an uncommitted draft that did the
 transition from INIT_PASSIVE to ESTABLISHED in the tx path itself, in
 an attempt to enable prompter tx on the new session as soon as it is
 established.  This turned out to be unnecessary, so I reverted most
 of it, but forgot that wgs_time_established no longer needed atomic
 treatment.

 We could go back to using time_t and time_uptime, now that there's no
 need to do atomic loads and stores on these quantities. But there's
 no point in 64-bit arithmetic when the time differences are all
 guaranteed bounded by a few minutes, so keeping it 32-bit is probably
 a slight performance improvement on 32-bit systems.

 (In contrast, wgs_time_last_data_sent is both written and read in the
 tx path, which may run in parallel on multiple CPUs, so it still
 requires the atomic treatment.)

 Tidying up for:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.116 -r1.117 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 29 Jul 2024 02:45:03 +0000
State-Changed-Why:
Aside from the race conditions in the spec (like what happens when two
peers simultaneously try to initiate), the protocol transitions we test
in the atf tests appear to be fixed in HEAD, and need pullup-10.


From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Mon, 29 Jul 2024 16:00:41 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 29 16:00:41 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Fix typo in comment recently added.

 Comment added in the service of:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.118 -r1.119 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Mon, 29 Jul 2024 16:01:13 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 29 16:01:13 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Omit needless atomic_load.

 wgs_local_index is only ever written to while only one thread has
 access to it and it is not in the thmap -- before it is published in
 wg_get_session_index, and after it is unpublished in
 wg_destroy_session.  So no need for atomic_load -- it is stable if we
 observe it in thmap_get result.

 (Of course this is only for an assertion, which if tripped obviously
 indicates a violation of our assumptions.  But if that happens, well,
 in the worst case we'll see a weird assertion message claiming that
 the index is not equal to itself, which from which we can conclude
 there must have been a concurrent update, which is good enough to
 help diagnose that problem without any atomic_load.)

 Tidying some of the changes for:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.119 -r1.120 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Mon, 29 Jul 2024 16:02:05 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 29 16:02:05 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Deduplicate session establishment actions.

 The actions to

 (a) record the last handshake time,
 (b) clear some handshake state,
 (c) transmit first data if queued, or (if initiator) keepalive, and
 (d) begin destroying the old session,

 were formerly duplicated between wg_handle_msg_resp (for when we're
 the initiator) and wg_task_establish_session (for when we're the
 responder).

 Instead, let's factor this out into wg_swap_session so there's only
 one copy of the logic.

 This requires moving wg_update_endpoint_if_necessary a little earlier
 in wg_handle_msg_resp -- which should be done anyway so that the
 endpoint is updated _before_ the session is published for the data tx
 path to use.

 Other than moving wg_update_endpoint_if_necessary a little earlier,
 no functional change intended.

 Post-fix tidying for:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.121 -r1.122 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Mon, 29 Jul 2024 18:43:11 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 29 18:43:11 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Read wgs_state atomically in wg_get_stable_session.

 As noted in the comment above, it may concurrently transition from
 ESTABLISHED to DESTROYING.

 Post-fix tidying for:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.122 -r1.123 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Mon, 29 Jul 2024 19:43:56 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 29 19:43:56 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Force rekey on tx if session is older than reject-after-time.

 One more corner case for:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.123 -r1.124 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Mon, 29 Jul 2024 19:45:56 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 29 19:45:56 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Trigger session initiation in wgintr, not in wg_output.

 We have to look up the session in wgintr anyway, for
 wg_send_data_msg.  By triggering session initiation in wgintr instead
 of wg_output, we can skip the stable session lookup and reference in
 wg_output -- simpler that way.

 Post-fix tidying for:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.125 -r1.126 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Mon, 29 Jul 2024 19:47:00 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 29 19:47:00 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): When a session is established, send first packet directly.

 Like we would do with the keepalive packet, if we had to send that
 instead -- no need to defer it to the pktq.  Keep it simple.

 Post-fix tidying for:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.127 -r1.128 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Mon, 29 Jul 2024 19:47:14 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 29 19:47:13 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Sprinkle volatile on variables requiring atomic access.

 No functional change intended, since the relevant access is always
 done with atomic_* when it might race with concurrent access -- and
 really this should be _Atomic or something.  But for now our
 atomic_ops(9) API is still spelled with volatile, so we'll use that.

 Post-fix tidying for:

 PR kern/55729: net/if_wg/t_misc:wg_rekey test case fails
 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.


 To generate a diff of this commit:
 cvs rdiff -u -r1.128 -r1.129 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56252 CVS commit: src/sys/net
Date: Wed, 31 Jul 2024 00:25:47 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Wed Jul 31 00:25:47 UTC 2024

 Modified Files:
 	src/sys/net: if_wg.c

 Log Message:
 wg(4): Make a rule for who wins when both peers send INIT at once.

 The rule is that the peer with the numerically smaller public key
 hash, in little-endian, takes priority iff the low order bit of

 H(peer A pubkey) ^ H(peer B pubkey) ^ H(posix minutes as le64)

 is 0, and the peer with the lexicographically larger public key takes
 priority iff the low-order bit is 1.

 Another case of:

 PR kern/56252: wg(4) state machine has race conditions
 PR kern/58463: if_wg does not work when idle.

 This one is, as far as I can tell, simply a deadlock in the protocol
 of the whitepaper -- until both sides give up on the handshake and
 one of them (but not both) later decides to try sending data again.

 (But not related to our t_misc:wg_rekey test, as far as I can tell,
 and I haven't put enough thought into how to reliably trigger this
 race to write a new automatic test for it.)


 To generate a diff of this commit:
 cvs rdiff -u -r1.129 -r1.130 src/sys/net/if_wg.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: needs-pullups->open
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 09 Oct 2024 14:33:26 +0000
State-Changed-Why:
A workaround has been committed for the session establishment race, but
it's a bespoke NetBSDism that other implementations don't follow, so
I'll leave this open to track resolutions if ever adopted by other
implementations.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.