NetBSD Problem Report #59347

From www@netbsd.org  Wed Apr 23 15:54:12 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 58FD11A9239
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 23 Apr 2025 15:54:12 +0000 (UTC)
Message-Id: <20250423155410.688251A923E@mollari.NetBSD.org>
Date: Wed, 23 Apr 2025 15:54:10 +0000 (UTC)
From: andrew.cagney@gmail.com
Reply-To: andrew.cagney@gmail.com
To: gnats-bugs@NetBSD.org
Subject: the `-w` in `racoonctl establish-sa -w esp inet ${leftsubnet}/255 ${rightsubnet}/255 any` is racy 
X-Send-Pr-Version: www-1.0

>Number:         59347
>Category:       bin
>Synopsis:       the `-w` in `racoonctl establish-sa -w esp inet ${leftsubnet}/255 ${rightsubnet}/255 any` is racy
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Apr 23 15:55:00 +0000 2025
>Originator:     cagney
>Release:        10.1
>Organization:
>Environment:
NetBSD west 10.1 NetBSD 10.1 (GENERIC) #0: Mon Dec 16 13:08:11 UTC 2024  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64

>Description:
The documentation says:

             Specifying -w will make racoonctl wait until the SA is actually
             established or an error occurs.

however, it doesn't always work.  My hunch is that racoonctl:

- triggers an acquire
- tries to attach to racoon's socket

but sometimes the second step fails (timesout)

First, here's a working `-w`.  As part of establishing the IPsec SA, racoon sends a request for an SPI to the kernel and then, while waiting for the response pools for and gets the attach:

2025-04-23 15:41:03: DEBUG: pfkey GETSPI sent: ESP/Tunnel 192.1.2.23[500]->192.1.2.45[500] 
2025-04-23 15:41:03: DEBUG: pfkey getspi sent.
2025-04-23 15:41:03: DEBUG: [28] admin connection is polling events
2025-04-23 15:41:03: DEBUG: [28] admin connection established

however, here:

2025-04-23 05:09:25: DEBUG: call pfkey_send_getspi
2025-04-23 05:09:25: DEBUG: pfkey GETSPI sent: ESP/Tunnel 192.1.2.23[500]->192.1.2.45[500] 
2025-04-23 05:09:25: DEBUG: pfkey getspi sent.
2025-04-23 05:09:25: DEBUG: pk_recv: retry[0] recv() 
2025-04-23 05:09:25: DEBUG: got pfkey GETSPI message

the pool never happens and the `-w` is missed.

(as an aside, the working VM's host is significantly faster then the failing VM's host)


>How-To-Repeat:

>Fix:
the workaround is to ignore `-w` and instead either monitor the logs, or probe racoon until the SA can be seen

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.