NetBSD Problem Report #59347
From www@netbsd.org Wed Apr 23 15:54:12 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 58FD11A9239
for <gnats-bugs@gnats.NetBSD.org>; Wed, 23 Apr 2025 15:54:12 +0000 (UTC)
Message-Id: <20250423155410.688251A923E@mollari.NetBSD.org>
Date: Wed, 23 Apr 2025 15:54:10 +0000 (UTC)
From: andrew.cagney@gmail.com
Reply-To: andrew.cagney@gmail.com
To: gnats-bugs@NetBSD.org
Subject: the `-w` in `racoonctl establish-sa -w esp inet ${leftsubnet}/255 ${rightsubnet}/255 any` is racy
X-Send-Pr-Version: www-1.0
>Number: 59347
>Category: bin
>Synopsis: the `-w` in `racoonctl establish-sa -w esp inet ${leftsubnet}/255 ${rightsubnet}/255 any` is racy
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Apr 23 15:55:00 +0000 2025
>Originator: cagney
>Release: 10.1
>Organization:
>Environment:
NetBSD west 10.1 NetBSD 10.1 (GENERIC) #0: Mon Dec 16 13:08:11 UTC 2024 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
The documentation says:
Specifying -w will make racoonctl wait until the SA is actually
established or an error occurs.
however, it doesn't always work. My hunch is that racoonctl:
- triggers an acquire
- tries to attach to racoon's socket
but sometimes the second step fails (timesout)
First, here's a working `-w`. As part of establishing the IPsec SA, racoon sends a request for an SPI to the kernel and then, while waiting for the response pools for and gets the attach:
2025-04-23 15:41:03: DEBUG: pfkey GETSPI sent: ESP/Tunnel 192.1.2.23[500]->192.1.2.45[500]
2025-04-23 15:41:03: DEBUG: pfkey getspi sent.
2025-04-23 15:41:03: DEBUG: [28] admin connection is polling events
2025-04-23 15:41:03: DEBUG: [28] admin connection established
however, here:
2025-04-23 05:09:25: DEBUG: call pfkey_send_getspi
2025-04-23 05:09:25: DEBUG: pfkey GETSPI sent: ESP/Tunnel 192.1.2.23[500]->192.1.2.45[500]
2025-04-23 05:09:25: DEBUG: pfkey getspi sent.
2025-04-23 05:09:25: DEBUG: pk_recv: retry[0] recv()
2025-04-23 05:09:25: DEBUG: got pfkey GETSPI message
the pool never happens and the `-w` is missed.
(as an aside, the working VM's host is significantly faster then the failing VM's host)
>How-To-Repeat:
>Fix:
the workaround is to ignore `-w` and instead either monitor the logs, or probe racoon until the SA can be seen
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.