NetBSD Problem Report #44941

From Wolfgang.Stukenbrock@nagler-company.com  Fri May  6 12:39:13 2011
Return-Path: <Wolfgang.Stukenbrock@nagler-company.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id CD1EC63BBEC
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  6 May 2011 12:39:13 +0000 (UTC)
Message-Id: <20110506123903.EA0DF1E80D1@test-s0.nagler-company.com>
Date: Fri,  6 May 2011 14:39:03 +0200 (CEST)
From: Wolfgang.Stukenbrock@nagler-company.com
Reply-To: Wolfgang.Stukenbrock@nagler-company.com
To: gnats-bugs@gnats.NetBSD.org
Cc: wgstuken@test-s0.nagler-company.com
Subject: racoon droppes pfkey messages -> timeout
X-Send-Pr-Version: 3.95

>Number:         44941
>Category:       bin
>Synopsis:       racoon droppes pfkey messages -> timeout
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 06 12:40:00 +0000 2011
>Originator:     Dr. Wolfgang Stukenbrock
>Release:        NetBSD 5.1
>Organization:
Dr. Nagler & Company GmbH
>Environment:


System: NetBSD e010 5.1 NetBSD 5.1 (NSW-svc-ISDN) #2: Thu May  5 13:12:45 CEST 2011  wgstuken@s012:/export/NetBSD-5.1/N+C-build/.OBJDIR_i386/export/NetBSD-5.1/src/sys/arch/i386/compile/NSW-svc-ISDN i386
Architecture: x86_64
Machine: amd64
>Description:
	While trying to connect a Windows7 client to my sytem I run into problems.
	The communications starts "normal" but during rekeying the connection suddenly
	hangs.
	I've turned on debugged and added some additional output to racoon in order to
	find the reason for this instability.
	I found teh following scenario:
	While adding a new SA into the kernel, racoon seems to get another message from the client.
	The message is a "delete" message - i this case a delete with 0 SPI entries.
	OK - this is not very smart from Windows7, but it triggers a bug in racoon.
	When this message is processed, pfkey_sadb_dump() is called. This routines send a
	dump request into the kernel and then retrieves messages until nothing is left.
	In my case it first drops tow messages (type = 2 and type = 3).
	This lead to a problem when the "running" add-SA tries to get it's answer messages from the kernel,
	because the dump has dropped it ...
	You can see the following messages in the output:
	"ERROR: 172.16.65.151 give up to get IPsec-SA due to time up to wait"
	After this the tunnel from the Windows7-client is dead.
>How-To-Repeat:
	Not easy, because it is very senseitive to timeing aspects ....
>Fix:
	The problem is located int /usr/src/crypto/dist/ipsec-tools/src/racoon/pfkey.c.
	In pfkey_sadb_dump() messages are dropped if they are not of type SADB_DUMP.
	Instead of dropping them, they should be forwareded as it is normaly done
	in pfkey_handler().
	Now there are several ways of dooing it ...
	I assume that there is no way to figure out when the lase messages that belongs to the dump
	request has been recieved. In this case there must be a polling in pfkey_sadb_dump() as
	currently implemented.
	So I recommend to split the pfkey_handler() routine into thow parts and reuse the lower part
	in pfkey_sadb_dump() to deliver the currently dropped messages.

	I'm not 100% shure if the PID should be checked in pfkey_handler() too - as done for the dump
	part. I'm not shure if delivering the messages is OK or if this may create some other sync-problems
	in the implementation and the delivery should be delayed.
	In this case a larger change is required with a queue to place these messages in it and force
	checking this queue at all places first where we wait for additional kernel messages from the pfkey-socket.

	For me I've added a direct delivery at the moment, and it seems to work.
	I got a "IPsec-SA established: ESP/Tunnel 62.153.101.194[500]->172.16.65.151[500] spi=1221443210(0x48cdbe8a)"
	message in the middle of the dump processing and the connection is stil alive.

	The follwoing patch is the thing I've added to my 5.1 racoon:

--- pfkey.c	2011/05/06 11:16:17	1.1
+++ pfkey.c	2011/05/06 11:24:36
@@ -190,12 +190,13 @@
  *	0: success
  *	-1: fail
  */
+static int pfkey_handler_x(struct sadb_msg *msg);
+
 int
 pfkey_handler()
 {
 	struct sadb_msg *msg;
 	int len;
-	caddr_t mhp[SADB_EXT_MAX + 1];
 	int error = -1;

 	/* receive pfkey message. */
@@ -235,19 +236,32 @@

 		goto end;
 	}
+	if (pfkey_handler_x(msg) != 0) goto end;
+// remark: we assume that pfkey_align() will assign msg to mhp[0] in the separated code ....
+
+	error = 0;
+end:
+	if (msg)
+		racoon_free(msg);
+	return(error);
+}
+
+static int pfkey_handler_x(struct sadb_msg *msg)
+{
+	caddr_t mhp[SADB_EXT_MAX + 1];

 	/* check pfkey message. */
 	if (pfkey_align(msg, mhp)) {
 		plog(LLV_ERROR, LOCATION, NULL,
 			"libipsec failed pfkey align (%s)\n",
 			ipsec_strerror());
-		goto end;
+		return -1;
 	}
 	if (pfkey_check(mhp)) {
 		plog(LLV_ERROR, LOCATION, NULL,
 			"libipsec failed pfkey check (%s)\n",
 			ipsec_strerror());
-		goto end;
+		return -1;
 	}
 	msg = (struct sadb_msg *)mhp[0];

@@ -256,24 +270,20 @@
 		plog(LLV_ERROR, LOCATION, NULL,
 			"unknown PF_KEY message type=%u\n",
 			msg->sadb_msg_type);
-		goto end;
+		return -1;
 	}

 	if (pkrecvf[msg->sadb_msg_type] == NULL) {
 		plog(LLV_INFO, LOCATION, NULL,
 			"unsupported PF_KEY message %s\n",
 			s_pfkey_type(msg->sadb_msg_type));
-		goto end;
+		return -1;
 	}

 	if ((pkrecvf[msg->sadb_msg_type])(mhp) < 0)
-		goto end;
+		return -1;

-	error = 0;
-end:
-	if (msg)
-		racoon_free(msg);
-	return(error);
+	return(0);
 }

 /*
@@ -317,10 +327,17 @@

 		if (msg->sadb_msg_type != SADB_DUMP || msg->sadb_msg_pid != pid)
 		{
-		    plog(LLV_DEBUG, LOCATION, NULL,
-			 "discarding non-sadb dump msg %p, our pid=%i\n", msg, pid);
-		    plog(LLV_DEBUG, LOCATION, NULL,
-			 "type %i, pid %i\n", msg->sadb_msg_type, msg->sadb_msg_pid);
+		    if (msg->sadb_msg_pid == pid && pfkey_handler_x(msg) == 0)
+		    {
+		      plog(LLV_DEBUG, LOCATION, NULL,
+			   "successfull processed msg of type %i while collecting dump messages\n",
+			   msg->sadb_msg_type);
+		    } else {
+			plog(LLV_DEBUG, LOCATION, NULL,
+			     "discarding non-sadb dump msg %p, our pid=%i\n", msg, pid);
+			plog(LLV_DEBUG, LOCATION, NULL,
+			     "type %i, pid %i\n", msg->sadb_msg_type, msg->sadb_msg_pid);
+		    }
 		    continue;
 		}


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.