NetBSD Problem Report #47013

From root@net01.malaiwah.local  Thu Sep 27 19:35:07 2012
Return-Path: <root@net01.malaiwah.local>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 1F51B63B907
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 27 Sep 2012 19:35:07 +0000 (UTC)
Message-Id: <20120927193433.31127801F3@net01.malaiwah.local>
Date: Thu, 27 Sep 2012 19:34:32 +0000 (UTC)
From: support@net01.malaiwah.local
Reply-To: michel.belleau@malaiwah.com
To: gnats-bugs@gnats.NetBSD.org
Subject: carp device misbehaving with ipv6 alias (ip6_output failed: 65 / HOSTUNREACH)
X-Send-Pr-Version: 3.95

>Number:         47013
>Category:       kern
>Synopsis:       ipv4+ipv6 carp0 interface is trying to get MASTER status although master is still advertising
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Sep 27 19:40:00 +0000 2012
>Last-Modified:  Tue Oct 18 14:35:00 +0000 2016
>Originator:     Michel Belleau
>Release:        NetBSD 6.0_RC2
>Organization:
>Environment:
System: NetBSD net01.malaiwah.local 6.0_RC2 NetBSD 6.0_RC2 (SHEEVAPLUG) #3: Mon Sep 17 00:57:10 UTC 2012 root@60448eae-930e-41f9-acf1-32776a7758b9.local:/zones/60448eae-930e-41f9-acf1-32776a7758b9/data/src/sys/arch/evbarm/compile/obj/SHEEVAPLUG evbarm
Architecture: arm
Machine: evbarm
>Description:
	I have two servers configured with a carp interface that has both IPv4 and IPv6 addresses.
	The second server (advskew 100) is trying to take over the first one every couple of seconds; it looks like this is happenning since I added the IPv6 alias to the carp interface.
	The network traces are showing that the first server is really advertising every second, but somehow the second server is still trying to take over the interface IPs.

	sysctl value:
	# sysctl -a | grep net.inet.carp
	net.inet.carp.preempt = 0
	net.inet.carp.arpbalance = 0
	net.inet.carp.allow = 1
	net.inet.carp.log = 1

	tcpdump trace:
    14:28:01.821864 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:02.845864 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:03.869987 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:04.893894 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:05.918201 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:06.941902 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:07.966179 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:08.990252 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:10.013908 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:11.037912 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:12.061905 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:13.085945 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:14.109952 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:15.133944 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:15.953660 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:16.977089 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:18.001099 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:19.025135 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:20.049238 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:21.073148 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:22.097185 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:23.121147 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:24.145164 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:25.169164 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:26.193162 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:27.217174 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:28.241185 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:29.265196 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:30.289178 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:31.108400 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:32.132388 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:33.156677 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:34.180702 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:35.204388 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:36.228681 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:37.252399 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:38.276716 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:39.300696 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:40.324420 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:41.348657 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:42.372402 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:43.396424 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
    14:28:44.420417 IP 192.168.15.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36

>How-To-Repeat:
	On a NetBSD 6.0 RC2 machine that has carp device enabled (I was trying on ARM / sheevaplug, but that might be relevant to other ports as well) issue (with the correct carpdev and IP addresses):

	# ifconfig carp0 create
	# ifconfig carp0 vhid 3 pass mekmitasdigoat carpdev mvgbe0 192.168.15.3 netmask 255.255.255.128
	# ifconfig carp0 inet6 2607:fa48:6e63:eb05::2 alias

	Watch dmesg (or /var/log/message) complaining about this at regular intervals:
	carp0: ip6_output failed: 65
	carp0: ip6_output failed: 65
	carp0: ip6_output failed: 65
	carp0: ip6_output failed: 65
	carp0: ip6_output failed: 65

	Furthermore, if you set up a second machine (I got a second sheevaplug) the same way but with "advskew 100", you will find the same error messages on its dmesg AND it will try to takeover the carp IP address once in a while (BACKUP status becomes MASTER for a tiny amount of time if you check "ifconfig carp0" continually).

	I enabled the sysctl carp log to find this out (from the second machine):

Sep 27 18:52:58 net02 /netbsd: carp0: INIT -> MASTER (preempting)
Sep 27 18:52:58 net02 /netbsd: carp0: state transition from: BACKUP -> to: MASTER
Sep 27 18:52:58 net02 /netbsd: carp0: ip6_output failed: 65
Sep 27 18:52:59 net02 /netbsd: carp0: MASTER -> BACKUP (more frequent advertisement received)
Sep 27 18:52:59 net02 /netbsd: carp0: state transition from: MASTER -> to: BACKUP
Sep 27 18:53:02 net02 /netbsd: carp0: INIT -> MASTER (preempting)
Sep 27 18:53:03 net02 /netbsd: carp0: state transition from: BACKUP -> to: MASTER
Sep 27 18:53:03 net02 /netbsd: carp0: ip6_output failed: 65
Sep 27 18:53:03 net02 /netbsd: carp0: MASTER -> BACKUP (more frequent advertisement received)
Sep 27 18:53:03 net02 /netbsd: carp0: state transition from: MASTER -> to: BACKUP
Sep 27 18:53:06 net02 /netbsd: carp0: INIT -> MASTER (preempting)
Sep 27 18:53:06 net02 /netbsd: carp0: state transition from: BACKUP -> to: MASTER
Sep 27 18:53:06 net02 /netbsd: carp0: ip6_output failed: 65
Sep 27 18:53:07 net02 /netbsd: carp0: MASTER -> BACKUP (more frequent advertisement received)
Sep 27 18:53:07 net02 /netbsd: carp0: state transition from: MASTER -> to: BACKUP
Sep 27 18:53:10 net02 /netbsd: carp0: INIT -> MASTER (preempting)
Sep 27 18:53:10 net02 /netbsd: carp0: state transition from: BACKUP -> to: MASTER
Sep 27 18:53:10 net02 /netbsd: carp0: ip6_output failed: 65
Sep 27 18:53:11 net02 /netbsd: carp0: MASTER -> BACKUP (more frequent advertisement received)
Sep 27 18:53:11 net02 /netbsd: carp0: state transition from: MASTER -> to: BACKUP
Sep 27 18:53:14 net02 /netbsd: carp0: INIT -> MASTER (preempting)
Sep 27 18:53:14 net02 /netbsd: carp0: state transition from: BACKUP -> to: MASTER
Sep 27 18:53:14 net02 /netbsd: carp0: ip6_output failed: 65
Sep 27 18:53:15 net02 /netbsd: carp0: MASTER -> BACKUP (more frequent advertisement received)
Sep 27 18:53:15 net02 /netbsd: carp0: state transition from: MASTER -> to: BACKUP
Sep 27 18:53:18 net02 /netbsd: carp0: INIT -> MASTER (preempting)
Sep 27 18:53:18 net02 /netbsd: carp0: state transition from: BACKUP -> to: MASTER
Sep 27 18:53:18 net02 /netbsd: carp0: ip6_output failed: 65
Sep 27 18:53:19 net02 /netbsd: carp0: MASTER -> BACKUP (more frequent advertisement received)
Sep 27 18:53:19 net02 /netbsd: carp0: state transition from: MASTER -> to: BACKUP

	It looks like every time the "ip6_output" error message comes in, the interface tries to preempt and become the master, shortly thereafter it comes back to BACKUP state (as it should have stayed from the start).

>Fix:
	No known fix yet.

>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/47013: carp device misbehaving with ipv6 alias (ip6_output
 failed: 65 / HOSTUNREACH)
Date: Sun, 30 Sep 2012 19:21:59 +0200

 --7JfCtLOvnd9MIVvH
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Thu, Sep 27, 2012 at 07:40:00PM +0000, support@net01.malaiwah.local wrote:
 > 	I have two servers configured with a carp interface that has both IPv4 and IPv6 addresses.
 > 	The second server (advskew 100) is trying to take over the first one every couple of seconds; it looks like this is happenning since I added the IPv6 alias to the carp interface.
 > 	The network traces are showing that the first server is really advertising every second, but somehow the second server is still trying to take over the interface IPs.

 I'm running with the attached hack on a IPv4+IPv6 setup. It seems that,
 for some reason, ipv6 advertisements are ignored. This won't work in a
 ipv6-only setup.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

 --7JfCtLOvnd9MIVvH
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="carp.diff"

 Index: ip_carp.c
 ===================================================================
 RCS file: /cvsroot/src/sys/netinet/ip_carp.c,v
 retrieving revision 1.26.10.2
 diff -u -r1.26.10.2 ip_carp.c
 --- ip_carp.c	9 Jun 2009 17:31:46 -0000	1.26.10.2
 +++ ip_carp.c	30 Sep 2012 17:18:07 -0000
 @@ -1041,7 +1041,7 @@
  		}
  	}
  #endif /* INET */
 -#ifdef INET6
 +#ifdef INET6_notyet
  	if (sc->sc_naddrs6) {
  		struct ip6_hdr *ip6;

 @@ -1445,7 +1445,7 @@
  			callout_schedule(&sc->sc_md_tmo, tvtohz(&tv));
  			break;
  #endif /* INET */
 -#ifdef INET6
 +#ifdef INET6_notyet
  		case AF_INET6:
  			callout_schedule(&sc->sc_md6_tmo, tvtohz(&tv));
  			break;
 @@ -1453,8 +1453,10 @@
  		default:
  			if (sc->sc_naddrs)
  				callout_schedule(&sc->sc_md_tmo, tvtohz(&tv));
 +#ifdef notyet
  			if (sc->sc_naddrs6)
  				callout_schedule(&sc->sc_md6_tmo, tvtohz(&tv));
 +#endif
  			break;
  		}
  		break;

 --7JfCtLOvnd9MIVvH--

From: "Ignatios Souvatzis" <is@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/47013 CVS commit: src/sys/netinet
Date: Sat, 23 Jul 2016 12:19:08 +0000

 Module Name:	src
 Committed By:	is
 Date:		Sat Jul 23 12:19:08 UTC 2016

 Modified Files:
 	src/sys/netinet: ip_carp.c

 Log Message:
 Workaround for PR 47013 by bouyer@. Only works for mixed IPv4/IPv6
 environemnts, not for pure-IPv6 yet. A real fix is still needed.


 To generate a diff of this commit:
 cvs rdiff -u -r1.74 -r1.75 src/sys/netinet/ip_carp.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/47013 CVS commit: [netbsd-7] src/sys/netinet
Date: Sat, 27 Aug 2016 04:25:50 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Sat Aug 27 04:25:50 UTC 2016

 Modified Files:
 	src/sys/netinet [netbsd-7]: ip_carp.c

 Log Message:
 Pull up following revision(s) (requested by is in ticket #1208):
 	sys/netinet/ip_carp.c: revision 1.75
 Workaround for PR 47013 by bouyer@. Only works for mixed IPv4/IPv6
 environemnts, not for pure-IPv6 yet. A real fix is still needed.


 To generate a diff of this commit:
 cvs rdiff -u -r1.59.2.2 -r1.59.2.3 src/sys/netinet/ip_carp.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Manuel Bouyer" <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/47013 CVS commit: [netbsd-6] src/sys/netinet
Date: Sun, 28 Aug 2016 10:49:45 +0000

 Module Name:	src
 Committed By:	bouyer
 Date:		Sun Aug 28 10:49:45 UTC 2016

 Modified Files:
 	src/sys/netinet [netbsd-6]: ip_carp.c

 Log Message:
 Pull up following revision(s) (requested by is in ticket #1393):
 	sys/netinet/ip_carp.c: revision 1.75
 Workaround for PR 47013 by bouyer@. Only works for mixed IPv4/IPv6
 environemnts, not for pure-IPv6 yet. A real fix is still needed.


 To generate a diff of this commit:
 cvs rdiff -u -r1.47.4.4 -r1.47.4.5 src/sys/netinet/ip_carp.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: Hauke Fath <hf@spg.tu-darmstadt.de>
Subject: Re: kern/47013: carp device misbehaving with ipv6 alias (ip6_output,
 failed: 65 / HOSTUNREACH)
Date: Tue, 18 Oct 2016 16:33:42 +0200

 I find that, even with this PR's "workaround" patch in place, the 
 secondary host's attempts to take over the one interface that has an 
 ipv6 address go on:

 % fgrep carp0 /var/log/messages | wc -l
       299
 % fgrep carp0 /var/log/messages | tail -20
 Oct 18 15:47:19 secondary-router /netbsd: carp0: MASTER -> BACKUP (more 
 frequent advertisement received)
 Oct 18 15:47:19 secondary-router /netbsd: carp0: state transition from: 
 MASTER -> to: BACKUP
 Oct 18 15:47:22 secondary-router /netbsd: carp0: INIT -> MASTER (preempting)
 Oct 18 15:47:22 secondary-router /netbsd: carp0: state transition from: 
 BACKUP -> to: MASTER
 Oct 18 15:47:26 secondary-router /netbsd: carp0: MASTER -> BACKUP (more 
 frequent advertisement received)
 Oct 18 15:47:26 secondary-router /netbsd: carp0: state transition from: 
 MASTER -> to: BACKUP
 Oct 18 15:47:30 secondary-router /netbsd: carp0: INIT -> MASTER (preempting)
 Oct 18 15:47:30 secondary-router /netbsd: carp0: state transition from: 
 BACKUP -> to: MASTER
 Oct 18 16:02:12 secondary-router /netbsd: carp0: MASTER -> BACKUP (more 
 frequent advertisement received)
 Oct 18 16:02:12 secondary-router /netbsd: carp0: state transition from: 
 MASTER -> to: BACKUP
 Oct 18 16:02:15 secondary-router /netbsd: carp0: INIT -> MASTER (preempting)
 Oct 18 16:02:15 secondary-router /netbsd: carp0: state transition from: 
 BACKUP -> to: MASTER
 Oct 18 16:20:28 secondary-router /netbsd: carp0: MASTER -> BACKUP (more 
 frequent advertisement received)
 Oct 18 16:20:28 secondary-router /netbsd: carp0: state transition from: 
 MASTER -> to: BACKUP
 Oct 18 16:20:32 secondary-router /netbsd: carp0: INIT -> MASTER (preempting)
 Oct 18 16:20:32 secondary-router /netbsd: carp0: state transition from: 
 BACKUP -> to: MASTER
 Oct 18 16:24:30 secondary-router /netbsd: carp0: MASTER -> BACKUP (more 
 frequent advertisement received)
 Oct 18 16:24:30 secondary-router /netbsd: carp0: state transition from: 
 MASTER -> to: BACKUP
 Oct 18 16:24:34 secondary-router /netbsd: carp0: INIT -> MASTER (preempting)
 Oct 18 16:24:34 secondary-router /netbsd: carp0: state transition from: 
 BACKUP -> to: MASTER
 %

 ... telling me the underlying problem is not fixed. Every now and then, 
 ssh sessions are dying on me.

 hauke

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.