NetBSD Problem Report #48472
From khym@azeotrope.org Mon Dec 23 04:40:00 2013
Return-Path: <khym@azeotrope.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id E09F2A61B7
for <gnats-bugs@gnats.NetBSD.org>; Mon, 23 Dec 2013 04:40:00 +0000 (UTC)
Message-Id: <20131223043903.B47AC1C7A23@yerfable.azeotrope.org>
Date: Sun, 22 Dec 2013 22:39:03 -0600 (CST)
From: khym@azeotrope.org
Reply-To: khym@azeotrope.org
To: gnats-bugs@NetBSD.org
Subject: NetBSD may send ICMP_UNREACH_NEEDFRAG with "next MTU" larger than packet that caused it to be sent
X-Send-Pr-Version: 3.95
>Number: 48472
>Category: kern
>Synopsis: NetBSD may send ICMP_UNREACH_NEEDFRAG with "next MTU" larger than packet that caused it to be sent
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Dec 23 04:45:00 +0000 2013
>Originator: Dave Huang
>Release: NetBSD 6.99.17
>Organization:
Name: Dave Huang | Mammal, mammal / their names are called /
INet: khym@azeotrope.org | they raise a paw / the bat, the cat /
FurryMUCK: Dahan | dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 38 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++
>Environment:
System: NetBSD foxy.azeotrope.org 6.99.28 NetBSD 6.99.28 (FOXY) #20: Fri Dec 20 00:04:49 CST 2013 khym@vmbsd.azeotrope.org:/usr/obj.i386/sys/arch/i386/compile/FOXY i386
Architecture: i386
Machine: i386
>Description:
If you add a route to a destination and override the MTU,
setting it smaller than the interface MTU, NetBSD will use that MTU
when fragmenting packets that are routed through it to that
destination. And if the packet has the Don't Fragment bit set, NetBSD
will drop the packet and send an ICMP destination unreachable,
fragmentation needed and DF set. However, the next-hop MTU that's sent
in that ICMP packet is the interface MTU, rather than the smaller
route MTU, which breaks path MTU discovery.
There's discussion about this at a thread starting at
<http://mail-index.netbsd.org/tech-net/2013/12/19/msg004418.html>.
There's some disagreement about whether route MTUs should even be used
when forwarding packets; my understanding is that the concern stems
from PMTU discovery possibly using the routing table as a PMTU cache.
I agree that the PMTU cache should not affect packets being forwarded
through the router from another host, but do think that if the system
admin manually adds a route with a smaller MTU, that MTU should be
honored when routing packets. However, it's agreed that it's wrong for
NetBSD to drop a packet because it's bigger than the MTU, but give an
MTU larger than the packet in its ICMP fragmentation needed packet.
>How-To-Repeat:
On a NetBSD machine acting as a router:
# route add www.netbsd.org $my_gateway_ip -mtu 1200
(replacing $my_gateway_ip with the correct next hop gateway)
Then on another machine that routes through the above router,
$ ping -Ds 1300 www.netbsd.org
PING www.netbsd.org (149.20.53.86): 1300 data bytes
36 bytes from foxy.azeotrope.org (10.1.1.67): frag needed and DF set. Next MTU=1500 for icmp_seq=0
Note that Next MTU=1500, even though the packet sent is smaller than
1500. Next MTU should be 1200.
Linux does use the route MTU when routing and returns the route MTU in
the ICMP fragmentation needed packet. Tested on Debian Linux, kernel
2.6.32-5-686, by running the following on the router, then doing the
above ping test from a machine that routes through it:
ip route add 149.20.53.86 dev eth0 mtu 1200
It appears from a comment in FreeBSD's ip_input.c:ip_forward() that it
sends the smaller of the interface MTU and the route MTU in its ICMP
fragmentation needed packet, but I haven't confirmed that.
>Fix:
I think this patch will at least make NetBSD consistent. It's already
using the route MTU to determine whether a packet needs to be
fragmented or not; this will make it return the actual MTU used in
that determination.
Index: netinet/ip_input.c
===================================================================
RCS file: /cvsroot/src/sys/netinet/ip_input.c,v
retrieving revision 1.308
diff -u -r1.308 ip_input.c
--- netinet/ip_input.c 29 Jun 2013 21:06:58 -0000 1.308
+++ netinet/ip_input.c 20 Dec 2013 06:04:33 -0000
@@ -1335,7 +1335,8 @@
code = ICMP_UNREACH_NEEDFRAG;
if ((rt = rtcache_validate(&ipforward_rt)) != NULL)
- destmtu = rt->rt_ifp->if_mtu;
+ destmtu = rt->rt_rmx.rmx_mtu ?
+ rt->rt_rmx.rmx_mtu : rt->rt_ifp->if_mtu;
#ifdef IPSEC
(void)ipsec4_forward(mcopy, &destmtu);
#endif
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.