NetBSD Problem Report #53664

From htodd@emily.i8u.org  Tue Oct  9 03:17:00 2018
Return-Path: <htodd@emily.i8u.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A3C197A26E
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  9 Oct 2018 03:17:00 +0000 (UTC)
Message-Id: <20181009031656.4817B333BA5@emily.i8u.org>
Date: Mon,  8 Oct 2018 20:16:56 -0700 (PDT)
From: htodd@i8u.org
Reply-To: htodd@i8u.org
To: gnats-bugs@NetBSD.org
Subject: nfs client broken on netbsd-8
X-Send-Pr-Version: 3.95

>Number:         53664
>Category:       kern
>Synopsis:       nfs client broken on netbsd-8
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    maxv
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Oct 09 03:20:00 +0000 2018
>Closed-Date:    Thu Oct 25 16:00:28 +0000 2018
>Last-Modified:  Thu Oct 25 16:00:28 +0000 2018
>Originator:     htodd@i8u.org
>Release:        NetBSD 8.0_STABLE
>Organization:

>Environment:


System: NetBSD emily.i8u.org 8.0_STABLE NetBSD 8.0_STABLE (EMILY) #139: Mon Oct 8 20:00:22 PDT 2018 htodd@chris.i8u.org:/usr/obj/i386/sys/arch/i386/compile/EMILY i386
Architecture: i386
Machine: i386
>Description:
This also happens on netbsd-8-amd64.

My mount script is:
#!/bin/sh

if test "$(id -u)" -ne "0"; then
   echo "Run this script as root" 1>&2
   exit 1
fi

/sbin/mount_nfs -o ro mara:/mnt2/home/netbsd-cvs/netbsd-8/src /usr/src
/sbin/mount_nfs -o ro mara:/mnt2/home/netbsd-cvs/netbsd-8/xsrc /usr/xsrc

# /sbin/mount_nfs -o ro mara:/home/netbsd/pkgsrc-2014Q4/pkgsrc /usr/pkgsrc
# /sbin/mount_nfs mara:/home/netbsd/pkgsrc/pkgsrc-2012Q4/packages/All/i386-fw /usr/pkgsrc/packages/All
/sbin/mount_nfs -o ro mara:/mnt2/home/netbsd-cvs/pkgsrc-2018Q3/pkgsrc /usr/pkgsrc

The mount of pkgsrc hangs. rpcmount and showmount looked normal, so I decided to bisect the kernel using git.

The result is:
24fcafa1a5051620452bb42e7b433d462764340f is the first bad commit
commit 24fcafa1a5051620452bb42e7b433d462764340f
Author: martin <martin@netbsd.org>
Date:   Wed Oct 3 17:53:56 2018 +0000

    Pull up following revision(s) (requested by maxv in ticket #1045):

            sys/netinet/ip_reass.c: revision 1.19

    Hold ip_off and ip_len in the fragment entry, instead of always reading
    the associated mbuf (and converting to host order). This reduces the
    cache/TLB misses when processing long lists.

>How-To-Repeat:
build a kernel with the changes to ip_reass.c using:
/usr/src/build.sh -O /usr/obj/i386 -T /usr/obj/tools.i386 -U -m i386 -j$JOBS kernel=EMILY

>Fix:
Phooey, I forgot to revert just that commit. I'll add that to the pr later.

>Release-Note:

>Audit-Trail:
From: Hisashi T Fujinaka <htodd@twofifty.com>
To: gnats-bugs@NetBSD.org
Cc: maxv@netbsd.org
Subject: Re: kern/53664: nfs client broken on netbsd-8
Date: Mon, 8 Oct 2018 21:14:56 -0700 (PDT)

 I just confirmed that reverting that one commit fixes nfs.

 -- 
 Hisashi T Fujinaka - htodd@twofifty.com

Responsible-Changed-From-To: kern-bug-people->maxkv
Responsible-Changed-By: maya@NetBSD.org
Responsible-Changed-When: Tue, 09 Oct 2018 04:48:23 +0000
Responsible-Changed-Why:
Ping. any idea why this is failing?


Responsible-Changed-From-To: maxkv->maxv
Responsible-Changed-By: maya@NetBSD.org
Responsible-Changed-When: Tue, 09 Oct 2018 04:48:42 +0000
Responsible-Changed-Why:
Ping. any idea why this is failing?


From: Maxime Villard <max@m00nbsd.net>
To: gnats-bugs@NetBSD.org, maxv@NetBSD.org, netbsd-bugs@netbsd.org,
 gnats-admin@netbsd.org, maya@NetBSD.org, htodd@i8u.org,
 Martin Husemann <martin@duskware.de>
Cc: 
Subject: Re: kern/53664 (nfs client broken on netbsd-8)
Date: Tue, 9 Oct 2018 07:08:51 +0200

 Le 09/10/2018 à 06:48, maya@NetBSD.org a écrit :
 > Synopsis: nfs client broken on netbsd-8
 > 
 > Responsible-Changed-From-To: maxkv->maxv
 > Responsible-Changed-By: maya@NetBSD.org
 > Responsible-Changed-When: Tue, 09 Oct 2018 04:48:42 +0000
 > Responsible-Changed-Why:
 > Ping. any idea why this is failing?

 Martin can you revert my pull up until I investigate? Sorry.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53664 CVS commit: [netbsd-8] src/sys/netinet
Date: Tue, 9 Oct 2018 09:44:31 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Tue Oct  9 09:44:31 UTC 2018

 Modified Files:
 	src/sys/netinet [netbsd-8]: ip_reass.c

 Log Message:
 Back out the following from ticket #1045 by maxv:

 	sys/netinet/ip_reass.c		1.19

 Faster IPv4 packet reassembly - causes fallout, needs further investigation
 (see PR kern/53664)


 To generate a diff of this commit:
 cvs rdiff -u -r1.11.8.5 -r1.11.8.6 src/sys/netinet/ip_reass.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Martin Husemann <martin@duskware.de>
To: Maxime Villard <max@m00nbsd.net>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53664 (nfs client broken on netbsd-8)
Date: Sat, 13 Oct 2018 08:30:11 +0200

 On Sat, Oct 13, 2018 at 08:21:43AM +0200, Maxime Villard wrote:
 > htodd, can you test with the latest current before I re-send the pullup?
 > Just boot with a current kernel and see if nfs works.

 Just to make sure: default nfs mounts changed from UDP to TCP and I have
 not seen any NFS fallout with any -current version in the meantime,
 even with r1.20. Is the fallout specific to UDP maybe, or specific
 network devices or server settings?

 Martin

From: Maxime Villard <max@m00nbsd.net>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
 htodd@i8u.org
Cc: 
Subject: Re: kern/53664 (nfs client broken on netbsd-8)
Date: Sat, 13 Oct 2018 08:21:43 +0200

 Le 09/10/2018 à 06:48, maya@NetBSD.org a écrit :
 > Synopsis: nfs client broken on netbsd-8
 > 
 > Responsible-Changed-From-To: maxkv->maxv
 > Responsible-Changed-By: maya@NetBSD.org
 > Responsible-Changed-When: Tue, 09 Oct 2018 04:48:42 +0000
 > Responsible-Changed-Why:
 > Ping. any idea why this is failing?

 htodd, can you test with the latest current before I re-send the pullup?
 Just boot with a current kernel and see if nfs works.

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53664 (nfs client broken on netbsd-8)
Date: Sat, 13 Oct 2018 11:14:28 -0000 (UTC)

 martin@duskware.de (Martin Husemann) writes:

 > Just to make sure: default nfs mounts changed from UDP to TCP and I have
 > not seen any NFS fallout with any -current version in the meantime,
 > even with r1.20. Is the fallout specific to UDP maybe, or specific
 > network devices or server settings?

 It's specific to IP reassembly, and that's very rarely used with TCP
 but for NFS almost always with UDP.

 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: Maxime Villard <max@m00nbsd.net>
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
 htodd@i8u.org
Cc: 
Subject: Re: kern/53664 (nfs client broken on netbsd-8)
Date: Sat, 13 Oct 2018 13:59:53 +0200

 Le 13/10/2018 à 13:15, Michael van Elst a écrit :
 > The following reply was made to PR kern/53664; it has been noted by GNATS.
 >
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: kern/53664 (nfs client broken on netbsd-8)
 > Date: Sat, 13 Oct 2018 11:14:28 -0000 (UTC)
 >
 >   martin@duskware.de (Martin Husemann) writes:
 >
 >   > Just to make sure: default nfs mounts changed from UDP to TCP and I have
 >   > not seen any NFS fallout with any -current version in the meantime,
 >   > even with r1.20. Is the fallout specific to UDP maybe, or specific
 >   > network devices or server settings?
 >
 >   It's specific to IP reassembly, and that's very rarely used with TCP
 >   but for NFS almost always with UDP.

 No, the fallout is when someone reads ip_off on a reassembled packet, which
 very few places do in practice. Fragmented UDP worked correctly, that's how
 I did basic testing. So did TCP, etc.

 Probably htodd's configuration reads ip_off somewhere in a way that matters,
 and in all cases, it was a mistake (from the code before me, and also from me)
 not to explicitly reset ip_off to zero.

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: maxv@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	htodd@i8u.org
Subject: Re: kern/53664 (nfs client broken on netbsd-8)
Date: Sat, 13 Oct 2018 14:25:15 +0200

 On Sat, Oct 13, 2018 at 12:05:01PM +0000, Maxime Villard wrote:
 >  Probably htodd's configuration reads ip_off somewhere in a way that matters,
 >  and in all cases, it was a mistake (from the code before me, and also from me)
 >  not to explicitly reset ip_off to zero.

 Yes, I was just trying to point out that either an explicit UDP mount
 with a current kernel in his configuration may be a better test, or
 testing a backport to -8.

 Martin

From: Hisashi T Fujinaka <htodd@twofifty.com>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@NetBSD.org, maxv@NetBSD.org, gnats-admin@netbsd.org, 
    netbsd-bugs@netbsd.org
Subject: Re: kern/53664 (nfs client broken on netbsd-8)
Date: Sat, 13 Oct 2018 07:56:24 -0700 (PDT)

 On Sat, 13 Oct 2018, Martin Husemann wrote:

 > Yes, I was just trying to point out that either an explicit UDP mount
 > with a current kernel in his configuration may be a better test, or
 > testing a backport to -8.

 My use case was -current exports and -8 clients. Mounting on the client
 would fail on some mounts but not others. I only tried reverting the -8
 client patch.

 For that reason, I would think that Martin's idea, a proposed patch for
 -8, would be a better test.

 -- 
 Hisashi T Fujinaka - htodd@twofifty.com
 BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee

From: Martin Husemann <martin@duskware.de>
To: Hisashi T Fujinaka <htodd@twofifty.com>
Cc: maxv@NetBSD.org, gnats-bugs@netbsd.org, htodd@i8u.org
Subject: Re: kern/53664 (nfs client broken on netbsd-8)
Date: Sat, 13 Oct 2018 18:56:14 +0200

 --sdtB3X0nJg68CQEu
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Sat, Oct 13, 2018 at 03:00:01PM +0000, Hisashi T Fujinaka wrote:
 > The following reply was made to PR kern/53664; it has been noted by GNATS.
 >  For that reason, I would think that Martin's idea, a proposed patch for
 >  -8, would be a better test.

 Here is one that catches up to -current for that file

 Martin

 --sdtB3X0nJg68CQEu
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=patch

 Index: ip_reass.c
 ===================================================================
 RCS file: /cvsroot/src/sys/netinet/ip_reass.c,v
 retrieving revision 1.11.8.6
 diff -c -u -p -r1.11.8.6 ip_reass.c
 --- ip_reass.c	9 Oct 2018 09:44:31 -0000	1.11.8.6
 +++ ip_reass.c	13 Oct 2018 16:54:01 -0000
 @@ -80,6 +80,8 @@ typedef struct ipfr_qent {
  	struct ip *		ipqe_ip;
  	struct mbuf *		ipqe_m;
  	bool			ipqe_mff;
 +	uint16_t		ipqe_off;
 +	uint16_t		ipqe_len;
  } ipfr_qent_t;

  TAILQ_HEAD(ipfr_qent_head, ipfr_qent);
 @@ -215,7 +217,7 @@ ip_nmbclusters_changed(void)
  struct mbuf *
  ip_reass(ipfr_qent_t *ipqe, ipfr_queue_t *fp, const u_int hash)
  {
 -	struct ip *ip = ipqe->ipqe_ip, *qip;
 +	struct ip *ip = ipqe->ipqe_ip;
  	const int hlen = ip->ip_hl << 2;
  	struct mbuf *m = ipqe->ipqe_m, *t;
  	int ipsecflags = m->m_flags & (M_DECRYPTED|M_AUTHIPHDR);
 @@ -230,16 +232,6 @@ ip_reass(ipfr_qent_t *ipqe, ipfr_queue_t
  	m->m_data += hlen;
  	m->m_len -= hlen;

 -#ifdef	notyet
 -	/* Make sure fragment limit is up-to-date. */
 -	CHECK_NMBCLUSTER_PARAMS();
 -
 -	/* If we have too many fragments, drop the older half. */
 -	if (ip_nfrags >= ip_maxfrags) {
 -		ip_reass_drophalf(void);
 -	}
 -#endif
 -
  	/*
  	 * We are about to add a fragment; increment frag count.
  	 */
 @@ -255,9 +247,9 @@ ip_reass(ipfr_qent_t *ipqe, ipfr_queue_t
  		 * never accept fragments  b) if maxfrag is -1, accept
  		 * all fragments without limitation.
  		 */
 -		if (ip_maxfragpackets < 0)
 -			;
 -		else if (ip_nfragpackets >= ip_maxfragpackets) {
 +		if (ip_maxfragpackets < 0) {
 +			/* no limit */
 +		} else if (ip_nfragpackets >= ip_maxfragpackets) {
  			goto dropfrag;
  		}
  		fp = malloc(sizeof(ipfr_queue_t), M_FTABLE, M_NOWAIT);
 @@ -285,7 +277,7 @@ ip_reass(ipfr_qent_t *ipqe, ipfr_queue_t
  	 * Find a segment which begins after this one does.
  	 */
  	TAILQ_FOREACH(q, &fp->ipq_fragq, ipqe_q) {
 -		if (ntohs(q->ipqe_ip->ip_off) > ntohs(ip->ip_off))
 +		if (q->ipqe_off > ipqe->ipqe_off)
  			break;
  	}
  	if (q != NULL) {
 @@ -295,39 +287,45 @@ ip_reass(ipfr_qent_t *ipqe, ipfr_queue_t
  	}

  	/*
 -	 * If there is a preceding segment, it may provide some of our
 -	 * data already.  If so, drop the data from the incoming segment.
 -	 * If it provides all of our data, drop us.
 +	 * Look at the preceding segment.
 +	 *
 +	 * If it provides some of our data already, in part or entirely, trim
 +	 * us or drop us.
 +	 *
 +	 * If a preceding segment exists, and was marked as the last segment,
 +	 * drop us.
  	 */
  	if (p != NULL) {
 -		i = ntohs(p->ipqe_ip->ip_off) + ntohs(p->ipqe_ip->ip_len) -
 -		    ntohs(ip->ip_off);
 +		i = p->ipqe_off + p->ipqe_len - ipqe->ipqe_off;
  		if (i > 0) {
 -			if (i >= ntohs(ip->ip_len)) {
 +			if (i >= ipqe->ipqe_len) {
  				goto dropfrag;
  			}
  			m_adj(ipqe->ipqe_m, i);
 -			ip->ip_off = htons(ntohs(ip->ip_off) + i);
 -			ip->ip_len = htons(ntohs(ip->ip_len) - i);
 +			ipqe->ipqe_off = ipqe->ipqe_off + i;
 +			ipqe->ipqe_len = ipqe->ipqe_len - i;
  		}
  	}
 +	if (p != NULL && !p->ipqe_mff) {
 +		goto dropfrag;
 +	}

  	/*
 -	 * While we overlap succeeding segments trim them or, if they are
 -	 * completely covered, dequeue them.
 +	 * Look at the segments that follow.
 +	 *
 +	 * If we cover them, in part or entirely, trim them or dequeue them.
 +	 *
 +	 * If a following segment exists, and we are marked as the last
 +	 * segment, drop us.
  	 */
  	while (q != NULL) {
 -		size_t end;
 -
 -		qip = q->ipqe_ip;
 -		end = ntohs(ip->ip_off) + ntohs(ip->ip_len);
 -		if (end <= ntohs(qip->ip_off)) {
 +		i = ipqe->ipqe_off + ipqe->ipqe_len - q->ipqe_off;
 +		if (i <= 0) {
  			break;
  		}
 -		i = end - ntohs(qip->ip_off);
 -		if (i < ntohs(qip->ip_len)) {
 -			qip->ip_len = htons(ntohs(qip->ip_len) - i);
 -			qip->ip_off = htons(ntohs(qip->ip_off) + i);
 +		if (i < q->ipqe_len) {
 +			q->ipqe_off = q->ipqe_off + i;
 +			q->ipqe_len = q->ipqe_len - i;
  			m_adj(q->ipqe_m, i);
  			break;
  		}
 @@ -339,6 +337,9 @@ ip_reass(ipfr_qent_t *ipqe, ipfr_queue_t
  		ip_nfrags--;
  		q = nq;
  	}
 +	if (q != NULL && !ipqe->ipqe_mff) {
 +		goto dropfrag;
 +	}

  insert:
  	/*
 @@ -351,12 +352,11 @@ insert:
  	}
  	next = 0;
  	TAILQ_FOREACH(q, &fp->ipq_fragq, ipqe_q) {
 -		qip = q->ipqe_ip;
 -		if (ntohs(qip->ip_off) != next) {
 +		if (q->ipqe_off != next) {
  			mutex_exit(&ipfr_lock);
  			return NULL;
  		}
 -		next += ntohs(qip->ip_len);
 +		next += q->ipqe_len;
  	}
  	p = TAILQ_LAST(&fp->ipq_fragq, ipfr_qent_head);
  	if (p->ipqe_mff) {
 @@ -402,6 +402,7 @@ insert:
  	 * header visible.
  	 */
  	ip->ip_len = htons((ip->ip_hl << 2) + next);
 +	ip->ip_off = htons(0);
  	ip->ip_src = fp->ipq_src;
  	ip->ip_dst = fp->ipq_dst;
  	free(fp, M_FTABLE);
 @@ -651,13 +652,6 @@ ip_reass_packet(struct mbuf **m0, struct
  		return EINVAL;
  	}

 -	/*
 -	 * Adjust total IP length to not reflect header and convert
 -	 * offset of this to bytes.  XXX: clobbers struct ip.
 -	 */
 -	ip->ip_len = htons(flen);
 -	ip->ip_off = htons(off);
 -
  	/* Look for queue of fragments of this datagram. */
  	mutex_enter(&ipfr_lock);
  	hash = IPREASS_HASH(ip->ip_src.s_addr, ip->ip_id);
 @@ -702,6 +696,8 @@ ip_reass_packet(struct mbuf **m0, struct
  	ipqe->ipqe_mff = mff;
  	ipqe->ipqe_m = m;
  	ipqe->ipqe_ip = ip;
 +	ipqe->ipqe_off = off;
 +	ipqe->ipqe_len = flen;

  	*m0 = ip_reass(ipqe, fp, hash);
  	if (*m0) {

 --sdtB3X0nJg68CQEu--

From: Hisashi T Fujinaka <htodd@twofifty.com>
To: gnats-bugs@NetBSD.org
Cc: maxv@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53664 (nfs client broken on netbsd-8)
Date: Sat, 13 Oct 2018 22:42:40 -0700 (PDT)

 The patch works for me on both netbsd8-i386 and netbsd8-amd64.

 -- 
 Hisashi T Fujinaka - htodd@twofifty.com
 BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee

From: Martin Husemann <martin@duskware.de>
To: Maxime Villard <max@m00nbsd.net>
Cc: Hisashi T Fujinaka <htodd@twofifty.com>, gnats-bugs@netbsd.org,
	htodd@i8u.org
Subject: Re: kern/53664 (nfs client broken on netbsd-8)
Date: Wed, 17 Oct 2018 07:14:25 +0200

 On Wed, Oct 17, 2018 at 07:13:46AM +0200, Maxime Villard wrote:
 > Given that the fix works can you pull it up? Thanks

 Sure.

From: Maxime Villard <max@m00nbsd.net>
To: Martin Husemann <martin@duskware.de>,
 Hisashi T Fujinaka <htodd@twofifty.com>
Cc: gnats-bugs@netbsd.org, htodd@i8u.org
Subject: Re: kern/53664 (nfs client broken on netbsd-8)
Date: Wed, 17 Oct 2018 07:13:46 +0200

 Le 13/10/2018 à 18:56, Martin Husemann a écrit :
 > On Sat, Oct 13, 2018 at 03:00:01PM +0000, Hisashi T Fujinaka wrote:
 >> The following reply was made to PR kern/53664; it has been noted by GNATS.
 >>   For that reason, I would think that Martin's idea, a proposed patch for
 >>   -8, would be a better test.
 > 
 > Here is one that catches up to -current for that file
 > 
 > Martin

 Given that the fix works can you pull it up? Thanks

State-Changed-From-To: open->closed
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Thu, 25 Oct 2018 16:00:28 +0000
State-Changed-Why:
fixed and pulled up


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.