NetBSD Problem Report #57691

From schmonz@schmonz.com  Fri Nov 10 19:06:16 2023
Return-Path: <schmonz@schmonz.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 506FE1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 10 Nov 2023 19:06:16 +0000 (UTC)
Message-Id: <20231110190612.13198.qmail@miracle-dirt.schmonz.com>
Date: 10 Nov 2023 19:06:12 -0000
From: schmonz@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: NFS client regression with macOS 14 server
X-Send-Pr-Version: 3.95

>Number:         57691
>Category:       kern
>Synopsis:       NFS client regression with macOS 14 server
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    schmonz
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Nov 10 19:10:00 +0000 2023
>Closed-Date:    Sun Dec 10 18:26:12 +0000 2023
>Last-Modified:  Tue Dec 12 16:50:01 +0000 2023
>Originator:     Amitai Schleier
>Release:        NetBSD 9.3 and 10.0_BETA
>Organization:
Latent Agility
>Environment:
NetBSD netbsd9-amd64.pet-power-plant.local 9.3 NetBSD 9.3 (GENERIC) #0: Thu Aug  4 15:30:37 UTC 2022  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64

NetBSD netbsd10-arm64.magnetic-babysitter.local 10.0_BETA NetBSD 10.0_BETA (GENERIC64) #0: Wed Feb  1 19:00:10 UTC 2023  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64 evbarm
>Description:
After upgrading my NFS server from macOS 13 to 14, I started
encountering this error regularly on my netbsd-9 and -10 clients,
in ordinary pkgsrc usage:

    bmake: Cannot open `.' (Invalid argument)

Once this has happened in a subtree of the NFS mount, file operations
in that subtree all fail until the NetBSD client (a VM on the same
physical machine) has been rebooted.

I didn't have this interoperability problem with macOS 13's NFS
server, and I haven't been able to reproduce this problem with any
of the other OSes (FreeBSD, OpenBSD, Tribblix, many Linuxes) also
running in VMs on the same machine.

NFS in this state has become unusable for me.
>How-To-Repeat:
On either netbsd-9 or netbsd-10, with an NFS mount in $HOME/trees
from a macOS 14 host:

    cd $HOME/trees/pkgsrc-cvs/shells/oksh
    bmake

ktrace of that:
https://netbsd.schmonz.com/tmp/nfs/make-in-pkgsrc-shells-oksh-over-nfs-kdump.txt

tcpdump of "ls" after the problem has manifested:
https://netbsd.schmonz.com/tmp/nfs/ls-after-the-problem-tcpdump.txt

A smaller reproducer:

    ls trees/foo; touch trees/foo; ls trees/foo; rm trees/foo; ls trees
    ls: trees/foo: No such file or directory
    trees/foo
    ls: trees: Invalid argument
>Fix:
none known

>Release-Note:

>Audit-Trail:

From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.org>
Subject: Re: kern/57691: NFS client regression with macOS 14 server
Date: Fri, 8 Dec 2023 10:55:54 -0800

 > On Nov 10, 2023, at 11:10=E2=80=AFAM, schmonz@netbsd.org =
 <schmonz@NetBSD.org> wrote:
 >=20
 > tcpdump of "ls" after the problem has manifested:
 > https://netbsd.schmonz.com/tmp/nfs/ls-after-the-problem-tcpdump.txt

 The tcpdump shows:

 16:16:35.683996 IP 10.0.2.2.shilp > 10.0.2.15.exp2: Flags [P.], seq =
 1573:1693, ack 1508, win 65535, length 120: NFS reply xid 834502658 =
 reply ok 116 readdir ERROR: READDIR/READDIRPLUS cookie is stale

 Looking at the code that issues READDIRPLUS in the kernel in =
 nfs_readdirplusrpc():

                 if (nmp->nm_iflag & NFSMNT_SWAPCOOKIE) {
                         txdr_swapcookie3(uiop->uio_offset, tl);
                 } else {
                         txdr_cookie3(uiop->uio_offset, tl);
                 }
                 tl +=3D 2;
                 *tl++ =3D dnp->n_cookieverf.nfsuquad[0];
                 *tl++ =3D dnp->n_cookieverf.nfsuquad[1];
                 *tl++ =3D txdr_unsigned(nmp->nm_readdirsize);

 I think the cookie verifier is wrong.  See: =
 https://www.rfc-editor.org/rfc/rfc1813#page-81

       cookieverf
 This should be set to 0 on the first request to read a
 directory. On subsequent requests, it should be a
 cookieverf as returned by the server. The cookieverf
 must match that returned by the READDIRPLUS call in
 which the cookie was acquired.

 Probably what=E2=80=99s going on is that the server is verifying the =
 directory cookie more strictly than before.  Those two lines that pack =
 the cookieverf should be inserting 0s if uno_offset is 0.

 -- thorpej

From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.org>
Subject: Re: kern/57691: NFS client regression with macOS 14 server
Date: Fri, 8 Dec 2023 11:17:49 -0800

 > On Dec 8, 2023, at 10:55=E2=80=AFAM, Jason Thorpe <thorpej@me.com> =
 wrote:
 >=20
 > Probably what=E2=80=99s going on is that the server is verifying the =
 directory cookie more strictly than before.  Those two lines that pack =
 the cookieverf should be inserting 0s if uno_offset is 0.

 Just confirmed by code inspection that FreeBSD always sends a 0 cookie =
 verifier for uio_offset 0.

 <quote>
 		if (cookie.qval =3D=3D 0) {
 			*tl++ =3D 0;
 			*tl++ =3D 0;
 		} else {
 </quote>

 (=46rom their nfsrpc_readdirplus().)

 -- thorpej

From: "Amitai Schleier" <schmonz@schmonz.com>
To: "Jason Thorpe" <thorpej@me.com>
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
  netbsd-bugs@netbsd.org,
  "NetBSD Kernel Technical Discussion List" <tech-kern@NetBSD.org>
Subject: Re: kern/57691: NFS client regression with macOS 14 server
Date: 8 Dec 2023 22:25:32 -0500

 On 8 Dec 2023, at 14:17, Jason Thorpe wrote:

 >> On Dec 8, 2023, at 10:55 AM, Jason Thorpe <thorpej@me.com> wrote:
 >>
 >> Probably what’s going on is that the server is verifying the 
 >> directory cookie more strictly than before.  Those two lines that 
 >> pack the cookieverf should be inserting 0s if uno_offset is 0.
 >
 > Just confirmed by code inspection that FreeBSD always sends a 0 cookie 
 > verifier for uio_offset 0.
 >
 > <quote>
 > 		if (cookie.qval == 0) {
 > 			*tl++ = 0;
 > 			*tl++ = 0;
 > 		} else {
 > </quote>

 Thank you for looking into this and giving me something to try. Neither 
 of the following diffs seems to make a difference, though (tested on my 
 10.0_RC1/aarch64 NFS client). Have I misunderstood something?


 Index: nfs/nfs_vnops.c
 ===================================================================
 RCS file: /cvsroot/src/sys/nfs/nfs_vnops.c,v
 retrieving revision 1.324
 diff -u -p -r1.324 nfs_vnops.c
 --- nfs/nfs_vnops.c	24 May 2022 06:28:02 -0000	1.324
 +++ nfs/nfs_vnops.c	9 Dec 2023 03:23:02 -0000
 @@ -2632,8 +2632,13 @@ nfs_readdirplusrpc(struct vnode *vp, str
   			txdr_cookie3(uiop->uio_offset, tl);
   		}
   		tl += 2;
 -		*tl++ = dnp->n_cookieverf.nfsuquad[0];
 -		*tl++ = dnp->n_cookieverf.nfsuquad[1];
 +		if (uiop->uio_offset == 0) {
 +			*tl++ = 0;
 +			*tl++ = 0;
 +		} else {
 +			*tl++ = dnp->n_cookieverf.nfsuquad[0];
 +			*tl++ = dnp->n_cookieverf.nfsuquad[1];
 +		}
   		*tl++ = txdr_unsigned(nmp->nm_readdirsize);
   		*tl = txdr_unsigned(nmp->nm_rsize);
   		nfsm_request(dnp, NFSPROC_READDIRPLUS, curlwp, cred);



 Index: nfs/nfs_vnops.c
 ===================================================================
 RCS file: /cvsroot/src/sys/nfs/nfs_vnops.c,v
 retrieving revision 1.324
 diff -u -p -r1.324 nfs_vnops.c
 --- nfs/nfs_vnops.c	24 May 2022 06:28:02 -0000	1.324
 +++ nfs/nfs_vnops.c	9 Dec 2023 03:12:56 -0000
 @@ -2632,6 +2632,10 @@ nfs_readdirplusrpc(struct vnode *vp, str
   			txdr_cookie3(uiop->uio_offset, tl);
   		}
   		tl += 2;
 +		if (uiop->uio_offset == 0) {
 +			dnp->n_cookieverf.nfsuquad[0] = 0;
 +			dnp->n_cookieverf.nfsuquad[1] = 0;
 +		}
   		*tl++ = dnp->n_cookieverf.nfsuquad[0];
   		*tl++ = dnp->n_cookieverf.nfsuquad[1];
   		*tl++ = txdr_unsigned(nmp->nm_readdirsize);

From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 "schmonz@netbsd.org" <schmonz@NetBSD.org>
Subject: Re: kern/57691: NFS client regression with macOS 14 server
Date: Sat, 9 Dec 2023 08:11:38 -0800

 > On Dec 8, 2023, at 7:30=E2=80=AFPM, Amitai Schleier =
 <schmonz@schmonz.com> wrote:
 >=20
 > Thank you for looking into this and giving me something to try. =
 Neither=20
 > of the following diffs seems to make a difference, though (tested on =
 my=20
 > 10.0_RC1/aarch64 NFS client). Have I misunderstood something?

 No, it just may not have been sufficient.  But I just tried to reproduce =
 this with a sparc (in qemu =E2=80=94 happened to be the most convenient =
 test client) client against my MacBook Air:

 dhcp-194:thorpej$ sw_vers
 ProductName: macOS
 ProductVersion: 14.1.2
 BuildVersion: 23B92
 dhcp-194:thorpej$=20

 =E2=80=A6and I can=E2=80=99t reproduce the problem.

 sparc-vm# df
 Filesystem                           512-blocks         Used        =
 Avail %Cap Mounted on
 /dev/sd0a                              16257372      1359056     =
 14085448   8% /
 kernfs                                        2            2            =
 0 100% /kern
 ptyfs                                         2            2            =
 0 100% /dev/pts
 procfs                                        8            8            =
 0 100% /proc
 tmpfs                                     57280            8        =
 57272   0% /var/shm
 192.168.1.194:/System/Volumes/Data   1942700360    692414544   =
 1250285816  35% /mnt
 sparc-vm# ls -l /mnt
 ls: .DocumentRevisions-V100: Permission denied
 ls: .Spotlight-V100: Permission denied
 ls: .TemporaryItems: Permission denied
 total 0
 drwxr-xr-x      5 root  wheel      160 Dec  9 07:43 =
 .PreviousSystemInformation
 drwxr-xr-x      7 root  wheel      224 Dec  9 07:35 =
 .com.apple.templatemigration.boot-install
 drwx------  49341 root  wheel  1578912 Dec  9 08:02 .fseventsd
 drwxrwxr-x     29 root  80         928 Dec  9 07:43 Applications
 drwxr-xr-x     69 root  wheel     2208 Dec  9 07:45 Library
 drwxr-xr-x      3 root  wheel       96 Nov 16 18:27 MobileSoftwareUpdate
 drwxr-xr-x      3 root  wheel       96 Nov 18 10:13 System
 drwxr-xr-x      5 root  80         160 Dec  9 07:43 Users
 drwxr-xr-x      3 root  wheel       96 Dec  9 07:44 Volumes
 drwxr-xr-x      2 root  wheel       64 Jul 14  2022 cores
 drwxr-xr-x      2 root  wheel       64 Jul 14  2022 home
 drwxr-xr-x      2 root  wheel       64 Jul 14  2022 mnt
 drwxr-xr-x      7 root  wheel      224 May 20  2023 opt
 drwxr-xr-x      6 root  wheel      192 Dec  9 07:44 private
 drwxr-xr-x      2 root  wheel       64 Jul 14  2022 sw
 drwxr-xr-x      5 root  wheel      160 Nov 18 10:13 usr
 sparc-vm# uname -a
 NetBSD sparc-vm 10.99.10 NetBSD 10.99.10 (GENERIC) #11: Sun Dec  3 =
 06:29:38 PST 2023  =
 thorpej@the-ripe-vessel:/space/src/sys/arch/sparc/compile/GENERIC sparc
 sparc-vm#

 -- thorpej

From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 "schmonz@netbsd.org" <schmonz@NetBSD.org>
Subject: Re: kern/57691: NFS client regression with macOS 14 server
Date: Sat, 9 Dec 2023 08:30:44 -0800

 > On Dec 9, 2023, at 8:11=E2=80=AFAM, Jason Thorpe <thorpej@me.com> =
 wrote:
 >=20
 > =E2=80=A6and I can=E2=80=99t reproduce the problem.

 Actually, never mind, I can=E2=80=A6 I didn=E2=80=99t notice your =
 =E2=80=9Cminimal reproducer=E2=80=9D before =E2=80=A6 and that triggers =
 the problem for me.

 -- thorpej

From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 "schmonz@netbsd.org" <schmonz@NetBSD.org>
Subject: Re: kern/57691: NFS client regression with macOS 14 server
Date: Sat, 9 Dec 2023 08:37:37 -0800

 > On Dec 8, 2023, at 7:30=E2=80=AFPM, Amitai Schleier =
 <schmonz@schmonz.com> wrote:
 >=20
 > Thank you for looking into this and giving me something to try. =
 Neither=20
 > of the following diffs seems to make a difference, though (tested on =
 my=20
 > 10.0_RC1/aarch64 NFS client). Have I misunderstood something?
 >=20
 >=20
 > Index: nfs/nfs_vnops.c
 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 > RCS file: /cvsroot/src/sys/nfs/nfs_vnops.c,v
 > retrieving revision 1.324
 > diff -u -p -r1.324 nfs_vnops.c
 > --- nfs/nfs_vnops.c 24 May 2022 06:28:02 -0000 1.324
 > +++ nfs/nfs_vnops.c 9 Dec 2023 03:23:02 -0000
 > @@ -2632,8 +2632,13 @@ nfs_readdirplusrpc(struct vnode *vp, str
 >    txdr_cookie3(uiop->uio_offset, tl);
 >    }
 >    tl +=3D 2;
 > - *tl++ =3D dnp->n_cookieverf.nfsuquad[0];
 > - *tl++ =3D dnp->n_cookieverf.nfsuquad[1];
 > + if (uiop->uio_offset =3D=3D 0) {
 > + *tl++ =3D 0;
 > + *tl++ =3D 0;
 > + } else {
 > + *tl++ =3D dnp->n_cookieverf.nfsuquad[0];
 > + *tl++ =3D dnp->n_cookieverf.nfsuquad[1];
 > + }
 >    *tl++ =3D txdr_unsigned(nmp->nm_readdirsize);
 >    *tl =3D txdr_unsigned(nmp->nm_rsize);
 >    nfsm_request(dnp, NFSPROC_READDIRPLUS, curlwp, cred);

 Ok, this is the correct variation of the change, but you missed a =
 spot=E2=80=A6 there=E2=80=99s also a similar bit of code for the READDIR =
 RPC.  Add the same snippet there and try again (just look for =
 =E2=80=9Ccookieverf=E2=80=9D).

 -- thorpej

Responsible-Changed-From-To: kern-bug-people->schmonz
Responsible-Changed-By: thorpej@NetBSD.org
Responsible-Changed-When: Sat, 09 Dec 2023 18:01:09 +0000
Responsible-Changed-Why:
Provided info to schmonz on how to fix.


State-Changed-From-To: open->analyzed
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Sat, 09 Dec 2023 18:01:09 +0000
State-Changed-Why:
Root cause understood.


From: "Amitai Schleier" <schmonz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57691 CVS commit: src/sys/nfs
Date: Sun, 10 Dec 2023 18:16:08 +0000

 Module Name:	src
 Committed By:	schmonz
 Date:		Sun Dec 10 18:16:08 UTC 2023

 Modified Files:
 	src/sys/nfs: nfs_vnops.c

 Log Message:
 NFS client: fix interop with macOS 14 servers.

 Symptom: a bunch of "Cannot open `.' (Invalid argument)".

 thorpej@ analysis and fix: on the first request to read a given
 directory, make sure READDIR and READDIRPLUS cookie verifiers are
 being set to 0. This is in RFC1813 and macOS must have gotten
 stricter about it.

 Verified on 10.0_RC1/aarch64 to fix the reproducers in PR kern/57691 as
 well as the original use case in which I met the bug: pkg_rr once again
 runs to completion.


 To generate a diff of this commit:
 cvs rdiff -u -r1.324 -r1.325 src/sys/nfs/nfs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->closed
State-Changed-By: schmonz@NetBSD.org
State-Changed-When: Sun, 10 Dec 2023 18:26:12 +0000
State-Changed-Why:
Fixed in -r1.325 of nfs_vnops.c. Thanks again thorpej@!


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57691 CVS commit: [netbsd-10] src/sys/nfs
Date: Mon, 11 Dec 2023 12:32:40 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Dec 11 12:32:40 UTC 2023

 Modified Files:
 	src/sys/nfs [netbsd-10]: nfs_vnops.c

 Log Message:
 Pull up following revision(s) (requested by schmonz in ticket #490):

 	sys/nfs/nfs_vnops.c: revision 1.325

 NFS client: fix interop with macOS 14 servers.

 Symptom: a bunch of "Cannot open `.' (Invalid argument)".
 thorpej@ analysis and fix: on the first request to read a given
 directory, make sure READDIR and READDIRPLUS cookie verifiers are
 being set to 0. This is in RFC1813 and macOS must have gotten
 stricter about it.

 Verified on 10.0_RC1/aarch64 to fix the reproducers in PR kern/57691 as
 well as the original use case in which I met the bug: pkg_rr once again
 runs to completion.


 To generate a diff of this commit:
 cvs rdiff -u -r1.324 -r1.324.4.1 src/sys/nfs/nfs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57691 CVS commit: [netbsd-9] src/sys/nfs
Date: Mon, 11 Dec 2023 12:34:43 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Dec 11 12:34:43 UTC 2023

 Modified Files:
 	src/sys/nfs [netbsd-9]: nfs_vnops.c

 Log Message:
 Pull up following revision(s) (requested by schmonz in ticket #1778):

 	sys/nfs/nfs_vnops.c: revision 1.325

 NFS client: fix interop with macOS 14 servers.

 Symptom: a bunch of "Cannot open `.' (Invalid argument)".
 thorpej@ analysis and fix: on the first request to read a given
 directory, make sure READDIR and READDIRPLUS cookie verifiers are
 being set to 0. This is in RFC1813 and macOS must have gotten
 stricter about it.

 Verified on 10.0_RC1/aarch64 to fix the reproducers in PR kern/57691 as
 well as the original use case in which I met the bug: pkg_rr once again
 runs to completion.


 To generate a diff of this commit:
 cvs rdiff -u -r1.311 -r1.311.4.1 src/sys/nfs/nfs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57691 CVS commit: [netbsd-8] src/sys/nfs
Date: Tue, 12 Dec 2023 16:48:59 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Tue Dec 12 16:48:59 UTC 2023

 Modified Files:
 	src/sys/nfs [netbsd-8]: nfs_vnops.c

 Log Message:
 Pull up following revision(s) (requested by schmonz in ticket #1927):

 	sys/nfs/nfs_vnops.c: revision 1.325

 NFS client: fix interop with macOS 14 servers.

 Symptom: a bunch of "Cannot open `.' (Invalid argument)".
 thorpej@ analysis and fix: on the first request to read a given
 directory, make sure READDIR and READDIRPLUS cookie verifiers are
 being set to 0. This is in RFC1813 and macOS must have gotten
 stricter about it.

 Verified on 10.0_RC1/aarch64 to fix the reproducers in PR kern/57691 as
 well as the original use case in which I met the bug: pkg_rr once again
 runs to completion.


 To generate a diff of this commit:
 cvs rdiff -u -r1.310 -r1.310.4.1 src/sys/nfs/nfs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.