NetBSD Problem Report #57691
From schmonz@schmonz.com Fri Nov 10 19:06:16 2023
Return-Path: <schmonz@schmonz.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 506FE1A9239
for <gnats-bugs@gnats.NetBSD.org>; Fri, 10 Nov 2023 19:06:16 +0000 (UTC)
Message-Id: <20231110190612.13198.qmail@miracle-dirt.schmonz.com>
Date: 10 Nov 2023 19:06:12 -0000
From: schmonz@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: NFS client regression with macOS 14 server
X-Send-Pr-Version: 3.95
>Number: 57691
>Category: kern
>Synopsis: NFS client regression with macOS 14 server
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: schmonz
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Nov 10 19:10:00 +0000 2023
>Closed-Date: Sun Dec 10 18:26:12 +0000 2023
>Last-Modified: Tue Dec 12 16:50:01 +0000 2023
>Originator: Amitai Schleier
>Release: NetBSD 9.3 and 10.0_BETA
>Organization:
Latent Agility
>Environment:
NetBSD netbsd9-amd64.pet-power-plant.local 9.3 NetBSD 9.3 (GENERIC) #0: Thu Aug 4 15:30:37 UTC 2022 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
NetBSD netbsd10-arm64.magnetic-babysitter.local 10.0_BETA NetBSD 10.0_BETA (GENERIC64) #0: Wed Feb 1 19:00:10 UTC 2023 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64 evbarm
>Description:
After upgrading my NFS server from macOS 13 to 14, I started
encountering this error regularly on my netbsd-9 and -10 clients,
in ordinary pkgsrc usage:
bmake: Cannot open `.' (Invalid argument)
Once this has happened in a subtree of the NFS mount, file operations
in that subtree all fail until the NetBSD client (a VM on the same
physical machine) has been rebooted.
I didn't have this interoperability problem with macOS 13's NFS
server, and I haven't been able to reproduce this problem with any
of the other OSes (FreeBSD, OpenBSD, Tribblix, many Linuxes) also
running in VMs on the same machine.
NFS in this state has become unusable for me.
>How-To-Repeat:
On either netbsd-9 or netbsd-10, with an NFS mount in $HOME/trees
from a macOS 14 host:
cd $HOME/trees/pkgsrc-cvs/shells/oksh
bmake
ktrace of that:
https://netbsd.schmonz.com/tmp/nfs/make-in-pkgsrc-shells-oksh-over-nfs-kdump.txt
tcpdump of "ls" after the problem has manifested:
https://netbsd.schmonz.com/tmp/nfs/ls-after-the-problem-tcpdump.txt
A smaller reproducer:
ls trees/foo; touch trees/foo; ls trees/foo; rm trees/foo; ls trees
ls: trees/foo: No such file or directory
trees/foo
ls: trees: Invalid argument
>Fix:
none known
>Release-Note:
>Audit-Trail:
From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org,
NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.org>
Subject: Re: kern/57691: NFS client regression with macOS 14 server
Date: Fri, 8 Dec 2023 10:55:54 -0800
> On Nov 10, 2023, at 11:10=E2=80=AFAM, schmonz@netbsd.org =
<schmonz@NetBSD.org> wrote:
>=20
> tcpdump of "ls" after the problem has manifested:
> https://netbsd.schmonz.com/tmp/nfs/ls-after-the-problem-tcpdump.txt
The tcpdump shows:
16:16:35.683996 IP 10.0.2.2.shilp > 10.0.2.15.exp2: Flags [P.], seq =
1573:1693, ack 1508, win 65535, length 120: NFS reply xid 834502658 =
reply ok 116 readdir ERROR: READDIR/READDIRPLUS cookie is stale
Looking at the code that issues READDIRPLUS in the kernel in =
nfs_readdirplusrpc():
if (nmp->nm_iflag & NFSMNT_SWAPCOOKIE) {
txdr_swapcookie3(uiop->uio_offset, tl);
} else {
txdr_cookie3(uiop->uio_offset, tl);
}
tl +=3D 2;
*tl++ =3D dnp->n_cookieverf.nfsuquad[0];
*tl++ =3D dnp->n_cookieverf.nfsuquad[1];
*tl++ =3D txdr_unsigned(nmp->nm_readdirsize);
I think the cookie verifier is wrong. See: =
https://www.rfc-editor.org/rfc/rfc1813#page-81
cookieverf
This should be set to 0 on the first request to read a
directory. On subsequent requests, it should be a
cookieverf as returned by the server. The cookieverf
must match that returned by the READDIRPLUS call in
which the cookie was acquired.
Probably what=E2=80=99s going on is that the server is verifying the =
directory cookie more strictly than before. Those two lines that pack =
the cookieverf should be inserting 0s if uno_offset is 0.
-- thorpej
From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org,
NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.org>
Subject: Re: kern/57691: NFS client regression with macOS 14 server
Date: Fri, 8 Dec 2023 11:17:49 -0800
> On Dec 8, 2023, at 10:55=E2=80=AFAM, Jason Thorpe <thorpej@me.com> =
wrote:
>=20
> Probably what=E2=80=99s going on is that the server is verifying the =
directory cookie more strictly than before. Those two lines that pack =
the cookieverf should be inserting 0s if uno_offset is 0.
Just confirmed by code inspection that FreeBSD always sends a 0 cookie =
verifier for uio_offset 0.
<quote>
if (cookie.qval =3D=3D 0) {
*tl++ =3D 0;
*tl++ =3D 0;
} else {
</quote>
(=46rom their nfsrpc_readdirplus().)
-- thorpej
From: "Amitai Schleier" <schmonz@schmonz.com>
To: "Jason Thorpe" <thorpej@me.com>
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org,
"NetBSD Kernel Technical Discussion List" <tech-kern@NetBSD.org>
Subject: Re: kern/57691: NFS client regression with macOS 14 server
Date: 8 Dec 2023 22:25:32 -0500
On 8 Dec 2023, at 14:17, Jason Thorpe wrote:
>> On Dec 8, 2023, at 10:55 AM, Jason Thorpe <thorpej@me.com> wrote:
>>
>> Probably what’s going on is that the server is verifying the
>> directory cookie more strictly than before. Those two lines that
>> pack the cookieverf should be inserting 0s if uno_offset is 0.
>
> Just confirmed by code inspection that FreeBSD always sends a 0 cookie
> verifier for uio_offset 0.
>
> <quote>
> if (cookie.qval == 0) {
> *tl++ = 0;
> *tl++ = 0;
> } else {
> </quote>
Thank you for looking into this and giving me something to try. Neither
of the following diffs seems to make a difference, though (tested on my
10.0_RC1/aarch64 NFS client). Have I misunderstood something?
Index: nfs/nfs_vnops.c
===================================================================
RCS file: /cvsroot/src/sys/nfs/nfs_vnops.c,v
retrieving revision 1.324
diff -u -p -r1.324 nfs_vnops.c
--- nfs/nfs_vnops.c 24 May 2022 06:28:02 -0000 1.324
+++ nfs/nfs_vnops.c 9 Dec 2023 03:23:02 -0000
@@ -2632,8 +2632,13 @@ nfs_readdirplusrpc(struct vnode *vp, str
txdr_cookie3(uiop->uio_offset, tl);
}
tl += 2;
- *tl++ = dnp->n_cookieverf.nfsuquad[0];
- *tl++ = dnp->n_cookieverf.nfsuquad[1];
+ if (uiop->uio_offset == 0) {
+ *tl++ = 0;
+ *tl++ = 0;
+ } else {
+ *tl++ = dnp->n_cookieverf.nfsuquad[0];
+ *tl++ = dnp->n_cookieverf.nfsuquad[1];
+ }
*tl++ = txdr_unsigned(nmp->nm_readdirsize);
*tl = txdr_unsigned(nmp->nm_rsize);
nfsm_request(dnp, NFSPROC_READDIRPLUS, curlwp, cred);
Index: nfs/nfs_vnops.c
===================================================================
RCS file: /cvsroot/src/sys/nfs/nfs_vnops.c,v
retrieving revision 1.324
diff -u -p -r1.324 nfs_vnops.c
--- nfs/nfs_vnops.c 24 May 2022 06:28:02 -0000 1.324
+++ nfs/nfs_vnops.c 9 Dec 2023 03:12:56 -0000
@@ -2632,6 +2632,10 @@ nfs_readdirplusrpc(struct vnode *vp, str
txdr_cookie3(uiop->uio_offset, tl);
}
tl += 2;
+ if (uiop->uio_offset == 0) {
+ dnp->n_cookieverf.nfsuquad[0] = 0;
+ dnp->n_cookieverf.nfsuquad[1] = 0;
+ }
*tl++ = dnp->n_cookieverf.nfsuquad[0];
*tl++ = dnp->n_cookieverf.nfsuquad[1];
*tl++ = txdr_unsigned(nmp->nm_readdirsize);
From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org,
"schmonz@netbsd.org" <schmonz@NetBSD.org>
Subject: Re: kern/57691: NFS client regression with macOS 14 server
Date: Sat, 9 Dec 2023 08:11:38 -0800
> On Dec 8, 2023, at 7:30=E2=80=AFPM, Amitai Schleier =
<schmonz@schmonz.com> wrote:
>=20
> Thank you for looking into this and giving me something to try. =
Neither=20
> of the following diffs seems to make a difference, though (tested on =
my=20
> 10.0_RC1/aarch64 NFS client). Have I misunderstood something?
No, it just may not have been sufficient. But I just tried to reproduce =
this with a sparc (in qemu =E2=80=94 happened to be the most convenient =
test client) client against my MacBook Air:
dhcp-194:thorpej$ sw_vers
ProductName: macOS
ProductVersion: 14.1.2
BuildVersion: 23B92
dhcp-194:thorpej$=20
=E2=80=A6and I can=E2=80=99t reproduce the problem.
sparc-vm# df
Filesystem 512-blocks Used =
Avail %Cap Mounted on
/dev/sd0a 16257372 1359056 =
14085448 8% /
kernfs 2 2 =
0 100% /kern
ptyfs 2 2 =
0 100% /dev/pts
procfs 8 8 =
0 100% /proc
tmpfs 57280 8 =
57272 0% /var/shm
192.168.1.194:/System/Volumes/Data 1942700360 692414544 =
1250285816 35% /mnt
sparc-vm# ls -l /mnt
ls: .DocumentRevisions-V100: Permission denied
ls: .Spotlight-V100: Permission denied
ls: .TemporaryItems: Permission denied
total 0
drwxr-xr-x 5 root wheel 160 Dec 9 07:43 =
.PreviousSystemInformation
drwxr-xr-x 7 root wheel 224 Dec 9 07:35 =
.com.apple.templatemigration.boot-install
drwx------ 49341 root wheel 1578912 Dec 9 08:02 .fseventsd
drwxrwxr-x 29 root 80 928 Dec 9 07:43 Applications
drwxr-xr-x 69 root wheel 2208 Dec 9 07:45 Library
drwxr-xr-x 3 root wheel 96 Nov 16 18:27 MobileSoftwareUpdate
drwxr-xr-x 3 root wheel 96 Nov 18 10:13 System
drwxr-xr-x 5 root 80 160 Dec 9 07:43 Users
drwxr-xr-x 3 root wheel 96 Dec 9 07:44 Volumes
drwxr-xr-x 2 root wheel 64 Jul 14 2022 cores
drwxr-xr-x 2 root wheel 64 Jul 14 2022 home
drwxr-xr-x 2 root wheel 64 Jul 14 2022 mnt
drwxr-xr-x 7 root wheel 224 May 20 2023 opt
drwxr-xr-x 6 root wheel 192 Dec 9 07:44 private
drwxr-xr-x 2 root wheel 64 Jul 14 2022 sw
drwxr-xr-x 5 root wheel 160 Nov 18 10:13 usr
sparc-vm# uname -a
NetBSD sparc-vm 10.99.10 NetBSD 10.99.10 (GENERIC) #11: Sun Dec 3 =
06:29:38 PST 2023 =
thorpej@the-ripe-vessel:/space/src/sys/arch/sparc/compile/GENERIC sparc
sparc-vm#
-- thorpej
From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org,
"schmonz@netbsd.org" <schmonz@NetBSD.org>
Subject: Re: kern/57691: NFS client regression with macOS 14 server
Date: Sat, 9 Dec 2023 08:30:44 -0800
> On Dec 9, 2023, at 8:11=E2=80=AFAM, Jason Thorpe <thorpej@me.com> =
wrote:
>=20
> =E2=80=A6and I can=E2=80=99t reproduce the problem.
Actually, never mind, I can=E2=80=A6 I didn=E2=80=99t notice your =
=E2=80=9Cminimal reproducer=E2=80=9D before =E2=80=A6 and that triggers =
the problem for me.
-- thorpej
From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org,
"schmonz@netbsd.org" <schmonz@NetBSD.org>
Subject: Re: kern/57691: NFS client regression with macOS 14 server
Date: Sat, 9 Dec 2023 08:37:37 -0800
> On Dec 8, 2023, at 7:30=E2=80=AFPM, Amitai Schleier =
<schmonz@schmonz.com> wrote:
>=20
> Thank you for looking into this and giving me something to try. =
Neither=20
> of the following diffs seems to make a difference, though (tested on =
my=20
> 10.0_RC1/aarch64 NFS client). Have I misunderstood something?
>=20
>=20
> Index: nfs/nfs_vnops.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /cvsroot/src/sys/nfs/nfs_vnops.c,v
> retrieving revision 1.324
> diff -u -p -r1.324 nfs_vnops.c
> --- nfs/nfs_vnops.c 24 May 2022 06:28:02 -0000 1.324
> +++ nfs/nfs_vnops.c 9 Dec 2023 03:23:02 -0000
> @@ -2632,8 +2632,13 @@ nfs_readdirplusrpc(struct vnode *vp, str
> txdr_cookie3(uiop->uio_offset, tl);
> }
> tl +=3D 2;
> - *tl++ =3D dnp->n_cookieverf.nfsuquad[0];
> - *tl++ =3D dnp->n_cookieverf.nfsuquad[1];
> + if (uiop->uio_offset =3D=3D 0) {
> + *tl++ =3D 0;
> + *tl++ =3D 0;
> + } else {
> + *tl++ =3D dnp->n_cookieverf.nfsuquad[0];
> + *tl++ =3D dnp->n_cookieverf.nfsuquad[1];
> + }
> *tl++ =3D txdr_unsigned(nmp->nm_readdirsize);
> *tl =3D txdr_unsigned(nmp->nm_rsize);
> nfsm_request(dnp, NFSPROC_READDIRPLUS, curlwp, cred);
Ok, this is the correct variation of the change, but you missed a =
spot=E2=80=A6 there=E2=80=99s also a similar bit of code for the READDIR =
RPC. Add the same snippet there and try again (just look for =
=E2=80=9Ccookieverf=E2=80=9D).
-- thorpej
Responsible-Changed-From-To: kern-bug-people->schmonz
Responsible-Changed-By: thorpej@NetBSD.org
Responsible-Changed-When: Sat, 09 Dec 2023 18:01:09 +0000
Responsible-Changed-Why:
Provided info to schmonz on how to fix.
State-Changed-From-To: open->analyzed
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Sat, 09 Dec 2023 18:01:09 +0000
State-Changed-Why:
Root cause understood.
From: "Amitai Schleier" <schmonz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57691 CVS commit: src/sys/nfs
Date: Sun, 10 Dec 2023 18:16:08 +0000
Module Name: src
Committed By: schmonz
Date: Sun Dec 10 18:16:08 UTC 2023
Modified Files:
src/sys/nfs: nfs_vnops.c
Log Message:
NFS client: fix interop with macOS 14 servers.
Symptom: a bunch of "Cannot open `.' (Invalid argument)".
thorpej@ analysis and fix: on the first request to read a given
directory, make sure READDIR and READDIRPLUS cookie verifiers are
being set to 0. This is in RFC1813 and macOS must have gotten
stricter about it.
Verified on 10.0_RC1/aarch64 to fix the reproducers in PR kern/57691 as
well as the original use case in which I met the bug: pkg_rr once again
runs to completion.
To generate a diff of this commit:
cvs rdiff -u -r1.324 -r1.325 src/sys/nfs/nfs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->closed
State-Changed-By: schmonz@NetBSD.org
State-Changed-When: Sun, 10 Dec 2023 18:26:12 +0000
State-Changed-Why:
Fixed in -r1.325 of nfs_vnops.c. Thanks again thorpej@!
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57691 CVS commit: [netbsd-10] src/sys/nfs
Date: Mon, 11 Dec 2023 12:32:40 +0000
Module Name: src
Committed By: martin
Date: Mon Dec 11 12:32:40 UTC 2023
Modified Files:
src/sys/nfs [netbsd-10]: nfs_vnops.c
Log Message:
Pull up following revision(s) (requested by schmonz in ticket #490):
sys/nfs/nfs_vnops.c: revision 1.325
NFS client: fix interop with macOS 14 servers.
Symptom: a bunch of "Cannot open `.' (Invalid argument)".
thorpej@ analysis and fix: on the first request to read a given
directory, make sure READDIR and READDIRPLUS cookie verifiers are
being set to 0. This is in RFC1813 and macOS must have gotten
stricter about it.
Verified on 10.0_RC1/aarch64 to fix the reproducers in PR kern/57691 as
well as the original use case in which I met the bug: pkg_rr once again
runs to completion.
To generate a diff of this commit:
cvs rdiff -u -r1.324 -r1.324.4.1 src/sys/nfs/nfs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57691 CVS commit: [netbsd-9] src/sys/nfs
Date: Mon, 11 Dec 2023 12:34:43 +0000
Module Name: src
Committed By: martin
Date: Mon Dec 11 12:34:43 UTC 2023
Modified Files:
src/sys/nfs [netbsd-9]: nfs_vnops.c
Log Message:
Pull up following revision(s) (requested by schmonz in ticket #1778):
sys/nfs/nfs_vnops.c: revision 1.325
NFS client: fix interop with macOS 14 servers.
Symptom: a bunch of "Cannot open `.' (Invalid argument)".
thorpej@ analysis and fix: on the first request to read a given
directory, make sure READDIR and READDIRPLUS cookie verifiers are
being set to 0. This is in RFC1813 and macOS must have gotten
stricter about it.
Verified on 10.0_RC1/aarch64 to fix the reproducers in PR kern/57691 as
well as the original use case in which I met the bug: pkg_rr once again
runs to completion.
To generate a diff of this commit:
cvs rdiff -u -r1.311 -r1.311.4.1 src/sys/nfs/nfs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57691 CVS commit: [netbsd-8] src/sys/nfs
Date: Tue, 12 Dec 2023 16:48:59 +0000
Module Name: src
Committed By: martin
Date: Tue Dec 12 16:48:59 UTC 2023
Modified Files:
src/sys/nfs [netbsd-8]: nfs_vnops.c
Log Message:
Pull up following revision(s) (requested by schmonz in ticket #1927):
sys/nfs/nfs_vnops.c: revision 1.325
NFS client: fix interop with macOS 14 servers.
Symptom: a bunch of "Cannot open `.' (Invalid argument)".
thorpej@ analysis and fix: on the first request to read a given
directory, make sure READDIR and READDIRPLUS cookie verifiers are
being set to 0. This is in RFC1813 and macOS must have gotten
stricter about it.
Verified on 10.0_RC1/aarch64 to fix the reproducers in PR kern/57691 as
well as the original use case in which I met the bug: pkg_rr once again
runs to completion.
To generate a diff of this commit:
cvs rdiff -u -r1.310 -r1.310.4.1 src/sys/nfs/nfs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.