NetBSD Problem Report #57134
From martin@duskware.de Fri Dec 23 19:31:08 2022
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 04DBC1A921F
for <gnats-bugs@gnats.NetBSD.org>; Fri, 23 Dec 2022 19:31:08 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: fictious disklabel not created on vnd
X-Send-Pr-Version: 3.95
>Number: 57134
>Category: kern
>Synopsis: st_size of stat on vnd raw partition sometimes is 0, causing newfs to fail
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Dec 23 19:35:00 +0000 2022
>Closed-Date: Wed Mar 22 09:50:48 +0000 2023
>Last-Modified: Wed Mar 22 09:50:48 +0000 2023
>Originator: Martin Husemann
>Release: NetBSD 10.0_BETA
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD thirdstage.duskware.de 10.0_BETA NetBSD 10.0_BETA (MODULAR) #3: Thu Dec 22 09:34:53 CET 2022 martin@thirdstage.duskware.de:/usr/src/sys/arch/sparc64/compile/MODULAR sparc64
Architecture: sparc64
Machine: sparc64
>Description:
Some atf tests fail due to newfs failures, e.g. /usr/tests/dev/fss.
It does:
dd if=/dev/zero of=./image bs=32k count=64
vndconfig -c vnd0 ./image
newfs -I vnd0
and that errors out:
newfs: Unable to determine file system size
Disklabel is:
3 partitions:
# size offset fstype [fsize bsize cpg/sgs]
c: 4096 0 unused 0 0 # (Cyl. 0 - 1)
... so something did not create a fictious label here.
>How-To-Repeat:
s/a
>Fix:
n/a
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/57134: fstat on vnd returns empty size
Date: Mon, 26 Dec 2022 13:33:30 +0100
The code runs via newfs.c:516
fsi = opendisk(special, O_RDONLY, device, sizeof(device), 0);
special = device;
if (fsi < 0 || fstat(fsi, &sb) == -1)
err(1, "%s: open for read", special);
if (S_ISBLK(sb.st_mode)) {
errx(1, "%s is a block device. use raw device",
special);
}
the opendisk works (returns 3) and the fstat also works, but sb has
sb.st_size == 0 (and sb.st_dev == 0xab01, which also looks strange).
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/57134: fstat on vnd returns empty size
Date: Mon, 26 Dec 2022 13:40:15 +0100
Whate I forgot to mention: the fictious label is proerly created, just a bit
strange, which confused me initially (single cylinder tiny drive on the
small test image). The real issue is the device size stat delivering bogus
data.
This does not happen in -current for me on the same machine, which I find
highly confusing (there sould be no relevant diffs at this point).
Martin
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57134: fstat on vnd returns empty size
Date: Mon, 26 Dec 2022 21:42:29 -0000 (UTC)
martin@duskware.de (Martin Husemann) writes:
> the opendisk works (returns 3) and the fstat also works, but sb has
> sb.st_size == 0 (and sb.st_dev == 0xab01, which also looks strange).
disklabel routines are device and platform-specific. Some have been
unified, but there is still lots of "individual" behaviour.
The default result for a non-existing disklabel is to set only
the raw partition to the whole disk and to make it of type BSDFFS
so that you can use the raw disk for a FFS filesystem. That's what
happens on sparc64.
Platforms that use the PC-style readdisklabel routine will define
the raw partition as unused and set partition 'a' as BSDFFS spanning
the whole disk.
st_dev == 0xab01 = 171.1. That's strange, it should be the device
where the filesystem containing /dev is located. Maybe a typo?
0xa801 would be /dev/dk1. st_rdev would be the device itself, e.g.
0800 for vnd0a.
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57134: fstat on vnd returns empty size
Date: Tue, 27 Dec 2022 09:04:54 +0100
I can now reproduce the same failure on evbarm running -current:
tp-start: 1672128177.136445, t_fss, 1
tc-start: 1672128177.136819, basic
tc-se:64+0 records in
tc-se:64+0 records out
tc-se:2097152 bytes transferred in 0.060 secs (34952533 bytes/sec)
tc-se:newfs: Unable to determine file system size
tc-se:mount_ffs: /dev/vnd0 on /tmp/atf-run.Y98zIZ/m1: incorrect super block
tc-se:fssconfig: /dev/rfss0: FSSIOCSET: Invalid argument
tc-se:mount_ffs: /dev/fss0 on /tmp/atf-run.Y98zIZ/m2: Device not configured
tc-se:sh: cannot open ./m2/text: no such file
tc-se:umount: /dev/fss0: not currently mounted
tc-se:fssconfig: /dev/rfss0: FSSIOCCLR: Device busy
tc-se:umount: /dev/vnd0: not currently mounted
tc-end: 1672128178.156306, basic, failed, Original data != (Original data != )
tp-end: 1672128178.168914, t_fss
NetBSD unpluged.duskware.de 10.99.2 NetBSD 10.99.2 (UNPLUGED) #470: Mon Dec 26 15:01:41 CET 2022 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/evbarm/compile/UNPLUGED evbarm
This is with / on NFS and /tmp on tmpfs (whereas the original report for
sparc64 was with / on FFSv2 and /tmp on tmpfs, but I guess the tmpfs is
the only relevatn file system for atf runs).
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/57134: fstat on vnd returns empty size
Date: Tue, 27 Dec 2022 11:38:55 +0100
So I compared on a working -current machine running evbearmv7hf-eb
and the non-working -current evbarm machine (but with kernel and userland
build from exactly the same tree).
The difference is indeed the content of the fstat return value in sb:
non-working case:
(gdb) p sb
$16 = {st_dev = 2817, st_mode = 8608, st_ino = 26895769, st_nlink = 1,
st_uid = 0, st_gid = 5, st_rdev = 4866, st_atim = {tv_sec = 1672128063,
tv_nsec = 370078288}, st_mtim = {tv_sec = 1672128063,
tv_nsec = 370078288}, st_ctim = {tv_sec = 1672128063,
tv_nsec = 371967752}, st_birthtim = {tv_sec = -1, tv_nsec = -1},
st_size = 0, st_blocks = 0, st_blksize = 65536, st_flags = 0, st_gen = 0,
st_spare = {0, 0}}
(gdb) p special
$17 = 0x2bc78 <device> "/dev/rvnd0"
# ll /dev/rvnd0
crw-r----- 1 root operator 19, 2 Dec 27 09:01 /dev/rvnd0
working case:
$6 = {st_dev = 4096, st_mode = 8608, st_ino = 15746732, st_nlink = 1,
st_uid = 0, st_gid = 5, st_rdev = 4866, st_atim = {tv_sec = 1672128436,
tv_nsec = 527951165}, st_mtim = {tv_sec = 1672128436,
tv_nsec = 527951165}, st_ctim = {tv_sec = 1672128436,
tv_nsec = 527951165}, st_birthtim = {tv_sec = 1596518740,
tv_nsec = 687526124}, st_size = 2097152, st_blocks = 0,
st_blksize = 65536, st_flags = 0, st_gen = 0, st_spare = {0, 0}}
(gdb) p special
$17 = 0x2bc78 <device> "/dev/rvnd0"
# ll /dev/rvnd0
crw-r----- 1 root operator 19, 2 Dec 27 09:07 /dev/rvnd0
So something in GENERIC that is not in my evbarm kernel? Some missing COMPAT_*
or something most likely (but that still would not explain the sparc64 problem)
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57134: st_size of stat on vnd raw partition sometimes is 0,
causing newfs to fail
Date: Wed, 28 Dec 2022 13:11:30 +0100
The working vs. the non working case on sparc64 is reproducable by
using a tmpfs /dev (non-working) vs. a FFSv2 /dev (working).
I tested another machine (aarch64) with /dev on tmpfs and it also fails.
So it seems to fail if the dev node for vnd0 is on NFS or tmpfs.
Martin
From: "J. Hannken-Illjes" <hannken@mailbox.org>
To: NetBSD GNATS <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/57134: st_size of stat on vnd raw partition sometimes is 0,
causing newfs to fail
Date: Sun, 1 Jan 2023 11:45:26 +0100
--Apple-Mail=_BE0039CA-ECD5-4A4A-8415-1CC7D566CCF4
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
This has nothing to do with "st_size", I hope it is
undefined for devices but I'm not a POSIX expert.
The problem is sbin/fsck/partutil.c::getdiskinfo()
returning zero size as it assumes "vnd0" (without
partition character at end) is illegal for a
partitioned disk.
Please try the attached diff.
--
J. Hannken-Illjes - hannken@mailbox.org
--Apple-Mail=_BE0039CA-ECD5-4A4A-8415-1CC7D566CCF4
Content-Disposition: attachment;
filename=001_getdiskinfo.diff
Content-Type: application/octet-stream;
x-unix-mode=0644;
name="001_getdiskinfo.diff"
Content-Transfer-Encoding: 7bit
getdiskinfo
Change getdiskinfo() to no longer infer the partition from the device name.
Since 2016-06-16 we create disk devices "<type><<unit>" as an alias
for "<type><<unit><part>" where "<part>" is the raw partition.
These devices are treated as invalid partitions and a zero geometry
is returned.
Take the partition from "st_rdev" instead.
Should fix PR kern/57134: st_size of stat on vnd raw partition sometimes
is 0, causing newfs to fail
diff -r 70eda1015a53 -r 7705ad77c23e sbin/fsck/partutil.c
--- sbin/fsck/partutil.c
+++ sbin/fsck/partutil.c
@@ -155,9 +155,8 @@ getdiskinfo(const char *s, int fd, const
if (stat(s, &sb) == -1)
return 0;
- ptn = strchr(s, '\0')[-1] - 'a';
- if ((unsigned)ptn >= lp->d_npartitions ||
- (devminor_t)ptn != DISKPART(sb.st_rdev))
+ ptn = DISKPART(sb.st_rdev);
+ if (ptn < 0 || ptn >= lp->d_npartitions)
return 0;
pp = &lp->d_partitions[ptn];
--Apple-Mail=_BE0039CA-ECD5-4A4A-8415-1CC7D566CCF4--
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57134: st_size of stat on vnd raw partition sometimes is 0, causing newfs to fail
Date: Sun, 01 Jan 2023 20:42:53 +0700
Date: Sun, 1 Jan 2023 10:50:02 +0000 (UTC)
From: "J. Hannken-Illjes" <hannken@mailbox.org>
Message-ID: <20230101105002.EBBD91A923A@mollari.NetBSD.org>
| This has nothing to do with "st_size", I hope it is
| undefined for devices but I'm not a POSIX expert.
It is unspecified for devices, which allows implementations to do
whatever they like (and which means portable applications, of which
newfs is not intended to be one, should not depend upon the info).
| Should fix PR kern/57134: st_size of stat on vnd raw partition sometimes
| is 0, causing newfs to fail
It very likely will, as newfs uses the size from getdiskinfo() (from the
fsck sources) if st_size of the device is 0.
So that should fix the immediate problem.
It doesn't explain why st_size on a vnd device returns the underlying
file size when its inode is on a UFS2 filesystem, but doesn't when it
is on a tmpfs or NFS filesystem. That would be nice to discover.
kre
From: "J. Hannken-Illjes" <hannken@mailbox.org>
To: NetBSD GNATS <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/57134: st_size of stat on vnd raw partition sometimes is 0,
causing newfs to fail
Date: Sun, 1 Jan 2023 18:51:34 +0100
> On 1. Jan 2023, at 14:45, Robert Elz <kre@munnari.OZ.AU> wrote:
>
> It doesn't explain why st_size on a vnd device returns the underlying
> file size when its inode is on a UFS2 filesystem, but doesn't when it
> is on a tmpfs or NFS filesystem. That would be nice to discover.
It (vnode member v_size) gets set from spec_open() around line 1840
of miscfs/specfs/spec_vnops.c. While UFS returns vp->v_size on
getattr call tmpfs returns its internal size tn_size which doesn't
get updated from spec_open().
Setting the size from spec_open() looks wrong, as a program configuring
a vnd device may miss the size change if it uses open -> configure ->
fstat with a single file descriptor.
--
J. Hannken-Illjes - hannken@mailbox.org
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57134: st_size of stat on vnd raw partition sometimes is 0, causing newfs to fail
Date: Sun, 1 Jan 2023 18:35:34 -0000 (UTC)
kre@munnari.OZ.AU (Robert Elz) writes:
> It doesn't explain why st_size on a vnd device returns the underlying
> file size when its inode is on a UFS2 filesystem, but doesn't when it
> is on a tmpfs or NFS filesystem. That would be nice to discover.
The size only exists in the cached vnode after opening the device
(determined by specfs, and only for real disks, not wedges that
don't support DIOCGPARTINFO). FFS returns data from the vnode for
stat() but tmpfs keeps its own size value in its own tmpfs_node
structure that specfs doesn't know about.
Not sure where the information gets lost with NFS.
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57134: st_size of stat on vnd raw partition sometimes is 0, causing newfs to fail
Date: Sun, 1 Jan 2023 18:42:51 -0000 (UTC)
hannken@mailbox.org ("J. Hannken-Illjes") writes:
> Setting the size from spec_open() looks wrong, as a program configuring
> a vnd device may miss the size change if it uses open -> configure ->
> fstat with a single file descriptor.
The disk or partition size is defined by the disklabel, and that
isn't expected to change while the device is open. That is probably
also the reason to fetch the size on open, only then does the device
know about the label.
From: "J. Hannken-Illjes" <hannken@mailbox.org>
To: NetBSD GNATS <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/57134: st_size of stat on vnd raw partition sometimes is 0,
causing newfs to fail
Date: Sun, 1 Jan 2023 19:50:12 +0100
> On 1. Jan 2023, at 19:45, Michael van Elst <mlelstv@serpens.de> wrote:
>
> The disk or partition size is defined by the disklabel, and that
> isn't expected to change while the device is open. That is probably
> also the reason to fetch the size on open, only then does the device
> know about the label.
This assumption is wrong for this sequence:
fd = open("/dev/rvnd0",)
ioctl(fd, VNDIOCSET,)
fstat(fd, &sbuf)
as the partition size changes AFTER open.
--
J. Hannken-Illjes - hannken@mailbox.org
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57134: st_size of stat on vnd raw partition sometimes is 0, causing newfs to fail
Date: Mon, 2 Jan 2023 06:20:54 -0000 (UTC)
hannken@mailbox.org ("J. Hannken-Illjes") writes:
> This assumption is wrong for this sequence:
>
> fd = open("/dev/rvnd0",)
> ioctl(fd, VNDIOCSET,)
> fstat(fd, &sbuf)
>
> as the partition size changes AFTER open.
Yes, sad behaviour of vnd that it uses an uninitialized unit to create
a real unit. Makes lots of things difficult and you should better
not rely on this detail.
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57134: st_size of stat on vnd raw partition sometimes is 0, causing newfs to fail
Date: Mon, 02 Jan 2023 15:43:34 +0700
Date: Mon, 2 Jan 2023 06:25:01 +0000 (UTC)
From: mlelstv@serpens.de (Michael van Elst)
Message-ID: <20230102062501.671A91A923A@mollari.NetBSD.org>
| > This assumption is wrong for this sequence:
| >
| > fd = open("/dev/rvnd0",)
| > ioctl(fd, VNDIOCSET,)
| > fstat(fd, &sbuf)
| >
| > as the partition size changes AFTER open.
newfs doesn't do that, so that should not be an issue for it.
If something else is changing the vnd size while newfs is running
we have a race condition that there's no way to avoid - but it is
kind of unbelievable that that would be happening while sysinst is
installing a system.
| Yes, sad behaviour of vnd that it uses an uninitialized unit to create
| a real unit. Makes lots of things difficult and you should better
| not rely on this detail.
Maybe it would be better for newfs to simply ignore st_size for all
devices (use it only when making a filesystem in a file) ?
That would be trivial to make happen, just need to set st_size to 0
in the code that handles devices (and calls getdiskinfo()).
kre
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57134: st_size of stat on vnd raw partition sometimes is 0, causing newfs to fail
Date: Mon, 2 Jan 2023 10:01:24 -0000 (UTC)
kre@munnari.OZ.AU (Robert Elz) writes:
> Maybe it would be better for newfs to simply ignore st_size for all
> devices (use it only when making a filesystem in a file) ?
It almost does. For all cases where st_size is invalid, it is 0 and that
value is ignored.
> That would be trivial to make happen, just need to set st_size to 0
> in the code that handles devices (and calls getdiskinfo()).
I think having st_size being valid all the time for disks would be
useful, in particular because Linux works like that.
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57134: st_size of stat on vnd raw partition sometimes is 0, causing newfs to fail
Date: Mon, 02 Jan 2023 17:57:49 +0700
Date: Mon, 2 Jan 2023 10:05:02 +0000 (UTC)
From: mlelstv@serpens.de (Michael van Elst)
Message-ID: <20230102100502.151AE1A923A@mollari.NetBSD.org>
| I think having st_size being valid all the time for disks would be
| useful, in particular because Linux works like that.
I have no objection (though even that doesn't mean that newfs needs
to use it) - but if we're going to do that, it needs to work regardless
of what the filesystem the device node is located on happens to be (so
tmpfs and nfs need changing, and lfs, cd9660 (if that supports devices at all),
zfs, udf, and whatever else we have will all need testing (or at a
minimum, code examination). Then we need to verify that none of the
layering mechanisms (unionfs ...) change things.
kre
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57134: st_size of stat on vnd raw partition sometimes is 0,
causing newfs to fail
Date: Mon, 2 Jan 2023 16:57:32 +0100
On Sun, Jan 01, 2023 at 10:50:02AM +0000, J. Hannken-Illjes wrote:
> Please try the attached diff.
Not very suprisingly this fixes the test issue.
Martin
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57134 CVS commit: src/sbin/fsck
Date: Mon, 2 Jan 2023 16:08:13 +0000
Module Name: src
Committed By: hannken
Date: Mon Jan 2 16:08:13 UTC 2023
Modified Files:
src/sbin/fsck: partutil.c
Log Message:
Change getdiskinfo() to no longer infer the partition from the device name.
Since 2016-06-16 we create disk devices "<type><<unit>" as an alias
for "<type><<unit><part>" where "<part>" is the raw partition.
These devices are treated as invalid partitions and a zero geometry
is returned.
Take the partition from "st_rdev" instead.
Fix for PR kern/57134: st_size of stat on vnd raw partition sometimes
is 0, causing newfs to fail
To generate a diff of this commit:
cvs rdiff -u -r1.17 -r1.18 src/sbin/fsck/partutil.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57134 CVS commit: [netbsd-9] src/sbin/fsck
Date: Tue, 3 Jan 2023 18:22:09 +0000
Module Name: src
Committed By: martin
Date: Tue Jan 3 18:22:09 UTC 2023
Modified Files:
src/sbin/fsck [netbsd-9]: partutil.c
Log Message:
Pull up following revision(s) (requested by hannken in ticket #1560):
sbin/fsck/partutil.c: revision 1.18
Change getdiskinfo() to no longer infer the partition from the device name.
Since 2016-06-16 we create disk devices "<type><<unit>" as an alias
for "<type><<unit><part>" where "<part>" is the raw partition.
These devices are treated as invalid partitions and a zero geometry
is returned.
Take the partition from "st_rdev" instead.
Fix for PR kern/57134: st_size of stat on vnd raw partition sometimes
is 0, causing newfs to fail
To generate a diff of this commit:
cvs rdiff -u -r1.15.18.2 -r1.15.18.3 src/sbin/fsck/partutil.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57134 CVS commit: [netbsd-10] src/sbin/fsck
Date: Tue, 3 Jan 2023 18:23:18 +0000
Module Name: src
Committed By: martin
Date: Tue Jan 3 18:23:18 UTC 2023
Modified Files:
src/sbin/fsck [netbsd-10]: partutil.c
Log Message:
Pull up following revision(s) (requested by hannken in ticket #32):
sbin/fsck/partutil.c: revision 1.18
Change getdiskinfo() to no longer infer the partition from the device name.
Since 2016-06-16 we create disk devices "<type><<unit>" as an alias
for "<type><<unit><part>" where "<part>" is the raw partition.
These devices are treated as invalid partitions and a zero geometry
is returned.
Take the partition from "st_rdev" instead.
Fix for PR kern/57134: st_size of stat on vnd raw partition sometimes
is 0, causing newfs to fail
To generate a diff of this commit:
cvs rdiff -u -r1.17 -r1.17.8.1 src/sbin/fsck/partutil.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Wed, 22 Mar 2023 09:46:29 +0000
State-Changed-Why:
Fix committed, pullup to -9 and -10 complete. Ok to close?
State-Changed-From-To: feedback->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Wed, 22 Mar 2023 09:50:48 +0000
State-Changed-Why:
yes, it is fixed
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.