NetBSD Problem Report #54969
From gson@gson.org Sun Feb 16 09:38:34 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 745BF1A9213
for <gnats-bugs@gnats.NetBSD.org>; Sun, 16 Feb 2020 09:38:34 +0000 (UTC)
Message-Id: <20200216093828.779AB253FA3@guava.gson.org>
Date: Sun, 16 Feb 2020 11:38:28 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Disk cache is no longer flushed on shutdown
X-Send-Pr-Version: 3.95
>Number: 54969
>Category: kern
>Synopsis: Disk cache is no longer flushed on shutdown
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Feb 16 09:40:00 +0000 2020
>Closed-Date: Sun Jul 25 09:46:13 +0000 2021
>Last-Modified: Sun Jul 25 09:46:13 +0000 2021
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date >= 2017.08.21.09.00.21, and -9
>Organization:
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:
The disk controller on one of my systems is logging an error message
on every power-on, indicating that the controller's battery backed
cache still contains data from the previous time the system was
powered on:
POST Error: 1792-Drive Array Reports Valid Data Found in Array Accelerator
This means that from the controller's perspective, the system was not
shut down cleanly. But the system has in fact been shut down cleanly,
at least as far as the kernel is concerned, by running "halt -p".
By adding some printfs to the sd(4) driver, I found that sd_flush() is
not being called during the shutdown, and neither is sd_lastclose().
The serial console shows "detached" messages from a large number of
devices including the non-root disk sd1 (which was never mounted), but
the root disk sd0 is conspicuously absent:
Feb 9 05:32:24 hostname halt: halted by root
Feb 9 05:32:24 hostname syslogd[167]: Exiting on signal 15
[ 8086.8109260] syncing disks... done
[ 8086.9609971] sd1: detached
[ 8086.9910100] cd0: detached
[ 8087.0210241] brgphy3: detached
[ 8087.0610430] brgphy2: detached
[ 8087.0910569] brgphy1: detached
[ 8087.1310757] brgphy0: detached
[ 8087.1710944] atapibus0: detached
[ 8087.2011089] uhub5: detached
[ 8087.2411278] uhub3: detached
[ 8087.2711418] uhub2: detached
[ 8087.3111606] uhub1: detached
[ 8087.3411746] com1: detached
[ 8087.4312167] bnx3: detached
[ 8087.5212591] bnx2: detached
[ 8087.6113014] bnx1: detached
[ 8087.7013435] bnx0: detached
[ 8087.7313577] atabus1: detached
[ 8087.7757913] atabus0: detached
[ 8087.8122505] usb5: detached
[ 8087.8455835] usb4: detached
[ 8087.8789168] usb2: detached
[ 8087.9122492] usb1: detached
[ 8087.9455816] pci11: detached
[ 8087.9799570] pci10: detached
[ 8088.0143320] pci9: detached
[ 8088.0476650] pci8: detached
[ 8088.0809989] pci7: detached
[ 8088.1143312] pci6: detached
[ 8088.1476642] pci5: detached
[ 8088.1809984] pci4: detached
[ 8088.2143316] pci3: detached
[ 8088.2476648] pci2: detached
[ 8088.2809972] sysbeep0: detached
[ 8088.3184975] midi0: detached
[ 8088.3516482] ehci0: detached
[ 8088.3816623] uhci4: detached
[ 8088.4216811] uhci2: detached
[ 8088.4516952] uhci1: detached
[ 8088.4817099] ppb10: detached
[ 8088.5217278] pchb12: detached
[ 8088.5517425] pchb11: detached
[ 8088.5917607] pchb10: detached
[ 8088.6217747] pchb9: detached
[ 8088.6617935] pchb8: detached
[ 8088.6918073] pchb7: detached
[ 8088.7318261] pchb6: detached
[ 8088.7618402] pchb5: detached
[ 8088.8018594] pchb4: detached
[ 8088.8318730] pchb3: detached
[ 8088.8618871] pchb2: detached
[ 8088.9019059] pchb1: detached
[ 8088.9319202] ppb9: detached
[ 8088.9719389] ppb8: detached
[ 8089.0019528] ppb7: detached
[ 8089.0319669] ppb6: detached
[ 8089.0719858] ppb5: detached
[ 8089.1019997] ppb4: detached
[ 8089.1320139] ppb3: detached
[ 8089.1720326] ppb2: detached
[ 8089.2020466] ppb1: detached
[ 8089.2320607] pchb0: detached
[ 8089.2820849] The operating system has halted.
[ 8089.2820849] Please press any key to reboot.
This is a HP DL360 G7 server with a P410i disk controller and the
BBWC option. A full console log is at:
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.02.15.12.45.05/test.log
By grepping historic logs from the TNF i386 testbed for the
corresponding "wd0: detached" messages, I found that they were present
until the following commit, and absent thereafter:
2017.08.21.09.00.21 hannken src/sys/kern/vfs_mount.c 1.67
2017.08.21.09.00.21 hannken src/sys/kern/vfs_vnode.c 1.98
2017.08.21.09.00.21 hannken src/sys/sys/vnode_impl.h 1.16
The commit message was "Change forced unmount to revert open device
vnodes to anonymous devices."
This issue looks like it has the potential to cause data loss. For
example, the HP system will presumably lose the cahced data if powered
off long enough to drain the BBWC battery. The -9 branch is also
affected.
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Sun, 16 Feb 2020 15:54:14 +0100
--Apple-Mail=_3DAA2E36-0C99-48D6-8DC3-621AEF12D17E
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
> On 16. Feb 2020, at 10:40, Andreas Gustafsson <gson@gson.org> wrote:
<snip>
> By adding some printfs to the sd(4) driver, I found that sd_flush() is
> not being called during the shutdown, and neither is sd_lastclose().
On a first look I don't see an obvious problem with the commit.
Could you add some printfs to sddetach() and see if it gets
called during shutdown?
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig
--Apple-Mail=_3DAA2E36-0C99-48D6-8DC3-621AEF12D17E
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl5JV5YACgkQKoaVJdNr
+uFY2ggAvgnFBfuU+yttkqXE+WGQczUBMD46VJ92eGYPmV7iqOd5oQUtzOvDNriI
VC7xHfxtEF1Ud/GWX5xkf1IGW3i85vSojU0QzPdEexVmu4O2+Z/7jy8gVfDzI/rF
+a4jZww3h3DOCUB3eaLIs0G72Py7jvX/h4NlhYqwfLEOiKE6PBsiqMIUFEEHrpvb
MKU4qVVjYIqxRlo1NjewiScMRTCbpTV6qgFYz9h944ulCnPTyD0pBrMCNbjj366e
GfasCqh3Wgi6MQLdmLKK8klLIYAzb/hNIG1OKKsHJW6EIX6JW1vw9Z0KbSRFjBnb
gBfMyyHwBFn1t5DC8uvXT3bSiG7qrA==
=Y/q0
-----END PGP SIGNATURE-----
--Apple-Mail=_3DAA2E36-0C99-48D6-8DC3-621AEF12D17E--
From: Andreas Gustafsson <gson@gson.org>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Mon, 17 Feb 2020 20:15:43 +0200
J. Hannken-Illjes wrote:
> Could you add some printfs to sddetach() and see if it gets
> called during shutdown?
Done. sddetach() is called for both sd0 and sd1, but the call for sd0
returns early because disk_begindetach() returns EBUSY.
--
Andreas Gustafsson, gson@gson.org
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: Andreas Gustafsson <gson@gson.org>
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Wed, 19 Feb 2020 14:31:20 +0100
--Apple-Mail=_092968F0-61F9-472B-9D3A-FC2BA4D40A76
Content-Type: multipart/mixed;
boundary="Apple-Mail=_CA4E0C13-7F83-4B6E-9110-F0183169C8C4"
--Apple-Mail=_CA4E0C13-7F83-4B6E-9110-F0183169C8C4
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
Tried with sd0@vioscsi0 under qemu with some printfs and got:
$ shutdown -p now
...
unmounting 0xffff9d635e36a008 / (/dev/sd0a)...
forcefully unmounting / (/dev/sd0a)...
sdclose: dev=0x400 (unit 0)
dk_close: dev=0x400 error=0 openmask=c0 b0
sd0: detached
scsibus0: detached
With "halt -p" I see the problem from this PR as the swap
device sd0b doesn't get closed.
Please report with the attached diff holding the printfs ...
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig
--Apple-Mail=_CA4E0C13-7F83-4B6E-9110-F0183169C8C4
Content-Disposition: attachment;
filename=sd.c.diff
Content-Type: application/octet-stream;
x-unix-mode=0644;
name="sd.c.diff"
Content-Transfer-Encoding: 7bit
diff -r 9cba3dc7e065 sys/dev/scsipi/sd.c
--- sys/dev/scsipi/sd.c Wed Feb 19 09:01:50 2020 +0100
+++ sys/dev/scsipi/sd.c Wed Feb 19 14:29:59 2020 +0100
@@ -520,6 +520,7 @@ sdopen(dev_t dev, int flag, int fmt, str
return (ENXIO);
dksc = &sd->sc_dksc;
+printf("sdopen: dev=0x%"PRIx64" (unit %d)\n", dev, SDUNIT(dev));
if (!device_is_active(dksc->sc_dev))
return (ENODEV);
@@ -541,6 +542,7 @@ sdopen(dev_t dev, int flag, int fmt, str
}
error = dk_open(dksc, dev, flag, fmt, l);
+printf("dk_open: dev=0x%"PRIx64" error=%d openmask=c%x b%x\n", dev, error, dksc->sc_dkdev.dk_copenmask, dksc->sc_dkdev.dk_bopenmask);
SC_DEBUG(periph, SCSIPI_DB3, ("open complete\n"));
@@ -598,11 +600,14 @@ sdclose(dev_t dev, int flag, int fmt, st
struct dk_softc *dksc;
int unit;
+printf("sdclose: dev=0x%"PRIx64" (unit %d)\n", dev, SDUNIT(dev));
unit = SDUNIT(dev);
sd = device_lookup_private(&sd_cd, unit);
dksc = &sd->sc_dksc;
- return dk_close(dksc, dev, flag, fmt, l);
+ int error = dk_close(dksc, dev, flag, fmt, l);
+printf("dk_close: dev=0x%"PRIx64" error=%d openmask=c%x b%x\n", dev, error, dksc->sc_dkdev.dk_copenmask, dksc->sc_dkdev.dk_bopenmask);
+ return error;
}
/*
--Apple-Mail=_CA4E0C13-7F83-4B6E-9110-F0183169C8C4--
--Apple-Mail=_092968F0-61F9-472B-9D3A-FC2BA4D40A76
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl5NOKgACgkQKoaVJdNr
+uHFjAgAgKXsB9TWKa+32dENULm4zVK1ar/r5VwQf8bIiAzw2oc3mZRdvjy1KKyA
9mpTHv/ihduKvwe8M0nkx3rq5L5s9m3Sqh5ohV9IwCw2r2uNn495Ymm3xwz5nYBX
qNRqM42L8Nv61Tl2herVBhI7YABGO4a7meWF9AW5lUr52bkAhMGmj+k0F+kjRiyc
Igc+Wwz5hTGNEmOkLNv2N9MwBqephFP/GtCKBxeEEs9iaJf6GonDxELh9EFzbyWE
+olt8SvojQEfd/+HyEDphK2dVF0wpbaRn5yl2V9FZELv4JNTfILM+HQlkpXprWbj
OUQXemo5rXRrHfci0rw+gtfsntH0dA==
=2jGX
-----END PGP SIGNATURE-----
--Apple-Mail=_092968F0-61F9-472B-9D3A-FC2BA4D40A76--
From: Andreas Gustafsson <gson@gson.org>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Wed, 19 Feb 2020 16:37:09 +0200
J. Hannken-Illjes wrote:
> Tried with sd0@vioscsi0 under qemu with some printfs and got:
>
> $ shutdown -p now
> ...
> unmounting 0xffff9d635e36a008 / (/dev/sd0a)...
> forcefully unmounting / (/dev/sd0a)...
> sdclose: dev=0x400 (unit 0)
> dk_close: dev=0x400 error=0 openmask=c0 b0
> sd0: detached
> scsibus0: detached
>
> With "halt -p" I see the problem from this PR as the swap
> device sd0b doesn't get closed.
>
> Please report with the attached diff holding the printfs ...
I can do that, but I'm not sure why it's needed as you seem to have
already reproduced the problem locally using "halt -p".
I may have caused some confusion by using the word "shutdown" in the
PR subject - sorry about that. I meant it as a reference to the
general action of shutting the system down, not as a reference to the
specific command shutdown(8). Also, I said "halt -p" in the PR, but
checking the logs, I see that the actual command used was a plain
"halt" without the -p option.
--
Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Wed, 19 Feb 2020 21:39:48 +0200
J. Hannken-Illjes wrote:
> Please report with the attached diff holding the printfs ...
Console output from a test run with the patch is now at
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.02.19.13.32.40/test.log
--
Andreas Gustafsson, gson@gson.org
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: Andreas Gustafsson <gson@gson.org>
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Thu, 20 Feb 2020 09:50:48 +0100
--Apple-Mail=_E794A460-544A-495A-93E2-ABF6592D6DBE
Content-Type: multipart/mixed;
boundary="Apple-Mail=_97B3B872-94A5-4D88-995F-1564B709CEC8"
--Apple-Mail=_97B3B872-94A5-4D88-995F-1564B709CEC8
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
> On 19. Feb 2020, at 15:40, Andreas Gustafsson <gson@gson.org> wrote:
<snip>
> Also, I said "halt -p" in the PR, but
> checking the logs, I see that the actual command used was a plain
> "halt" without the -p option.
Sorry, I missed that.
The attached diff restores the previous behaviour and destroys
all device vnodes once the mountlist is empty.
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig
--Apple-Mail=_97B3B872-94A5-4D88-995F-1564B709CEC8
Content-Disposition: attachment;
filename=vfs_mount.c.diff
Content-Type: application/octet-stream;
x-unix-mode=0644;
name="vfs_mount.c.diff"
Content-Transfer-Encoding: 7bit
diff -r 9cba3dc7e065 sys/kern/vfs_mount.c
--- sys/kern/vfs_mount.c Wed Feb 19 09:01:50 2020 +0100
+++ sys/kern/vfs_mount.c Thu Feb 20 09:47:23 2020 +0100
@@ -114,6 +114,8 @@ static struct vnode *vfs_vnode_iterator_
/* Root filesystem. */
vnode_t * rootvnode;
+extern struct mount *dead_rootmount;
+
/* Mounted filesystem list. */
static TAILQ_HEAD(mountlist, mountlist_entry) mountlist;
static kmutex_t mountlist_lock __cacheline_aligned;
@@ -1014,6 +1016,7 @@ bool
vfs_unmountall1(struct lwp *l, bool force, bool verbose)
{
struct mount *mp;
+ mount_iterator_t *iter;
bool any_error = false, progress = false;
uint64_t gen;
int error;
@@ -1048,6 +1051,24 @@ vfs_unmountall1(struct lwp *l, bool forc
if (any_error && verbose) {
printf("WARNING: some file systems would not unmount\n");
}
+
+ /* If the mountlist is empty destroy anonymous device vnodes. */
+ mountlist_iterator_init(&iter);
+ if (mountlist_iterator_next(iter) == NULL) {
+ struct vnode_iterator *marker;
+ vnode_t *vp;
+
+ vfs_vnode_iterator_init(dead_rootmount, &marker);
+ while ((vp = vfs_vnode_iterator_next(marker, NULL, NULL))) {
+ if (vp->v_type == VCHR || vp->v_type == VBLK)
+ vgone(vp);
+ else
+ vrele(vp);
+ }
+ vfs_vnode_iterator_destroy(marker);
+ }
+ mountlist_iterator_destroy(iter);
+
return progress;
}
--Apple-Mail=_97B3B872-94A5-4D88-995F-1564B709CEC8--
--Apple-Mail=_E794A460-544A-495A-93E2-ABF6592D6DBE
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl5OSGgACgkQKoaVJdNr
+uGSGQf/XouOoWRpynjY3tFz5SSjLEUTYBygYPtwk+b7FSK1u0mF1PENgrZvVMCu
PNX48MslL1MycU/HlsEwHUM4Zdw+dvqvQjJSW+wfCe3ILu+jHciIm+7KrCOfVEit
Z53zJs9EYuxzjSO+PCfFkQFXeR1mWcCbnoupKZXdWz4Bp8fZcCclUVJ2AKmcBksE
+bJNJUSM6CwEdhltLDq/3dRUVejimWMfs0NGgtc16KXMmiMBn0+Ma3dyrea2zrl5
7lhAhByqjMIINK+ZoXKlkMZg+vnl8Jh/nM1KuitccpyZOz8UiGKxd0HaMiedF95z
RjIfqXIhx9li6S0QvSgfMqRN5hheGA==
=31gT
-----END PGP SIGNATURE-----
--Apple-Mail=_E794A460-544A-495A-93E2-ABF6592D6DBE--
From: Andreas Gustafsson <gson@gson.org>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Thu, 20 Feb 2020 17:27:12 +0200
J. Hannken-Illjes wrote:
> The attached diff restores the previous behaviour and destroys
> all device vnodes once the mountlist is empty.
With this patch, the "sd0: detached" message appears:
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.02.20.08.31.17/test.log
--
Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Sun, 12 Apr 2020 15:48:44 +0300
On Feb 20, J. Hannken-Illjes wrote:
> The attached diff restores the previous behaviour and destroys
> all device vnodes once the mountlist is empty.
Since the patch appears to work, could you commit it?
--
Andreas Gustafsson, gson@gson.org
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54969 CVS commit: src/sys/kern
Date: Sun, 19 Apr 2020 13:26:18 +0000
Module Name: src
Committed By: hannken
Date: Sun Apr 19 13:26:18 UTC 2020
Modified Files:
src/sys/kern: vfs_mount.c
Log Message:
Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.
PR kern/54969: Disk cache is no longer flushed on shutdown
To generate a diff of this commit:
cvs rdiff -u -r1.78 -r1.79 src/sys/kern/vfs_mount.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->needs-pullups
State-Changed-By: gson@NetBSD.org
State-Changed-When: Mon, 20 Apr 2020 12:31:15 +0000
State-Changed-Why:
Confirmed fixed in -current, should be pulled up to -9.
State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: gson@NetBSD.org
State-Changed-When: Mon, 20 Apr 2020 14:21:52 +0000
State-Changed-Why:
Pullup to -9 requested.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54969 CVS commit: [netbsd-9] src/sys/kern
Date: Wed, 22 Apr 2020 18:05:11 +0000
Module Name: src
Committed By: martin
Date: Wed Apr 22 18:05:11 UTC 2020
Modified Files:
src/sys/kern [netbsd-9]: vfs_mount.c
Log Message:
Pull up following revision(s) (requested by gson in ticket #839):
sys/kern/vfs_mount.c: revision 1.79
Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.
PR kern/54969: Disk cache is no longer flushed on shutdown
To generate a diff of this commit:
cvs rdiff -u -r1.70 -r1.70.4.1 src/sys/kern/vfs_mount.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Thu, 30 Apr 2020 06:59:36 +0000
State-Changed-Why:
Pullup done.
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54969 CVS commit: src/sys/kern
Date: Fri, 1 May 2020 08:45:01 +0000
Module Name: src
Committed By: hannken
Date: Fri May 1 08:45:01 UTC 2020
Modified Files:
src/sys/kern: vfs_mount.c
Log Message:
Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:
forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36
Reopens PR kern/54969: Disk cache is no longer flushed on shutdown
To generate a diff of this commit:
cvs rdiff -u -r1.81 -r1.82 src/sys/kern/vfs_mount.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: closed->open
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Fri, 01 May 2020 09:00:43 +0000
State-Changed-Why:
Fix reverted -- back to start ...
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54969 CVS commit: [netbsd-9] src/sys/kern
Date: Fri, 1 May 2020 11:54:53 +0000
Module Name: src
Committed By: martin
Date: Fri May 1 11:54:53 UTC 2020
Modified Files:
src/sys/kern [netbsd-9]: vfs_mount.c
Log Message:
Pull up following revision(s) (requested by hannken in ticket #881):
sys/kern/vfs_mount.c: revision 1.82
Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:
forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36
Reopens PR kern/54969: Disk cache is no longer flushed on shutdown
To generate a diff of this commit:
cvs rdiff -u -r1.70.4.1 -r1.70.4.2 src/sys/kern/vfs_mount.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: hannken@netbsd.org
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 17 Jul 2020 12:35:50 +0200
Any update on this? Rediscovered this still open PR when searching for
bnx(4) PRs.
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: hannken@netbsd.org
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 17 Jul 2020 15:33:11 +0300
Jaromir Dolecek wrote:
> Any update on this?
It's still broken as can be seen from the lack of an "sd0: detached"
message among the shutdown console messages at the end of
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.07.16.18.39.19/test.log
See also 55393, "System booted from USB panics on shutdown".
--
Andreas Gustafsson, gson@gson.org
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: =?utf-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
hannken@netbsd.org
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 17 Jul 2020 17:11:18 +0200
--Apple-Mail=_35FDBF4E-E008-4953-A23F-733009D2C106
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=utf-8
> On 17. Jul 2020, at 12:35, Jarom=C3=ADr Dole=C4=8Dek =
<jaromir.dolecek@gmail.com> wrote:
>=20
> Any update on this? Rediscovered this still open PR when searching for
> bnx(4) PRs.
The current behaviour is the result of a longer discussion
"Fixing swap1_stop" in 2017, starting at
http://mail-index.netbsd.org/current-users/2017/08/03/msg032129.html
The submitter wants to halt without disabling swap and the
unclosed swap device prevents the root device from closing.
This is intended and using "shutdown" instead of "halt" works.
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig
--Apple-Mail=_35FDBF4E-E008-4953-A23F-733009D2C106
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl8Rv5YACgkQKoaVJdNr
+uENlAf9FjGKDQv4gcvgXxNNiw9GCr6VDd7wf4iI6iEugwo0fG0irSfkcGeQRiPR
jOnucEfln3Mtk7D1y/Ap42eSbc9JfmPvotGoR6a8xDjOTzHDgejpEPqu6fPY/+jB
XsWtlbczIj9V8SUbGRQwcw85+FRMYNxT8ivWLEKVbIhR35hn05lAHuqqA+xe/Xih
RG1ytbFcBvirOw7Y5NAPP59oUOecdi1XEPZAPTYxr08EizcyK6VssOC1TK12aKx/
71wz7o64Ke3N7BHt1N01ojoRFSvSSsjtyWHiHM81pEmSWP69YBapFk/hl7SLF2/a
zQtkAzPQ5G1a1aQxzj20P4CrFVs+sA==
=riiG
-----END PGP SIGNATURE-----
--Apple-Mail=_35FDBF4E-E008-4953-A23F-733009D2C106--
From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, hannken@netbsd.org
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 17 Jul 2020 23:44:34 +0200
Le ven. 17 juil. 2020 =C3=A0 17:11, J. Hannken-Illjes
<hannken@eis.cs.tu-bs.de> a =C3=A9crit :
>
> > On 17. Jul 2020, at 12:35, Jarom=C3=ADr Dole=C4=8Dek <jaromir.dolecek@g=
mail.com> wrote:
> >
> > Any update on this? Rediscovered this still open PR when searching for
> > bnx(4) PRs.
>
> The current behaviour is the result of a longer discussion
> "Fixing swap1_stop" in 2017, starting at
>
> http://mail-index.netbsd.org/current-users/2017/08/03/msg032129.html
>
> The submitter wants to halt without disabling swap and the
> unclosed swap device prevents the root device from closing.
>
> This is intended and using "shutdown" instead of "halt" works.
I think it's the kernel's responsibility to ensure all the swap is
disabled and devices properly closed, regardless if userland managed
to do this or not, i.e. regardless if shutdown was via 'shutdown' or
'halt'
At the time the shutdown code runs and devices are detached, no user
processes run any longer. Swap can't be needed to finish shutdown -
the system state during the late shutdown phase is similar as during
boot, which also doesn't have swap.
How hard it would be to change this so the swap is actually disabled
before detaching the physical devices?
Jaromir
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 31 Jul 2020 18:07:10 +0200
--Apple-Mail=_812D5C87-9AEB-4C62-8362-49E678AE2E3B
Content-Type: multipart/mixed;
boundary="Apple-Mail=_BAE26654-F34D-4CFA-B364-64074B45AFCA"
--Apple-Mail=_BAE26654-F34D-4CFA-B364-64074B45AFCA
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=utf-8
> On 17. Jul 2020, at 23:45, Jarom=C3=ADr Dole=C4=8Dek =
<jaromir.dolecek@gmail.com> wrote:
<snip>
> How hard it would be to change this so the swap is actually disabled
> before detaching the physical devices?
The attached diff should do what you want.
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig
--Apple-Mail=_BAE26654-F34D-4CFA-B364-64074B45AFCA
Content-Disposition: attachment;
filename=003_swapoff.diff
Content-Type: application/octet-stream;
x-unix-mode=0644;
name="003_swapoff.diff"
Content-Transfer-Encoding: 7bit
swapoff
Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.
Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)
diff -r 816c662719ab -r 8e737f88a4e6 sys/kern/vfs_mount.c
--- sys/kern/vfs_mount.c
+++ sys/kern/vfs_mount.c
@@ -94,6 +94,8 @@
#include <miscfs/genfs/genfs.h>
#include <miscfs/specfs/specdev.h>
+#include <uvm/uvm_swap.h>
+
enum mountlist_type {
ME_MOUNT,
ME_MARKER
@@ -1014,6 +1016,7 @@ bool
vfs_unmountall1(struct lwp *l, bool force, bool verbose)
{
struct mount *mp;
+ mount_iterator_t *iter;
bool any_error = false, progress = false;
uint64_t gen;
int error;
@@ -1048,6 +1051,13 @@ vfs_unmountall1(struct lwp *l, bool forc
if (any_error && verbose) {
printf("WARNING: some file systems would not unmount\n");
}
+ /* If the mountlist is empty it is time to remove swap. */
+ mountlist_iterator_init(&iter);
+ if (mountlist_iterator_next(iter) == NULL) {
+ uvm_swap_shutdown(l);
+ }
+ mountlist_iterator_destroy(iter);
+
return progress;
}
diff -r 816c662719ab -r 8e737f88a4e6 sys/uvm/uvm_swap.c
--- sys/uvm/uvm_swap.c
+++ sys/uvm/uvm_swap.c
@@ -1152,27 +1152,23 @@ again:
if ((sdp->swd_flags & (SWF_INUSE|SWF_ENABLE)) == 0)
continue;
#ifdef DEBUG
- printf("\nturning off swap on %s...",
- sdp->swd_path);
+ printf("\nturning off swap on %s...", sdp->swd_path);
#endif
+ /* Have to lock and reference vnode for swap_off(). */
if (vn_lock(vp = sdp->swd_vp, LK_EXCLUSIVE)) {
error = EBUSY;
- vp = NULL;
- } else
- error = 0;
- if (!error) {
+ } else {
+ vref(vp);
error = swap_off(l, sdp);
+ vput(vp);
mutex_enter(&uvm_swap_data_lock);
}
if (error) {
printf("stopping swap on %s failed "
"with error %d\n", sdp->swd_path, error);
- TAILQ_REMOVE(&spp->spi_swapdev, sdp,
- swd_next);
+ TAILQ_REMOVE(&spp->spi_swapdev, sdp, swd_next);
uvmexp.nswapdev--;
swaplist_trim();
- if (vp)
- vput(vp);
}
goto again;
}
--Apple-Mail=_BAE26654-F34D-4CFA-B364-64074B45AFCA--
--Apple-Mail=_812D5C87-9AEB-4C62-8362-49E678AE2E3B
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl8kQa4ACgkQKoaVJdNr
+uFFDgf/WiqW+F0wFyHceFXqNXrYnn/Nh1jpACIIgbXPMS2B13Vp5Jiv594iy2mM
z5yfQPFWY6qcZAGWrzYzKxJ2LmBN+74X5qTxUqH+MyOnbi38koGc4A/3GKXcngh+
LnBKWm7Oq8CEYf5FsqidBJGOfr2/xd/++eRyP04bX2RfI/RMVc/Ekebt3IW688c8
ghteEHtyDuzntQjLdO2zRvi288LiCLEz2XLWtDFHd56KYRb4eSHxhUorUXsrVW0d
LRSAgI45Ntax0M0saMVRjzJC46Jp3v23Y2IfSjcfXsHGZx3psd3JAwEBp+lMHtYW
5MbXoWr0Jt0/nqzhCqXKmboJpQBbZQ==
=s+g5
-----END PGP SIGNATURE-----
--Apple-Mail=_812D5C87-9AEB-4C62-8362-49E678AE2E3B--
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org, "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, gson@gson.org (Andreas Gustafsson)
Subject: re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Sat, 01 Aug 2020 05:38:30 +1000
nice work. i like how it's fairly simple.
you probably need to care about VMSWAP option.
i'd like to see some tests performed where a system is rebooted
when it is full of ram and swap -- i fear that this will generate
useless IO that may hang. also, an option to skip it, maybe just
use RB_NOSYNC?
thank!
.mrg.
From: Jason Thorpe <thorpej@me.com>
To: matthew green <mrg@eterna.com.au>
Cc: gnats-bugs@netbsd.org,
"J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>,
kern-bug-people@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org,
Andreas Gustafsson <gson@gson.org>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 31 Jul 2020 15:35:00 -0700
> On Jul 31, 2020, at 12:38 PM, matthew green <mrg@eterna.com.au> wrote:
>=20
> nice work. i like how it's fairly simple.
>=20
> you probably need to care about VMSWAP option.
>=20
> i'd like to see some tests performed where a system is rebooted
> when it is full of ram and swap -- i fear that this will generate
> useless IO that may hang. also, an option to skip it, maybe just
> use RB_NOSYNC?
I think there should be an option to swap_off() to just toss the data, =
rather than page it all back in. That would solve the problem you're =
concerned about.
I was also thinking that disabling swap BEFORE unmounting all of the =
file systems would be a good idea, because there might be file-backed =
swap.
-- thorpej
From: Paul Goyette <paul@whooppee.com>
To: Jason Thorpe <thorpej@me.com>
Cc: matthew green <mrg@eterna.com.au>, gnats-bugs@netbsd.org,
"J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Andreas Gustafsson <gson@gson.org>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 31 Jul 2020 15:43:45 -0700 (PDT)
On Fri, 31 Jul 2020, Jason Thorpe wrote:
<snip>
> I was also thinking that disabling swap BEFORE unmounting all of the
> file systems would be a good idea, because there might be file-backed
> swap.
Or two phases of disabling swap, one for file-backed and one for dev-
backed.
+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org |
+--------------------+--------------------------+-----------------------+
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Sun, 2 Aug 2020 10:22:15 +0200
--Apple-Mail=_5B077801-13CC-4712-AB89-29D604245885
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
On 31. Jul 2020, at 21:38, matthew green <mrg@eterna.com.au> wrote:
> nice work. i like how it's fairly simple.
>
> you probably need to care about VMSWAP option.
Already present, without VMSWAP sys/uvm/uvm_swapstub.c gets
built and it has an empty uvm_swap_shutdown().
> i'd like to see some tests performed where a system is rebooted
> when it is full of ram and swap -- i fear that this will generate
> useless IO that may hang.
When swap gets removed all user processes beside the one running
halt or reboot are gone and all file systems are unmounted.
At this time there should be no data backed by swap.
> also, an option to skip it, maybe just
> use RB_NOSYNC?
With RB_NOSYNC it is already skipped, vfs_unmountall1() will
not run in this case.
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig
--Apple-Mail=_5B077801-13CC-4712-AB89-29D604245885
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl8md7cACgkQKoaVJdNr
+uGzQQf9FoU/jebNWP3ZkxuufV0JX+WYEQDsVNRgAKOLmTyBddcknptswx+WaxmV
89jXp9Q9rNAmiNv15y2dR5A8o+19GA0kkJzlOZmaa8LGdabo9spd3IVhLEup7io2
Hb1xAau6xjLQod+LaB0zGZwjddbfVsc5Iv19EkhUUSNI+fMoerZkNY8egjKcan6Z
w5pBBFXR2eqNVyaR4iqCOnFKkrJPNIsmKiaE7+yvKxt3Z/Zn8vE8IbHqTw9exDhg
gmVeQSX7QeWRJnmyyh2Jf87gRNhzs3AkW4wEtvep5wRD2GPCzala7+AqfI7Rf+/f
RRa9wiVWqt5ZwZql6V9yuUKpwO68pg==
=a0if
-----END PGP SIGNATURE-----
--Apple-Mail=_5B077801-13CC-4712-AB89-29D604245885--
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Sun, 2 Aug 2020 10:27:12 +0200
--Apple-Mail=_A1A3C738-4FF9-4FF8-B5AC-2AC3EA63D435
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=us-ascii
> On 1. Aug 2020, at 00:35, Jason Thorpe <thorpej@me.com> wrote:
>=20
> I was also thinking that disabling swap BEFORE unmounting all of the =
file systems would be a good idea, because there might be file-backed =
swap.
This brings back the problems the initial thread was about and
the goal is not to cleanly shutdown swap but to close devices
used for swap so they receive a final cache sync.
It may be impossible to keep all data from tmpfs file systems in RAM ...
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig
--Apple-Mail=_A1A3C738-4FF9-4FF8-B5AC-2AC3EA63D435
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl8meOAACgkQKoaVJdNr
+uFLvgf/Zi7IIMh+zHTt4ASI9Tuh1CNevXiFVeMshEJirtQ4roMOiFND4+8XhvIv
YtEKfHD0mLS57YmFO2NYD3IGil8TbLUlxilDB5OkPPm29LUK7yWS0CRlpLcBaUZh
rU68TeK1RplvTTFyg9pZcsNVKQWtupvnedHDv/BB1cCyp/v0fCC8x9GrDP8rXJq0
GR5tr8V+4ygaCMxK6jSipPIDFxNULdI/J+uOZD8rHf1K9gzw0zjcc7jlI0fjLQSH
NHrQnwoZi6nojF01OH5/Icptv/bfVMjlgMxYgWc6ucchRyRHm7XNlPljZ8TwgWLe
4Fe6CGu7CE01msLd1FqS4TMjxZxitA==
=Y8UP
-----END PGP SIGNATURE-----
--Apple-Mail=_A1A3C738-4FF9-4FF8-B5AC-2AC3EA63D435--
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54969 CVS commit: src/sys
Date: Tue, 16 Feb 2021 09:56:32 +0000
Module Name: src
Committed By: hannken
Date: Tue Feb 16 09:56:32 UTC 2021
Modified Files:
src/sys/kern: vfs_mount.c
src/sys/uvm: uvm_swap.c
Log Message:
Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.
Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)
To generate a diff of this commit:
cvs rdiff -u -r1.85 -r1.86 src/sys/kern/vfs_mount.c
cvs rdiff -u -r1.200 -r1.201 src/sys/uvm/uvm_swap.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54969 CVS commit: src/sys/uvm
Date: Fri, 19 Feb 2021 13:20:44 +0000
Module Name: src
Committed By: hannken
Date: Fri Feb 19 13:20:44 UTC 2021
Modified Files:
src/sys/uvm: uvm_swap.c
Log Message:
When turning off swap during reboot we have to lock with LK_RETRY
as regular files got reclaimed during unmount.
Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)
To generate a diff of this commit:
cvs rdiff -u -r1.201 -r1.202 src/sys/uvm/uvm_swap.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Greg A. Woods" <woods@planix.ca>
To: NetBSD-current Users's Discussion List <current-users@netbsd.org>,
NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: NetBSD Users's Discussion List <netbsd-users@netbsd.org>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Thu, 25 Mar 2021 11:14:48 -0700
--pgp-sign-Multipart_Thu_Mar_25_11:14:44_2021-1
Content-Type: text/plain; charset=US-ASCII
So, the reason I jumped from what I thought was a relatively stable
point in the main -current branch to a more recent version was primarily
because of what I believe is a problem related to PR# 54969.
I had noticed long fscks on large filesystems following normal clean
reboots and got investigating.
Maybe what remains an issue here is just related to dm(4) partitions, as
only my /dev/mapper partition(s) have had problems recently.
Unfortunately though this is still happening with 9.99.81 (2021-03-10).
(and both with GENERIC and XEN3_DOM0)
In any case I would say this is the single most critical, serious, and
important, issue in current (and netbsd-9)! It totally kills system
reliability (though maybe only if one is using LVM).
Just for evidence, I added a bunch more printfs to the kernel and rc.d
scripts (and '-v' flags to fsck, mount, etc.) to help me see for myself
better what exactly is going on. This is the console after a truly
normal complete safe reboot using shutdown(8).
In this example all processes but the shutdown scripts should be dead.
The NFS mount on /more/work probably won't complete because I probably
started shutdown(8) without first doing "cd /" (and without doing "exec
shutdown"), and my CWD was probably on that NFS mount. This could maybe
be fixed by having reboot/halt/powerdown kill its parent process first,
and perhaps also doing chdir("/") too.
There's no excuse I can find for /build not unmounting though, and
definitely no excuse for '/' not umounting either, though it later '/'
is forcefully unmounted, and on reboot '/' appears to be clean. However
the forceful unmount of /build doesn't work, and it is NOT clean.
Note also that /build will sometimes unmount quickly and cleanly if it
hasn't been dirtied since the last boot, but it seems even creating one
file can leave it dirty on reboot.
Maybe what remains an issue here is just related to dm(4) partitions?
[Wed Mar 24 20:42:56 2021][ 715713.0781096] syncing disks... done
[Wed Mar 24 20:42:56 2021][ 715713.2081201] unmounted more.local:/vcs from /more/vcs, type nfs
[Wed Mar 24 20:42:56 2021][ 715713.2481211] unmount of /more/work (more.local:/work) failed with error 16
[Wed Mar 24 20:42:56 2021][ 715713.2581208] unmounted more.local:/home from /more/home, type nfs
[Wed Mar 24 20:42:56 2021][ 715713.2581208] unmounted more.local:/archive from /more/archive, type nfs
[Wed Mar 24 20:42:57 2021][ 715714.0781691] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Wed Mar 24 20:42:57 2021][ 715714.0781691] unmounted procfs from /proc, type procfs
[Wed Mar 24 20:42:57 2021][ 715714.0781691] unmounted ptyfs from /dev/pts, type ptyfs
[Wed Mar 24 20:42:57 2021][ 715714.0781691] unmounted kernfs from /kern, type kernfs
[Wed Mar 24 20:42:58 2021][ 715714.6782049] unmounted /dev/dk3 from /usr/pkg, type ffs
[Wed Mar 24 20:42:58 2021][ 715714.7282939] unmounted /dev/dk2 from /var, type ffs
[Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of / (/dev/dk0) failed with error 16
[Wed Mar 24 20:42:58 2021][ 715714.8282434] WARNING: some file systems would not unmount
[Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of /more/work (more.local:/work) failed with error 16
[Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of / (/dev/dk0) failed with error 16
[Wed Mar 24 20:42:58 2021][ 715714.8282434] WARNING: some file systems would not unmount
[Wed Mar 24 20:42:59 2021][ 715716.5383256] brgphy1: detached
[[ ... almost all the rest of devices detach ... ]]
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /more/work (more.local:/work) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of / (/dev/dk0) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] WARNING: some file systems would not unmount
[Wed Mar 24 20:43:02 2021][ 715718.5284461] sd1: detached
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /more/work (more.local:/work) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of / (/dev/dk0) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] WARNING: some file systems would not unmount
[Wed Mar 24 20:43:02 2021][ 715718.5284461] forcefully unmounting more.local:/work from /more/work...
[Wed Mar 24 20:43:02 2021][ 715718.5284461] forcefully unmounted more.local:/work from /more/work, type nfs
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of / (/dev/dk0) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5284461] WARNING: some file systems would not unmount
[Wed Mar 24 20:43:02 2021][ 715718.5284461] forcefully unmounting /dev/mapper/scratch-build from /build...
[Wed Mar 24 20:43:02 2021][ 715718.5284461] forcefully unmounted /dev/mapper/scratch-build from /build, type ffs
[Wed Mar 24 20:43:02 2021][ 715718.5384534] unmount of / (/dev/dk0) failed with error 16
[Wed Mar 24 20:43:02 2021][ 715718.5384534] WARNING: some file systems would not unmount
[Wed Mar 24 20:43:02 2021][ 715718.5384534] forcefully unmounting /dev/dk0 from /...
[Wed Mar 24 20:43:02 2021][ 715718.5384534] forcefully unmounted /dev/dk0 from /, type ffs
[Wed Mar 24 20:43:02 2021][ 715718.5384534] unmounting done
[Wed Mar 24 20:43:02 2021][ 715718.5384534] turning off swap... done
[Wed Mar 24 20:43:02 2021][ 715718.5384534] dk0 at sd0 (/) deleted
[Wed Mar 24 20:43:02 2021][ 715718.5384534] sd0: detached
[Wed Mar 24 20:43:02 2021][ 715718.5384534] scsibus0: detached
[Wed Mar 24 20:43:02 2021][ 715718.7184994] mfi0: detached
[Wed Mar 24 20:43:02 2021][ 715718.7184994] pci8: detached
[Wed Mar 24 20:43:02 2021][ 715718.7184994] ppb7: detached
[Wed Mar 24 20:43:02 2021][ 715718.7184994] unmounting done
[Wed Mar 24 20:43:02 2021][ 715718.7184994] turning off swap... done
[Wed Mar 24 20:43:02 2021][ 715718.7184994] rebooting...
[[ ... why is "turning off swap" seen twice? .. ]]
[[ ... and then the reboot, until rc scripts say ... ]]
[Wed Mar 24 20:44:51 2021]Starting root file system check:
[Wed Mar 24 20:44:51 2021]/dev/rdk0: file system is clean; not checking
[Wed Mar 24 20:44:51 2021]start / wait fsck_ffs -p /dev/rdk0
[Wed Mar 24 20:44:52 2021]Starting file system checks:
[Wed Mar 24 20:44:52 2021]/dev/rdk2: file system is clean; not checking
[Wed Mar 24 20:44:52 2021]/dev/rdk3: file system is clean; not checking
[[ ... here I hit ^T on the console as it was taking too long ... ]]
[Wed Mar 24 20:44:58 2021][ 15.0201108] load: 0.08 cmd: sleep 345 [nanoslp] 0.00u 0.00s 0% 512k
[Wed Mar 24 20:44:58 2021]/dev/mapper/rscratch-build: phase 1: cyl group 24 of 345 (6%)
[Wed Mar 24 20:46:09 2021]/dev/mapper/rscratch-build: phase 1: cyl group 284 of 345 (82%)
[Wed Mar 24 20:49:30 2021]/dev/mapper/rscratch-build: 1400986 files, 36172587 used, 28347707 free (17403 frags, 3541288 blocks, 0.0% fragmentation)
[Wed Mar 24 20:49:30 2021]/dev/mapper/rscratch-build: MARKING FILE SYSTEM CLEAN
[Wed Mar 24 20:49:30 2021]start /var nowait fsck_ffs -p /dev/rdk2
[Wed Mar 24 20:49:30 2021]start /build nowait fsck_ffs -p /dev/mapper/rscratch-build
[Wed Mar 24 20:49:30 2021]done ffs: /dev/rdk2 (/var) = 0x0
[Wed Mar 24 20:49:30 2021]start /usr/pkg nowait fsck_ffs -p /dev/rdk3
[Wed Mar 24 20:49:30 2021]done ffs: /dev/rdk3 (/usr/pkg) = 0x0
[Wed Mar 24 20:49:30 2021]done ffs: /dev/mapper/rscratch-build (/build) = 0x0
[Wed Mar 24 20:49:30 2021]Script /etc/rc.d/fsck running
[Wed Mar 24 20:49:30 2021]Currently sourcing /etc/rc.d/fsck
[Wed Mar 24 20:49:30 2021]exec: mount_ffs -o rw /dev/dk2 /var
[Wed Mar 24 20:49:30 2021]exec: mount_ffs -o rw /dev/dk2 /var
[Wed Mar 24 20:49:30 2021]/dev/dk2 on /var type ffs (local, fsid: 0xa802/0x78b, reads: sync 1 async 0, writes: sync 2 async 0)
--
Greg A. Woods <gwoods@acm.org>
Kelowna, BC +1 250 762-7675 RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Avoncote Farms <woods@avoncote.ca>
--pgp-sign-Multipart_Thu_Mar_25_11:14:44_2021-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit
Content-Description: OpenPGP Digital Signature
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQRuK6dmwVAucmRxuh9mfXG3eL/0fwUCYFzTGAAKCRBmfXG3eL/0
fxLoAKD6ybZOT8vzuIU0Ayww7xKcGDnAwgCdF5bpzfJz9Ux+eoSkIMrdNYOenQg=
=woZJ
-----END PGP SIGNATURE-----
--pgp-sign-Multipart_Thu_Mar_25_11:14:44_2021-1--
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: "Greg A. Woods" <woods@planix.ca>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Sun, 2 May 2021 18:32:14 +0200
--Apple-Mail=_74B2A6AC-7A14-4AD0-9BDC-83F105D678A2
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
> On 25. Mar 2021, at 19:14, Greg A. Woods <woods@planix.ca> wrote:
>
> There's no excuse I can find for /build not unmounting though, and
> definitely no excuse for '/' not umounting either, though it later '/'
> is forcefully unmounted, and on reboot '/' appears to be clean. However
> the forceful unmount of /build doesn't work, and it is NOT clean.
Could you please attach more information:
- /etc/fstab
- wedge config
- lvm config
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig
--Apple-Mail=_74B2A6AC-7A14-4AD0-9BDC-83F105D678A2
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAmCO1A8ACgkQKoaVJdNr
+uEfbQgAt1lSJm1HUrKbIauN9DnpLb4R1NW0iazL81tLkIljPXIAjAY1ccLCbjO0
M3hOp61atFBP5//nCZSz4lPvRuxh7PO551SucTN2R/AJjZteqfb+9SQE3MXE42Rs
RMA5y7XiuslT6icYVadFdReXabGUcgjvXiB+0/FEkNWgJsgM5snmq3IqcsGjGay+
BmYmTU9oWyVvcN5Z/gsDs53salNdSoDu6g5LOQDP6xx5DwA5WE54w2TDeNd3NTlw
8rPTllmpbVW9SPFIzwr1es+GTohfMLJiMrwPA/eZW1t7AMwa9EjHQaWi+w4WMNeD
GV9qxWoKESMRDul4MXUNTMnbwwAPTw==
=jx/t
-----END PGP SIGNATURE-----
--Apple-Mail=_74B2A6AC-7A14-4AD0-9BDC-83F105D678A2--
From: "Greg A. Woods" <woods@planix.ca>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Mon, 03 May 2021 22:59:45 -0700
--pgp-sign-Multipart_Mon_May__3_22:59:21_2021-1
Content-Type: text/plain; charset=US-ASCII
At Sun, 2 May 2021 18:32:14 +0200, "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de> wrote:
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
>
> [1 <text/plain; us-ascii (7bit)>]
> > On 25. Mar 2021, at 19:14, Greg A. Woods <woods@planix.ca>
> > wrote:
> > There's no excuse I can find for /build not unmounting though,
> > and definitely no excuse for '/' not umounting either, though it
> > later '/' is forcefully unmounted, and on reboot '/' appears to
> > be clean. However the forceful unmount of /build doesn't work,
> > and it is NOT clean.
>
> Could you please attach more information:
> - /etc/fstab wedge config lvm config
#
# NetBSD /etc/fstab
#
NAME=/ / ffs rw,log 1 1
NAME=swap none swap sw,dp 0 0
NAME=/var /var ffs rw,log 1 2
NAME=/usr/pkg /usr/pkg ffs rw,log 1 2
#
tmpfs /tmp tmpfs rw,-m=1777,-s=ram%25
kernfs /kern kernfs rw
ptyfs /dev/pts ptyfs rw
procfs /proc procfs rw
tmpfs /var/shm tmpfs rw,-m1777,-sram%25
#
/dev/cd0a /cdrom cd9660 ro,noauto
#
/dev/mapper/scratch-build /build ffs rw,log 1 2
#
more.local:/archive /more/archive nfs -b,-i,rw,nosuid,nodev
more.local:/home /more/home nfs -b,-i,rw,nosuid,nodev
more.local:/work /more/work nfs -b,-i,rw,nosuid,nodev
more.local:/vcs /more/vcs nfs -b,-i,rw,nosuid,nodev
# dkctl /dev/rsd0 listwedges
/dev/rsd0: 5 wedges:
dk0: /, 62914560 blocks at 2048, type: ffs
dk1: swap, 100663296 blocks at 62916608, type: swap
dk2: /var, 8388608 blocks at 176166912, type: ffs
dk3: /usr/pkg, 104857600 blocks at 184557568, type: ffs
dk4: LVM-vg0, 686282719 blocks at 289417216, type: ffs
# dkctl /dev/rsd1 listwedges
/dev/rsd1: 1 wedge:
dk5: scratchdisk0, 1141897216 blocks at 1024, type: ffs
22:53 [1.1770] # lvm pvdisplay
--- Physical volume ---
PV Name /dev/rdk5
VG Name scratch
PV Size 544.50 GiB / not usable 3.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 139391
Free PE 60031
Allocated PE 79360
PV UUID xxf5PJ-HL26-PbyQ-rHic-3ptB-t56f-S5NqXD
--- Physical volume ---
PV Name /dev/rdk4
VG Name vg0
PV Size 327.25 GiB / not usable 2.98 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 83774
Free PE 63805
Allocated PE 19969
PV UUID kcMuoP-L3c0-EdZp-lOjP-OFyA-xc6d-NpQsez
22:53 [1.1771] # lvm vgdisplay
--- Volume group ---
VG Name scratch
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 4
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 3
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 544.50 GiB
PE Size 4.00 MiB
Total PE 139391
Alloc PE / Size 79360 / 310.00 GiB
Free PE / Size 60031 / 234.50 GiB
VG UUID jEdo7q-pkhv-dG83-C10z-FysR-UBNy-cr0Fzc
--- Volume group ---
VG Name vg0
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 6
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 5
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 327.24 GiB
PE Size 4.00 MiB
Total PE 83774
Alloc PE / Size 19969 / 78.00 GiB
Free PE / Size 63805 / 249.24 GiB
VG UUID cM3M8T-x0nZ-uUJw-avMT-zgO9-Tbux-2LSDOa
22:53 [1.1772] # lvm lvdisplay
--- Logical volume ---
LV Name /dev/scratch/build
VG Name scratch
LV UUID myecD7-LUdo-x2m0-8jdg-gHHN-uQ0Y-lc76fS
LV Write Access read/write
LV Status available
# open 0
LV Size 250.00 GiB
Current LE 64000
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 0
Block device 169:1
--- Logical volume ---
LV Name /dev/scratch/fbsd-test.0
VG Name scratch
LV UUID hZbvoM-maqc-dur3-TFoT-2p76-Ru0p-BW8qO2
LV Write Access read/write
LV Status available
# open 0
LV Size 30.00 GiB
Current LE 7680
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 0
Block device 169:2
--- Logical volume ---
LV Name /dev/scratch/fbsd-test.1
VG Name scratch
LV UUID mx3cD7-fcJi-S8AF-w91y-c6gH-U4rW-zdRRzk
LV Write Access read/write
LV Status available
# open 0
LV Size 30.00 GiB
Current LE 7680
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 0
Block device 169:3
--- Logical volume ---
LV Name /dev/vg0/nbtest.root
VG Name vg0
LV UUID DpA5NL-8jdj-M8tM-0qB9-1GFM-eokA-u2tjv9
LV Write Access read/write
LV Status available
# open 0
LV Size 30.00 GiB
Current LE 7680
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 0
Block device 169:4
--- Logical volume ---
LV Name /dev/vg0/nbtest.swap
VG Name vg0
LV UUID XD9t0F-Cdca-EaN3-5NNi-lb6M-qwnK-syilBU
LV Write Access read/write
LV Status available
# open 0
LV Size 8.00 GiB
Current LE 2048
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 0
Block device 169:5
--- Logical volume ---
LV Name /dev/vg0/nbtest.var
VG Name vg0
LV UUID qybtJ1-Rt90-e2sa-XXiL-avZh-A82D-xglF25
LV Write Access read/write
LV Status available
# open 0
LV Size 10.00 GiB
Current LE 2560
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 0
Block device 169:6
--- Logical volume ---
LV Name /dev/vg0/nbtest.pkg
VG Name vg0
LV UUID 5WrGfp-4042-pmH9-Hc3K-JgJQ-l7mE-zclcNL
LV Write Access read/write
LV Status available
# open 0
LV Size 30.00 GiB
Current LE 7680
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 0
Block device 169:7
--- Logical volume ---
LV Name /dev/vg0/tinytest
VG Name vg0
LV UUID 4BjVA5-rBIG-8dhJ-oMyZ-uj7g-ouNJ-G2Jolb
LV Write Access read/write
LV Status available
# open 0
LV Size 4.00 MiB
Current LE 1
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 0
Block device 169:8
--
Greg A. Woods <gwoods@acm.org>
Kelowna, BC +1 250 762-7675 RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Avoncote Farms <woods@avoncote.ca>
--pgp-sign-Multipart_Mon_May__3_22:59:21_2021-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit
Content-Description: OpenPGP Digital Signature
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQRuK6dmwVAucmRxuh9mfXG3eL/0fwUCYJDivwAKCRBmfXG3eL/0
fzcAAKCsOMLzjI4vLegzg4GOhTgG7obWwQCdG1bB0CW2HEU3+RD3Prd9qeqKbnw=
=I98G
-----END PGP SIGNATURE-----
--pgp-sign-Multipart_Mon_May__3_22:59:21_2021-1--
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: "Greg A. Woods" <woods@planix.ca>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Tue, 4 May 2021 11:57:05 +0200
--Apple-Mail=_39DD0DB7-6F29-4B35-94E0-A802F5308099
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=us-ascii
> On 25. Mar 2021, at 19:15, Greg A. Woods <woods@planix.ca> wrote:
<snip[>
> There's no excuse I can find for /build not unmounting though, and
> definitely no excuse for '/' not umounting either, though it later '/'
> is forcefully unmounted, and on reboot '/' appears to be clean. =
However
> the forceful unmount of /build doesn't work, and it is NOT clean.
>=20
> Note also that /build will sometimes unmount quickly and cleanly if it
> hasn't been dirtied since the last boot, but it seems even creating =
one
> file can leave it dirty on reboot.
>=20
> Maybe what remains an issue here is just related to dm(4) partitions?
<snip>
> [Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of /build =
(/dev/mapper/scratch-build) failed with error 16
> [Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of / (/dev/dk0) =
failed with error 16
> [Wed Mar 24 20:42:58 2021][ 715714.8282434] WARNING: some file systems =
would not unmount
> [Wed Mar 24 20:42:59 2021][ 715716.5383256] brgphy1: detached
>=20
> [[ ... almost all the rest of devices detach ... ]]
I'm quite sure one of them is "dm0" -- dm(4) is no longer backed with
physical disks but /build is still mounted so from here on even forced
unmounts fail.
This problem occurs on dm(4) devices only.
Looking through sys/dev/dm/device-mapper.c it becomes clear that
dmopen() / dmclose() don't count opens and therefore dm_detach() will
unconditionally unconfigure dm(4).
As dm_detach() gets called during shutdown dm(4) unconfigures too early.
Fix is to count device opens and prevent dm_detach() to succeed as long
as devices are open. Once succeeding during shutdown it should
dm_destroy() on last dm_detach().
Short term hack is to remove DVF_DETACH_SHUTDOWN from device_mapper.c
so dm_detach() doesn't run on shutdown:
CFATTACH_DECL3_NEW(dm, 0,
dm_match, dm_attach, dm_detach, NULL, NULL, NULL,
- DVF_DETACH_SHUTDOWN);
+ 0 /* DVF_DETACH_SHUTDOWN */);
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
--Apple-Mail=_39DD0DB7-6F29-4B35-94E0-A802F5308099
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAmCRGnEACgkQKoaVJdNr
+uGdhQf9G5TZ2vOVej7RK63LlRSn/8CPo1uY7U1mAqwkhY1N5/M6z8qLqSEUbZFL
tUIWvtNhhNqshSzkGzaQd767UxpajZv7PK4oBUsM+UWrRlDcAZbQ5PxF3Eme5wnu
Z34xrp5/D/L8JLH0XM1uKN8SuBEbA3jCyOsD56ud8+vUs+xheSMARTZM+/LGFdCC
hz7hsEgtIzwGgVAYDtCEcBB7+b+WKInOJ/+TigCaPR322Izo8kD1OmMgPIQUB5Sz
DQoRjYSph33A4CdDfz2Rd/yWZ+9It+O5/pY9XYJhkLKZsf00a++qYg3ssjkm5Wye
4q2mV44ljoZ4ye9zBiPsXsjPfJzJag==
=GkNU
-----END PGP SIGNATURE-----
--Apple-Mail=_39DD0DB7-6F29-4B35-94E0-A802F5308099--
From: "Greg A. Woods" <woods@planix.ca>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Tue, 04 May 2021 11:07:42 -0700
--pgp-sign-Multipart_Tue_May__4_11:07:26_2021-1
Content-Type: text/plain; charset=US-ASCII
At Tue, 4 May 2021 11:57:05 +0200, "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de> wrote:
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
>
> I'm quite sure one of them is "dm0" -- dm(4) is no longer backed
> with physical disks but /build is still mounted so from here on
> even forced unmounts fail.
>
> This problem occurs on dm(4) devices only.
>
> Looking through sys/dev/dm/device-mapper.c it becomes clear that
> dmopen() / dmclose() don't count opens and therefore dm_detach()
> will unconditionally unconfigure dm(4).
>
> As dm_detach() gets called during shutdown dm(4) unconfigures too
> early.
Excellent catch!
> Fix is to count device opens and prevent dm_detach() to succeed as
> long as devices are open. Once succeeding during shutdown it
> should dm_destroy() on last dm_detach().
>
>
> Short term hack is to remove DVF_DETACH_SHUTDOWN from
> device_mapper.c so dm_detach() doesn't run on shutdown:
>
> CFATTACH_DECL3_NEW(dm, 0, dm_match, dm_attach, dm_detach, NULL,
> NULL, NULL,
> - DVF_DETACH_SHUTDOWN);
> + 0 /* DVF_DETACH_SHUTDOWN */);
>
I'll give that a try, and I expect it to work -- it looks like this is
indeed the problem!
--
Greg A. Woods <gwoods@acm.org>
Kelowna, BC +1 250 762-7675 RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Avoncote Farms <woods@avoncote.ca>
--pgp-sign-Multipart_Tue_May__4_11:07:26_2021-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit
Content-Description: OpenPGP Digital Signature
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQRuK6dmwVAucmRxuh9mfXG3eL/0fwUCYJGNYgAKCRBmfXG3eL/0
f1+iAJ9J/9iTyVnj7ROHe6JN4DeDUMqZzwCdGo4HOCFb+GZk3ai9joRzN8nv5yI=
=b5AY
-----END PGP SIGNATURE-----
--pgp-sign-Multipart_Tue_May__4_11:07:26_2021-1--
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: NetBSD GNATS <gnats-bugs@netbsd.org>
Cc: "Greg A. Woods" <woods@planix.ca>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Wed, 5 May 2021 15:51:08 +0200
--Apple-Mail=_02C4883B-991C-4B77-BB72-7F959434AA00
Content-Type: multipart/mixed;
boundary="Apple-Mail=_129BAF62-8E27-4159-8DB4-38C1B290CAA3"
--Apple-Mail=_129BAF62-8E27-4159-8DB4-38C1B290CAA3
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
The attached diffs should fix the problem with device-mapper
devices getting detached too early.
Please report if this fix really works.
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig
--Apple-Mail=_129BAF62-8E27-4159-8DB4-38C1B290CAA3
Content-Disposition: attachment;
filename=002_dm_unit.diff
Content-Type: application/octet-stream;
x-unix-mode=0644;
name="002_dm_unit.diff"
Content-Transfer-Encoding: 7bit
dm_unit
Make sure the unit number of device-mapper devices matches our minor number.
diff -r 1e6b25a1c949 -r 497cf11564ea sys/dev/dm/dm_ioctl.c
--- sys/dev/dm/dm_ioctl.c
+++ sys/dev/dm/dm_ioctl.c
@@ -92,18 +92,11 @@
#include "netbsd-dm.h"
#include "dm.h"
+#include "ioconf.h"
static uint32_t sc_minor_num;
uint32_t dm_dev_counter;
-/* Generic cf_data for device-mapper driver */
-static struct cfdata dm_cfdata = {
- .cf_name = "dm",
- .cf_atname = "dm",
- .cf_fstate = FSTATE_STAR,
- .cf_unit = 0
-};
-
#define DM_REMOVE_FLAG(flag, name) do { \
prop_dictionary_get_uint32(dm_dict,DM_IOCTL_FLAGS,&flag); \
flag &= ~name; \
@@ -196,6 +189,7 @@ dm_dev_create_ioctl(prop_dictionary_t dm
int r;
uint32_t flags;
device_t devt;
+ cfdata_t cf;
flags = 0;
name = NULL;
@@ -214,7 +208,13 @@ dm_dev_create_ioctl(prop_dictionary_t dm
dm_dev_unbusy(dmv);
return EEXIST;
}
- if ((devt = config_attach_pseudo(&dm_cfdata)) == NULL) {
+ cf = kmem_alloc(sizeof(*cf), KM_SLEEP);
+ cf->cf_name = dm_cd.cd_name;
+ cf->cf_atname = dm_cd.cd_name;
+ cf->cf_unit = (uint64_t)atomic_inc_32_nv(&sc_minor_num);
+ cf->cf_fstate = FSTATE_NOTFOUND;
+ if ((devt = config_attach_pseudo(cf)) == NULL) {
+ kmem_free(cf, sizeof(*cf));
aprint_error("Unable to attach pseudo device dm/%s\n", name);
return (ENOMEM);
}
@@ -229,7 +229,7 @@ dm_dev_create_ioctl(prop_dictionary_t dm
if (name)
strlcpy(dmv->name, name, DM_NAME_LEN);
- dmv->minor = (uint64_t)atomic_inc_32_nv(&sc_minor_num);
+ dmv->minor = cf->cf_unit;
dmv->flags = 0; /* device flags are set when needed */
dmv->ref_cnt = 0;
dmv->event_nr = 0;
@@ -365,6 +365,8 @@ dm_dev_rename_ioctl(prop_dictionary_t dm
int
dm_dev_remove_ioctl(prop_dictionary_t dm_dict)
{
+ int error;
+ cfdata_t cf;
dm_dev_t *dmv;
const char *name, *uuid;
uint32_t flags, minor;
@@ -398,7 +400,11 @@ dm_dev_remove_ioctl(prop_dictionary_t dm
* This will call dm_detach routine which will actually removes
* device.
*/
- return config_detach(devt, DETACH_QUIET);
+ cf = device_cfdata(devt);
+ error = config_detach(devt, DETACH_QUIET);
+ if (error == 0)
+ kmem_free(cf, sizeof(*cf));
+ return error;
}
/*
--Apple-Mail=_129BAF62-8E27-4159-8DB4-38C1B290CAA3
Content-Disposition: attachment;
filename=003_dm_opencount.diff
Content-Type: application/octet-stream;
x-unix-mode=0644;
name="003_dm_opencount.diff"
Content-Transfer-Encoding: 7bit
dm_opencount
Track the number of cdev and bdev opens and fail dm_detach()
on open devices unless detach is forced.
PR kern/54969 (Disk cache is no longer flushed on shutdown)
diff -r 497cf11564ea -r 93e32176d61e sys/dev/dm/device-mapper.c
--- sys/dev/dm/device-mapper.c
+++ sys/dev/dm/device-mapper.c
@@ -260,8 +260,17 @@ dm_attach(device_t parent, device_t self
static int
dm_detach(device_t self, int flags)
{
+ bool busy;
dm_dev_t *dmv;
+ dmv = dm_dev_lookup(NULL, NULL, device_unit(self));
+ mutex_enter(&dmv->diskp->dk_openlock);
+ busy = (dmv->diskp->dk_openmask != 0 && (flags & DETACH_FORCE) == 0);
+ mutex_exit(&dmv->diskp->dk_openlock);
+ dm_dev_unbusy(dmv);
+ if (busy)
+ return EBUSY;
+
pmf_device_deregister(self);
/* Detach device from global device list */
@@ -334,6 +343,25 @@ dmdestroy(void)
static int
dmopen(dev_t dev, int flags, int mode, struct lwp *l)
{
+ dm_dev_t *dmv;
+ struct disk *dk;
+
+ dmv = dm_dev_lookup(NULL, NULL, minor(dev));
+ if (dmv) {
+ dk = dmv->diskp;
+ mutex_enter(&dk->dk_openlock);
+ switch (mode) {
+ case S_IFCHR:
+ dk->dk_copenmask |= 1;
+ break;
+ case S_IFBLK:
+ dk->dk_bopenmask |= 1;
+ break;
+ }
+ dk->dk_openmask = dk->dk_copenmask | dk->dk_bopenmask;
+ mutex_exit(&dk->dk_openlock);
+ dm_dev_unbusy(dmv);
+ }
aprint_debug("dm open routine called %" PRIu32 "\n", minor(dev));
return 0;
@@ -342,8 +370,27 @@ dmopen(dev_t dev, int flags, int mode, s
static int
dmclose(dev_t dev, int flags, int mode, struct lwp *l)
{
+ dm_dev_t *dmv;
+ struct disk *dk;
aprint_debug("dm close routine called %" PRIu32 "\n", minor(dev));
+
+ dmv = dm_dev_lookup(NULL, NULL, minor(dev));
+ if (dmv) {
+ dk = dmv->diskp;
+ mutex_enter(&dk->dk_openlock);
+ switch (mode) {
+ case S_IFCHR:
+ dk->dk_copenmask &= ~1;
+ break;
+ case S_IFBLK:
+ dk->dk_bopenmask &= ~1;
+ break;
+ }
+ dk->dk_openmask = dk->dk_copenmask | dk->dk_bopenmask;
+ mutex_exit(&dk->dk_openlock);
+ dm_dev_unbusy(dmv);
+ }
return 0;
}
--Apple-Mail=_129BAF62-8E27-4159-8DB4-38C1B290CAA3--
--Apple-Mail=_02C4883B-991C-4B77-BB72-7F959434AA00
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAmCSoswACgkQKoaVJdNr
+uEFzQgAgXaa6+Vi1Fs+aPTZpA2XaO6tNCeqOxkUcsLbxD8m2zTiUJ02FkoMaXZL
1F4kM9cNJl9jFzy0L2GJCoJ6LB8XCyhu2yIOHhVWuCiN0rfAfEW25JnEcdcQAPas
zvBEfo3p6a3pcWaZzj+suvN6fle7zAET4IUSVEnF0O5Xe1yS71SA/eSAZOqv+C9D
1CFaWOt8E/CEEA/3s0D6X3g9euFM4kOFxq0SPLB3oZ6OQ6hmLDLtmkbH1Q8RjGh1
rA0MiyLC1BfmsaiPnLJKSyYzEL3gEDVbc0pQCcxcfRW+aqxr8YybhmbaLYdvfiZe
KtshrvD39mSAfIl+tx3hrkVlDqgmtw==
=4lQO
-----END PGP SIGNATURE-----
--Apple-Mail=_02C4883B-991C-4B77-BB72-7F959434AA00--
From: "Greg A. Woods" <woods@planix.ca>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Thu, 06 May 2021 14:38:11 -0700
--pgp-sign-Multipart_Thu_May__6_14:37:55_2021-1
Content-Type: text/plain; charset=US-ASCII
At Wed, 5 May 2021 15:51:08 +0200, "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de> wrote:
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
>
> The attached diffs should fix the problem with device-mapper
> devices getting detached too early.
Thank you very much!
> Please report if this fix really works.
I would say that they do work, though I wasn't brave enough to leave the
filesystem really dirty -- I unmounted it manually, re-mounted it, then
copied one new file to it before rebooting. Perhaps I should have left
my console login shell sitting with its CWD in that filesystem just to
try to delay the umount from working further.
Here are the verbose shutdown messages from the kernel showing that
/build unmounted right away without complaint, and the dm(4) devices
detach much later:
[ 1295581.6679937] unmounted /dev/mapper/scratch-build from /build, type ffs
[ 1295581.6679937] unmounted more.local:/vcs from /more/vcs, type nfs
[ 1295581.6679937] unmounted more.local:/work from /more/work, type nfs
[ 1295581.6779931] unmounted more.local:/home from /more/home, type nfs
[ 1295581.6779931] unmounted more.local:/archive from /more/archive, type nfs
[ 1295581.6779931] unmounted procfs from /proc, type procfs
[ 1295581.6779931] unmounted ptyfs from /dev/pts, type ptyfs
[ 1295581.6779931] unmounted kernfs from /kern, type kernfs
[ 1295581.9880145] unmounted /dev/dk3 from /usr/pkg, type ffs
[ 1295582.0080257] unmounted /dev/dk2 from /var, type ffs
[ 1295582.1180252] unmount of / (/dev/dk0) failed with error 16
[ 1295582.1180252] WARNING: some file systems would not unmount
[ 1295582.1180252] unmount of / (/dev/dk0) failed with error 16
[ 1295582.1180252] WARNING: some file systems would not unmount
[ 1295583.8281335] brgphy1: detached
[ 1295583.8481286] bnx1: detached
[ 1295585.5582807] brgphy0: detached
[ 1295585.5782645] bnx0: detached
[ 1295585.5782645] pci6: detached
[ 1295585.5782645] pci4: detached
[ 1295585.5782645] sd2: detached
[ 1295585.5782645] cd1: detached
[ 1295585.5782645] ppb5: detached
[ 1295585.5782645] ppb3: detached
[ 1295585.5782645] scsibus2: detached
[ 1295585.5782645] scsibus1: detached
[ 1295585.5782645] pci5: detached
[ 1295585.5782645] pci3: detached
[ 1295585.7582583] brgphy2: detached
[ 1295585.7882989] bnx2: detached
[ 1295585.7882989] ppb4: detached
[ 1295585.7882989] ppb2: detached
[ 1295585.7882989] uhub6: detached
[ 1295585.7882989] cd0: detached
[ 1295585.7882989] pci14: detached
[ 1295585.7882989] pci7: detached
[ 1295585.7882989] pci2: detached
[ 1295585.7882989] atapibus0: detached
[ 1295585.7882989] uhub4: detached
[ 1295585.7882989] uhub2: detached
[ 1295585.7882989] uhub1: detached
[ 1295585.7882989] uhub0: detached
[ 1295585.7882989] com1: detached
[ 1295585.7882989] ppb13: detached
[ 1295585.7982582] ppb6: detached
[ 1295585.7982582] ppb1: detached
[ 1295585.7982582] atabus0: detached
[ 1295585.7982582] usb3: detached
[ 1295585.7982582] usb2: detached
[ 1295585.7982582] usb1: detached
[ 1295585.7982582] usb0: detached
[ 1295585.7982582] pci13: detached
[ 1295585.7982582] pci12: detached
[ 1295585.7982582] pci11: detached
[ 1295585.7982582] pci10: detached
[ 1295585.7982582] pci9: detached
[ 1295585.7982582] pci1: detached
[ 1295585.7982582] uhci3: detached
[ 1295585.7982582] uhci2: detached
[ 1295585.7982582] uhci1: detached
[ 1295585.7982582] uhci0: detached
[ 1295585.7982582] ppb12: detached
[ 1295585.7982582] pchb7: detached
[ 1295585.7982582] pchb6: detached
[ 1295585.7982582] pchb5: detached
[ 1295585.7982582] pchb4: detached
[ 1295585.7982582] pchb3: detached
[ 1295585.7982582] pchb2: detached
[ 1295585.7982582] pchb1: detached
[ 1295585.7982582] ppb11: detached
[ 1295585.7982582] ppb10: detached
[ 1295585.7982582] ppb9: detached
[ 1295585.7982582] ppb8: detached
[ 1295585.7982582] ppb0: detached
[ 1295585.7982582] pchb0: detached
[ 1295585.7982582] ipmi_acpi0: detached
[ 1295585.7982582] dm7: detached
[ 1295585.7982582] dm6: detached
[ 1295585.7982582] dm5: detached
[ 1295585.7982582] dm4: detached
[ 1295585.7982582] dm3: detached
[ 1295585.7982582] dm2: detached
[ 1295585.7982582] dm1: detached
[ 1295585.7982582] dm0: detached
[ 1295585.7982582] cgd3: detached
[ 1295585.7982582] vnd3: detached
[ 1295585.7982582] cgd2: detached
[ 1295585.7982582] vnd2: detached
[ 1295585.7982582] cgd1: detached
[ 1295585.7982582] vnd1: detached
[ 1295585.7982582] cgd0: detached
[ 1295585.7982582] vnd0: detached
[ 1295585.7982582] dk5 at sd1 (scratchdisk0) deleted
[ 1295585.7982582] dk5: detached
[ 1295585.7982582] dk4 at sd0 (LVM-vg0) deleted
[ 1295585.7982582] dk4: detached
[ 1295585.7982582] dk3 at sd0 (/usr/pkg) deleted
[ 1295585.7982582] dk3: detached
For the record here's a full log of a failed umount prior to applying
any patches:
[Mon Apr 5 11:35:12 2021][ 243874.4095581] syncing disks... done
[Mon Apr 5 11:35:12 2021][ 243874.5395716] unmounted more.local:/vcs from /more/vcs, type nfs
[Mon Apr 5 11:35:12 2021][ 243874.5395716] unmounted more.local:/work from /more/work, type nfs
[Mon Apr 5 11:35:12 2021][ 243874.5495735] unmounted more.local:/home from /more/home, type nfs
[Mon Apr 5 11:35:12 2021][ 243874.5495735] unmounted more.local:/archive from /more/archive, type nfs
[Mon Apr 5 11:35:13 2021][ 243875.7096504] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Mon Apr 5 11:35:13 2021][ 243875.7096504] unmounted procfs from /proc, type procfs
[Mon Apr 5 11:35:13 2021][ 243875.7096504] unmounted ptyfs from /dev/pts, type ptyfs
[Mon Apr 5 11:35:13 2021][ 243875.7096504] unmounted kernfs from /kern, type kernfs
[Mon Apr 5 11:35:14 2021][ 243875.7497417] unmounted /dev/dk3 from /usr/pkg, type ffs
[Mon Apr 5 11:35:14 2021][ 243875.7797304] unmounted /dev/dk2 from /var, type ffs
[Mon Apr 5 11:35:14 2021][ 243875.8097252] unmount of / (/dev/dk0) failed with error 16
[Mon Apr 5 11:35:14 2021][ 243875.8097252] WARNING: some file systems would not unmount
[Mon Apr 5 11:35:14 2021][ 243875.8097252] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Mon Apr 5 11:35:14 2021][ 243875.8097252] unmount of / (/dev/dk0) failed with error 16
[Mon Apr 5 11:35:14 2021][ 243875.8097252] WARNING: some file systems would not unmount
[Mon Apr 5 11:35:15 2021][ 243877.5197731] brgphy1: detached
[Mon Apr 5 11:35:15 2021][ 243877.5398270] bnx1: detached
[Mon Apr 5 11:35:17 2021][ 243879.2498901] brgphy0: detached
[Mon Apr 5 11:35:17 2021][ 243879.2699359] bnx0: detached
[Mon Apr 5 11:35:17 2021][ 243879.2699359] pci6: detached
[Mon Apr 5 11:35:17 2021][ 243879.2699359] pci4: detached
[Mon Apr 5 11:35:17 2021][ 243879.2699359] ppb5: detached
[Mon Apr 5 11:35:17 2021][ 243879.2699359] ppb3: detached
[Mon Apr 5 11:35:17 2021][ 243879.2699359] pci5: detached
[Mon Apr 5 11:35:17 2021][ 243879.2699359] pci3: detached
[Mon Apr 5 11:35:17 2021][ 243879.4599042] brgphy2: detached
[Mon Apr 5 11:35:17 2021][ 243879.4899784] bnx2: detached
[Mon Apr 5 11:35:17 2021][ 243879.4899784] ppb4: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] ppb2: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] uhub6: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] uhub5: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] entropy: cd0 detached as an entropy source
[Mon Apr 5 11:35:17 2021][ 243879.4999078] cd0: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] pci14: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] pci7: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] pci2: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] atapibus0: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] uhub4: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] uhub3: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] uhub2: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] uhub1: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] uhub0: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] com1: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] ppb13: detached
[Mon Apr 5 11:35:17 2021][ 243879.4999078] ppb6: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] ppb1: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] atabus0: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] usb4: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] usb3: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] usb2: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] usb1: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] usb0: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pci13: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pci12: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pci11: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pci10: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pci9: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pci1: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] ehci0: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] uhci3: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] uhci2: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] uhci1: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] uhci0: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] ppb12: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pchb7: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pchb6: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pchb5: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pchb4: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pchb3: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pchb2: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pchb1: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] ppb11: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] ppb10: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] ppb9: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] ppb8: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] ppb0: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] pchb0: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] ipmi_acpi0: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dm5: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dm4: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dm3: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dm2: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dm1: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dm0: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] cgd3: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] vnd3: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] cgd2: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] vnd2: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] cgd1: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] vnd1: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] cgd0: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dk5 at sd1 (scratchdisk0) deleted
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dk5: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dk4 at sd0 (LVM-vg0) deleted
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dk4: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dk3 at sd0 (/usr/pkg) deleted
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dk3: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dk2 at sd0 (/var) deleted
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dk2: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dk1 at sd0 (swap) deleted
[Mon Apr 5 11:35:17 2021][ 243879.5099064] dk1: detached
[Mon Apr 5 11:35:17 2021][ 243879.5099064] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Mon Apr 5 11:35:18 2021][ 243879.5099064] unmount of / (/dev/dk0) failed with error 16
[Mon Apr 5 11:35:18 2021][ 243879.5099064] WARNING: some file systems would not unmount
[Mon Apr 5 11:35:18 2021][ 243879.5099064] entropy: sd1 detached as an entropy source
[Mon Apr 5 11:35:18 2021][ 243879.5099064] sd1: detached
[Mon Apr 5 11:35:18 2021][ 243879.5099064] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[Mon Apr 5 11:35:18 2021][ 243879.5099064] unmount of / (/dev/dk0) failed with error 16
[Mon Apr 5 11:35:18 2021][ 243879.5099064] WARNING: some file systems would not unmount
[Mon Apr 5 11:35:18 2021][ 243879.5099064] forcefully unmounting /dev/mapper/scratch-build from /build...
[Mon Apr 5 11:35:18 2021][ 243879.5099064] forcefully unmounted /dev/mapper/scratch-build from /build, type ffs
[Mon Apr 5 11:35:18 2021][ 243879.5099064] unmount of / (/dev/dk0) failed with error 16
[Mon Apr 5 11:35:18 2021][ 243879.5199222] WARNING: some file systems would not unmount
[Mon Apr 5 11:35:18 2021][ 243879.5199222] forcefully unmounting /dev/dk0 from /...
[Mon Apr 5 11:35:18 2021][ 243879.5199222] forcefully unmounted /dev/dk0 from /, type ffs
[Mon Apr 5 11:35:18 2021][ 243879.5199222] unmounting done
[Mon Apr 5 11:35:18 2021][ 243879.5199222] turning off swap... done
[Mon Apr 5 11:35:18 2021][ 243879.5199222] dk0 at sd0 (/) deleted
[Mon Apr 5 11:35:18 2021][ 243879.5199222] entropy: sd0 detached as an entropy source
[Mon Apr 5 11:35:18 2021][ 243879.5199222] sd0: detached
[Mon Apr 5 11:35:18 2021][ 243879.5199222] scsibus0: detached
[Mon Apr 5 11:35:18 2021][ 243879.7099877] mfi0: detached
[Mon Apr 5 11:35:18 2021][ 243879.7099877] pci8: detached
[Mon Apr 5 11:35:18 2021][ 243879.7099877] ppb7: detached
[Mon Apr 5 11:35:18 2021][ 243879.7099877] unmounting done
[Mon Apr 5 11:35:18 2021][ 243879.7099877] turning off swap... done
[Mon Apr 5 11:35:18 2021][ 243879.7099877] rebooting...
What's clear there is that the dm(4) devices do detach before /build is
properly, or even forcefully, unmounted.
What's not so clear is why it failed the initial umount -- I'm guessing
though it was because some process was still sitting with its CWD on it.
Perhaps I'll try that one more time just to trick it and before I get
involved in doing other things that make me reluctant to reboot again.
--
Greg A. Woods <gwoods@acm.org>
Kelowna, BC +1 250 762-7675 RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Avoncote Farms <woods@avoncote.ca>
--pgp-sign-Multipart_Thu_May__6_14:37:55_2021-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit
Content-Description: OpenPGP Digital Signature
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQRuK6dmwVAucmRxuh9mfXG3eL/0fwUCYJRhtgAKCRBmfXG3eL/0
f4fhAJ4na4boIu/s3KUl/qNDKOlZH6A6zwCgt9g7LpdLZE4QGwfW/A34sfviX8Y=
=b9Pf
-----END PGP SIGNATURE-----
--pgp-sign-Multipart_Thu_May__6_14:37:55_2021-1--
From: "Greg A. Woods" <woods@planix.ca>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Thu, 06 May 2021 15:44:15 -0700
--pgp-sign-Multipart_Thu_May__6_15:44:03_2021-1
Content-Type: text/plain; charset=US-ASCII
At Wed, 5 May 2021 15:51:08 +0200, "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de> wrote:
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
>
> Please report if this fix really works.
I think this reboot shows definitively that it really works.
I left a couple of processes with CWD in the filesystem, one being a
nohup'ed sleep and the other being the console login shell, and here we
see /build having to be forcefully unmounted, after all but one of dm(4)
devices detach, with the relevant dm1 (and its underlying dk(4) and
sd(4) devices) only detaching after the unmount:
[ 3868.1813906] syncing disks... done
[ 3868.1913903] unmounted more.local:/vcs from /more/vcs, type nfs
[ 3868.1913903] unmounted more.local:/work from /more/work, type nfs
[ 3868.2514795] unmounted more.local:/home from /more/home, type nfs
[ 3868.2514795] unmounted more.local:/archive from /more/archive, type nfs
[ 3868.2614057] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[ 3868.2614057] unmounted procfs from /proc, type procfs
[ 3868.2614057] unmounted ptyfs from /dev/pts, type ptyfs
[ 3868.2614057] unmounted kernfs from /kern, type kernfs
[ 3868.3313995] unmounted /dev/dk3 from /usr/pkg, type ffs
[ 3868.4114046] unmounted /dev/dk2 from /var, type ffs
[ 3868.5615043] unmount of / (/dev/dk0) failed with error 16
[ 3868.5615043] WARNING: some file systems would not unmount
[ 3868.5615043] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[ 3868.5615043] unmount of / (/dev/dk0) failed with error 16
[ 3868.5615043] WARNING: some file systems would not unmount
[ 3870.2715341] brgphy1: detached
[ 3870.2915786] bnx1: detached
[ 3871.9916481] brgphy0: detached
[ 3872.0117375] bnx0: detached
[ 3872.0117375] pci6: detached
[ 3872.0117375] pci4: detached
[ 3872.0117375] sd2: detached
[ 3872.0117375] cd1: detached
[ 3872.0216581] ppb5: detached
[ 3872.0216581] ppb3: detached
[ 3872.0216581] scsibus2: detached
[ 3872.0216581] scsibus1: detached
[ 3872.0216581] pci5: detached
[ 3872.0216581] pci3: detached
[ 3872.2016539] brgphy2: detached
[ 3872.2216614] bnx2: detached
[ 3872.2316580] ppb4: detached
[ 3872.2316580] ppb2: detached
[ 3872.2316580] uhub6: detached
[ 3872.2316580] cd0: detached
[ 3872.2316580] pci14: detached
[ 3872.2316580] pci7: detached
[ 3872.2316580] pci2: detached
[ 3872.2316580] atapibus0: detached
[ 3872.2316580] uhub4: detached
[ 3872.2316580] uhub2: detached
[ 3872.2316580] uhub1: detached
[ 3872.2316580] uhub0: detached
[ 3872.2316580] com1: detached
[ 3872.2316580] ppb13: detached
[ 3872.2316580] ppb6: detached
[ 3872.2416575] ppb1: detached
[ 3872.2416575] atabus0: detached
[ 3872.2416575] usb3: detached
[ 3872.2416575] usb2: detached
[ 3872.2416575] usb1: detached
[ 3872.2416575] usb0: detached
[ 3872.2416575] pci13: detached
[ 3872.2416575] pci12: detached
[ 3872.2416575] pci11: detached
[ 3872.2416575] pci10: detached
[ 3872.2416575] pci9: detached
[ 3872.2416575] pci1: detached
[ 3872.2416575] uhci3: detached
[ 3872.2416575] uhci2: detached
[ 3872.2416575] uhci1: detached
[ 3872.2416575] uhci0: detached
[ 3872.2416575] ppb12: detached
[ 3872.2416575] pchb7: detached
[ 3872.2416575] pchb6: detached
[ 3872.2416575] pchb5: detached
[ 3872.2416575] pchb4: detached
[ 3872.2416575] pchb3: detached
[ 3872.2416575] pchb2: detached
[ 3872.2416575] pchb1: detached
[ 3872.2416575] ppb11: detached
[ 3872.2416575] ppb10: detached
[ 3872.2416575] ppb9: detached
[ 3872.2416575] ppb8: detached
[ 3872.2416575] ppb0: detached
[ 3872.2416575] pchb0: detached
[ 3872.2416575] ipmi_acpi0: detached
[ 3872.2416575] dm8: detached
[ 3872.2416575] dm7: detached
[ 3872.2416575] dm6: detached
[ 3872.2416575] dm5: detached
[ 3872.2416575] dm4: detached
[ 3872.2416575] dm3: detached
[ 3872.2416575] dm2: detached
[ 3872.2416575] cgd3: detached
[ 3872.2416575] vnd3: detached
[ 3872.2416575] cgd2: detached
[ 3872.2416575] vnd2: detached
[ 3872.2416575] cgd1: detached
[ 3872.2416575] vnd1: detached
[ 3872.2416575] cgd0: detached
[ 3872.2416575] vnd0: detached
[ 3872.2416575] dk4 at sd0 (LVM-vg0) deleted
[ 3872.2416575] dk4: detached
[ 3872.2416575] dk3 at sd0 (/usr/pkg) deleted
[ 3872.2416575] dk3: detached
[ 3872.2416575] dk2 at sd0 (/var) deleted
[ 3872.2416575] dk2: detached
[ 3872.2416575] dk1 at sd0 (swap) deleted
[ 3872.2416575] dk1: detached
[ 3872.2416575] unmount of /build (/dev/mapper/scratch-build) failed with error 16
[ 3872.2416575] unmount of / (/dev/dk0) failed with error 16
[ 3872.2416575] WARNING: some file systems would not unmount
[ 3872.2416575] forcefully unmounting /dev/mapper/scratch-build from /build...
[ 3872.2516597] forcefully unmounted /dev/mapper/scratch-build from /build, type ffs
[ 3872.2516597] unmount of / (/dev/dk0) failed with error 16
[ 3872.2516597] WARNING: some file systems would not unmount
[ 3872.2516597] dm1: detached
[ 3872.2516597] dk5 at sd1 (scratchdisk0) deleted
[ 3872.2516597] dk5: detached
[ 3872.2516597] unmount of / (/dev/dk0) failed with error 16
[ 3872.2516597] WARNING: some file systems would not unmount
[ 3872.2516597] sd1: detached
[ 3872.2516597] unmount of / (/dev/dk0) failed with error 16
[ 3872.2516597] WARNING: some file systems would not unmount
[ 3872.2516597] forcefully unmounting /dev/dk0 from /...
[ 3872.2616634] forcefully unmounted /dev/dk0 from /, type ffs
[ 3872.2616634] unmounting done
[ 3872.2616634] turning off swap... done
[ 3872.2616634] dk0 at sd0 (/) deleted
[ 3872.2616634] sd0: detached
[ 3872.2616634] scsibus0: detached
[ 3872.4517420] mfi0: detached
[ 3872.4517420] pci8: detached
[ 3872.4517420] ppb7: detached
[ 3872.4517420] unmounting done
[ 3872.4517420] turning off swap... done
[ 3872.4517420] rebooting...
(XEN) [2021-05-06 21:54:10.835] Hardware Dom0 shutdown: rebooting machine
--
Greg A. Woods <gwoods@acm.org>
Kelowna, BC +1 250 762-7675 RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Avoncote Farms <woods@avoncote.ca>
--pgp-sign-Multipart_Thu_May__6_15:44:03_2021-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit
Content-Description: OpenPGP Digital Signature
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQRuK6dmwVAucmRxuh9mfXG3eL/0fwUCYJRxOAAKCRBmfXG3eL/0
f+wYAKCPGlT0WmH7UQe6LCAof9I8foZrTACfajB64psSESwde8rdlwSEk4WfzAs=
=b8EV
-----END PGP SIGNATURE-----
--pgp-sign-Multipart_Thu_May__6_15:44:03_2021-1--
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54969 CVS commit: src/sys/dev/dm
Date: Fri, 7 May 2021 09:54:43 +0000
Module Name: src
Committed By: hannken
Date: Fri May 7 09:54:43 UTC 2021
Modified Files:
src/sys/dev/dm: device-mapper.c
Log Message:
Track the number of cdev and bdev opens and fail dm_detach()
on open devices unless detach is forced.
PR kern/54969 (Disk cache is no longer flushed on shutdown)
To generate a diff of this commit:
cvs rdiff -u -r1.61 -r1.62 src/sys/dev/dm/device-mapper.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 25 Jul 2021 02:35:30 +0000
State-Changed-Why:
Is this one finished? Looks like it
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: dholland@NetBSD.org, hannken@NetBSD.org
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Sun, 25 Jul 2021 12:40:08 +0300
dholland@NetBSD.org wrote:
> Is this one finished? Looks like it
Going by the presence of "wd0: detached" messages in the logs of the
TNF i386 testbed, it was fixed on 2021-02-16, likely by this commit:
2021.02.16.09.56.32 hannken src/sys/kern/vfs_mount.c 1.86
2021.02.16.09.56.32 hannken src/sys/uvm/uvm_swap.c 1.201
--
Andreas Gustafsson, gson@gson.org
State-Changed-From-To: feedback->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Sun, 25 Jul 2021 09:46:13 +0000
State-Changed-Why:
The "wd0: detached" messages appear once more.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.