NetBSD Problem Report #54969

From gson@gson.org  Sun Feb 16 09:38:34 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 745BF1A9213
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 16 Feb 2020 09:38:34 +0000 (UTC)
Message-Id: <20200216093828.779AB253FA3@guava.gson.org>
Date: Sun, 16 Feb 2020 11:38:28 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Disk cache is no longer flushed on shutdown
X-Send-Pr-Version: 3.95

>Number:         54969
>Category:       kern
>Synopsis:       Disk cache is no longer flushed on shutdown
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 16 09:40:00 +0000 2020
>Closed-Date:    
>Last-Modified:  Fri May 07 09:55:01 +0000 2021
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2017.08.21.09.00.21, and -9
>Organization:

>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

The disk controller on one of my systems is logging an error message
on every power-on, indicating that the controller's battery backed
cache still contains data from the previous time the system was
powered on:

  POST Error: 1792-Drive Array Reports Valid Data Found in Array Accelerator

This means that from the controller's perspective, the system was not
shut down cleanly.  But the system has in fact been shut down cleanly,
at least as far as the kernel is concerned, by running "halt -p".

By adding some printfs to the sd(4) driver, I found that sd_flush() is
not being called during the shutdown, and neither is sd_lastclose().

The serial console shows "detached" messages from a large number of
devices including the non-root disk sd1 (which was never mounted), but
the root disk sd0 is conspicuously absent:

  Feb  9 05:32:24 hostname halt: halted by root
  Feb  9 05:32:24 hostname syslogd[167]: Exiting on signal 15
  [ 8086.8109260] syncing disks... done
  [ 8086.9609971] sd1: detached
  [ 8086.9910100] cd0: detached
  [ 8087.0210241] brgphy3: detached
  [ 8087.0610430] brgphy2: detached
  [ 8087.0910569] brgphy1: detached
  [ 8087.1310757] brgphy0: detached
  [ 8087.1710944] atapibus0: detached
  [ 8087.2011089] uhub5: detached
  [ 8087.2411278] uhub3: detached
  [ 8087.2711418] uhub2: detached
  [ 8087.3111606] uhub1: detached
  [ 8087.3411746] com1: detached
  [ 8087.4312167] bnx3: detached
  [ 8087.5212591] bnx2: detached
  [ 8087.6113014] bnx1: detached
  [ 8087.7013435] bnx0: detached
  [ 8087.7313577] atabus1: detached
  [ 8087.7757913] atabus0: detached
  [ 8087.8122505] usb5: detached
  [ 8087.8455835] usb4: detached
  [ 8087.8789168] usb2: detached
  [ 8087.9122492] usb1: detached
  [ 8087.9455816] pci11: detached
  [ 8087.9799570] pci10: detached
  [ 8088.0143320] pci9: detached
  [ 8088.0476650] pci8: detached
  [ 8088.0809989] pci7: detached
  [ 8088.1143312] pci6: detached
  [ 8088.1476642] pci5: detached
  [ 8088.1809984] pci4: detached
  [ 8088.2143316] pci3: detached
  [ 8088.2476648] pci2: detached
  [ 8088.2809972] sysbeep0: detached
  [ 8088.3184975] midi0: detached
  [ 8088.3516482] ehci0: detached
  [ 8088.3816623] uhci4: detached
  [ 8088.4216811] uhci2: detached
  [ 8088.4516952] uhci1: detached
  [ 8088.4817099] ppb10: detached
  [ 8088.5217278] pchb12: detached
  [ 8088.5517425] pchb11: detached
  [ 8088.5917607] pchb10: detached
  [ 8088.6217747] pchb9: detached
  [ 8088.6617935] pchb8: detached
  [ 8088.6918073] pchb7: detached
  [ 8088.7318261] pchb6: detached
  [ 8088.7618402] pchb5: detached
  [ 8088.8018594] pchb4: detached
  [ 8088.8318730] pchb3: detached
  [ 8088.8618871] pchb2: detached
  [ 8088.9019059] pchb1: detached
  [ 8088.9319202] ppb9: detached
  [ 8088.9719389] ppb8: detached
  [ 8089.0019528] ppb7: detached
  [ 8089.0319669] ppb6: detached
  [ 8089.0719858] ppb5: detached
  [ 8089.1019997] ppb4: detached
  [ 8089.1320139] ppb3: detached
  [ 8089.1720326] ppb2: detached
  [ 8089.2020466] ppb1: detached
  [ 8089.2320607] pchb0: detached

  [ 8089.2820849] The operating system has halted.
  [ 8089.2820849] Please press any key to reboot.

This is a HP DL360 G7 server with a P410i disk controller and the
BBWC option.  A full console log is at:

  http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.02.15.12.45.05/test.log

By grepping historic logs from the TNF i386 testbed for the
corresponding "wd0: detached" messages, I found that they were present
until the following commit, and absent thereafter:

  2017.08.21.09.00.21 hannken src/sys/kern/vfs_mount.c 1.67
  2017.08.21.09.00.21 hannken src/sys/kern/vfs_vnode.c 1.98
  2017.08.21.09.00.21 hannken src/sys/sys/vnode_impl.h 1.16

The commit message was "Change forced unmount to revert open device
vnodes to anonymous devices."

This issue looks like it has the potential to cause data loss.  For
example, the HP system will presumably lose the cahced data if powered
off long enough to drain the BBWC battery.  The -9 branch is also
affected.

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Sun, 16 Feb 2020 15:54:14 +0100

 --Apple-Mail=_3DAA2E36-0C99-48D6-8DC3-621AEF12D17E
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 > On 16. Feb 2020, at 10:40, Andreas Gustafsson <gson@gson.org> wrote:
 <snip>
 > By adding some printfs to the sd(4) driver, I found that sd_flush() is
 > not being called during the shutdown, and neither is sd_lastclose().

 On a first look I don't see an obvious problem with the commit.

 Could you add some printfs to sddetach() and see if it gets
 called during shutdown?

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig

 --Apple-Mail=_3DAA2E36-0C99-48D6-8DC3-621AEF12D17E
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl5JV5YACgkQKoaVJdNr
 +uFY2ggAvgnFBfuU+yttkqXE+WGQczUBMD46VJ92eGYPmV7iqOd5oQUtzOvDNriI
 VC7xHfxtEF1Ud/GWX5xkf1IGW3i85vSojU0QzPdEexVmu4O2+Z/7jy8gVfDzI/rF
 +a4jZww3h3DOCUB3eaLIs0G72Py7jvX/h4NlhYqwfLEOiKE6PBsiqMIUFEEHrpvb
 MKU4qVVjYIqxRlo1NjewiScMRTCbpTV6qgFYz9h944ulCnPTyD0pBrMCNbjj366e
 GfasCqh3Wgi6MQLdmLKK8klLIYAzb/hNIG1OKKsHJW6EIX6JW1vw9Z0KbSRFjBnb
 gBfMyyHwBFn1t5DC8uvXT3bSiG7qrA==
 =Y/q0
 -----END PGP SIGNATURE-----

 --Apple-Mail=_3DAA2E36-0C99-48D6-8DC3-621AEF12D17E--

From: Andreas Gustafsson <gson@gson.org>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Mon, 17 Feb 2020 20:15:43 +0200

 J. Hannken-Illjes wrote:
 >  Could you add some printfs to sddetach() and see if it gets
 >  called during shutdown?

 Done.  sddetach() is called for both sd0 and sd1, but the call for sd0
 returns early because disk_begindetach() returns EBUSY.
 -- 
 Andreas Gustafsson, gson@gson.org

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: Andreas Gustafsson <gson@gson.org>
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Wed, 19 Feb 2020 14:31:20 +0100

 --Apple-Mail=_092968F0-61F9-472B-9D3A-FC2BA4D40A76
 Content-Type: multipart/mixed;
 	boundary="Apple-Mail=_CA4E0C13-7F83-4B6E-9110-F0183169C8C4"


 --Apple-Mail=_CA4E0C13-7F83-4B6E-9110-F0183169C8C4
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 Tried with sd0@vioscsi0 under qemu with some printfs and got:

 $ shutdown -p now
 ...
 unmounting 0xffff9d635e36a008 / (/dev/sd0a)...
 forcefully unmounting / (/dev/sd0a)...
 sdclose: dev=0x400 (unit 0)
 dk_close: dev=0x400 error=0 openmask=c0 b0
 sd0: detached
 scsibus0: detached

 With "halt -p" I see the problem from this PR as the swap
 device sd0b doesn't get closed.

 Please report with the attached diff holding the printfs ...

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig

 --Apple-Mail=_CA4E0C13-7F83-4B6E-9110-F0183169C8C4
 Content-Disposition: attachment;
 	filename=sd.c.diff
 Content-Type: application/octet-stream;
 	x-unix-mode=0644;
 	name="sd.c.diff"
 Content-Transfer-Encoding: 7bit

 diff -r 9cba3dc7e065 sys/dev/scsipi/sd.c
 --- sys/dev/scsipi/sd.c	Wed Feb 19 09:01:50 2020 +0100
 +++ sys/dev/scsipi/sd.c	Wed Feb 19 14:29:59 2020 +0100
 @@ -520,6 +520,7 @@ sdopen(dev_t dev, int flag, int fmt, str
  		return (ENXIO);
  	dksc = &sd->sc_dksc;

 +printf("sdopen: dev=0x%"PRIx64" (unit %d)\n", dev, SDUNIT(dev));
  	if (!device_is_active(dksc->sc_dev))
  		return (ENODEV);

 @@ -541,6 +542,7 @@ sdopen(dev_t dev, int flag, int fmt, str
  	}

  	error = dk_open(dksc, dev, flag, fmt, l);
 +printf("dk_open: dev=0x%"PRIx64" error=%d openmask=c%x b%x\n", dev, error, dksc->sc_dkdev.dk_copenmask, dksc->sc_dkdev.dk_bopenmask);

  	SC_DEBUG(periph, SCSIPI_DB3, ("open complete\n"));

 @@ -598,11 +600,14 @@ sdclose(dev_t dev, int flag, int fmt, st
  	struct dk_softc *dksc;
  	int unit;

 +printf("sdclose: dev=0x%"PRIx64" (unit %d)\n", dev, SDUNIT(dev));
  	unit = SDUNIT(dev);
  	sd = device_lookup_private(&sd_cd, unit);
  	dksc = &sd->sc_dksc;

 -	return dk_close(dksc, dev, flag, fmt, l);
 +	int error = dk_close(dksc, dev, flag, fmt, l);
 +printf("dk_close: dev=0x%"PRIx64" error=%d openmask=c%x b%x\n", dev, error, dksc->sc_dkdev.dk_copenmask, dksc->sc_dkdev.dk_bopenmask);
 +	return error;
  }

  /*

 --Apple-Mail=_CA4E0C13-7F83-4B6E-9110-F0183169C8C4--

 --Apple-Mail=_092968F0-61F9-472B-9D3A-FC2BA4D40A76
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl5NOKgACgkQKoaVJdNr
 +uHFjAgAgKXsB9TWKa+32dENULm4zVK1ar/r5VwQf8bIiAzw2oc3mZRdvjy1KKyA
 9mpTHv/ihduKvwe8M0nkx3rq5L5s9m3Sqh5ohV9IwCw2r2uNn495Ymm3xwz5nYBX
 qNRqM42L8Nv61Tl2herVBhI7YABGO4a7meWF9AW5lUr52bkAhMGmj+k0F+kjRiyc
 Igc+Wwz5hTGNEmOkLNv2N9MwBqephFP/GtCKBxeEEs9iaJf6GonDxELh9EFzbyWE
 +olt8SvojQEfd/+HyEDphK2dVF0wpbaRn5yl2V9FZELv4JNTfILM+HQlkpXprWbj
 OUQXemo5rXRrHfci0rw+gtfsntH0dA==
 =2jGX
 -----END PGP SIGNATURE-----

 --Apple-Mail=_092968F0-61F9-472B-9D3A-FC2BA4D40A76--

From: Andreas Gustafsson <gson@gson.org>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Wed, 19 Feb 2020 16:37:09 +0200

 J. Hannken-Illjes wrote:
 > Tried with sd0@vioscsi0 under qemu with some printfs and got:
 > 
 > $ shutdown -p now
 > ...
 > unmounting 0xffff9d635e36a008 / (/dev/sd0a)...
 > forcefully unmounting / (/dev/sd0a)...
 > sdclose: dev=0x400 (unit 0)
 > dk_close: dev=0x400 error=0 openmask=c0 b0
 > sd0: detached
 > scsibus0: detached
 > 
 > With "halt -p" I see the problem from this PR as the swap
 > device sd0b doesn't get closed.
 > 
 > Please report with the attached diff holding the printfs ...

 I can do that, but I'm not sure why it's needed as you seem to have
 already reproduced the problem locally using "halt -p".

 I may have caused some confusion by using the word "shutdown" in the
 PR subject - sorry about that.  I meant it as a reference to the
 general action of shutting the system down, not as a reference to the
 specific command shutdown(8).  Also, I said "halt -p" in the PR, but
 checking the logs, I see that the actual command used was a plain
 "halt" without the -p option.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Andreas Gustafsson <gson@gson.org>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Wed, 19 Feb 2020 21:39:48 +0200

 J. Hannken-Illjes wrote:
 > Please report with the attached diff holding the printfs ...

 Console output from a test run with the patch is now at

   http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.02.19.13.32.40/test.log

 -- 
 Andreas Gustafsson, gson@gson.org

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: Andreas Gustafsson <gson@gson.org>
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Thu, 20 Feb 2020 09:50:48 +0100

 --Apple-Mail=_E794A460-544A-495A-93E2-ABF6592D6DBE
 Content-Type: multipart/mixed;
 	boundary="Apple-Mail=_97B3B872-94A5-4D88-995F-1564B709CEC8"


 --Apple-Mail=_97B3B872-94A5-4D88-995F-1564B709CEC8
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 > On 19. Feb 2020, at 15:40, Andreas Gustafsson <gson@gson.org> wrote:
 <snip>
 > Also, I said "halt -p" in the PR, but
 > checking the logs, I see that the actual command used was a plain
 > "halt" without the -p option.

 Sorry, I missed that.

 The attached diff restores the previous behaviour and destroys
 all device vnodes once the mountlist is empty.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig


 --Apple-Mail=_97B3B872-94A5-4D88-995F-1564B709CEC8
 Content-Disposition: attachment;
 	filename=vfs_mount.c.diff
 Content-Type: application/octet-stream;
 	x-unix-mode=0644;
 	name="vfs_mount.c.diff"
 Content-Transfer-Encoding: 7bit

 diff -r 9cba3dc7e065 sys/kern/vfs_mount.c
 --- sys/kern/vfs_mount.c	Wed Feb 19 09:01:50 2020 +0100
 +++ sys/kern/vfs_mount.c	Thu Feb 20 09:47:23 2020 +0100
 @@ -114,6 +114,8 @@ static struct vnode *vfs_vnode_iterator_
  /* Root filesystem. */
  vnode_t *			rootvnode;

 +extern struct mount		*dead_rootmount;
 +
  /* Mounted filesystem list. */
  static TAILQ_HEAD(mountlist, mountlist_entry) mountlist;
  static kmutex_t			mountlist_lock __cacheline_aligned;
 @@ -1014,6 +1016,7 @@ bool
  vfs_unmountall1(struct lwp *l, bool force, bool verbose)
  {
  	struct mount *mp;
 +	mount_iterator_t *iter;
  	bool any_error = false, progress = false;
  	uint64_t gen;
  	int error;
 @@ -1048,6 +1051,24 @@ vfs_unmountall1(struct lwp *l, bool forc
  	if (any_error && verbose) {
  		printf("WARNING: some file systems would not unmount\n");
  	}
 +
 +	/* If the mountlist is empty destroy anonymous device vnodes. */
 +	mountlist_iterator_init(&iter);
 +	if (mountlist_iterator_next(iter) == NULL) {
 +		struct vnode_iterator *marker;
 +		vnode_t *vp;
 +
 +		vfs_vnode_iterator_init(dead_rootmount, &marker);
 +		while ((vp = vfs_vnode_iterator_next(marker, NULL, NULL))) {
 +			if (vp->v_type == VCHR || vp->v_type == VBLK)
 +				vgone(vp);
 +			else
 +				vrele(vp);
 +		}
 +		vfs_vnode_iterator_destroy(marker);
 +	}
 +	mountlist_iterator_destroy(iter);
 +
  	return progress;
  }


 --Apple-Mail=_97B3B872-94A5-4D88-995F-1564B709CEC8--

 --Apple-Mail=_E794A460-544A-495A-93E2-ABF6592D6DBE
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl5OSGgACgkQKoaVJdNr
 +uGSGQf/XouOoWRpynjY3tFz5SSjLEUTYBygYPtwk+b7FSK1u0mF1PENgrZvVMCu
 PNX48MslL1MycU/HlsEwHUM4Zdw+dvqvQjJSW+wfCe3ILu+jHciIm+7KrCOfVEit
 Z53zJs9EYuxzjSO+PCfFkQFXeR1mWcCbnoupKZXdWz4Bp8fZcCclUVJ2AKmcBksE
 +bJNJUSM6CwEdhltLDq/3dRUVejimWMfs0NGgtc16KXMmiMBn0+Ma3dyrea2zrl5
 7lhAhByqjMIINK+ZoXKlkMZg+vnl8Jh/nM1KuitccpyZOz8UiGKxd0HaMiedF95z
 RjIfqXIhx9li6S0QvSgfMqRN5hheGA==
 =31gT
 -----END PGP SIGNATURE-----

 --Apple-Mail=_E794A460-544A-495A-93E2-ABF6592D6DBE--

From: Andreas Gustafsson <gson@gson.org>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Thu, 20 Feb 2020 17:27:12 +0200

 J. Hannken-Illjes wrote:
 > The attached diff restores the previous behaviour and destroys
 > all device vnodes once the mountlist is empty.

 With this patch, the "sd0: detached" message appears:

   http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.02.20.08.31.17/test.log

 -- 
 Andreas Gustafsson, gson@gson.org

From: Andreas Gustafsson <gson@gson.org>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/54969: Disk cache is no longer flushed on shutdown
Date: Sun, 12 Apr 2020 15:48:44 +0300

 On Feb 20, J. Hannken-Illjes wrote:
 >  The attached diff restores the previous behaviour and destroys
 >  all device vnodes once the mountlist is empty.

 Since the patch appears to work, could you commit it?
 -- 
 Andreas Gustafsson, gson@gson.org

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54969 CVS commit: src/sys/kern
Date: Sun, 19 Apr 2020 13:26:18 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Sun Apr 19 13:26:18 UTC 2020

 Modified Files:
 	src/sys/kern: vfs_mount.c

 Log Message:
 Destroy anonymous device vnodes on reboot once the last file system
 got unmounted and the mount list is empty.

 PR kern/54969: Disk cache is no longer flushed on shutdown


 To generate a diff of this commit:
 cvs rdiff -u -r1.78 -r1.79 src/sys/kern/vfs_mount.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->needs-pullups
State-Changed-By: gson@NetBSD.org
State-Changed-When: Mon, 20 Apr 2020 12:31:15 +0000
State-Changed-Why:
Confirmed fixed in -current, should be pulled up to -9.


State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: gson@NetBSD.org
State-Changed-When: Mon, 20 Apr 2020 14:21:52 +0000
State-Changed-Why:
Pullup to -9 requested.


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54969 CVS commit: [netbsd-9] src/sys/kern
Date: Wed, 22 Apr 2020 18:05:11 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Wed Apr 22 18:05:11 UTC 2020

 Modified Files:
 	src/sys/kern [netbsd-9]: vfs_mount.c

 Log Message:
 Pull up following revision(s) (requested by gson in ticket #839):

 	sys/kern/vfs_mount.c: revision 1.79

 Destroy anonymous device vnodes on reboot once the last file system
 got unmounted and the mount list is empty.

 PR kern/54969: Disk cache is no longer flushed on shutdown


 To generate a diff of this commit:
 cvs rdiff -u -r1.70 -r1.70.4.1 src/sys/kern/vfs_mount.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Thu, 30 Apr 2020 06:59:36 +0000
State-Changed-Why:
Pullup done.


From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54969 CVS commit: src/sys/kern
Date: Fri, 1 May 2020 08:45:01 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Fri May  1 08:45:01 UTC 2020

 Modified Files:
 	src/sys/kern: vfs_mount.c

 Log Message:
 Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
 disks before the raid:

   forcefully unmounting / (/dev/raid0a)...
   sd1: detached
   sd0: detached
   raid0: cache flush to component /dev/sd0a failed.
   raid0: cache flush to component /dev/sd1a failed.
   fatal page fault in supervisor mode
   Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

 Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


 To generate a diff of this commit:
 cvs rdiff -u -r1.81 -r1.82 src/sys/kern/vfs_mount.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: closed->open
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Fri, 01 May 2020 09:00:43 +0000
State-Changed-Why:
Fix reverted -- back to start ...


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54969 CVS commit: [netbsd-9] src/sys/kern
Date: Fri, 1 May 2020 11:54:53 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Fri May  1 11:54:53 UTC 2020

 Modified Files:
 	src/sys/kern [netbsd-9]: vfs_mount.c

 Log Message:
 Pull up following revision(s) (requested by hannken in ticket #881):

 	sys/kern/vfs_mount.c: revision 1.82

 Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
 disks before the raid:

  forcefully unmounting / (/dev/raid0a)...
  sd1: detached
  sd0: detached
  raid0: cache flush to component /dev/sd0a failed.
  raid0: cache flush to component /dev/sd1a failed.
  fatal page fault in supervisor mode
  Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

 Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


 To generate a diff of this commit:
 cvs rdiff -u -r1.70.4.1 -r1.70.4.2 src/sys/kern/vfs_mount.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: hannken@netbsd.org
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 17 Jul 2020 12:35:50 +0200

 Any update on this? Rediscovered this still open PR when searching for
 bnx(4) PRs.

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@netbsd.org
Cc: hannken@netbsd.org
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 17 Jul 2020 15:33:11 +0300

 Jaromir Dolecek wrote:
 >  Any update on this?

 It's still broken as can be seen from the lack of an "sd0: detached"
 message among the shutdown console messages at the end of

   http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.07.16.18.39.19/test.log

 See also 55393, "System booted from USB panics on shutdown".
 -- 
 Andreas Gustafsson, gson@gson.org

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: =?utf-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
 hannken@netbsd.org
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 17 Jul 2020 17:11:18 +0200

 --Apple-Mail=_35FDBF4E-E008-4953-A23F-733009D2C106
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain;
 	charset=utf-8

 > On 17. Jul 2020, at 12:35, Jarom=C3=ADr Dole=C4=8Dek =
 <jaromir.dolecek@gmail.com> wrote:
 >=20
 > Any update on this? Rediscovered this still open PR when searching for
 > bnx(4) PRs.

 The current behaviour is the result of a longer discussion
 "Fixing swap1_stop" in 2017, starting at

 http://mail-index.netbsd.org/current-users/2017/08/03/msg032129.html

 The submitter wants to halt without disabling swap and the
 unclosed swap device prevents the root device from closing.

 This is intended and using "shutdown" instead of "halt" works.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig

 --Apple-Mail=_35FDBF4E-E008-4953-A23F-733009D2C106
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl8Rv5YACgkQKoaVJdNr
 +uENlAf9FjGKDQv4gcvgXxNNiw9GCr6VDd7wf4iI6iEugwo0fG0irSfkcGeQRiPR
 jOnucEfln3Mtk7D1y/Ap42eSbc9JfmPvotGoR6a8xDjOTzHDgejpEPqu6fPY/+jB
 XsWtlbczIj9V8SUbGRQwcw85+FRMYNxT8ivWLEKVbIhR35hn05lAHuqqA+xe/Xih
 RG1ytbFcBvirOw7Y5NAPP59oUOecdi1XEPZAPTYxr08EizcyK6VssOC1TK12aKx/
 71wz7o64Ke3N7BHt1N01ojoRFSvSSsjtyWHiHM81pEmSWP69YBapFk/hl7SLF2/a
 zQtkAzPQ5G1a1aQxzj20P4CrFVs+sA==
 =riiG
 -----END PGP SIGNATURE-----

 --Apple-Mail=_35FDBF4E-E008-4953-A23F-733009D2C106--

From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, hannken@netbsd.org
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 17 Jul 2020 23:44:34 +0200

 Le ven. 17 juil. 2020 =C3=A0 17:11, J. Hannken-Illjes
 <hannken@eis.cs.tu-bs.de> a =C3=A9crit :
 >
 > > On 17. Jul 2020, at 12:35, Jarom=C3=ADr Dole=C4=8Dek <jaromir.dolecek@g=
 mail.com> wrote:
 > >
 > > Any update on this? Rediscovered this still open PR when searching for
 > > bnx(4) PRs.
 >
 > The current behaviour is the result of a longer discussion
 > "Fixing swap1_stop" in 2017, starting at
 >
 > http://mail-index.netbsd.org/current-users/2017/08/03/msg032129.html
 >
 > The submitter wants to halt without disabling swap and the
 > unclosed swap device prevents the root device from closing.
 >
 > This is intended and using "shutdown" instead of "halt" works.

 I think it's the kernel's responsibility to ensure all the swap is
 disabled and devices properly closed, regardless if userland managed
 to do this or not, i.e. regardless if shutdown was via 'shutdown' or
 'halt'

 At the time the shutdown code runs and devices are detached, no user
 processes run any longer. Swap can't be needed to finish shutdown -
 the system state during the late shutdown phase is similar as during
 boot, which also doesn't have swap.

 How hard it would be to change this so the swap is actually disabled
 before detaching the physical devices?

 Jaromir

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 31 Jul 2020 18:07:10 +0200

 --Apple-Mail=_812D5C87-9AEB-4C62-8362-49E678AE2E3B
 Content-Type: multipart/mixed;
 	boundary="Apple-Mail=_BAE26654-F34D-4CFA-B364-64074B45AFCA"


 --Apple-Mail=_BAE26654-F34D-4CFA-B364-64074B45AFCA
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain;
 	charset=utf-8

 > On 17. Jul 2020, at 23:45, Jarom=C3=ADr Dole=C4=8Dek =
 <jaromir.dolecek@gmail.com> wrote:
 <snip>
 > How hard it would be to change this so the swap is actually disabled
 > before detaching the physical devices?

 The attached diff should do what you want.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig


 --Apple-Mail=_BAE26654-F34D-4CFA-B364-64074B45AFCA
 Content-Disposition: attachment;
 	filename=003_swapoff.diff
 Content-Type: application/octet-stream;
 	x-unix-mode=0644;
 	name="003_swapoff.diff"
 Content-Transfer-Encoding: 7bit

 swapoff

 Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
 locked and referenced across the call to swap_off() and finally
 use it from vfs_unmountall1() to remove swap after unmounting
 the last file system.

 Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)

 diff -r 816c662719ab -r 8e737f88a4e6 sys/kern/vfs_mount.c
 --- sys/kern/vfs_mount.c
 +++ sys/kern/vfs_mount.c
 @@ -94,6 +94,8 @@
  #include <miscfs/genfs/genfs.h>
  #include <miscfs/specfs/specdev.h>

 +#include <uvm/uvm_swap.h>
 +
  enum mountlist_type {
  	ME_MOUNT,
  	ME_MARKER
 @@ -1014,6 +1016,7 @@ bool
  vfs_unmountall1(struct lwp *l, bool force, bool verbose)
  {
  	struct mount *mp;
 +	mount_iterator_t *iter;
  	bool any_error = false, progress = false;
  	uint64_t gen;
  	int error;
 @@ -1048,6 +1051,13 @@ vfs_unmountall1(struct lwp *l, bool forc
  	if (any_error && verbose) {
  		printf("WARNING: some file systems would not unmount\n");
  	}
 +	/* If the mountlist is empty it is time to remove swap. */
 +	mountlist_iterator_init(&iter);
 +	if (mountlist_iterator_next(iter) == NULL) {
 +		uvm_swap_shutdown(l);
 +	}
 +	mountlist_iterator_destroy(iter);
 +
  	return progress;
  }

 diff -r 816c662719ab -r 8e737f88a4e6 sys/uvm/uvm_swap.c
 --- sys/uvm/uvm_swap.c
 +++ sys/uvm/uvm_swap.c
 @@ -1152,27 +1152,23 @@ again:
  			if ((sdp->swd_flags & (SWF_INUSE|SWF_ENABLE)) == 0)
  				continue;
  #ifdef DEBUG
 -			printf("\nturning off swap on %s...",
 -			    sdp->swd_path);
 +			printf("\nturning off swap on %s...", sdp->swd_path);
  #endif
 +			/* Have to lock and reference vnode for swap_off(). */
  			if (vn_lock(vp = sdp->swd_vp, LK_EXCLUSIVE)) {
  				error = EBUSY;
 -				vp = NULL;
 -			} else
 -				error = 0;
 -			if (!error) {
 +			} else {
 +				vref(vp);
  				error = swap_off(l, sdp);
 +				vput(vp);
  				mutex_enter(&uvm_swap_data_lock);
  			}
  			if (error) {
  				printf("stopping swap on %s failed "
  				    "with error %d\n", sdp->swd_path, error);
 -				TAILQ_REMOVE(&spp->spi_swapdev, sdp,
 -				    swd_next);
 +				TAILQ_REMOVE(&spp->spi_swapdev, sdp, swd_next);
  				uvmexp.nswapdev--;
  				swaplist_trim();
 -				if (vp)
 -					vput(vp);
  			}
  			goto again;
  		}

 --Apple-Mail=_BAE26654-F34D-4CFA-B364-64074B45AFCA--

 --Apple-Mail=_812D5C87-9AEB-4C62-8362-49E678AE2E3B
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl8kQa4ACgkQKoaVJdNr
 +uFFDgf/WiqW+F0wFyHceFXqNXrYnn/Nh1jpACIIgbXPMS2B13Vp5Jiv594iy2mM
 z5yfQPFWY6qcZAGWrzYzKxJ2LmBN+74X5qTxUqH+MyOnbi38koGc4A/3GKXcngh+
 LnBKWm7Oq8CEYf5FsqidBJGOfr2/xd/++eRyP04bX2RfI/RMVc/Ekebt3IW688c8
 ghteEHtyDuzntQjLdO2zRvi288LiCLEz2XLWtDFHd56KYRb4eSHxhUorUXsrVW0d
 LRSAgI45Ntax0M0saMVRjzJC46Jp3v23Y2IfSjcfXsHGZx3psd3JAwEBp+lMHtYW
 5MbXoWr0Jt0/nqzhCqXKmboJpQBbZQ==
 =s+g5
 -----END PGP SIGNATURE-----

 --Apple-Mail=_812D5C87-9AEB-4C62-8362-49E678AE2E3B--

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org, "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, gson@gson.org (Andreas Gustafsson)
Subject: re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Sat, 01 Aug 2020 05:38:30 +1000

 nice work.  i like how it's fairly simple.

 you probably need to care about VMSWAP option.

 i'd like to see some tests performed where a system is rebooted
 when it is full of ram and swap -- i fear that this will generate
 useless IO that may hang.  also, an option to skip it, maybe just
 use RB_NOSYNC?

 thank!


 .mrg.

From: Jason Thorpe <thorpej@me.com>
To: matthew green <mrg@eterna.com.au>
Cc: gnats-bugs@netbsd.org,
 "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>,
 kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 Andreas Gustafsson <gson@gson.org>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 31 Jul 2020 15:35:00 -0700

 > On Jul 31, 2020, at 12:38 PM, matthew green <mrg@eterna.com.au> wrote:
 >=20
 > nice work.  i like how it's fairly simple.
 >=20
 > you probably need to care about VMSWAP option.
 >=20
 > i'd like to see some tests performed where a system is rebooted
 > when it is full of ram and swap -- i fear that this will generate
 > useless IO that may hang.  also, an option to skip it, maybe just
 > use RB_NOSYNC?

 I think there should be an option to swap_off() to just toss the data, =
 rather than page it all back in.  That would solve the problem you're =
 concerned about.

 I was also thinking that disabling swap BEFORE unmounting all of the =
 file systems would be a good idea, because there might be file-backed =
 swap.

 -- thorpej

From: Paul Goyette <paul@whooppee.com>
To: Jason Thorpe <thorpej@me.com>
Cc: matthew green <mrg@eterna.com.au>, gnats-bugs@netbsd.org, 
    "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>, kern-bug-people@netbsd.org, 
    gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
    Andreas Gustafsson <gson@gson.org>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Fri, 31 Jul 2020 15:43:45 -0700 (PDT)

 On Fri, 31 Jul 2020, Jason Thorpe wrote:

 <snip>

 > I was also thinking that disabling swap BEFORE unmounting all of the
 > file systems would be a good idea, because there might be file-backed
 > swap.

 Or two phases of disabling swap, one for file-backed and one for dev-
 backed.



 +--------------------+--------------------------+-----------------------+
 | Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
 | (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com     |
 | Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org   |
 +--------------------+--------------------------+-----------------------+

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Sun, 2 Aug 2020 10:22:15 +0200

 --Apple-Mail=_5B077801-13CC-4712-AB89-29D604245885
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 On 31. Jul 2020, at 21:38, matthew green <mrg@eterna.com.au> wrote:

 > nice work.  i like how it's fairly simple.
 > 
 > you probably need to care about VMSWAP option.

 Already present, without VMSWAP sys/uvm/uvm_swapstub.c gets
 built and it has an empty uvm_swap_shutdown().

 > i'd like to see some tests performed where a system is rebooted
 > when it is full of ram and swap -- i fear that this will generate
 > useless IO that may hang.

 When swap gets removed all user processes beside the one running
 halt or reboot are gone and all file systems are unmounted.

 At this time there should be no data backed by swap.

 > also, an option to skip it, maybe just
 > use RB_NOSYNC?

 With RB_NOSYNC it is already skipped, vfs_unmountall1() will
 not run in this case.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig

 --Apple-Mail=_5B077801-13CC-4712-AB89-29D604245885
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl8md7cACgkQKoaVJdNr
 +uGzQQf9FoU/jebNWP3ZkxuufV0JX+WYEQDsVNRgAKOLmTyBddcknptswx+WaxmV
 89jXp9Q9rNAmiNv15y2dR5A8o+19GA0kkJzlOZmaa8LGdabo9spd3IVhLEup7io2
 Hb1xAau6xjLQod+LaB0zGZwjddbfVsc5Iv19EkhUUSNI+fMoerZkNY8egjKcan6Z
 w5pBBFXR2eqNVyaR4iqCOnFKkrJPNIsmKiaE7+yvKxt3Z/Zn8vE8IbHqTw9exDhg
 gmVeQSX7QeWRJnmyyh2Jf87gRNhzs3AkW4wEtvep5wRD2GPCzala7+AqfI7Rf+/f
 RRa9wiVWqt5ZwZql6V9yuUKpwO68pg==
 =a0if
 -----END PGP SIGNATURE-----

 --Apple-Mail=_5B077801-13CC-4712-AB89-29D604245885--

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Sun, 2 Aug 2020 10:27:12 +0200

 --Apple-Mail=_A1A3C738-4FF9-4FF8-B5AC-2AC3EA63D435
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain;
 	charset=us-ascii

 > On 1. Aug 2020, at 00:35, Jason Thorpe <thorpej@me.com> wrote:
 >=20
 > I was also thinking that disabling swap BEFORE unmounting all of the =
 file systems would be a good idea, because there might be file-backed =
 swap.

 This brings back the problems the initial thread was about and
 the goal is not to cleanly shutdown swap but to close devices
 used for swap so they receive a final cache sync.

 It may be impossible to keep all data from tmpfs file systems in RAM ...

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig

 --Apple-Mail=_A1A3C738-4FF9-4FF8-B5AC-2AC3EA63D435
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl8meOAACgkQKoaVJdNr
 +uFLvgf/Zi7IIMh+zHTt4ASI9Tuh1CNevXiFVeMshEJirtQ4roMOiFND4+8XhvIv
 YtEKfHD0mLS57YmFO2NYD3IGil8TbLUlxilDB5OkPPm29LUK7yWS0CRlpLcBaUZh
 rU68TeK1RplvTTFyg9pZcsNVKQWtupvnedHDv/BB1cCyp/v0fCC8x9GrDP8rXJq0
 GR5tr8V+4ygaCMxK6jSipPIDFxNULdI/J+uOZD8rHf1K9gzw0zjcc7jlI0fjLQSH
 NHrQnwoZi6nojF01OH5/Icptv/bfVMjlgMxYgWc6ucchRyRHm7XNlPljZ8TwgWLe
 4Fe6CGu7CE01msLd1FqS4TMjxZxitA==
 =Y8UP
 -----END PGP SIGNATURE-----

 --Apple-Mail=_A1A3C738-4FF9-4FF8-B5AC-2AC3EA63D435--

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54969 CVS commit: src/sys
Date: Tue, 16 Feb 2021 09:56:32 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Tue Feb 16 09:56:32 UTC 2021

 Modified Files:
 	src/sys/kern: vfs_mount.c
 	src/sys/uvm: uvm_swap.c

 Log Message:
 Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
 locked and referenced across the call to swap_off() and finally
 use it from vfs_unmountall1() to remove swap after unmounting
 the last file system.

 Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


 To generate a diff of this commit:
 cvs rdiff -u -r1.85 -r1.86 src/sys/kern/vfs_mount.c
 cvs rdiff -u -r1.200 -r1.201 src/sys/uvm/uvm_swap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54969 CVS commit: src/sys/uvm
Date: Fri, 19 Feb 2021 13:20:44 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Fri Feb 19 13:20:44 UTC 2021

 Modified Files:
 	src/sys/uvm: uvm_swap.c

 Log Message:
 When turning off swap during reboot we have to lock with LK_RETRY
 as regular files got reclaimed during unmount.

 Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


 To generate a diff of this commit:
 cvs rdiff -u -r1.201 -r1.202 src/sys/uvm/uvm_swap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Greg A. Woods" <woods@planix.ca>
To: NetBSD-current Users's Discussion List <current-users@netbsd.org>,
    NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: NetBSD Users's Discussion List <netbsd-users@netbsd.org>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Thu, 25 Mar 2021 11:14:48 -0700

 --pgp-sign-Multipart_Thu_Mar_25_11:14:44_2021-1
 Content-Type: text/plain; charset=US-ASCII

 So, the reason I jumped from what I thought was a relatively stable
 point in the main -current branch to a more recent version was primarily
 because of what I believe is a problem related to PR# 54969.

 I had noticed long fscks on large filesystems following normal clean
 reboots and got investigating.

 Maybe what remains an issue here is just related to dm(4) partitions, as
 only my /dev/mapper partition(s) have had problems recently.

 Unfortunately though this is still happening with 9.99.81 (2021-03-10).
 (and both with GENERIC and XEN3_DOM0)

 In any case I would say this is the single most critical, serious, and
 important, issue in current (and netbsd-9)!  It totally kills system
 reliability (though maybe only if one is using LVM).

 Just for evidence, I added a bunch more printfs to the kernel and rc.d
 scripts (and '-v' flags to fsck, mount, etc.) to help me see for myself
 better what exactly is going on.  This is the console after a truly
 normal complete safe reboot using shutdown(8).

 In this example all processes but the shutdown scripts should be dead.
 The NFS mount on /more/work probably won't complete because I probably
 started shutdown(8) without first doing "cd /" (and without doing "exec
 shutdown"), and my CWD was probably on that NFS mount.  This could maybe
 be fixed by having reboot/halt/powerdown kill its parent process first,
 and perhaps also doing chdir("/") too.

 There's no excuse I can find for /build not unmounting though, and
 definitely no excuse for '/' not umounting either, though it later '/'
 is forcefully unmounted, and on reboot '/' appears to be clean.  However
 the forceful unmount of /build doesn't work, and it is NOT clean.

 Note also that /build will sometimes unmount quickly and cleanly if it
 hasn't been dirtied since the last boot, but it seems even creating one
 file can leave it dirty on reboot.

 Maybe what remains an issue here is just related to dm(4) partitions?


 [Wed Mar 24 20:42:56 2021][ 715713.0781096] syncing disks... done
 [Wed Mar 24 20:42:56 2021][ 715713.2081201] unmounted more.local:/vcs from /more/vcs, type nfs
 [Wed Mar 24 20:42:56 2021][ 715713.2481211] unmount of /more/work (more.local:/work) failed with error 16
 [Wed Mar 24 20:42:56 2021][ 715713.2581208] unmounted more.local:/home from /more/home, type nfs
 [Wed Mar 24 20:42:56 2021][ 715713.2581208] unmounted more.local:/archive from /more/archive, type nfs
 [Wed Mar 24 20:42:57 2021][ 715714.0781691] unmount of /build (/dev/mapper/scratch-build) failed with error 16
 [Wed Mar 24 20:42:57 2021][ 715714.0781691] unmounted procfs from /proc, type procfs
 [Wed Mar 24 20:42:57 2021][ 715714.0781691] unmounted ptyfs from /dev/pts, type ptyfs
 [Wed Mar 24 20:42:57 2021][ 715714.0781691] unmounted kernfs from /kern, type kernfs
 [Wed Mar 24 20:42:58 2021][ 715714.6782049] unmounted /dev/dk3 from /usr/pkg, type ffs
 [Wed Mar 24 20:42:58 2021][ 715714.7282939] unmounted /dev/dk2 from /var, type ffs
 [Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of / (/dev/dk0) failed with error 16
 [Wed Mar 24 20:42:58 2021][ 715714.8282434] WARNING: some file systems would not unmount
 [Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of /more/work (more.local:/work) failed with error 16
 [Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of /build (/dev/mapper/scratch-build) failed with error 16
 [Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of / (/dev/dk0) failed with error 16
 [Wed Mar 24 20:42:58 2021][ 715714.8282434] WARNING: some file systems would not unmount
 [Wed Mar 24 20:42:59 2021][ 715716.5383256] brgphy1: detached

 	[[ ... almost all the rest of devices detach ... ]]

 [Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /more/work (more.local:/work) failed with error 16
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /build (/dev/mapper/scratch-build) failed with error 16
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of / (/dev/dk0) failed with error 16
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] WARNING: some file systems would not unmount
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] sd1: detached
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /more/work (more.local:/work) failed with error 16
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /build (/dev/mapper/scratch-build) failed with error 16
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of / (/dev/dk0) failed with error 16
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] WARNING: some file systems would not unmount
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] forcefully unmounting more.local:/work from /more/work...
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] forcefully unmounted more.local:/work from /more/work, type nfs
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of /build (/dev/mapper/scratch-build) failed with error 16
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] unmount of / (/dev/dk0) failed with error 16
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] WARNING: some file systems would not unmount
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] forcefully unmounting /dev/mapper/scratch-build from /build...
 [Wed Mar 24 20:43:02 2021][ 715718.5284461] forcefully unmounted /dev/mapper/scratch-build from /build, type ffs
 [Wed Mar 24 20:43:02 2021][ 715718.5384534] unmount of / (/dev/dk0) failed with error 16
 [Wed Mar 24 20:43:02 2021][ 715718.5384534] WARNING: some file systems would not unmount
 [Wed Mar 24 20:43:02 2021][ 715718.5384534] forcefully unmounting /dev/dk0 from /...
 [Wed Mar 24 20:43:02 2021][ 715718.5384534] forcefully unmounted /dev/dk0 from /, type ffs
 [Wed Mar 24 20:43:02 2021][ 715718.5384534] unmounting done
 [Wed Mar 24 20:43:02 2021][ 715718.5384534] turning off swap... done
 [Wed Mar 24 20:43:02 2021][ 715718.5384534] dk0 at sd0 (/) deleted
 [Wed Mar 24 20:43:02 2021][ 715718.5384534] sd0: detached
 [Wed Mar 24 20:43:02 2021][ 715718.5384534] scsibus0: detached
 [Wed Mar 24 20:43:02 2021][ 715718.7184994] mfi0: detached
 [Wed Mar 24 20:43:02 2021][ 715718.7184994] pci8: detached
 [Wed Mar 24 20:43:02 2021][ 715718.7184994] ppb7: detached
 [Wed Mar 24 20:43:02 2021][ 715718.7184994] unmounting done
 [Wed Mar 24 20:43:02 2021][ 715718.7184994] turning off swap... done
 [Wed Mar 24 20:43:02 2021][ 715718.7184994] rebooting...

 	[[ ... why is "turning off swap" seen twice? .. ]]

 	[[ ... and then the reboot, until rc scripts say ... ]]

 [Wed Mar 24 20:44:51 2021]Starting root file system check:
 [Wed Mar 24 20:44:51 2021]/dev/rdk0: file system is clean; not checking
 [Wed Mar 24 20:44:51 2021]start / wait fsck_ffs -p /dev/rdk0


 [Wed Mar 24 20:44:52 2021]Starting file system checks:
 [Wed Mar 24 20:44:52 2021]/dev/rdk2: file system is clean; not checking
 [Wed Mar 24 20:44:52 2021]/dev/rdk3: file system is clean; not checking

 	[[ ... here I hit ^T on the console as it was taking too long ... ]]

 [Wed Mar 24 20:44:58 2021][  15.0201108] load: 0.08  cmd: sleep 345 [nanoslp] 0.00u 0.00s 0% 512k
 [Wed Mar 24 20:44:58 2021]/dev/mapper/rscratch-build: phase 1: cyl group 24 of 345 (6%)
 [Wed Mar 24 20:46:09 2021]/dev/mapper/rscratch-build: phase 1: cyl group 284 of 345 (82%)
 [Wed Mar 24 20:49:30 2021]/dev/mapper/rscratch-build: 1400986 files, 36172587 used, 28347707 free (17403 frags, 3541288 blocks, 0.0% fragmentation)
 [Wed Mar 24 20:49:30 2021]/dev/mapper/rscratch-build: MARKING FILE SYSTEM CLEAN
 [Wed Mar 24 20:49:30 2021]start /var nowait fsck_ffs -p /dev/rdk2
 [Wed Mar 24 20:49:30 2021]start /build nowait fsck_ffs -p /dev/mapper/rscratch-build
 [Wed Mar 24 20:49:30 2021]done ffs: /dev/rdk2 (/var) = 0x0
 [Wed Mar 24 20:49:30 2021]start /usr/pkg nowait fsck_ffs -p /dev/rdk3
 [Wed Mar 24 20:49:30 2021]done ffs: /dev/rdk3 (/usr/pkg) = 0x0
 [Wed Mar 24 20:49:30 2021]done ffs: /dev/mapper/rscratch-build (/build) = 0x0
 [Wed Mar 24 20:49:30 2021]Script /etc/rc.d/fsck running
 [Wed Mar 24 20:49:30 2021]Currently sourcing /etc/rc.d/fsck
 [Wed Mar 24 20:49:30 2021]exec: mount_ffs -o rw /dev/dk2 /var
 [Wed Mar 24 20:49:30 2021]exec: mount_ffs -o rw /dev/dk2 /var
 [Wed Mar 24 20:49:30 2021]/dev/dk2 on /var type ffs (local, fsid: 0xa802/0x78b, reads: sync 1 async 0, writes: sync 2 async 0)


 --
 					Greg A. Woods <gwoods@acm.org>

 Kelowna, BC     +1 250 762-7675           RoboHack <woods@robohack.ca>
 Planix, Inc. <woods@planix.com>     Avoncote Farms <woods@avoncote.ca>

 --pgp-sign-Multipart_Thu_Mar_25_11:14:44_2021-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit
 Content-Description: OpenPGP Digital Signature

 -----BEGIN PGP SIGNATURE-----

 iF0EABECAB0WIQRuK6dmwVAucmRxuh9mfXG3eL/0fwUCYFzTGAAKCRBmfXG3eL/0
 fxLoAKD6ybZOT8vzuIU0Ayww7xKcGDnAwgCdF5bpzfJz9Ux+eoSkIMrdNYOenQg=
 =woZJ
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Thu_Mar_25_11:14:44_2021-1--

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: "Greg A. Woods" <woods@planix.ca>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Sun, 2 May 2021 18:32:14 +0200

 --Apple-Mail=_74B2A6AC-7A14-4AD0-9BDC-83F105D678A2
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 > On 25. Mar 2021, at 19:14, Greg A. Woods <woods@planix.ca> wrote:
 > 
 > There's no excuse I can find for /build not unmounting though, and
 > definitely no excuse for '/' not umounting either, though it later '/'
 > is forcefully unmounted, and on reboot '/' appears to be clean.  However
 > the forceful unmount of /build doesn't work, and it is NOT clean.

 Could you please attach more information:
 - /etc/fstab
 - wedge config
 - lvm config

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig

 --Apple-Mail=_74B2A6AC-7A14-4AD0-9BDC-83F105D678A2
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAmCO1A8ACgkQKoaVJdNr
 +uEfbQgAt1lSJm1HUrKbIauN9DnpLb4R1NW0iazL81tLkIljPXIAjAY1ccLCbjO0
 M3hOp61atFBP5//nCZSz4lPvRuxh7PO551SucTN2R/AJjZteqfb+9SQE3MXE42Rs
 RMA5y7XiuslT6icYVadFdReXabGUcgjvXiB+0/FEkNWgJsgM5snmq3IqcsGjGay+
 BmYmTU9oWyVvcN5Z/gsDs53salNdSoDu6g5LOQDP6xx5DwA5WE54w2TDeNd3NTlw
 8rPTllmpbVW9SPFIzwr1es+GTohfMLJiMrwPA/eZW1t7AMwa9EjHQaWi+w4WMNeD
 GV9qxWoKESMRDul4MXUNTMnbwwAPTw==
 =jx/t
 -----END PGP SIGNATURE-----

 --Apple-Mail=_74B2A6AC-7A14-4AD0-9BDC-83F105D678A2--

From: "Greg A. Woods" <woods@planix.ca>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Mon, 03 May 2021 22:59:45 -0700

 --pgp-sign-Multipart_Mon_May__3_22:59:21_2021-1
 Content-Type: text/plain; charset=US-ASCII

 At Sun, 2 May 2021 18:32:14 +0200, "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de> wrote:
 Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
 >
 > [1  <text/plain; us-ascii (7bit)>]
 > > On 25. Mar 2021, at 19:14, Greg A. Woods <woods@planix.ca>
 > > wrote:
 > >  There's no excuse I can find for /build not unmounting though,
 > > and definitely no excuse for '/' not umounting either, though it
 > > later '/' is forcefully unmounted, and on reboot '/' appears to
 > > be clean.  However the forceful unmount of /build doesn't work,
 > > and it is NOT clean.
 >
 > Could you please attach more information:
 > - /etc/fstab wedge config lvm config

 #
 #	NetBSD /etc/fstab
 #
 NAME=/				/		ffs	rw,log		 1 1
 NAME=swap			none		swap	sw,dp		 0 0
 NAME=/var			/var		ffs	rw,log		 1 2
 NAME=/usr/pkg			/usr/pkg	ffs	rw,log		 1 2
 #
 tmpfs				/tmp		tmpfs	rw,-m=1777,-s=ram%25
 kernfs				/kern		kernfs	rw
 ptyfs				/dev/pts	ptyfs	rw
 procfs				/proc		procfs	rw
 tmpfs				/var/shm	tmpfs	rw,-m1777,-sram%25
 #
 /dev/cd0a			/cdrom		cd9660	ro,noauto
 #
 /dev/mapper/scratch-build	/build		ffs	rw,log		1 2
 #
 more.local:/archive		/more/archive	nfs	-b,-i,rw,nosuid,nodev
 more.local:/home		/more/home	nfs	-b,-i,rw,nosuid,nodev
 more.local:/work		/more/work	nfs	-b,-i,rw,nosuid,nodev
 more.local:/vcs			/more/vcs	nfs	-b,-i,rw,nosuid,nodev



 # dkctl /dev/rsd0 listwedges
 /dev/rsd0: 5 wedges:
 dk0: /, 62914560 blocks at 2048, type: ffs
 dk1: swap, 100663296 blocks at 62916608, type: swap
 dk2: /var, 8388608 blocks at 176166912, type: ffs
 dk3: /usr/pkg, 104857600 blocks at 184557568, type: ffs
 dk4: LVM-vg0, 686282719 blocks at 289417216, type: ffs


 # dkctl /dev/rsd1 listwedges
 /dev/rsd1: 1 wedge:
 dk5: scratchdisk0, 1141897216 blocks at 1024, type: ffs


 22:53 [1.1770] # lvm pvdisplay
   --- Physical volume ---
   PV Name               /dev/rdk5
   VG Name               scratch
   PV Size               544.50 GiB / not usable 3.00 MiB
   Allocatable           yes
   PE Size               4.00 MiB
   Total PE              139391
   Free PE               60031
   Allocated PE          79360
   PV UUID               xxf5PJ-HL26-PbyQ-rHic-3ptB-t56f-S5NqXD

   --- Physical volume ---
   PV Name               /dev/rdk4
   VG Name               vg0
   PV Size               327.25 GiB / not usable 2.98 MiB
   Allocatable           yes
   PE Size               4.00 MiB
   Total PE              83774
   Free PE               63805
   Allocated PE          19969
   PV UUID               kcMuoP-L3c0-EdZp-lOjP-OFyA-xc6d-NpQsez




 22:53 [1.1771] # lvm vgdisplay
   --- Volume group ---
   VG Name               scratch
   System ID
   Format                lvm2
   Metadata Areas        1
   Metadata Sequence No  4
   VG Access             read/write
   VG Status             resizable
   MAX LV                0
   Cur LV                3
   Open LV               0
   Max PV                0
   Cur PV                1
   Act PV                1
   VG Size               544.50 GiB
   PE Size               4.00 MiB
   Total PE              139391
   Alloc PE / Size       79360 / 310.00 GiB
   Free  PE / Size       60031 / 234.50 GiB
   VG UUID               jEdo7q-pkhv-dG83-C10z-FysR-UBNy-cr0Fzc

   --- Volume group ---
   VG Name               vg0
   System ID
   Format                lvm2
   Metadata Areas        1
   Metadata Sequence No  6
   VG Access             read/write
   VG Status             resizable
   MAX LV                0
   Cur LV                5
   Open LV               0
   Max PV                0
   Cur PV                1
   Act PV                1
   VG Size               327.24 GiB
   PE Size               4.00 MiB
   Total PE              83774
   Alloc PE / Size       19969 / 78.00 GiB
   Free  PE / Size       63805 / 249.24 GiB
   VG UUID               cM3M8T-x0nZ-uUJw-avMT-zgO9-Tbux-2LSDOa




 22:53 [1.1772] # lvm lvdisplay
   --- Logical volume ---
   LV Name                /dev/scratch/build
   VG Name                scratch
   LV UUID                myecD7-LUdo-x2m0-8jdg-gHHN-uQ0Y-lc76fS
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                250.00 GiB
   Current LE             64000
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     0
   Block device           169:1

   --- Logical volume ---
   LV Name                /dev/scratch/fbsd-test.0
   VG Name                scratch
   LV UUID                hZbvoM-maqc-dur3-TFoT-2p76-Ru0p-BW8qO2
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                30.00 GiB
   Current LE             7680
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     0
   Block device           169:2

   --- Logical volume ---
   LV Name                /dev/scratch/fbsd-test.1
   VG Name                scratch
   LV UUID                mx3cD7-fcJi-S8AF-w91y-c6gH-U4rW-zdRRzk
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                30.00 GiB
   Current LE             7680
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     0
   Block device           169:3

   --- Logical volume ---
   LV Name                /dev/vg0/nbtest.root
   VG Name                vg0
   LV UUID                DpA5NL-8jdj-M8tM-0qB9-1GFM-eokA-u2tjv9
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                30.00 GiB
   Current LE             7680
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     0
   Block device           169:4

   --- Logical volume ---
   LV Name                /dev/vg0/nbtest.swap
   VG Name                vg0
   LV UUID                XD9t0F-Cdca-EaN3-5NNi-lb6M-qwnK-syilBU
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                8.00 GiB
   Current LE             2048
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     0
   Block device           169:5

   --- Logical volume ---
   LV Name                /dev/vg0/nbtest.var
   VG Name                vg0
   LV UUID                qybtJ1-Rt90-e2sa-XXiL-avZh-A82D-xglF25
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                10.00 GiB
   Current LE             2560
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     0
   Block device           169:6

   --- Logical volume ---
   LV Name                /dev/vg0/nbtest.pkg
   VG Name                vg0
   LV UUID                5WrGfp-4042-pmH9-Hc3K-JgJQ-l7mE-zclcNL
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                30.00 GiB
   Current LE             7680
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     0
   Block device           169:7

   --- Logical volume ---
   LV Name                /dev/vg0/tinytest
   VG Name                vg0
   LV UUID                4BjVA5-rBIG-8dhJ-oMyZ-uj7g-ouNJ-G2Jolb
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                4.00 MiB
   Current LE             1
   Segments               1
   Allocation             inherit
   Read ahead sectors     auto
   - currently set to     0
   Block device           169:8


 --
 					Greg A. Woods <gwoods@acm.org>

 Kelowna, BC     +1 250 762-7675           RoboHack <woods@robohack.ca>
 Planix, Inc. <woods@planix.com>     Avoncote Farms <woods@avoncote.ca>

 --pgp-sign-Multipart_Mon_May__3_22:59:21_2021-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit
 Content-Description: OpenPGP Digital Signature

 -----BEGIN PGP SIGNATURE-----

 iF0EABECAB0WIQRuK6dmwVAucmRxuh9mfXG3eL/0fwUCYJDivwAKCRBmfXG3eL/0
 fzcAAKCsOMLzjI4vLegzg4GOhTgG7obWwQCdG1bB0CW2HEU3+RD3Prd9qeqKbnw=
 =I98G
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Mon_May__3_22:59:21_2021-1--

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: "Greg A. Woods" <woods@planix.ca>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Tue, 4 May 2021 11:57:05 +0200

 --Apple-Mail=_39DD0DB7-6F29-4B35-94E0-A802F5308099
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain;
 	charset=us-ascii

 > On 25. Mar 2021, at 19:15, Greg A. Woods <woods@planix.ca> wrote:
 <snip[>
 > There's no excuse I can find for /build not unmounting though, and
 > definitely no excuse for '/' not umounting either, though it later '/'
 > is forcefully unmounted, and on reboot '/' appears to be clean.  =
 However
 > the forceful unmount of /build doesn't work, and it is NOT clean.
 >=20
 > Note also that /build will sometimes unmount quickly and cleanly if it
 > hasn't been dirtied since the last boot, but it seems even creating =
 one
 > file can leave it dirty on reboot.
 >=20
 > Maybe what remains an issue here is just related to dm(4) partitions?
 <snip>
 > [Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of /build =
 (/dev/mapper/scratch-build) failed with error 16
 > [Wed Mar 24 20:42:58 2021][ 715714.8282434] unmount of / (/dev/dk0) =
 failed with error 16
 > [Wed Mar 24 20:42:58 2021][ 715714.8282434] WARNING: some file systems =
 would not unmount
 > [Wed Mar 24 20:42:59 2021][ 715716.5383256] brgphy1: detached
 >=20
 > 	[[ ... almost all the rest of devices detach ... ]]

 I'm quite sure one of them is "dm0" -- dm(4) is no longer backed with
 physical disks but /build is still mounted so from here on even forced
 unmounts fail.

 This problem occurs on dm(4) devices only.

 Looking through sys/dev/dm/device-mapper.c it becomes clear that
 dmopen() / dmclose() don't count opens and therefore dm_detach() will
 unconditionally unconfigure dm(4).

 As dm_detach() gets called during shutdown dm(4) unconfigures too early.


 Fix is to count device opens and prevent dm_detach() to succeed as long
 as devices are open.  Once succeeding during shutdown it should
 dm_destroy() on last dm_detach().


 Short term hack is to remove DVF_DETACH_SHUTDOWN from device_mapper.c
 so dm_detach() doesn't run on shutdown:

  CFATTACH_DECL3_NEW(dm, 0,
       dm_match, dm_attach, dm_detach, NULL, NULL, NULL,
 -     DVF_DETACH_SHUTDOWN);
 +     0 /* DVF_DETACH_SHUTDOWN */);

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

 --Apple-Mail=_39DD0DB7-6F29-4B35-94E0-A802F5308099
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAmCRGnEACgkQKoaVJdNr
 +uGdhQf9G5TZ2vOVej7RK63LlRSn/8CPo1uY7U1mAqwkhY1N5/M6z8qLqSEUbZFL
 tUIWvtNhhNqshSzkGzaQd767UxpajZv7PK4oBUsM+UWrRlDcAZbQ5PxF3Eme5wnu
 Z34xrp5/D/L8JLH0XM1uKN8SuBEbA3jCyOsD56ud8+vUs+xheSMARTZM+/LGFdCC
 hz7hsEgtIzwGgVAYDtCEcBB7+b+WKInOJ/+TigCaPR322Izo8kD1OmMgPIQUB5Sz
 DQoRjYSph33A4CdDfz2Rd/yWZ+9It+O5/pY9XYJhkLKZsf00a++qYg3ssjkm5Wye
 4q2mV44ljoZ4ye9zBiPsXsjPfJzJag==
 =GkNU
 -----END PGP SIGNATURE-----

 --Apple-Mail=_39DD0DB7-6F29-4B35-94E0-A802F5308099--

From: "Greg A. Woods" <woods@planix.ca>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Tue, 04 May 2021 11:07:42 -0700

 --pgp-sign-Multipart_Tue_May__4_11:07:26_2021-1
 Content-Type: text/plain; charset=US-ASCII

 At Tue, 4 May 2021 11:57:05 +0200, "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de> wrote:
 Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
 >
 > I'm quite sure one of them is "dm0" -- dm(4) is no longer backed
 > with physical disks but /build is still mounted so from here on
 > even forced unmounts fail.
 >
 > This problem occurs on dm(4) devices only.
 >
 > Looking through sys/dev/dm/device-mapper.c it becomes clear that
 > dmopen() / dmclose() don't count opens and therefore dm_detach()
 > will unconditionally unconfigure dm(4).
 >
 > As dm_detach() gets called during shutdown dm(4) unconfigures too
 > early.

 Excellent catch!

 > Fix is to count device opens and prevent dm_detach() to succeed as
 > long as devices are open.  Once succeeding during shutdown it
 > should dm_destroy() on last dm_detach().
 >
 >
 > Short term hack is to remove DVF_DETACH_SHUTDOWN from
 > device_mapper.c so dm_detach() doesn't run on shutdown:
 >
 >  CFATTACH_DECL3_NEW(dm, 0, dm_match, dm_attach, dm_detach, NULL,
 >       NULL, NULL,
 > - DVF_DETACH_SHUTDOWN);
 > + 0 /* DVF_DETACH_SHUTDOWN */);
 >

 I'll give that a try, and I expect it to work -- it looks like this is
 indeed the problem!

 --
 					Greg A. Woods <gwoods@acm.org>

 Kelowna, BC     +1 250 762-7675           RoboHack <woods@robohack.ca>
 Planix, Inc. <woods@planix.com>     Avoncote Farms <woods@avoncote.ca>

 --pgp-sign-Multipart_Tue_May__4_11:07:26_2021-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit
 Content-Description: OpenPGP Digital Signature

 -----BEGIN PGP SIGNATURE-----

 iF0EABECAB0WIQRuK6dmwVAucmRxuh9mfXG3eL/0fwUCYJGNYgAKCRBmfXG3eL/0
 f1+iAJ9J/9iTyVnj7ROHe6JN4DeDUMqZzwCdGo4HOCFb+GZk3ai9joRzN8nv5yI=
 =b5AY
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Tue_May__4_11:07:26_2021-1--

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: NetBSD GNATS <gnats-bugs@netbsd.org>
Cc: "Greg A. Woods" <woods@planix.ca>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Wed, 5 May 2021 15:51:08 +0200

 --Apple-Mail=_02C4883B-991C-4B77-BB72-7F959434AA00
 Content-Type: multipart/mixed;
 	boundary="Apple-Mail=_129BAF62-8E27-4159-8DB4-38C1B290CAA3"


 --Apple-Mail=_129BAF62-8E27-4159-8DB4-38C1B290CAA3
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 The attached diffs should fix the problem with device-mapper
 devices getting detached too early.

 Please report if this fix really works.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig

 --Apple-Mail=_129BAF62-8E27-4159-8DB4-38C1B290CAA3
 Content-Disposition: attachment;
 	filename=002_dm_unit.diff
 Content-Type: application/octet-stream;
 	x-unix-mode=0644;
 	name="002_dm_unit.diff"
 Content-Transfer-Encoding: 7bit

 dm_unit

 Make sure the unit number of device-mapper devices matches our minor number.

 diff -r 1e6b25a1c949 -r 497cf11564ea sys/dev/dm/dm_ioctl.c
 --- sys/dev/dm/dm_ioctl.c
 +++ sys/dev/dm/dm_ioctl.c
 @@ -92,18 +92,11 @@

  #include "netbsd-dm.h"
  #include "dm.h"
 +#include "ioconf.h"

  static uint32_t sc_minor_num;
  uint32_t dm_dev_counter;

 -/* Generic cf_data for device-mapper driver */
 -static struct cfdata dm_cfdata = {
 -	.cf_name = "dm",
 -	.cf_atname = "dm",
 -	.cf_fstate = FSTATE_STAR,
 -	.cf_unit = 0
 -};
 -
  #define DM_REMOVE_FLAG(flag, name) do {					\
  	prop_dictionary_get_uint32(dm_dict,DM_IOCTL_FLAGS,&flag);	\
  	flag &= ~name;							\
 @@ -196,6 +189,7 @@ dm_dev_create_ioctl(prop_dictionary_t dm
  	int r;
  	uint32_t flags;
  	device_t devt;
 +	cfdata_t cf;

  	flags = 0;
  	name = NULL;
 @@ -214,7 +208,13 @@ dm_dev_create_ioctl(prop_dictionary_t dm
  		dm_dev_unbusy(dmv);
  		return EEXIST;
  	}
 -	if ((devt = config_attach_pseudo(&dm_cfdata)) == NULL) {
 +	cf = kmem_alloc(sizeof(*cf), KM_SLEEP);
 +	cf->cf_name = dm_cd.cd_name;
 +	cf->cf_atname = dm_cd.cd_name;
 +	cf->cf_unit = (uint64_t)atomic_inc_32_nv(&sc_minor_num);
 +	cf->cf_fstate = FSTATE_NOTFOUND;
 +	if ((devt = config_attach_pseudo(cf)) == NULL) {
 +		kmem_free(cf, sizeof(*cf));
  		aprint_error("Unable to attach pseudo device dm/%s\n", name);
  		return (ENOMEM);
  	}
 @@ -229,7 +229,7 @@ dm_dev_create_ioctl(prop_dictionary_t dm
  	if (name)
  		strlcpy(dmv->name, name, DM_NAME_LEN);

 -	dmv->minor = (uint64_t)atomic_inc_32_nv(&sc_minor_num);
 +	dmv->minor = cf->cf_unit;
  	dmv->flags = 0;		/* device flags are set when needed */
  	dmv->ref_cnt = 0;
  	dmv->event_nr = 0;
 @@ -365,6 +365,8 @@ dm_dev_rename_ioctl(prop_dictionary_t dm
  int
  dm_dev_remove_ioctl(prop_dictionary_t dm_dict)
  {
 +	int error;
 +	cfdata_t cf;
  	dm_dev_t *dmv;
  	const char *name, *uuid;
  	uint32_t flags, minor;
 @@ -398,7 +400,11 @@ dm_dev_remove_ioctl(prop_dictionary_t dm
  	 * This will call dm_detach routine which will actually removes
  	 * device.
  	 */
 -	return config_detach(devt, DETACH_QUIET);
 +	cf = device_cfdata(devt);
 +	error = config_detach(devt, DETACH_QUIET);
 +	if (error == 0)
 +		kmem_free(cf, sizeof(*cf));
 +	return error;
  }

  /*

 --Apple-Mail=_129BAF62-8E27-4159-8DB4-38C1B290CAA3
 Content-Disposition: attachment;
 	filename=003_dm_opencount.diff
 Content-Type: application/octet-stream;
 	x-unix-mode=0644;
 	name="003_dm_opencount.diff"
 Content-Transfer-Encoding: 7bit

 dm_opencount

 Track the number of cdev and bdev opens and fail dm_detach()
 on open devices unless detach is forced.

 PR kern/54969 (Disk cache is no longer flushed on shutdown)

 diff -r 497cf11564ea -r 93e32176d61e sys/dev/dm/device-mapper.c
 --- sys/dev/dm/device-mapper.c
 +++ sys/dev/dm/device-mapper.c
 @@ -260,8 +260,17 @@ dm_attach(device_t parent, device_t self
  static int
  dm_detach(device_t self, int flags)
  {
 +	bool busy;
  	dm_dev_t *dmv;

 +	dmv = dm_dev_lookup(NULL, NULL, device_unit(self));
 +	mutex_enter(&dmv->diskp->dk_openlock);
 +	busy = (dmv->diskp->dk_openmask != 0 && (flags & DETACH_FORCE) == 0);
 +	mutex_exit(&dmv->diskp->dk_openlock);
 +	dm_dev_unbusy(dmv);
 +	if (busy)
 +		return EBUSY;
 +
  	pmf_device_deregister(self);

  	/* Detach device from global device list */
 @@ -334,6 +343,25 @@ dmdestroy(void)
  static int
  dmopen(dev_t dev, int flags, int mode, struct lwp *l)
  {
 +	dm_dev_t *dmv;
 +	struct disk *dk;
 +
 +	dmv = dm_dev_lookup(NULL, NULL, minor(dev));
 +	if (dmv) {
 +		dk = dmv->diskp;
 +		mutex_enter(&dk->dk_openlock);
 +		switch (mode) {
 +		case S_IFCHR:
 +			dk->dk_copenmask |= 1;
 +			break;
 +		case S_IFBLK:
 +			dk->dk_bopenmask |= 1;
 +			break;
 +		}
 +		dk->dk_openmask = dk->dk_copenmask | dk->dk_bopenmask;
 +		mutex_exit(&dk->dk_openlock);
 +		dm_dev_unbusy(dmv);
 +	}

  	aprint_debug("dm open routine called %" PRIu32 "\n", minor(dev));
  	return 0;
 @@ -342,8 +370,27 @@ dmopen(dev_t dev, int flags, int mode, s
  static int
  dmclose(dev_t dev, int flags, int mode, struct lwp *l)
  {
 +	dm_dev_t *dmv;
 +	struct disk *dk;

  	aprint_debug("dm close routine called %" PRIu32 "\n", minor(dev));
 +
 +	dmv = dm_dev_lookup(NULL, NULL, minor(dev));
 +	if (dmv) {
 +		dk = dmv->diskp;
 +		mutex_enter(&dk->dk_openlock);
 +		switch (mode) {
 +		case S_IFCHR:
 +			dk->dk_copenmask &= ~1;
 +			break;
 +		case S_IFBLK:
 +			dk->dk_bopenmask &= ~1;
 +			break;
 +		}
 +		dk->dk_openmask = dk->dk_copenmask | dk->dk_bopenmask;
 +		mutex_exit(&dk->dk_openlock);
 +		dm_dev_unbusy(dmv);
 +	}
  	return 0;
  }


 --Apple-Mail=_129BAF62-8E27-4159-8DB4-38C1B290CAA3--

 --Apple-Mail=_02C4883B-991C-4B77-BB72-7F959434AA00
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAmCSoswACgkQKoaVJdNr
 +uEFzQgAgXaa6+Vi1Fs+aPTZpA2XaO6tNCeqOxkUcsLbxD8m2zTiUJ02FkoMaXZL
 1F4kM9cNJl9jFzy0L2GJCoJ6LB8XCyhu2yIOHhVWuCiN0rfAfEW25JnEcdcQAPas
 zvBEfo3p6a3pcWaZzj+suvN6fle7zAET4IUSVEnF0O5Xe1yS71SA/eSAZOqv+C9D
 1CFaWOt8E/CEEA/3s0D6X3g9euFM4kOFxq0SPLB3oZ6OQ6hmLDLtmkbH1Q8RjGh1
 rA0MiyLC1BfmsaiPnLJKSyYzEL3gEDVbc0pQCcxcfRW+aqxr8YybhmbaLYdvfiZe
 KtshrvD39mSAfIl+tx3hrkVlDqgmtw==
 =4lQO
 -----END PGP SIGNATURE-----

 --Apple-Mail=_02C4883B-991C-4B77-BB72-7F959434AA00--

From: "Greg A. Woods" <woods@planix.ca>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Thu, 06 May 2021 14:38:11 -0700

 --pgp-sign-Multipart_Thu_May__6_14:37:55_2021-1
 Content-Type: text/plain; charset=US-ASCII

 At Wed, 5 May 2021 15:51:08 +0200, "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de> wrote:
 Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
 >
 > The attached diffs should fix the problem with device-mapper
 > devices getting detached too early.

 Thank you very much!

 > Please report if this fix really works.

 I would say that they do work, though I wasn't brave enough to leave the
 filesystem really dirty -- I unmounted it manually, re-mounted it, then
 copied one new file to it before rebooting.  Perhaps I should have left
 my console login shell sitting with its CWD in that filesystem just to
 try to delay the umount from working further.

 Here are the verbose shutdown messages from the kernel showing that
 /build unmounted right away without complaint, and the dm(4) devices
 detach much later:

 [ 1295581.6679937] unmounted /dev/mapper/scratch-build from /build, type ffs
 [ 1295581.6679937] unmounted more.local:/vcs from /more/vcs, type nfs
 [ 1295581.6679937] unmounted more.local:/work from /more/work, type nfs
 [ 1295581.6779931] unmounted more.local:/home from /more/home, type nfs
 [ 1295581.6779931] unmounted more.local:/archive from /more/archive, type nfs
 [ 1295581.6779931] unmounted procfs from /proc, type procfs
 [ 1295581.6779931] unmounted ptyfs from /dev/pts, type ptyfs
 [ 1295581.6779931] unmounted kernfs from /kern, type kernfs
 [ 1295581.9880145] unmounted /dev/dk3 from /usr/pkg, type ffs
 [ 1295582.0080257] unmounted /dev/dk2 from /var, type ffs
 [ 1295582.1180252] unmount of / (/dev/dk0) failed with error 16
 [ 1295582.1180252] WARNING: some file systems would not unmount
 [ 1295582.1180252] unmount of / (/dev/dk0) failed with error 16
 [ 1295582.1180252] WARNING: some file systems would not unmount
 [ 1295583.8281335] brgphy1: detached
 [ 1295583.8481286] bnx1: detached
 [ 1295585.5582807] brgphy0: detached
 [ 1295585.5782645] bnx0: detached
 [ 1295585.5782645] pci6: detached
 [ 1295585.5782645] pci4: detached
 [ 1295585.5782645] sd2: detached
 [ 1295585.5782645] cd1: detached
 [ 1295585.5782645] ppb5: detached
 [ 1295585.5782645] ppb3: detached
 [ 1295585.5782645] scsibus2: detached
 [ 1295585.5782645] scsibus1: detached
 [ 1295585.5782645] pci5: detached
 [ 1295585.5782645] pci3: detached
 [ 1295585.7582583] brgphy2: detached
 [ 1295585.7882989] bnx2: detached
 [ 1295585.7882989] ppb4: detached
 [ 1295585.7882989] ppb2: detached
 [ 1295585.7882989] uhub6: detached
 [ 1295585.7882989] cd0: detached
 [ 1295585.7882989] pci14: detached
 [ 1295585.7882989] pci7: detached
 [ 1295585.7882989] pci2: detached
 [ 1295585.7882989] atapibus0: detached
 [ 1295585.7882989] uhub4: detached
 [ 1295585.7882989] uhub2: detached
 [ 1295585.7882989] uhub1: detached
 [ 1295585.7882989] uhub0: detached
 [ 1295585.7882989] com1: detached
 [ 1295585.7882989] ppb13: detached
 [ 1295585.7982582] ppb6: detached
 [ 1295585.7982582] ppb1: detached
 [ 1295585.7982582] atabus0: detached
 [ 1295585.7982582] usb3: detached
 [ 1295585.7982582] usb2: detached
 [ 1295585.7982582] usb1: detached
 [ 1295585.7982582] usb0: detached
 [ 1295585.7982582] pci13: detached
 [ 1295585.7982582] pci12: detached
 [ 1295585.7982582] pci11: detached
 [ 1295585.7982582] pci10: detached
 [ 1295585.7982582] pci9: detached
 [ 1295585.7982582] pci1: detached
 [ 1295585.7982582] uhci3: detached
 [ 1295585.7982582] uhci2: detached
 [ 1295585.7982582] uhci1: detached
 [ 1295585.7982582] uhci0: detached
 [ 1295585.7982582] ppb12: detached
 [ 1295585.7982582] pchb7: detached
 [ 1295585.7982582] pchb6: detached
 [ 1295585.7982582] pchb5: detached
 [ 1295585.7982582] pchb4: detached
 [ 1295585.7982582] pchb3: detached
 [ 1295585.7982582] pchb2: detached
 [ 1295585.7982582] pchb1: detached
 [ 1295585.7982582] ppb11: detached
 [ 1295585.7982582] ppb10: detached
 [ 1295585.7982582] ppb9: detached
 [ 1295585.7982582] ppb8: detached
 [ 1295585.7982582] ppb0: detached
 [ 1295585.7982582] pchb0: detached
 [ 1295585.7982582] ipmi_acpi0: detached
 [ 1295585.7982582] dm7: detached
 [ 1295585.7982582] dm6: detached
 [ 1295585.7982582] dm5: detached
 [ 1295585.7982582] dm4: detached
 [ 1295585.7982582] dm3: detached
 [ 1295585.7982582] dm2: detached
 [ 1295585.7982582] dm1: detached
 [ 1295585.7982582] dm0: detached
 [ 1295585.7982582] cgd3: detached
 [ 1295585.7982582] vnd3: detached
 [ 1295585.7982582] cgd2: detached
 [ 1295585.7982582] vnd2: detached
 [ 1295585.7982582] cgd1: detached
 [ 1295585.7982582] vnd1: detached
 [ 1295585.7982582] cgd0: detached
 [ 1295585.7982582] vnd0: detached
 [ 1295585.7982582] dk5 at sd1 (scratchdisk0) deleted
 [ 1295585.7982582] dk5: detached
 [ 1295585.7982582] dk4 at sd0 (LVM-vg0) deleted
 [ 1295585.7982582] dk4: detached
 [ 1295585.7982582] dk3 at sd0 (/usr/pkg) deleted
 [ 1295585.7982582] dk3: detached




 For the record here's a full log of a failed umount prior to applying
 any patches:

 [Mon Apr  5 11:35:12 2021][ 243874.4095581] syncing disks... done
 [Mon Apr  5 11:35:12 2021][ 243874.5395716] unmounted more.local:/vcs from /more/vcs, type nfs
 [Mon Apr  5 11:35:12 2021][ 243874.5395716] unmounted more.local:/work from /more/work, type nfs
 [Mon Apr  5 11:35:12 2021][ 243874.5495735] unmounted more.local:/home from /more/home, type nfs
 [Mon Apr  5 11:35:12 2021][ 243874.5495735] unmounted more.local:/archive from /more/archive, type nfs
 [Mon Apr  5 11:35:13 2021][ 243875.7096504] unmount of /build (/dev/mapper/scratch-build) failed with error 16
 [Mon Apr  5 11:35:13 2021][ 243875.7096504] unmounted procfs from /proc, type procfs
 [Mon Apr  5 11:35:13 2021][ 243875.7096504] unmounted ptyfs from /dev/pts, type ptyfs
 [Mon Apr  5 11:35:13 2021][ 243875.7096504] unmounted kernfs from /kern, type kernfs
 [Mon Apr  5 11:35:14 2021][ 243875.7497417] unmounted /dev/dk3 from /usr/pkg, type ffs
 [Mon Apr  5 11:35:14 2021][ 243875.7797304] unmounted /dev/dk2 from /var, type ffs
 [Mon Apr  5 11:35:14 2021][ 243875.8097252] unmount of / (/dev/dk0) failed with error 16
 [Mon Apr  5 11:35:14 2021][ 243875.8097252] WARNING: some file systems would not unmount
 [Mon Apr  5 11:35:14 2021][ 243875.8097252] unmount of /build (/dev/mapper/scratch-build) failed with error 16
 [Mon Apr  5 11:35:14 2021][ 243875.8097252] unmount of / (/dev/dk0) failed with error 16
 [Mon Apr  5 11:35:14 2021][ 243875.8097252] WARNING: some file systems would not unmount
 [Mon Apr  5 11:35:15 2021][ 243877.5197731] brgphy1: detached
 [Mon Apr  5 11:35:15 2021][ 243877.5398270] bnx1: detached
 [Mon Apr  5 11:35:17 2021][ 243879.2498901] brgphy0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.2699359] bnx0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.2699359] pci6: detached
 [Mon Apr  5 11:35:17 2021][ 243879.2699359] pci4: detached
 [Mon Apr  5 11:35:17 2021][ 243879.2699359] ppb5: detached
 [Mon Apr  5 11:35:17 2021][ 243879.2699359] ppb3: detached
 [Mon Apr  5 11:35:17 2021][ 243879.2699359] pci5: detached
 [Mon Apr  5 11:35:17 2021][ 243879.2699359] pci3: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4599042] brgphy2: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4899784] bnx2: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4899784] ppb4: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] ppb2: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] uhub6: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] uhub5: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] entropy: cd0 detached as an entropy source
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] cd0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] pci14: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] pci7: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] pci2: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] atapibus0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] uhub4: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] uhub3: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] uhub2: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] uhub1: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] uhub0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] com1: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] ppb13: detached
 [Mon Apr  5 11:35:17 2021][ 243879.4999078] ppb6: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] ppb1: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] atabus0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] usb4: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] usb3: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] usb2: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] usb1: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] usb0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pci13: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pci12: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pci11: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pci10: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pci9: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pci1: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] ehci0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] uhci3: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] uhci2: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] uhci1: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] uhci0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] ppb12: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pchb7: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pchb6: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pchb5: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pchb4: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pchb3: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pchb2: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pchb1: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] ppb11: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] ppb10: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] ppb9: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] ppb8: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] ppb0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] pchb0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] ipmi_acpi0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dm5: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dm4: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dm3: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dm2: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dm1: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dm0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] cgd3: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] vnd3: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] cgd2: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] vnd2: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] cgd1: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] vnd1: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] cgd0: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dk5 at sd1 (scratchdisk0) deleted
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dk5: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dk4 at sd0 (LVM-vg0) deleted
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dk4: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dk3 at sd0 (/usr/pkg) deleted
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dk3: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dk2 at sd0 (/var) deleted
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dk2: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dk1 at sd0 (swap) deleted
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] dk1: detached
 [Mon Apr  5 11:35:17 2021][ 243879.5099064] unmount of /build (/dev/mapper/scratch-build) failed with error 16
 [Mon Apr  5 11:35:18 2021][ 243879.5099064] unmount of / (/dev/dk0) failed with error 16
 [Mon Apr  5 11:35:18 2021][ 243879.5099064] WARNING: some file systems would not unmount
 [Mon Apr  5 11:35:18 2021][ 243879.5099064] entropy: sd1 detached as an entropy source
 [Mon Apr  5 11:35:18 2021][ 243879.5099064] sd1: detached
 [Mon Apr  5 11:35:18 2021][ 243879.5099064] unmount of /build (/dev/mapper/scratch-build) failed with error 16
 [Mon Apr  5 11:35:18 2021][ 243879.5099064] unmount of / (/dev/dk0) failed with error 16
 [Mon Apr  5 11:35:18 2021][ 243879.5099064] WARNING: some file systems would not unmount
 [Mon Apr  5 11:35:18 2021][ 243879.5099064] forcefully unmounting /dev/mapper/scratch-build from /build...
 [Mon Apr  5 11:35:18 2021][ 243879.5099064] forcefully unmounted /dev/mapper/scratch-build from /build, type ffs
 [Mon Apr  5 11:35:18 2021][ 243879.5099064] unmount of / (/dev/dk0) failed with error 16
 [Mon Apr  5 11:35:18 2021][ 243879.5199222] WARNING: some file systems would not unmount
 [Mon Apr  5 11:35:18 2021][ 243879.5199222] forcefully unmounting /dev/dk0 from /...
 [Mon Apr  5 11:35:18 2021][ 243879.5199222] forcefully unmounted /dev/dk0 from /, type ffs
 [Mon Apr  5 11:35:18 2021][ 243879.5199222] unmounting done
 [Mon Apr  5 11:35:18 2021][ 243879.5199222] turning off swap... done
 [Mon Apr  5 11:35:18 2021][ 243879.5199222] dk0 at sd0 (/) deleted
 [Mon Apr  5 11:35:18 2021][ 243879.5199222] entropy: sd0 detached as an entropy source
 [Mon Apr  5 11:35:18 2021][ 243879.5199222] sd0: detached
 [Mon Apr  5 11:35:18 2021][ 243879.5199222] scsibus0: detached
 [Mon Apr  5 11:35:18 2021][ 243879.7099877] mfi0: detached
 [Mon Apr  5 11:35:18 2021][ 243879.7099877] pci8: detached
 [Mon Apr  5 11:35:18 2021][ 243879.7099877] ppb7: detached
 [Mon Apr  5 11:35:18 2021][ 243879.7099877] unmounting done
 [Mon Apr  5 11:35:18 2021][ 243879.7099877] turning off swap... done
 [Mon Apr  5 11:35:18 2021][ 243879.7099877] rebooting...


 What's clear there is that the dm(4) devices do detach before /build is
 properly, or even forcefully, unmounted.

 What's not so clear is why it failed the initial umount -- I'm guessing
 though it was because some process was still sitting with its CWD on it.

 Perhaps I'll try that one more time just to trick it and before I get
 involved in doing other things that make me reluctant to reboot again.

 --
 					Greg A. Woods <gwoods@acm.org>

 Kelowna, BC     +1 250 762-7675           RoboHack <woods@robohack.ca>
 Planix, Inc. <woods@planix.com>     Avoncote Farms <woods@avoncote.ca>

 --pgp-sign-Multipart_Thu_May__6_14:37:55_2021-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit
 Content-Description: OpenPGP Digital Signature

 -----BEGIN PGP SIGNATURE-----

 iF0EABECAB0WIQRuK6dmwVAucmRxuh9mfXG3eL/0fwUCYJRhtgAKCRBmfXG3eL/0
 f4fhAJ4na4boIu/s3KUl/qNDKOlZH6A6zwCgt9g7LpdLZE4QGwfW/A34sfviX8Y=
 =b9Pf
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Thu_May__6_14:37:55_2021-1--

From: "Greg A. Woods" <woods@planix.ca>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
Date: Thu, 06 May 2021 15:44:15 -0700

 --pgp-sign-Multipart_Thu_May__6_15:44:03_2021-1
 Content-Type: text/plain; charset=US-ASCII

 At Wed, 5 May 2021 15:51:08 +0200, "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de> wrote:
 Subject: Re: kern/54969 (Disk cache is no longer flushed on shutdown)
 >
 > Please report if this fix really works.

 I think this reboot shows definitively that it really works.

 I left a couple of processes with CWD in the filesystem, one being a
 nohup'ed sleep and the other being the console login shell, and here we
 see /build having to be forcefully unmounted, after all but one of dm(4)
 devices detach, with the relevant dm1 (and its underlying dk(4) and
 sd(4) devices) only detaching after the unmount:

 [ 3868.1813906] syncing disks... done
 [ 3868.1913903] unmounted more.local:/vcs from /more/vcs, type nfs
 [ 3868.1913903] unmounted more.local:/work from /more/work, type nfs
 [ 3868.2514795] unmounted more.local:/home from /more/home, type nfs
 [ 3868.2514795] unmounted more.local:/archive from /more/archive, type nfs
 [ 3868.2614057] unmount of /build (/dev/mapper/scratch-build) failed with error 16
 [ 3868.2614057] unmounted procfs from /proc, type procfs
 [ 3868.2614057] unmounted ptyfs from /dev/pts, type ptyfs
 [ 3868.2614057] unmounted kernfs from /kern, type kernfs
 [ 3868.3313995] unmounted /dev/dk3 from /usr/pkg, type ffs
 [ 3868.4114046] unmounted /dev/dk2 from /var, type ffs
 [ 3868.5615043] unmount of / (/dev/dk0) failed with error 16
 [ 3868.5615043] WARNING: some file systems would not unmount
 [ 3868.5615043] unmount of /build (/dev/mapper/scratch-build) failed with error 16
 [ 3868.5615043] unmount of / (/dev/dk0) failed with error 16
 [ 3868.5615043] WARNING: some file systems would not unmount
 [ 3870.2715341] brgphy1: detached
 [ 3870.2915786] bnx1: detached
 [ 3871.9916481] brgphy0: detached
 [ 3872.0117375] bnx0: detached
 [ 3872.0117375] pci6: detached
 [ 3872.0117375] pci4: detached
 [ 3872.0117375] sd2: detached
 [ 3872.0117375] cd1: detached
 [ 3872.0216581] ppb5: detached
 [ 3872.0216581] ppb3: detached
 [ 3872.0216581] scsibus2: detached
 [ 3872.0216581] scsibus1: detached
 [ 3872.0216581] pci5: detached
 [ 3872.0216581] pci3: detached
 [ 3872.2016539] brgphy2: detached
 [ 3872.2216614] bnx2: detached
 [ 3872.2316580] ppb4: detached
 [ 3872.2316580] ppb2: detached
 [ 3872.2316580] uhub6: detached
 [ 3872.2316580] cd0: detached
 [ 3872.2316580] pci14: detached
 [ 3872.2316580] pci7: detached
 [ 3872.2316580] pci2: detached
 [ 3872.2316580] atapibus0: detached
 [ 3872.2316580] uhub4: detached
 [ 3872.2316580] uhub2: detached
 [ 3872.2316580] uhub1: detached
 [ 3872.2316580] uhub0: detached
 [ 3872.2316580] com1: detached
 [ 3872.2316580] ppb13: detached
 [ 3872.2316580] ppb6: detached
 [ 3872.2416575] ppb1: detached
 [ 3872.2416575] atabus0: detached
 [ 3872.2416575] usb3: detached
 [ 3872.2416575] usb2: detached
 [ 3872.2416575] usb1: detached
 [ 3872.2416575] usb0: detached
 [ 3872.2416575] pci13: detached
 [ 3872.2416575] pci12: detached
 [ 3872.2416575] pci11: detached
 [ 3872.2416575] pci10: detached
 [ 3872.2416575] pci9: detached
 [ 3872.2416575] pci1: detached
 [ 3872.2416575] uhci3: detached
 [ 3872.2416575] uhci2: detached
 [ 3872.2416575] uhci1: detached
 [ 3872.2416575] uhci0: detached
 [ 3872.2416575] ppb12: detached
 [ 3872.2416575] pchb7: detached
 [ 3872.2416575] pchb6: detached
 [ 3872.2416575] pchb5: detached
 [ 3872.2416575] pchb4: detached
 [ 3872.2416575] pchb3: detached
 [ 3872.2416575] pchb2: detached
 [ 3872.2416575] pchb1: detached
 [ 3872.2416575] ppb11: detached
 [ 3872.2416575] ppb10: detached
 [ 3872.2416575] ppb9: detached
 [ 3872.2416575] ppb8: detached
 [ 3872.2416575] ppb0: detached
 [ 3872.2416575] pchb0: detached
 [ 3872.2416575] ipmi_acpi0: detached
 [ 3872.2416575] dm8: detached
 [ 3872.2416575] dm7: detached
 [ 3872.2416575] dm6: detached
 [ 3872.2416575] dm5: detached
 [ 3872.2416575] dm4: detached
 [ 3872.2416575] dm3: detached
 [ 3872.2416575] dm2: detached
 [ 3872.2416575] cgd3: detached
 [ 3872.2416575] vnd3: detached
 [ 3872.2416575] cgd2: detached
 [ 3872.2416575] vnd2: detached
 [ 3872.2416575] cgd1: detached
 [ 3872.2416575] vnd1: detached
 [ 3872.2416575] cgd0: detached
 [ 3872.2416575] vnd0: detached
 [ 3872.2416575] dk4 at sd0 (LVM-vg0) deleted
 [ 3872.2416575] dk4: detached
 [ 3872.2416575] dk3 at sd0 (/usr/pkg) deleted
 [ 3872.2416575] dk3: detached
 [ 3872.2416575] dk2 at sd0 (/var) deleted
 [ 3872.2416575] dk2: detached
 [ 3872.2416575] dk1 at sd0 (swap) deleted
 [ 3872.2416575] dk1: detached
 [ 3872.2416575] unmount of /build (/dev/mapper/scratch-build) failed with error 16
 [ 3872.2416575] unmount of / (/dev/dk0) failed with error 16
 [ 3872.2416575] WARNING: some file systems would not unmount
 [ 3872.2416575] forcefully unmounting /dev/mapper/scratch-build from /build...
 [ 3872.2516597] forcefully unmounted /dev/mapper/scratch-build from /build, type ffs
 [ 3872.2516597] unmount of / (/dev/dk0) failed with error 16
 [ 3872.2516597] WARNING: some file systems would not unmount
 [ 3872.2516597] dm1: detached
 [ 3872.2516597] dk5 at sd1 (scratchdisk0) deleted
 [ 3872.2516597] dk5: detached
 [ 3872.2516597] unmount of / (/dev/dk0) failed with error 16
 [ 3872.2516597] WARNING: some file systems would not unmount
 [ 3872.2516597] sd1: detached
 [ 3872.2516597] unmount of / (/dev/dk0) failed with error 16
 [ 3872.2516597] WARNING: some file systems would not unmount
 [ 3872.2516597] forcefully unmounting /dev/dk0 from /...
 [ 3872.2616634] forcefully unmounted /dev/dk0 from /, type ffs
 [ 3872.2616634] unmounting done
 [ 3872.2616634] turning off swap... done
 [ 3872.2616634] dk0 at sd0 (/) deleted
 [ 3872.2616634] sd0: detached
 [ 3872.2616634] scsibus0: detached
 [ 3872.4517420] mfi0: detached
 [ 3872.4517420] pci8: detached
 [ 3872.4517420] ppb7: detached
 [ 3872.4517420] unmounting done
 [ 3872.4517420] turning off swap... done
 [ 3872.4517420] rebooting...
 (XEN) [2021-05-06 21:54:10.835] Hardware Dom0 shutdown: rebooting machine



 --
 					Greg A. Woods <gwoods@acm.org>

 Kelowna, BC     +1 250 762-7675           RoboHack <woods@robohack.ca>
 Planix, Inc. <woods@planix.com>     Avoncote Farms <woods@avoncote.ca>

 --pgp-sign-Multipart_Thu_May__6_15:44:03_2021-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit
 Content-Description: OpenPGP Digital Signature

 -----BEGIN PGP SIGNATURE-----

 iF0EABECAB0WIQRuK6dmwVAucmRxuh9mfXG3eL/0fwUCYJRxOAAKCRBmfXG3eL/0
 f+wYAKCPGlT0WmH7UQe6LCAof9I8foZrTACfajB64psSESwde8rdlwSEk4WfzAs=
 =b8EV
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Thu_May__6_15:44:03_2021-1--

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54969 CVS commit: src/sys/dev/dm
Date: Fri, 7 May 2021 09:54:43 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Fri May  7 09:54:43 UTC 2021

 Modified Files:
 	src/sys/dev/dm: device-mapper.c

 Log Message:
 Track the number of cdev and bdev opens and fail dm_detach()
 on open devices unless detach is forced.

 PR kern/54969 (Disk cache is no longer flushed on shutdown)


 To generate a diff of this commit:
 cvs rdiff -u -r1.61 -r1.62 src/sys/dev/dm/device-mapper.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.