NetBSD Problem Report #52301

From dholland@netbsd.org  Thu Jun 15 06:10:56 2017
Return-Path: <dholland@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 6FEF47A269
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 15 Jun 2017 06:10:56 +0000 (UTC)
Message-Id: <20170615061056.337D284CD8@mail.netbsd.org>
Date: Thu, 15 Jun 2017 06:10:56 +0000 (UTC)
From: dholland@NetBSD.org
Reply-To: dholland@NetBSD.org
To: gnats-bugs@NetBSD.org
Cc: coypu@sdf.org
Subject: lfs deadlock between lfs_fsync and lfs_create
X-Send-Pr-Version: 3.95

>Number:         52301
>Category:       kern
>Synopsis:       lfs deadlock between lfs_fsync and lfs_create
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jun 15 06:15:00 +0000 2017
>Closed-Date:    Sun Jun 20 22:24:42 +0000 2021
>Last-Modified:  Sun Jun 20 22:24:42 +0000 2021
>Originator:     David A. Holland
>Release:        NetBSD 7.99.75 (20170603)
>Organization:
>Environment:
System: NetBSD loggy 7.99.75 NetBSD 7.99.75 (GENERIC) #66: Sat Jun 10 05:26:28
+IDT 2017  fly@loggy:/home/fly/obj/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:

lfs can deadlock as follows:

1. process A calls lfs_fsync
   - which calls lfs_vflush
   - which takes the seglock and calls lfs_segwrite
2. meanwhile process B calls lfs_create
   - which starts a dirop
   - and then down inside lfs_valloc blocks waiting for the seglock.
3. now process A in lfs_segwrite calls lfs_writer_enter
   - which blocks waiting for dirops to clear
   - but they can't.

We found this by analyzing a crashdump, and confirmed it in the source
of a 7.99.75 tree from 20170603.

>How-To-Repeat:

See above.

>Fix:

Not obvious.

I wonder if this was introduced by the vnode cache changes... but even
if so reverting those isn't the answer.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->maya
Responsible-Changed-By: maya@NetBSD.org
Responsible-Changed-When: Thu, 15 Jun 2017 13:59:19 +0000
Responsible-Changed-Why:
Mine


From: coypu@sdf.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/52301: lfs deadlock between lfs_fsync and lfs_create
Date: Thu, 15 Jun 2017 14:20:29 +0000

 --bg08WKrSYDhXBjb5
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 I am torture testing the following diff.
 It boots, let's see how it holds up :-)

 --bg08WKrSYDhXBjb5
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="lfs12.diff"

 It isn't safe to drain dirops with seglock held, this leads
 to a deadlock. drain before grabbing seglock.

 Index: lfs_segment.c
 ===================================================================
 RCS file: /cvsroot/src/sys/ufs/lfs/lfs_segment.c,v
 retrieving revision 1.271
 diff -u -r1.271 lfs_segment.c
 --- lfs_segment.c	12 Jun 2017 15:02:32 -0000	1.271
 +++ lfs_segment.c	15 Jun 2017 14:18:13 -0000
 @@ -603,7 +603,6 @@
  	SEGUSE *segusep;
  	int do_ckp, did_ckp, error;
  	unsigned n, segleft, maxseg, sn, i, curseg;
 -	int writer_set = 0;
  	int dirty;
  	int redo;
  	SEGSUM *ssp;
 @@ -628,6 +627,8 @@
  	if (do_ckp)
  		flags &= ~SEGM_SINGLE;

 +	lfs_writer_enter(fs, "lfs segwrite");
 +
  	lfs_seglock(fs, flags | (do_ckp ? SEGM_CKP : 0));
  	sp = fs->lfs_sp;
  	if (sp->seg_flags & (SEGM_CLEAN | SEGM_CKP))
 @@ -653,11 +654,7 @@
  				break;
  			}

 -			if (do_ckp || fs->lfs_dirops == 0) {
 -				if (!writer_set) {
 -					lfs_writer_enter(fs, "lfs writer");
 -					writer_set = 1;
 -				}
 +			if (do_ckp) {
  				error = lfs_writevnodes(fs, mp, sp, VN_DIROP);
  				if (um_error == 0)
  					um_error = error;
 @@ -806,8 +803,7 @@

  	/* Note Ifile no longer needs to be written */
  	fs->lfs_doifile = 0;
 -	if (writer_set)
 -		lfs_writer_leave(fs);
 +	lfs_writer_leave(fs);

  	/*
  	 * If we didn't write the Ifile, we didn't really do anything.

 --bg08WKrSYDhXBjb5--

From: "Maya Rashish" <maya@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52301 CVS commit: src/sys/ufs/lfs
Date: Thu, 15 Jun 2017 14:37:31 +0000

 Module Name:	src
 Committed By:	maya
 Date:		Thu Jun 15 14:37:31 UTC 2017

 Modified Files:
 	src/sys/ufs/lfs: lfs_segment.c

 Log Message:
 It isn't safe to drain dirops with seglock held, it'll deadlock if there
 are any dirops. drain before grabbing seglock.

 lfs_dirops == 0 is always true (as we already drained dirops), so omit
 that part of the comparison.

 Fixes a lot of LFS deadlocks. PR kern/52301

 Many thanks to dholland for help analyzing coredumps


 To generate a diff of this commit:
 cvs rdiff -u -r1.271 -r1.272 src/sys/ufs/lfs/lfs_segment.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: coypu@sdf.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/52301: lfs deadlock between lfs_fsync and lfs_create
Date: Thu, 15 Jun 2017 14:43:42 +0000

 Now I'm unconditionally lfs_writer_enter'ing, whereas previously it did
 so if there were no dirops pending (in a racy check) or if doing a
 checkpoint. This feels more correct but I am not sure yet. Certainly it
 feels slower, but at least it doesn't deadlock all the time.

From: coypu@sdf.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/52301: lfs deadlock between lfs_fsync and lfs_create
Date: Mon, 19 Jun 2017 12:18:10 +0000

 lfs_fcntl:

             case LFCNRECLAIM:
                 /*
                  * Flush dirops and write Ifile, allowing empty segments
                  * to be immediately reclaimed.
                  */
                 lfs_writer_enter(fs, "pndirop");
                 off = lfs_sb_getoffset(fs);
                 lfs_seglock(fs, SEGM_FORCE_CKP | SEGM_CKP);
                 lfs_flush_dirops(fs);
                 LFS_CLEANERINFO(cip, fs, bp);
                 oclean = lfs_ci_getclean(fs, cip);
                 LFS_SYNC_CLEANERINFO(cip, fs, bp, 1);
                 lfs_segwrite(ap->a_vp->v_mount, SEGM_FORCE_CKP);
                 fs->lfs_sp->seg_flags |= SEGM_PROT;
                 lfs_segunlock(fs);
                 lfs_writer_leave(fs);

From: "Maya Rashish" <maya@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52301 CVS commit: src/sys/ufs/lfs
Date: Wed, 26 Jul 2017 15:07:27 +0000

 Module Name:	src
 Committed By:	maya
 Date:		Wed Jul 26 15:07:27 UTC 2017

 Modified Files:
 	src/sys/ufs/lfs: lfs_segment.c

 Log Message:
 Revert r1.272 fix to PR kern/52301, the performance hit is making things
 unusable.


 To generate a diff of this commit:
 cvs rdiff -u -r1.272 -r1.273 src/sys/ufs/lfs/lfs_segment.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/52301 CVS commit: src/sys/ufs/lfs
Date: Wed, 26 Jul 2017 17:15:10 +0000

 On Wed, Jul 26, 2017 at 03:10:01PM +0000, Maya Rashish wrote:
  >  Log Message:
  >  Revert r1.272 fix to PR kern/52301, the performance hit is making things
  >  unusable.

 I wonder if we should put that fix in -8 though, at least for now...

 (pretty sure this issue's a behavioral regression relative to -7)

 -- 
 David A. Holland
 dholland@netbsd.org

From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52301 CVS commit: [netbsd-8] src
Date: Mon, 30 Oct 2017 09:29:04 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Mon Oct 30 09:29:04 UTC 2017

 Modified Files:
 	src/sbin/fsck_lfs [netbsd-8]: inode.c lfs.c pass6.c segwrite.c
 	src/sys/ufs/lfs [netbsd-8]: lfs.h lfs_accessors.h lfs_alloc.c
 	    lfs_balloc.c lfs_bio.c lfs_extern.h lfs_inode.c lfs_inode.h
 	    lfs_itimes.c lfs_pages.c lfs_rename.c lfs_segment.c lfs_subr.c
 	    lfs_syscalls.c lfs_vfsops.c lfs_vnops.c ulfs_inode.c ulfs_inode.h
 	    ulfs_lookup.c ulfs_quota2.c ulfs_readwrite.c ulfs_vnops.c

 Log Message:
 Pull up following revision(s) (requested by maya in ticket #330):
 	sbin/fsck_lfs/inode.c: 1.69
 	sbin/fsck_lfs/lfs.c: 1.73
 	sbin/fsck_lfs/pass6.c: 1.50
 	sbin/fsck_lfs/segwrite.c: 1.46
 	sys/ufs/lfs/lfs.h: 1.202-1.203
 	sys/ufs/lfs/lfs_accessors.h: 1.48
 	sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
 	sys/ufs/lfs/lfs_balloc.c: 1.94
 	sys/ufs/lfs/lfs_bio.c: 1.141
 	sys/ufs/lfs/lfs_extern.h: 1.113
 	sys/ufs/lfs/lfs_inode.c: 1.156-1.157
 	sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
 	sys/ufs/lfs/lfs_itimes.c: 1.20
 	sys/ufs/lfs/lfs_pages.c: 1.13-1.15
 	sys/ufs/lfs/lfs_rename.c: 1.22
 	sys/ufs/lfs/lfs_segment.c: 1.270-1.275
 	sys/ufs/lfs/lfs_subr.c: 1.94-1.97
 	sys/ufs/lfs/lfs_syscalls.c: 1.175
 	sys/ufs/lfs/lfs_vfsops.c: 1.360
 	sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
 	sys/ufs/lfs/ulfs_inode.c: 1.20
 	sys/ufs/lfs/ulfs_inode.h: 1.24
 	sys/ufs/lfs/ulfs_lookup.c: 1.41
 	sys/ufs/lfs/ulfs_quota2.c: 1.31
 	sys/ufs/lfs/ulfs_readwrite.c: 1.24
 	sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
 Update inode member i_flag --> i_state to keep up with kernel changes
 Move definition of IN_ALLMOD near the flag it's a mask for.
 Now we can see that it doesn't match all the flags, but changing that will
 require more careful thought.
 Correct confusion between i_flag and i_flags
 These will have to be renamed.
 Spotted by Riastradh, thanks!
 Add an XXX about the missing flags so it's not buried in a commit
 message.
 now the XXX count for LFS is 260
 Rename i_flag to i_state.
 The similarity to i_flags has previously caused errors.
 Use continue to denote the no-op loop to match netbsd style
 newline for extra clarity.
 It isn't safe to drain dirops with seglock held, it'll deadlock if there
 are any dirops. drain before grabbing seglock.
 lfs_dirops == 0 is always true (as we already drained dirops), so omit
 that part of the comparison.
 Fixes a lot of LFS deadlocks. PR kern/52301
 Many thanks to dholland for help analyzing coredumps
 Ifdef out KDASSERT which fires on my machine.
 Deduplicate sanity check that seglock is held on segunlock
 Revert r1.272 fix to PR kern/52301, the performance hit is making things
 unusable.
 change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
 XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
 appropriate?
 fix buffer overflow/KASSERT when cookies are supplied
 lfs no longer uses the ffs-style struct direct, use the correct minimum
 size
 from dholland
 XXX more wrong
 Consistently use {,UN}MARK_VNODE macros rather than function calls.
 Not much point doing anything after a panic call
 Ask some question about the code in a XXX comment
 XXX question our double-flushing of dirops
 Fix typo in comment


 To generate a diff of this commit:
 cvs rdiff -u -r1.68 -r1.68.8.1 src/sbin/fsck_lfs/inode.c
 cvs rdiff -u -r1.72 -r1.72.6.1 src/sbin/fsck_lfs/lfs.c
 cvs rdiff -u -r1.49 -r1.49.8.1 src/sbin/fsck_lfs/pass6.c
 cvs rdiff -u -r1.45 -r1.45.8.1 src/sbin/fsck_lfs/segwrite.c
 cvs rdiff -u -r1.201 -r1.201.6.1 src/sys/ufs/lfs/lfs.h
 cvs rdiff -u -r1.47 -r1.47.8.1 src/sys/ufs/lfs/lfs_accessors.h
 cvs rdiff -u -r1.135 -r1.135.6.1 src/sys/ufs/lfs/lfs_alloc.c
 cvs rdiff -u -r1.92 -r1.92.6.1 src/sys/ufs/lfs/lfs_balloc.c \
     src/sys/ufs/lfs/lfs_subr.c
 cvs rdiff -u -r1.139 -r1.139.4.1 src/sys/ufs/lfs/lfs_bio.c
 cvs rdiff -u -r1.111 -r1.111.10.1 src/sys/ufs/lfs/lfs_extern.h
 cvs rdiff -u -r1.155 -r1.155.6.1 src/sys/ufs/lfs/lfs_inode.c
 cvs rdiff -u -r1.19 -r1.19.6.1 src/sys/ufs/lfs/lfs_inode.h
 cvs rdiff -u -r1.19 -r1.19.10.1 src/sys/ufs/lfs/lfs_itimes.c
 cvs rdiff -u -r1.11.6.1 -r1.11.6.2 src/sys/ufs/lfs/lfs_pages.c
 cvs rdiff -u -r1.21 -r1.21.10.1 src/sys/ufs/lfs/lfs_rename.c
 cvs rdiff -u -r1.269 -r1.269.6.1 src/sys/ufs/lfs/lfs_segment.c
 cvs rdiff -u -r1.174 -r1.174.4.1 src/sys/ufs/lfs/lfs_syscalls.c
 cvs rdiff -u -r1.359 -r1.359.4.1 src/sys/ufs/lfs/lfs_vfsops.c
 cvs rdiff -u -r1.315 -r1.315.2.1 src/sys/ufs/lfs/lfs_vnops.c
 cvs rdiff -u -r1.19 -r1.19.2.1 src/sys/ufs/lfs/ulfs_inode.c
 cvs rdiff -u -r1.22 -r1.22.10.1 src/sys/ufs/lfs/ulfs_inode.h
 cvs rdiff -u -r1.40 -r1.40.6.1 src/sys/ufs/lfs/ulfs_lookup.c
 cvs rdiff -u -r1.30 -r1.30.6.1 src/sys/ufs/lfs/ulfs_quota2.c
 cvs rdiff -u -r1.23 -r1.23.6.1 src/sys/ufs/lfs/ulfs_readwrite.c
 cvs rdiff -u -r1.48 -r1.48.4.1 src/sys/ufs/lfs/ulfs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: maya->kern-bug-people
Responsible-Changed-By: maya@NetBSD.org
Responsible-Changed-When: Mon, 13 Nov 2017 19:24:51 +0000
Responsible-Changed-Why:


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: Konrad Schroder <perseant@netbsd.org>
Subject: Re: kern/52301: lfs deadlock between lfs_fsync and lfs_create
Date: Mon, 14 Jan 2019 00:50:43 +0000

 On Thu, Jun 15, 2017 at 06:15:00AM +0000, dholland@NetBSD.org wrote:
  > 1. process A calls lfs_fsync
  >    - which calls lfs_vflush
  >    - which takes the seglock and calls lfs_segwrite
  > 2. meanwhile process B calls lfs_create
  >    - which starts a dirop
  >    - and then down inside lfs_valloc blocks waiting for the seglock.
  > 3. now process A in lfs_segwrite calls lfs_writer_enter
  >    - which blocks waiting for dirops to clear
  >    - but they can't.

 So I was just looking into fixing this by having lfs_writer_enter
 temporarily release the seglock, which would allow lfs_create to
 finish. However, this led me to discover something I'd either not
 realized or forgotten: lfs_seglock isn't just a lock, it's closely
 tied to the segment writing mechanism; to wit, taking lfs_seglock is
 not just a locking action but also beginning to write out a segment,
 and releasing it means waiting until it's done; you can't release it
 temporarily without compromising core design invariants.

 But.

 Why, then, is lfs_valloc taking the seglock? It seems like this means
 that every call to lfs_valloc will write out its own (partial) segment
 containing just that allocation; this may technically be correct in
 the sense of not corrupting the volume, but it's certainly not
 desirable.

 (Even if this segment collects other pending writes, which it might --
 didn't see the logic for that but didn't look very carefully -- it
 still puts every newly created file in a different segment which will
 be unfortunate for things like untar.)

 This behavior seems to have been introduced for locking reasons a long
 time ago and for the moment I'm not clear on what those were or
 whether they're still relevant.

 Am I wrong somewhere?

 Cc: perseant@ in case he's awake and remembers anything helpful...

 -- 
 David A. Holland
 dholland@netbsd.org

From: Konrad Schroder <perseant@hhhh.org>
To: David Holland <dholland-bugs@netbsd.org>, gnats-bugs@netbsd.org
Cc: Konrad Schroder <perseant@netbsd.org>
Subject: Re: kern/52301: lfs deadlock between lfs_fsync and lfs_create
Date: Fri, 18 Jan 2019 08:56:53 -0800

 My recollection is that lfs_valloc takes the segment lock because 
 otherwise the Ifile free list could become corrupted on disk if a 
 segment was being written at the same time.  The segment lock doesn't 
 cause a segment to be written when it is released, per se, so there's no 
 harm in using it to prevent this.

 The handling of dirops needs to be thought out very carefully.  The easy 
 way would be to prohibit dirops from beginning when a segment write is 
 in progress.  This would indicate that the locking order in your point 
 #2 below is wrong.  It might end up starving dirops, however.

 A better approach might be to note, when taking the segment lock, 
 whether dirops are ready to be written, and only prohibit new dirops in 
 that case...but if the segment lock was taken when no dirops are ready, 
 the segment writer wouldn't write any VDIROP vnodes and wouldn't wait 
 for dirops to complete before doing anything.

 Take care,

 -Konrad

 On 1/13/19 4:50 PM, David Holland wrote:
 > On Thu, Jun 15, 2017 at 06:15:00AM +0000, dholland@NetBSD.org wrote:
 >   > 1. process A calls lfs_fsync
 >   >    - which calls lfs_vflush
 >   >    - which takes the seglock and calls lfs_segwrite
 >   > 2. meanwhile process B calls lfs_create
 >   >    - which starts a dirop
 >   >    - and then down inside lfs_valloc blocks waiting for the seglock.
 >   > 3. now process A in lfs_segwrite calls lfs_writer_enter
 >   >    - which blocks waiting for dirops to clear
 >   >    - but they can't.
 >
 > So I was just looking into fixing this by having lfs_writer_enter
 > temporarily release the seglock, which would allow lfs_create to
 > finish. However, this led me to discover something I'd either not
 > realized or forgotten: lfs_seglock isn't just a lock, it's closely
 > tied to the segment writing mechanism; to wit, taking lfs_seglock is
 > not just a locking action but also beginning to write out a segment,
 > and releasing it means waiting until it's done; you can't release it
 > temporarily without compromising core design invariants.
 >
 > But.
 >
 > Why, then, is lfs_valloc taking the seglock? It seems like this means
 > that every call to lfs_valloc will write out its own (partial) segment
 > containing just that allocation; this may technically be correct in
 > the sense of not corrupting the volume, but it's certainly not
 > desirable.
 >
 > (Even if this segment collects other pending writes, which it might --
 > didn't see the logic for that but didn't look very carefully -- it
 > still puts every newly created file in a different segment which will
 > be unfortunate for things like untar.)
 >
 > This behavior seems to have been introduced for locking reasons a long
 > time ago and for the moment I'm not clear on what those were or
 > whether they're still relevant.
 >
 > Am I wrong somewhere?
 >
 > Cc: perseant@ in case he's awake and remembers anything helpful...
 >

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52301 CVS commit: src/sys/ufs/lfs
Date: Sun, 23 Feb 2020 08:40:37 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Feb 23 08:40:37 UTC 2020

 Modified Files:
 	src/sys/ufs/lfs: lfs_extern.h lfs_segment.c lfs_subr.c

 Log Message:
 Break deadlock in PR kern/52301.

 The lock order is lfs_writer -> lfs_seglock.  The problem in 52301 is
 that lfs_segwrite violates this lock order by sometimes doing
 lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
 opportunistically, when there are no dirops pending.  Both cases can
 deadlock, because dirops sometimes take the seglock (lfs_truncate,
 lfs_valloc, lfs_vfree):

 (a) There may be dirops pending, and they may be waiting for the
 seglock, so we can't wait for them to complete while holding the
 seglock.

 (b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
 may change by the time lfs_writer_enter acquires lfs_lock.

 To resolve this in each case:

 (a) Do lfs_writer_enter before lfs_seglock, since we will need it
 unconditionally anyway.  The worst performance impact of this should
 be that some dirops get delayed a little bit.

 (b) Create a new lfs_writer_tryenter to use at this point so that the
 test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
 atomically under lfs_lock.


 To generate a diff of this commit:
 cvs rdiff -u -r1.115 -r1.116 src/sys/ufs/lfs/lfs_extern.h
 cvs rdiff -u -r1.284 -r1.285 src/sys/ufs/lfs/lfs_segment.c
 cvs rdiff -u -r1.98 -r1.99 src/sys/ufs/lfs/lfs_subr.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Tue, 21 Apr 2020 16:50:54 +0000
State-Changed-Why:
Hi david, do you find the latest fix satisfactory?


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52301 CVS commit: [netbsd-9] src
Date: Mon, 17 Aug 2020 10:30:23 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Aug 17 10:30:23 UTC 2020

 Modified Files:
 	src/lib/libp2k [netbsd-9]: p2k.c
 	src/sbin/fsck_lfs [netbsd-9]: pass1.c
 	src/sys/arch/i386/stand/efiboot/bootx64 [netbsd-9]: Makefile
 	src/sys/rump/fs/lib/liblfs [netbsd-9]: Makefile
 	src/sys/ufs/lfs [netbsd-9]: lfs.h lfs_accessors.h lfs_alloc.c
 	    lfs_balloc.c lfs_bio.c lfs_debug.c lfs_extern.h lfs_inode.c
 	    lfs_inode.h lfs_pages.c lfs_rename.c lfs_segment.c lfs_subr.c
 	    lfs_vfsops.c lfs_vnops.c
 	src/usr.sbin/dumplfs [netbsd-9]: dumplfs.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #1050):

 	sys/ufs/lfs/lfs_subr.c: revision 1.101
 	sys/ufs/lfs/lfs_subr.c: revision 1.102
 	sys/ufs/lfs/lfs_inode.c: revision 1.158
 	sys/ufs/lfs/lfs_inode.h: revision 1.25
 	sys/ufs/lfs/lfs_balloc.c: revision 1.95
 	sys/ufs/lfs/lfs_pages.c: revision 1.21
 	sys/ufs/lfs/lfs_vnops.c: revision 1.330
 	sys/ufs/lfs/lfs_alloc.c: revision 1.140 (patch)
 	sys/ufs/lfs/lfs_alloc.c: revision 1.141 (patch)
 	lib/libp2k/p2k.c: revision 1.72
 	sys/ufs/lfs/lfs.h: revision 1.205
 	sys/ufs/lfs/lfs.h: revision 1.206
 	sys/ufs/lfs/lfs_segment.c: revision 1.284
 	sys/ufs/lfs/lfs.h: revision 1.207
 	sys/ufs/lfs/lfs_segment.c: revision 1.285
 	sys/ufs/lfs/lfs_debug.c: revision 1.55
 	sys/ufs/lfs/lfs_rename.c: revision 1.23
 	usr.sbin/dumplfs/dumplfs.c: revision 1.65
 	sys/ufs/lfs/lfs_vfsops.c: revision 1.371
 	sys/arch/i386/stand/efiboot/bootx64/Makefile: revision 1.3
 	sys/ufs/lfs/lfs_vfsops.c: revision 1.372
 	sys/ufs/lfs/lfs_vfsops.c: revision 1.373
 	sbin/fsck_lfs/pass1.c: revision 1.46
 	sys/ufs/lfs/lfs_vnops.c: revision 1.326
 	sys/ufs/lfs/lfs_vnops.c: revision 1.327
 	sys/ufs/lfs/lfs_vfsops.c: revision 1.375 (patch)
 	sys/ufs/lfs/lfs_vnops.c: revision 1.328
 	sys/ufs/lfs/lfs_subr.c: revision 1.98
 	sys/ufs/lfs/lfs_extern.h: revision 1.116
 	sys/ufs/lfs/lfs_vnops.c: revision 1.329
 	sys/ufs/lfs/lfs_subr.c: revision 1.99
 	sys/ufs/lfs/lfs_extern.h: revision 1.117
 	sys/ufs/lfs/lfs_accessors.h: revision 1.49
 	sys/ufs/lfs/lfs_extern.h: revision 1.118
 	sys/rump/fs/lib/liblfs/Makefile: revision 1.15
 	sys/ufs/lfs/lfs_bio.c: revision 1.146 (patch)
 	sys/ufs/lfs/lfs_bio.c: revision 1.147
 	sys/ufs/lfs/lfs_subr.c: revision 1.100

 Fix kassert in lfs by initializing vp first.

 Use a marker node to iterate lfs_dchainhd / i_lfs_dchain.

 I believe elements can be removed while the lock is dropped,
 including the next node we're hanging on to.

 Just use VOP_BWRITE for lfs_bwrite_log.
 Hope this doesn't cause trouble with vfs_suspend.

 Teach lfs to transition ro<->rw.

 Prevent new dirops while we issue lfs_flush_dirops.

 lfs_flush_dirops assumes (by KASSERT((ip->i_state & IN_ADIROP) == 0))
 that vnodes on the dchain will not become involved in active dirops
 even while holding no other locks (lfs_lock, v_interlock), so we must
 set lfs_writer here.  All other callers already set lfs_writer.

 We set fs->lfs_writer++ without explicitly doing lfs_writer_enter
 because
 (a) we already waited for the dirops to drain, and
 (b) we hold lfs_lock and cannot drop it before setting lfs_writer.

 Assert lfs_writer where I think we can now prove it.

 Serialize access to the splay tree with lfs_lock.

 Change some cheap KDASSERT into KASSERT.

 Take a reference and fix assertions in lfs_flush_dirops.
 Fixes panic:
 KASSERT((ip->i_state & IN_ADIROP) == 0) at lfs_vnops.c:1670
 lfs_flush_dirops
 lfs_check
 lfs_setattr
 VOP_SETATTR
 change_mode
 sys_fchmod
 syscall

 This assertion -- and the assertion that vp->v_uflag has VU_DIROP set
 -- is valid only until we release lfs_lock, because we may race with
 lfs_unmark_dirop which will remove the nodes and change the flags.

 Further, vp itself is valid only as long as it is referenced, which it
 is as long as it's on the dchain, but lfs_unmark_dirop drops the
 dchain's reference.

 Don't lfs_writer_enter while holding v_interlock.

 There's no need to lfs_writer_enter at all here, as far as I can see.
 lfs_flush_fs will do it for us.

 Break deadlock in PR kern/52301.

 The lock order is lfs_writer -> lfs_seglock.  The problem in 52301 is
 that lfs_segwrite violates this lock order by sometimes doing
 lfs_seglock -> lfs_writer, either (a) when doing a checkpoint or (b),
 opportunistically, when there are no dirops pending.  Both cases can
 deadlock, because dirops sometimes take the seglock (lfs_truncate,
 lfs_valloc, lfs_vfree):
 (a) There may be dirops pending, and they may be waiting for the
 seglock, so we can't wait for them to complete while holding the
 seglock.
 (b) The test for fs->lfs_dirops == 0 happens unlocked, and the state
 may change by the time lfs_writer_enter acquires lfs_lock.

 To resolve this in each case:
 (a) Do lfs_writer_enter before lfs_seglock, since we will need it
 unconditionally anyway.  The worst performance impact of this should
 be that some dirops get delayed a little bit.
 (b) Create a new lfs_writer_tryenter to use at this point so that the
 test for fs->lfs_dirops == 0 and the acquisition of lfs_writer happen
 atomically under lfs_lock.

 Initialize/destroy lfs_allclean_wakeup in modcmd, not lfs_mountfs.

 Fixes reloading lfs.kmod.

 In lfs_update, hold lfs_writer around lfs_vflush.

 Otherwise, we might do
 lfs_vflush
 -> lfs_seglock
 -> lfs_segwait(SEGM_CKP)
    -> lfs_writer_enter
 which is the reverse of the lfs_writer -> lfs_seglock ordering.

 Call lfs_orphan in lfs_rename while we're still in the dirop.
 lfs_writer_enter can't fail; keep it simple and don't pretend it can.

 Assert that mtsleep can't fail either -- it doesn't catch signals and
 there's no timeout.

 Teach LFS_ORPHAN_NEXTFREE about lfs64.

 Dust off the orphan detection code and try to make it work.

 Fix !DIAGNOSTIC compile

 Fix userland references to LFS_ORPHAN_NEXTFREE.

 Forgot to grep for these or do a full distribution build, oops!

 Fix missing <sys/evcnt.h> by removing the evcnts instead.

 Just wanted to confirm that a race might happen, and indeed it did.
 These serve little diagnostic value otherwise.

 OR into bp->b_cflags; don't overwrite.

 CTASSERT lfs on-disk structure sizes.

 Avoid misaligned access to lfs64 on-disk records in memory.
 lfs64 directory entries are only 32-bit aligned in order to conserve
 space in directory blocks, and we had a hack to stuff a 64-bit inode
 in them.  This replaces the hack by __aligned(4) __packed, and goes
 further:

 1. It's not clear that all the other lfs64 data structures are 64-bit
    aligned on disk to begin with.  We can go through these later and
    upgrade them from
         struct foo64 {
                 ...
         } __aligned(4) __packed;
         union foo {
                 struct foo64 f64;
                 ...
         };
    to
         struct foo64 {
                 ...
         };
         union foo {
                 struct foo64 f64 __aligned(8);
                 ...
         } __aligned(4) __packed;
    if we really want to take advantage of 64-bit memory accesses.
    However, the __aligned(4) __packed must remain on the union
    because:
 2. We access even the lfs32 data structures via a union that has
    lfs64 members, and it turns out that compilers will assume access
    through a union with 64-bit aligned members implies the whole
    union has 64-bit alignment, even if we're only accessing a 32-bit
    aligned member.

 Fix clang build after packed lfs64 accessor change.

 Suppress spurious address-of-packed error in rump lfs too.


 To generate a diff of this commit:
 cvs rdiff -u -r1.70 -r1.70.14.1 src/lib/libp2k/p2k.c
 cvs rdiff -u -r1.45 -r1.45.18.1 src/sbin/fsck_lfs/pass1.c
 cvs rdiff -u -r1.1.26.1 -r1.1.26.2 \
     src/sys/arch/i386/stand/efiboot/bootx64/Makefile
 cvs rdiff -u -r1.14 -r1.14.22.1 src/sys/rump/fs/lib/liblfs/Makefile
 cvs rdiff -u -r1.204 -r1.204.4.1 src/sys/ufs/lfs/lfs.h
 cvs rdiff -u -r1.48 -r1.48.12.1 src/sys/ufs/lfs/lfs_accessors.h
 cvs rdiff -u -r1.137 -r1.137.8.1 src/sys/ufs/lfs/lfs_alloc.c
 cvs rdiff -u -r1.94 -r1.94.10.1 src/sys/ufs/lfs/lfs_balloc.c
 cvs rdiff -u -r1.142 -r1.142.6.1 src/sys/ufs/lfs/lfs_bio.c
 cvs rdiff -u -r1.54 -r1.54.22.1 src/sys/ufs/lfs/lfs_debug.c
 cvs rdiff -u -r1.114 -r1.114.4.1 src/sys/ufs/lfs/lfs_extern.h
 cvs rdiff -u -r1.157 -r1.157.10.1 src/sys/ufs/lfs/lfs_inode.c
 cvs rdiff -u -r1.23 -r1.23.10.1 src/sys/ufs/lfs/lfs_inode.h
 cvs rdiff -u -r1.15 -r1.15.8.1 src/sys/ufs/lfs/lfs_pages.c
 cvs rdiff -u -r1.22 -r1.22.10.1 src/sys/ufs/lfs/lfs_rename.c
 cvs rdiff -u -r1.278 -r1.278.4.1 src/sys/ufs/lfs/lfs_segment.c
 cvs rdiff -u -r1.97 -r1.97.8.1 src/sys/ufs/lfs/lfs_subr.c
 cvs rdiff -u -r1.365 -r1.365.2.1 src/sys/ufs/lfs/lfs_vfsops.c
 cvs rdiff -u -r1.324 -r1.324.2.1 src/sys/ufs/lfs/lfs_vnops.c
 cvs rdiff -u -r1.64 -r1.64.4.1 src/usr.sbin/dumplfs/dumplfs.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 20 Jun 2021 22:24:42 +0000
State-Changed-Why:
riastradh fixed this properly some time back, and while lfs dirops still
need to be redesigned, keeping this PR open (especially in feedback) doesn't
do anything towards making that happen.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.