NetBSD Problem Report #52301

From dholland@netbsd.org  Thu Jun 15 06:10:56 2017
Return-Path: <dholland@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 6FEF47A269
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 15 Jun 2017 06:10:56 +0000 (UTC)
Message-Id: <20170615061056.337D284CD8@mail.netbsd.org>
Date: Thu, 15 Jun 2017 06:10:56 +0000 (UTC)
From: dholland@NetBSD.org
Reply-To: dholland@NetBSD.org
To: gnats-bugs@NetBSD.org
Cc: coypu@sdf.org
Subject: lfs deadlock between lfs_fsync and lfs_create
X-Send-Pr-Version: 3.95

>Number:         52301
>Category:       kern
>Synopsis:       lfs deadlock between lfs_fsync and lfs_create
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jun 15 06:15:00 +0000 2017
>Last-Modified:  Mon Nov 13 19:24:51 +0000 2017
>Originator:     David A. Holland
>Release:        NetBSD 7.99.75 (20170603)
>Organization:
>Environment:
System: NetBSD loggy 7.99.75 NetBSD 7.99.75 (GENERIC) #66: Sat Jun 10 05:26:28
+IDT 2017  fly@loggy:/home/fly/obj/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:

lfs can deadlock as follows:

1. process A calls lfs_fsync
   - which calls lfs_vflush
   - which takes the seglock and calls lfs_segwrite
2. meanwhile process B calls lfs_create
   - which starts a dirop
   - and then down inside lfs_valloc blocks waiting for the seglock.
3. now process A in lfs_segwrite calls lfs_writer_enter
   - which blocks waiting for dirops to clear
   - but they can't.

We found this by analyzing a crashdump, and confirmed it in the source
of a 7.99.75 tree from 20170603.

>How-To-Repeat:

See above.

>Fix:

Not obvious.

I wonder if this was introduced by the vnode cache changes... but even
if so reverting those isn't the answer.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->maya
Responsible-Changed-By: maya@NetBSD.org
Responsible-Changed-When: Thu, 15 Jun 2017 13:59:19 +0000
Responsible-Changed-Why:
Mine


From: coypu@sdf.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/52301: lfs deadlock between lfs_fsync and lfs_create
Date: Thu, 15 Jun 2017 14:20:29 +0000

 --bg08WKrSYDhXBjb5
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 I am torture testing the following diff.
 It boots, let's see how it holds up :-)

 --bg08WKrSYDhXBjb5
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="lfs12.diff"

 It isn't safe to drain dirops with seglock held, this leads
 to a deadlock. drain before grabbing seglock.

 Index: lfs_segment.c
 ===================================================================
 RCS file: /cvsroot/src/sys/ufs/lfs/lfs_segment.c,v
 retrieving revision 1.271
 diff -u -r1.271 lfs_segment.c
 --- lfs_segment.c	12 Jun 2017 15:02:32 -0000	1.271
 +++ lfs_segment.c	15 Jun 2017 14:18:13 -0000
 @@ -603,7 +603,6 @@
  	SEGUSE *segusep;
  	int do_ckp, did_ckp, error;
  	unsigned n, segleft, maxseg, sn, i, curseg;
 -	int writer_set = 0;
  	int dirty;
  	int redo;
  	SEGSUM *ssp;
 @@ -628,6 +627,8 @@
  	if (do_ckp)
  		flags &= ~SEGM_SINGLE;

 +	lfs_writer_enter(fs, "lfs segwrite");
 +
  	lfs_seglock(fs, flags | (do_ckp ? SEGM_CKP : 0));
  	sp = fs->lfs_sp;
  	if (sp->seg_flags & (SEGM_CLEAN | SEGM_CKP))
 @@ -653,11 +654,7 @@
  				break;
  			}

 -			if (do_ckp || fs->lfs_dirops == 0) {
 -				if (!writer_set) {
 -					lfs_writer_enter(fs, "lfs writer");
 -					writer_set = 1;
 -				}
 +			if (do_ckp) {
  				error = lfs_writevnodes(fs, mp, sp, VN_DIROP);
  				if (um_error == 0)
  					um_error = error;
 @@ -806,8 +803,7 @@

  	/* Note Ifile no longer needs to be written */
  	fs->lfs_doifile = 0;
 -	if (writer_set)
 -		lfs_writer_leave(fs);
 +	lfs_writer_leave(fs);

  	/*
  	 * If we didn't write the Ifile, we didn't really do anything.

 --bg08WKrSYDhXBjb5--

From: "Maya Rashish" <maya@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52301 CVS commit: src/sys/ufs/lfs
Date: Thu, 15 Jun 2017 14:37:31 +0000

 Module Name:	src
 Committed By:	maya
 Date:		Thu Jun 15 14:37:31 UTC 2017

 Modified Files:
 	src/sys/ufs/lfs: lfs_segment.c

 Log Message:
 It isn't safe to drain dirops with seglock held, it'll deadlock if there
 are any dirops. drain before grabbing seglock.

 lfs_dirops == 0 is always true (as we already drained dirops), so omit
 that part of the comparison.

 Fixes a lot of LFS deadlocks. PR kern/52301

 Many thanks to dholland for help analyzing coredumps


 To generate a diff of this commit:
 cvs rdiff -u -r1.271 -r1.272 src/sys/ufs/lfs/lfs_segment.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: coypu@sdf.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/52301: lfs deadlock between lfs_fsync and lfs_create
Date: Thu, 15 Jun 2017 14:43:42 +0000

 Now I'm unconditionally lfs_writer_enter'ing, whereas previously it did
 so if there were no dirops pending (in a racy check) or if doing a
 checkpoint. This feels more correct but I am not sure yet. Certainly it
 feels slower, but at least it doesn't deadlock all the time.

From: coypu@sdf.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/52301: lfs deadlock between lfs_fsync and lfs_create
Date: Mon, 19 Jun 2017 12:18:10 +0000

 lfs_fcntl:

             case LFCNRECLAIM:
                 /*
                  * Flush dirops and write Ifile, allowing empty segments
                  * to be immediately reclaimed.
                  */
                 lfs_writer_enter(fs, "pndirop");
                 off = lfs_sb_getoffset(fs);
                 lfs_seglock(fs, SEGM_FORCE_CKP | SEGM_CKP);
                 lfs_flush_dirops(fs);
                 LFS_CLEANERINFO(cip, fs, bp);
                 oclean = lfs_ci_getclean(fs, cip);
                 LFS_SYNC_CLEANERINFO(cip, fs, bp, 1);
                 lfs_segwrite(ap->a_vp->v_mount, SEGM_FORCE_CKP);
                 fs->lfs_sp->seg_flags |= SEGM_PROT;
                 lfs_segunlock(fs);
                 lfs_writer_leave(fs);

From: "Maya Rashish" <maya@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52301 CVS commit: src/sys/ufs/lfs
Date: Wed, 26 Jul 2017 15:07:27 +0000

 Module Name:	src
 Committed By:	maya
 Date:		Wed Jul 26 15:07:27 UTC 2017

 Modified Files:
 	src/sys/ufs/lfs: lfs_segment.c

 Log Message:
 Revert r1.272 fix to PR kern/52301, the performance hit is making things
 unusable.


 To generate a diff of this commit:
 cvs rdiff -u -r1.272 -r1.273 src/sys/ufs/lfs/lfs_segment.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/52301 CVS commit: src/sys/ufs/lfs
Date: Wed, 26 Jul 2017 17:15:10 +0000

 On Wed, Jul 26, 2017 at 03:10:01PM +0000, Maya Rashish wrote:
  >  Log Message:
  >  Revert r1.272 fix to PR kern/52301, the performance hit is making things
  >  unusable.

 I wonder if we should put that fix in -8 though, at least for now...

 (pretty sure this issue's a behavioral regression relative to -7)

 -- 
 David A. Holland
 dholland@netbsd.org

From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52301 CVS commit: [netbsd-8] src
Date: Mon, 30 Oct 2017 09:29:04 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Mon Oct 30 09:29:04 UTC 2017

 Modified Files:
 	src/sbin/fsck_lfs [netbsd-8]: inode.c lfs.c pass6.c segwrite.c
 	src/sys/ufs/lfs [netbsd-8]: lfs.h lfs_accessors.h lfs_alloc.c
 	    lfs_balloc.c lfs_bio.c lfs_extern.h lfs_inode.c lfs_inode.h
 	    lfs_itimes.c lfs_pages.c lfs_rename.c lfs_segment.c lfs_subr.c
 	    lfs_syscalls.c lfs_vfsops.c lfs_vnops.c ulfs_inode.c ulfs_inode.h
 	    ulfs_lookup.c ulfs_quota2.c ulfs_readwrite.c ulfs_vnops.c

 Log Message:
 Pull up following revision(s) (requested by maya in ticket #330):
 	sbin/fsck_lfs/inode.c: 1.69
 	sbin/fsck_lfs/lfs.c: 1.73
 	sbin/fsck_lfs/pass6.c: 1.50
 	sbin/fsck_lfs/segwrite.c: 1.46
 	sys/ufs/lfs/lfs.h: 1.202-1.203
 	sys/ufs/lfs/lfs_accessors.h: 1.48
 	sys/ufs/lfs/lfs_alloc.c: 1.136-1.137
 	sys/ufs/lfs/lfs_balloc.c: 1.94
 	sys/ufs/lfs/lfs_bio.c: 1.141
 	sys/ufs/lfs/lfs_extern.h: 1.113
 	sys/ufs/lfs/lfs_inode.c: 1.156-1.157
 	sys/ufs/lfs/lfs_inode.h: 1.20, 1.21, 1.23
 	sys/ufs/lfs/lfs_itimes.c: 1.20
 	sys/ufs/lfs/lfs_pages.c: 1.13-1.15
 	sys/ufs/lfs/lfs_rename.c: 1.22
 	sys/ufs/lfs/lfs_segment.c: 1.270-1.275
 	sys/ufs/lfs/lfs_subr.c: 1.94-1.97
 	sys/ufs/lfs/lfs_syscalls.c: 1.175
 	sys/ufs/lfs/lfs_vfsops.c: 1.360
 	sys/ufs/lfs/lfs_vnops.c: 1.316-1.321
 	sys/ufs/lfs/ulfs_inode.c: 1.20
 	sys/ufs/lfs/ulfs_inode.h: 1.24
 	sys/ufs/lfs/ulfs_lookup.c: 1.41
 	sys/ufs/lfs/ulfs_quota2.c: 1.31
 	sys/ufs/lfs/ulfs_readwrite.c: 1.24
 	sys/ufs/lfs/ulfs_vnops.c: 1.49-1.50
 Update inode member i_flag --> i_state to keep up with kernel changes
 Move definition of IN_ALLMOD near the flag it's a mask for.
 Now we can see that it doesn't match all the flags, but changing that will
 require more careful thought.
 Correct confusion between i_flag and i_flags
 These will have to be renamed.
 Spotted by Riastradh, thanks!
 Add an XXX about the missing flags so it's not buried in a commit
 message.
 now the XXX count for LFS is 260
 Rename i_flag to i_state.
 The similarity to i_flags has previously caused errors.
 Use continue to denote the no-op loop to match netbsd style
 newline for extra clarity.
 It isn't safe to drain dirops with seglock held, it'll deadlock if there
 are any dirops. drain before grabbing seglock.
 lfs_dirops == 0 is always true (as we already drained dirops), so omit
 that part of the comparison.
 Fixes a lot of LFS deadlocks. PR kern/52301
 Many thanks to dholland for help analyzing coredumps
 Ifdef out KDASSERT which fires on my machine.
 Deduplicate sanity check that seglock is held on segunlock
 Revert r1.272 fix to PR kern/52301, the performance hit is making things
 unusable.
 change lfs_nextsegsleep and lfs_allclean_wakeup to use condvar
 XXX had to use lfs_lock in lfs_segwait, removed kernel_lock, is this
 appropriate?
 fix buffer overflow/KASSERT when cookies are supplied
 lfs no longer uses the ffs-style struct direct, use the correct minimum
 size
 from dholland
 XXX more wrong
 Consistently use {,UN}MARK_VNODE macros rather than function calls.
 Not much point doing anything after a panic call
 Ask some question about the code in a XXX comment
 XXX question our double-flushing of dirops
 Fix typo in comment


 To generate a diff of this commit:
 cvs rdiff -u -r1.68 -r1.68.8.1 src/sbin/fsck_lfs/inode.c
 cvs rdiff -u -r1.72 -r1.72.6.1 src/sbin/fsck_lfs/lfs.c
 cvs rdiff -u -r1.49 -r1.49.8.1 src/sbin/fsck_lfs/pass6.c
 cvs rdiff -u -r1.45 -r1.45.8.1 src/sbin/fsck_lfs/segwrite.c
 cvs rdiff -u -r1.201 -r1.201.6.1 src/sys/ufs/lfs/lfs.h
 cvs rdiff -u -r1.47 -r1.47.8.1 src/sys/ufs/lfs/lfs_accessors.h
 cvs rdiff -u -r1.135 -r1.135.6.1 src/sys/ufs/lfs/lfs_alloc.c
 cvs rdiff -u -r1.92 -r1.92.6.1 src/sys/ufs/lfs/lfs_balloc.c \
     src/sys/ufs/lfs/lfs_subr.c
 cvs rdiff -u -r1.139 -r1.139.4.1 src/sys/ufs/lfs/lfs_bio.c
 cvs rdiff -u -r1.111 -r1.111.10.1 src/sys/ufs/lfs/lfs_extern.h
 cvs rdiff -u -r1.155 -r1.155.6.1 src/sys/ufs/lfs/lfs_inode.c
 cvs rdiff -u -r1.19 -r1.19.6.1 src/sys/ufs/lfs/lfs_inode.h
 cvs rdiff -u -r1.19 -r1.19.10.1 src/sys/ufs/lfs/lfs_itimes.c
 cvs rdiff -u -r1.11.6.1 -r1.11.6.2 src/sys/ufs/lfs/lfs_pages.c
 cvs rdiff -u -r1.21 -r1.21.10.1 src/sys/ufs/lfs/lfs_rename.c
 cvs rdiff -u -r1.269 -r1.269.6.1 src/sys/ufs/lfs/lfs_segment.c
 cvs rdiff -u -r1.174 -r1.174.4.1 src/sys/ufs/lfs/lfs_syscalls.c
 cvs rdiff -u -r1.359 -r1.359.4.1 src/sys/ufs/lfs/lfs_vfsops.c
 cvs rdiff -u -r1.315 -r1.315.2.1 src/sys/ufs/lfs/lfs_vnops.c
 cvs rdiff -u -r1.19 -r1.19.2.1 src/sys/ufs/lfs/ulfs_inode.c
 cvs rdiff -u -r1.22 -r1.22.10.1 src/sys/ufs/lfs/ulfs_inode.h
 cvs rdiff -u -r1.40 -r1.40.6.1 src/sys/ufs/lfs/ulfs_lookup.c
 cvs rdiff -u -r1.30 -r1.30.6.1 src/sys/ufs/lfs/ulfs_quota2.c
 cvs rdiff -u -r1.23 -r1.23.6.1 src/sys/ufs/lfs/ulfs_readwrite.c
 cvs rdiff -u -r1.48 -r1.48.4.1 src/sys/ufs/lfs/ulfs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: maya->kern-bug-people
Responsible-Changed-By: maya@NetBSD.org
Responsible-Changed-When: Mon, 13 Nov 2017 19:24:51 +0000
Responsible-Changed-Why:


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.