NetBSD Problem Report #53624

From www@NetBSD.org  Sat Sep 22 01:45:28 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4262C7A17C
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 22 Sep 2018 01:45:28 +0000 (UTC)
Message-Id: <20180922014526.BA49A7A26A@mollari.NetBSD.org>
Date: Sat, 22 Sep 2018 01:45:26 +0000 (UTC)
From: manu@netbsd.org
Reply-To: manu@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: dom0 freeze on domU exit
X-Send-Pr-Version: www-1.0

>Number:         53624
>Notify-List:    gson@gson.org
>Category:       kern
>Synopsis:       dom0 freeze on domU exit
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    hannken
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Sep 22 01:50:00 +0000 2018
>Closed-Date:    Thu Dec 13 10:24:06 +0000 2018
>Last-Modified:  Tue Oct 15 18:15:00 +0000 2019
>Originator:     Emmanuel Dreyfus
>Release:        NetBSD 8.0
>Organization:
NetBSD
>Environment:
NetBSD xmai 8.0_STABLE NetBSD 8.0_STABLE (XEN3_DOM0_NOAGP) #63: Fri Sep 21 16:37:10 CEST 2018  root@lego:/pkg_comp/NetBSD-8stable-amd64/src/sys/arch/amd64/compile/XEN3_DOM0_NOAGP amd64

>Description:
When shutting down a Xen domU that has two block devices backed by plain files, there it a race condition that can freeze the dom0.

Here are the relevant processes at freeze time:
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
1880     1 3   0        84   ffffa000010a0a20           vnconfig fstcnt
1711     1 3   0         c   ffffa000028d48c0           vnconfig biowait
0      103 3   0       200   ffffa00001c4a300               vnd0 fstchg

fstrans_dump output:

Fstrans locks by lwp:
1711.1   (/) shared 1 cow 0
Fstrans state by mount:
/                state suspended

vconfig 1711.1 waits for an I/O to complete:
sleepq_block/cv_wait/biowait/convertdisklabel/validate_label/readdisklabel/vndopen/spec_open/VOP_OPEN/vn_open/do_open/do_sys_openat/sys_open/syscall

This I/O should be done by kernel thread vnd0 0.103, which waits for
filesystem resume on  cv_wait(&fstrans_state_cv, &fstrans_lock)
sleepq_block/cv_wait/fstrans_start/genfs_do_putpages/VOP_PUTPAGES/vndthread

The process that suspended filesystem is vnconfig 1880.1 through vrevoke.
It is itself waiting for vconfig 1711.1 to finish its transaction, on
 cv_wait_sig(&fstrans_count_cv, &fstrans_lock):
sleepq_block/cv_wait_sig/fstrans_setstate/genfs_suspendctl/vfs_suspend/vrevoke_suspend_next.part.1/vrevoke/genfs_revoke/VOP_REVOKE/vdevgone/vnddoclear/vndioctl/VOP_IOCTL/vn_ioctl/sys_ioctl/syscall

Processes wait each others, we have a deadlock.
>How-To-Repeat:
Setup a domU with two block devices backed by file on the root filesystem, and create/shutdown it until dom0 freezes:
while true; do xl shutdown -w test ; xl create test ; sleep 60; done

>Fix:
The root of the problem seems to wait forever on fstrans_count_cv in
strans_setstate(). As condvar(9) notes, "Non-interruptable waits have 
the potential to deadlock the system". This wait is interruptible, 
but most processes in the system end up waiting in fstrans_start() because they attempt to do a filesystem access. Once sshd and getty are hit, it becomes impossible to login and kill a process.

Here is a proposal to fix the problem: use cv_timewait_sig() instead
of cv_wait_sig(). Since the original code allowed failure when catching a signal, failing because of a timeout is already correctly handled by calling functions.

--- sys/kern/vfs_trans.c.orig
+++ sys/kern/vfs_trans.c
@@ -41,8 +41,9 @@
 #endif

 #include <sys/param.h>
 #include <sys/systm.h>
+#include <sys/kernel.h>
 #include <sys/atomic.h>
 #include <sys/buf.h>
 #include <sys/kmem.h>
 #include <sys/mount.h>
@@ -531,12 +532,16 @@

        /*
         * All threads see the new state now.
         * Wait for transactions invalid at this state to leave.
+        * We cannot wait forever because many processes would
+        * get stuck waiting for fstcnt in fstrans_start(). This
+        * is acute when suspending the root filesystem.
         */
        error = 0;
        while (! state_change_done(mp)) {
-               error = cv_wait_sig(&fstrans_count_cv, &fstrans_lock);
+               error = cv_timedwait_sig(&fstrans_count_cv,
+                                        &fstrans_lock, hz / 4);
                if (error) {
                        new_state = fmi->fmi_state = FSTRANS_NORMAL;
                        break;
                }



-- 
Emmanuel Dreyfus
manu%netbsd.org@localhost


>Release-Note:

>Audit-Trail:
From: "Maya Rashish" <maya@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53624 CVS commit: pkgsrc/wm/cwm
Date: Sat, 22 Sep 2018 06:32:11 +0000

 Module Name:	pkgsrc
 Committed By:	maya
 Date:		Sat Sep 22 06:32:11 UTC 2018

 Modified Files:
 	pkgsrc/wm/cwm: DESCR Makefile PLIST distinfo

 Log Message:
 cwm: update to 6.3

 From Sunil Nimmagadda in pkgsrc-wip, PR pkg/53624.


 To generate a diff of this commit:
 cvs rdiff -u -r1.1.1.1 -r1.2 pkgsrc/wm/cwm/DESCR pkgsrc/wm/cwm/PLIST
 cvs rdiff -u -r1.19 -r1.20 pkgsrc/wm/cwm/Makefile
 cvs rdiff -u -r1.2 -r1.3 pkgsrc/wm/cwm/distinfo

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: manu@netbsd.org (Emmanuel Dreyfus)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: PR/53624 CVS commit: pkgsrc/wm/cwm
Date: Sat, 22 Sep 2018 08:58:21 +0200

 Maya Rashish <maya@netbsd.org> wrote:

 >  From Sunil Nimmagadda in pkgsrc-wip, PR pkg/53624.

 Wrong PR, #53624 is about "dom0 freeze on domU exit"

 -- 
 Emmanuel Dreyfus
 http://hcpnet.free.fr/pubz
 manu@netbsd.org

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53624: dom0 freeze on domU exit
Date: Thu, 4 Oct 2018 11:45:10 +0200

 To break this deadlock fstrans(9) needs a bracket operation that will
 not block while the file system is suspending.

 I'm working on a fix.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

Responsible-Changed-From-To: kern-bug-people->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Thu, 04 Oct 2018 10:09:47 +0000
Responsible-Changed-Why:
Take.


State-Changed-From-To: open->analyzed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Thu, 04 Oct 2018 10:09:47 +0000
State-Changed-Why:
Problem understood, working on a fix.


From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53624 CVS commit: src
Date: Fri, 5 Oct 2018 09:51:56 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Fri Oct  5 09:51:56 UTC 2018

 Modified Files:
 	src/distrib/sets/lists/comp: mi
 	src/share/man/man9: Makefile fstrans.9
 	src/sys/dev: vnd.c
 	src/sys/kern: vfs_trans.c
 	src/sys/miscfs/genfs: genfs_vfsops.c
 	src/sys/rump/librump/rumpkern: emul.c
 	src/sys/sys: fstrans.h

 Log Message:
 Bring back three state file system suspension:

   NORMAL -> SUSPENDING -> SUSPENDED

 and add operation fstrans_start_lazy() that only blocks while SUSPENDED.

 Change vndthread() support operation handle_with_rdwr() to bracket
 its file system operations by fstrans_start_lazy() and fstrans_done().

 PR kern/53624 (dom0 freeze on domU exit)


 To generate a diff of this commit:
 cvs rdiff -u -r1.2232 -r1.2233 src/distrib/sets/lists/comp/mi
 cvs rdiff -u -r1.430 -r1.431 src/share/man/man9/Makefile
 cvs rdiff -u -r1.26 -r1.27 src/share/man/man9/fstrans.9
 cvs rdiff -u -r1.265 -r1.266 src/sys/dev/vnd.c
 cvs rdiff -u -r1.50 -r1.51 src/sys/kern/vfs_trans.c
 cvs rdiff -u -r1.7 -r1.8 src/sys/miscfs/genfs/genfs_vfsops.c
 cvs rdiff -u -r1.186 -r1.187 src/sys/rump/librump/rumpkern/emul.c
 cvs rdiff -u -r1.11 -r1.12 src/sys/sys/fstrans.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->feedback
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Fri, 05 Oct 2018 09:57:35 +0000
State-Changed-Why:
Committed a fix to -current, please confirm.


From: manu@netbsd.org (Emmanuel Dreyfus)
To: gnats-bugs@NetBSD.org, hannken@NetBSD.org, netbsd-bugs@netbsd.org,
 gnats-admin@netbsd.org
Cc: 
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Sat, 6 Oct 2018 03:47:28 +0200

 <hannken@NetBSD.org> wrote:

 > Committed a fix to -current, please confirm.

 I applied the change to netbsd-8 and ran my previous test: it does not
 freeze.

 -- 
 Emmanuel Dreyfus
 http://hcpnet.free.fr/pubz
 manu@netbsd.org

State-Changed-From-To: feedback->pending-pullups
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Sat, 06 Oct 2018 10:10:02 +0000
State-Changed-Why:
Pullup to -8 requested with ticket #1052


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53624 CVS commit: [netbsd-8] src
Date: Tue, 9 Oct 2018 09:58:10 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Tue Oct  9 09:58:09 UTC 2018

 Modified Files:
 	src/distrib/sets/lists/comp [netbsd-8]: mi
 	src/share/man/man9 [netbsd-8]: Makefile fstrans.9
 	src/sys/dev [netbsd-8]: vnd.c
 	src/sys/kern [netbsd-8]: vfs_trans.c
 	src/sys/miscfs/genfs [netbsd-8]: genfs_vfsops.c
 	src/sys/rump/librump/rumpkern [netbsd-8]: emul.c
 	src/sys/sys [netbsd-8]: fstrans.h

 Log Message:
 Pull up following revision(s) (requested by hannken in ticket #1052):

 	sys/kern/vfs_trans.c: revision 1.51
 	distrib/sets/lists/comp/mi: revision 1.2233
 	share/man/man9/fstrans.9: revision 1.27
 	share/man/man9/Makefile: revision 1.431
 	sys/sys/fstrans.h: revision 1.12
 	sys/rump/librump/rumpkern/emul.c: revision 1.187
 	sys/dev/vnd.c: revision 1.266
 	sys/miscfs/genfs/genfs_vfsops.c: revision 1.8

 Bring back three state file system suspension:

  NORMAL -> SUSPENDING -> SUSPENDED

 and add operation fstrans_start_lazy() that only blocks while SUSPENDED.

 Change vndthread() support operation handle_with_rdwr() to bracket
 its file system operations by fstrans_start_lazy() and fstrans_done().

 PR kern/53624 (dom0 freeze on domU exit)


 To generate a diff of this commit:
 cvs rdiff -u -r1.2138.2.6 -r1.2138.2.7 src/distrib/sets/lists/comp/mi
 cvs rdiff -u -r1.414.2.1 -r1.414.2.2 src/share/man/man9/Makefile
 cvs rdiff -u -r1.24.2.1 -r1.24.2.2 src/share/man/man9/fstrans.9
 cvs rdiff -u -r1.259.6.1 -r1.259.6.2 src/sys/dev/vnd.c
 cvs rdiff -u -r1.45.2.2 -r1.45.2.3 src/sys/kern/vfs_trans.c
 cvs rdiff -u -r1.7 -r1.7.2.1 src/sys/miscfs/genfs/genfs_vfsops.c
 cvs rdiff -u -r1.181.6.2 -r1.181.6.3 src/sys/rump/librump/rumpkern/emul.c
 cvs rdiff -u -r1.10.60.1 -r1.10.60.2 src/sys/sys/fstrans.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Wed, 10 Oct 2018 09:09:31 +0000
State-Changed-Why:
Pullup complete -- thanks for the report.


From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Mon, 26 Nov 2018 15:51:03 +0200

 Hi,

 Even though PR 53624 has been closed, I'm still seeing dom0 hangs in
 my Xen test runs, and they seem to be happening around the time of
 domUs exiting, with domUs having two block devices backed by plain
 files just as in the original PR.

 For logs, see the items marked "Log (with timeout)" here:

   http://www.gson.org/netbsd/bugs/xen/results/2018-11-15/index.html
   http://www.gson.org/netbsd/bugs/xen/results/2018-11-25/index.html
   http://www.gson.org/netbsd/bugs/xen/results/2018-11-25/index.html

 Anything I can do to help debug this?  Is there a way to break into
 DDB from a serial console shared with Xen?
 -- 
 Andreas Gustafsson, gson@gson.org

From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: gson@gson.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Mon, 26 Nov 2018 13:58:46 +0000

 On Mon, Nov 26, 2018 at 01:55:00PM +0000, Andreas Gustafsson wrote:
 >  Anything I can do to help debug this?  Is there a way to break into
 >  DDB from a serial console shared with Xen?

 Have you tried +++++ ? 

 -- 
 Emmanuel Dreyfus
 manu@netbsd.org

From: Andreas Gustafsson <gson@gson.org>
To: Emmanuel Dreyfus <manu@netbsd.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Mon, 26 Nov 2018 16:43:16 +0200

 Emmanuel Dreyfus wrote:
 > Have you tried +++++ ?

 That worked - thanks.  Now I just need to get it to hang again...
 -- 
 Andreas Gustafsson, gson@gson.org

From: Andreas Gustafsson <gson@gson.org>
To: Emmanuel Dreyfus <manu@netbsd.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Mon, 26 Nov 2018 20:46:41 +0200

 Trying to reproduce the dom0 freeze, I got a hypervisor panic instead,
 with the error message "Assertion 'oc > 0' failed at mm.c:766".  This
 was also reported by Manuel Bouyer in

   https://lists.xenproject.org/archives/html/xen-users/2018-01/msg00116.html

 This was running a current xenkernel48.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: hannken@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        manu@netbsd.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Mon, 26 Nov 2018 22:49:12 +0100

 On Mon, Nov 26, 2018 at 06:50:01PM +0000, Andreas Gustafsson wrote:
 > The following reply was made to PR kern/53624; it has been noted by GNATS.
 > 
 > From: Andreas Gustafsson <gson@gson.org>
 > To: Emmanuel Dreyfus <manu@netbsd.org>
 > Cc: gnats-bugs@NetBSD.org
 > Subject: Re: kern/53624 (dom0 freeze on domU exit)
 > Date: Mon, 26 Nov 2018 20:46:41 +0200
 > 
 >  Trying to reproduce the dom0 freeze, I got a hypervisor panic instead,
 >  with the error message "Assertion 'oc > 0' failed at mm.c:766".  This
 >  was also reported by Manuel Bouyer in
 >  
 >    https://lists.xenproject.org/archives/html/xen-users/2018-01/msg00116.html
 >  
 >  This was running a current xenkernel48.

 There is a workaround for it in the xenkernel411 package

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

State-Changed-From-To: closed->open
State-Changed-By: gson@NetBSD.org
State-Changed-When: Wed, 05 Dec 2018 08:36:41 +0000
State-Changed-Why:
Freezes are still happening.


From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: Emmanuel Dreyfus <manu@netbsd.org>
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Wed, 5 Dec 2018 10:35:18 +0200

 I reproduced the freeze using xenkernel411.   Here's some ddb output:

   db> ps
   PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
   11125    1 3   0        80   ffffa30001f3f740           vnconfig fstcnt
   10492    1 3   0         0   ffffa300020c34e0           vnconfig biowait
   [...]
   0      126 3   0       200   ffffa300020834a0               vnd4 vndbp
   0      125 3   0       200   ffffa3000204e040               vnd3 fstchg
   0      124 3   0       200   ffffa30000ed1660               vnd2 vndbp
   0      123 3   0       200   ffffa30002031840               vnd1 fstchg
   0       66 3   0       200   ffffa3000088a8e0               vnd0 vndbp
   [...]
   db> call fstrans_dump
   Fstrans locks by lwp:
   [ 87719.6801535] 10492.1  (/) shared 1 cow 0
   [ 87719.6801535] Fstrans state by mount:
   [ 87719.6801535] /                state suspending
   0
   db>

 If there are other ddb commands I can run to help debug this, please
 let me know.  I will try to keep the system at the ddb prompt for the
 next 24 hours at least.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: hannken@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        manu@netbsd.org, gson@gson.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Wed, 5 Dec 2018 09:50:13 +0100

 On Wed, Dec 05, 2018 at 08:40:00AM +0000, Andreas Gustafsson wrote:
 > The following reply was made to PR kern/53624; it has been noted by GNATS.
 > 
 > From: Andreas Gustafsson <gson@gson.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: Emmanuel Dreyfus <manu@netbsd.org>
 > Subject: Re: kern/53624 (dom0 freeze on domU exit)
 > Date: Wed, 5 Dec 2018 10:35:18 +0200
 > 
 >  I reproduced the freeze using xenkernel411.   Here's some ddb output:
 >  
 >    db> ps
 >    PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 >    11125    1 3   0        80   ffffa30001f3f740           vnconfig fstcnt
 >    10492    1 3   0         0   ffffa300020c34e0           vnconfig biowait
 >    [...]
 >    0      126 3   0       200   ffffa300020834a0               vnd4 vndbp
 >    0      125 3   0       200   ffffa3000204e040               vnd3 fstchg
 >    0      124 3   0       200   ffffa30000ed1660               vnd2 vndbp
 >    0      123 3   0       200   ffffa30002031840               vnd1 fstchg
 >    0       66 3   0       200   ffffa3000088a8e0               vnd0 vndbp
 >    [...]
 >    db> call fstrans_dump
 >    Fstrans locks by lwp:
 >    [ 87719.6801535] 10492.1  (/) shared 1 cow 0
 >    [ 87719.6801535] Fstrans state by mount:
 >    [ 87719.6801535] /                state suspending
 >    0
 >    db>
 >  
 >  If there are other ddb commands I can run to help debug this, please
 >  let me know.  I will try to keep the system at the ddb prompt for the
 >  next 24 hours at least.

 Can you get a stack trace (tr/a) for the processes listed above ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Emmanuel Dreyfus <manu@netbsd.org>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@NetBSD.org, Emmanuel Dreyfus <manu@netbsd.org>
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Wed, 5 Dec 2018 08:50:50 +0000

 On Wed, Dec 05, 2018 at 10:35:18AM +0200, Andreas Gustafsson wrote:
 > I reproduced the freeze using xenkernel411.   Here's some ddb output:
 > 
 >   db> ps
 >   PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 >   11125    1 3   0        80   ffffa30001f3f740           vnconfig fstcnt
 >   10492    1 3   0         0   ffffa300020c34e0           vnconfig biowait
 >   [...]
 >   0      126 3   0       200   ffffa300020834a0               vnd4 vndbp
 >   0      125 3   0       200   ffffa3000204e040               vnd3 fstchg
 >   0      124 3   0       200   ffffa30000ed1660               vnd2 vndbp
 >   0      123 3   0       200   ffffa30002031840               vnd1 fstchg
 >   0       66 3   0       200   ffffa3000088a8e0               vnd0 vndbp
 (...)
 > If there are other ddb commands I can run to help debug this, please
 > let me know.  I will try to keep the system at the ddb prompt for the
 > next 24 hours at least.

 PLease get a kernel backtrace for the processes of interest here:
 bt/a ffffa30001f3f740
 bt/a ffffa300020c34e0
 ...

 At some point, running a kernel with options LOCKDEBUG may help.


 -- 
 Emmanuel Dreyfus
 manu@netbsd.org

From: Andreas Gustafsson <gson@gson.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Wed, 5 Dec 2018 10:58:35 +0200

 Manuel Bouyer wrote:
 > Can you get a stack trace (tr/a) for the processes listed above ?

 db> tr/a ffffa30001f3f740
 trace: pid 11125 lid 1 at 0xffffa3003b0c5850
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait_sig() at netbsd:cv_wait_sig+0xf3
 fstrans_setstate() at netbsd:fstrans_setstate+0xa7
 genfs_suspendctl() at netbsd:genfs_suspendctl+0x57
 vfs_suspend() at netbsd:vfs_suspend+0x74
 vrevoke_suspend_next() at netbsd:vrevoke_suspend_next+0x2a
 vrevoke() at netbsd:vrevoke+0x2b
 genfs_revoke() at netbsd:genfs_revoke+0x13
 VOP_REVOKE() at netbsd:VOP_REVOKE+0x2e
 vdevgone() at netbsd:vdevgone+0x5b
 vnddoclear() at netbsd:vnddoclear+0xbc
 vndioctl() at netbsd:vndioctl+0x384
 VOP_IOCTL() at netbsd:VOP_IOCTL+0x37
 vn_ioctl() at netbsd:vn_ioctl+0xa1
 sys_ioctl() at netbsd:sys_ioctl+0x103
 syscall() at netbsd:syscall+0x9c
 --- syscall (number 54) ---
 7d2cfc11a78a:
 db> tr /a ffffa300020c34e0
 trace: pid 10492 lid 1 at 0xffffa3003b36d8c0
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait() at netbsd:cv_wait+0xf1
 biowait() at netbsd:biowait+0x4f
 scan_mbr() at netbsd:scan_mbr+0x3e
 readdisklabel() at netbsd:readdisklabel+0x15a
 vndopen() at netbsd:vndopen+0x2e7
 spec_open() at netbsd:spec_open+0x386
 VOP_OPEN() at netbsd:VOP_OPEN+0x2f
 vn_open() at netbsd:vn_open+0x203
 do_open() at netbsd:do_open+0x10d
 do_sys_openat() at netbsd:do_sys_openat+0x68
 sys_open() at netbsd:sys_open+0x24
 syscall() at netbsd:syscall+0x9c
 --- syscall (number 5) ---
 72a42d23ea6a:
 db> tr/a ffffa300020834a0 
 trace: pid 0 lid 126 at 0xffffa3003b0dee30
 sleepq_block() at netbsd:sleepq_block+0x99
 vndthread() at netbsd:vndthread+0x53a
 db> tr/a ffffa3000204e040
 trace: pid 0 lid 125 at 0xffffa3003b6d4d30
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait() at netbsd:cv_wait+0xf1
 fstrans_start() at netbsd:fstrans_start+0x73c
 VOP_LOCK() at netbsd:VOP_LOCK+0x57
 vn_lock() at netbsd:vn_lock+0x90
 vndthread() at netbsd:vndthread+0x2bd
 db> tr/a ffffa30000ed1660
 trace: pid 0 lid 124 at 0xffffa3003b009e30
 sleepq_block() at netbsd:sleepq_block+0x99
 vndthread() at netbsd:vndthread+0x53a
 db> tr/a ffffa30002031840
 trace: pid 0 lid 123 at 0xffffa3003b0cad30
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait() at netbsd:cv_wait+0xf1
 fstrans_start() at netbsd:fstrans_start+0x73c
 VOP_LOCK() at netbsd:VOP_LOCK+0x57
 vn_lock() at netbsd:vn_lock+0x90
 vndthread() at netbsd:vndthread+0x2bd
 db> tr/a ffffa3000088a8e0
 trace: pid 0 lid 66 at 0xffffa3003b024e30
 sleepq_block() at netbsd:sleepq_block+0x99
 vndthread() at netbsd:vndthread+0x53a
 db> 

 -- 
 Andreas Gustafsson, gson@gson.org

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Wed, 5 Dec 2018 10:28:28 +0100

 --Apple-Mail=_6219B15B-4A05-4835-8608-FF6A8F5937DF
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii


 > On 5. Dec 2018, at 10:00, Andreas Gustafsson <gson@gson.org> wrote:
 <snip>
 > trace: pid 0 lid 125 at 0xffffa3003b6d4d30
 > sleepq_block() at netbsd:sleepq_block+0x99
 > cv_wait() at netbsd:cv_wait+0xf1
 > fstrans_start() at netbsd:fstrans_start+0x73c
 > VOP_LOCK() at netbsd:VOP_LOCK+0x57
 > vn_lock() at netbsd:vn_lock+0x90
 > vndthread() at netbsd:vndthread+0x2bd

 Oops, my bad -- we have to protect handle_with_strategy() too.

 Please try the attached diff.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

 --Apple-Mail=_6219B15B-4A05-4835-8608-FF6A8F5937DF
 Content-Disposition: attachment;
 	filename=vnd.c.diff
 Content-Type: application/octet-stream;
 	x-unix-mode=0644;
 	name="vnd.c.diff"
 Content-Transfer-Encoding: 7bit

 diff -r 400ec4f24994 sys/dev/vnd.c
 --- sys/dev/vnd.c
 +++ sys/dev/vnd.c
 @@ -733,12 +733,17 @@ vndthread(void *arg)
  		bp->b_bcount = obp->b_bcount;
  		BIO_COPYPRIO(bp, obp);

 +		/* Make sure the request succeeds while suspending this fs. */
 +		fstrans_start_lazy(vnd->sc_vp->v_mount);
 +
  		/* Handle the request using the appropriate operations. */
  		if ((vnd->sc_flags & VNF_USE_VN_RDWR) == 0)
  			handle_with_strategy(vnd, obp, bp);
  		else
  			handle_with_rdwr(vnd, obp, bp);

 +		fstrans_done(vnd->sc_vp->v_mount);
 +
  		s = splbio();
  		continue;

 @@ -804,9 +809,6 @@ handle_with_rdwr(struct vnd_softc *vnd, 
  		    bp->b_bcount);
  #endif

 -	/* Make sure the request succeeds while suspending this fs. */
 -	fstrans_start_lazy(vp->v_mount);
 -
  	/* Issue the read or write operation. */
  	bp->b_error =
  	    vn_rdwr(doread ? UIO_READ : UIO_WRITE,
 @@ -828,8 +830,6 @@ handle_with_rdwr(struct vnd_softc *vnd, 
  	else
  		mutex_exit(vp->v_interlock);

 -	fstrans_done(vp->v_mount);
 -
  	/* We need to increase the number of outputs on the vnode if
  	 * there was any write to it. */
  	if (!doread) {

 --Apple-Mail=_6219B15B-4A05-4835-8608-FF6A8F5937DF--

From: Andreas Gustafsson <gson@gson.org>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Fri, 7 Dec 2018 19:54:06 +0200

 J. Hannken-Illjes wrote:
 > Oops, my bad -- we have to protect handle_with_strategy() too.
 >  
 > Please try the attached diff.

 Looking good - with the patch applied, my test system has now created
 and destroyed 22 domUs and is still running.  Without the patch, it
 usually froze after creating and destroying just a few domUs.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Andreas Gustafsson <gson@gson.org>
To: hannken@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Mon, 10 Dec 2018 16:39:41 +0200

 Hi hannken,

 I have now shut down my test system after it completed than 100
 successful domU create/destroy cycles without freezing with your
 patch.  Feel free to commit it :)
 -- 
 Andreas Gustafsson, gson@gson.org

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53624 CVS commit: src/sys/dev
Date: Mon, 10 Dec 2018 15:22:35 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Mon Dec 10 15:22:35 UTC 2018

 Modified Files:
 	src/sys/dev: vnd.c

 Log Message:
 Operation handle_with_strategy() also needs the
 fstrans_start_lazy() / fstrans_done() bracket.

 PR kern/53624 (dom0 freeze on domU exit)


 To generate a diff of this commit:
 cvs rdiff -u -r1.269 -r1.270 src/sys/dev/vnd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53624 CVS commit: [netbsd-8] src/sys/dev
Date: Mon, 10 Dec 2018 17:16:11 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Dec 10 17:16:11 UTC 2018

 Modified Files:
 	src/sys/dev [netbsd-8]: vnd.c

 Log Message:
 Pull up following revision(s) (requested by hannken in ticket #1133):

 	sys/dev/vnd.c: revision 1.270

 Operation handle_with_strategy() also needs the
 fstrans_start_lazy() / fstrans_done() bracket.

 PR kern/53624 (dom0 freeze on domU exit)


 To generate a diff of this commit:
 cvs rdiff -u -r1.259.6.4 -r1.259.6.5 src/sys/dev/vnd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Thu, 13 Dec 2018 10:24:06 +0000
State-Changed-Why:
Fix committed to -current, pullup to -8 complete.


From: Manuel.Bouyer@lip6.fr
To: gnats-bugs@NetBSD.org
Cc: 
Subject: kern/53624 (dom0 freeze on domU exit) is still there
Date: Wed, 18 Sep 2019 16:54:56 +0200 (MEST)

 >Submitter-Id:	net
 >Originator:	Manuel Bouyer
 >Organization:
 >Confidential:	no 
 >Synopsis:	kern/53624 (dom0 freeze on domU exit) is still there
 >Severity:	serious
 >Priority:	high
 >Category:	kern
 >Class:		sw-bug
 >Release:	NetBSD 8.1_STABLE
 >Environment:
 System: NetBSD xen1.soc.lip6.fr 8.1_STABLE NetBSD 8.1_STABLE (ADMIN_DOM0) #0: Tue Sep 17 15:47:43 MEST 2019 bouyer@armandeche.soc.lip6.fr:/local/armandeche1/tmp/build/amd64/obj/local/armandeche1/netbsd-8/src/sys/arch/amd64/compile/ADMIN_DOM0 x86_64
 Architecture: x86_64
 Machine: amd64
 >Description:
 	On my testbed, which starts/destroys several domUs per day (eventually
 	in parallel), I see occasional filesystem hang with processes
 	waiting on fstchg.
 	Interesting processes are:
 0      105 3   0       200   ffffa0000213e5a0               vnd1 fstchg
 0      104 3   0       200   ffffa00002088160               vnd0 vndbp
 0       97 3   0       200   ffffa0000206a980               vnd3 vndbp
 0       96 3   0       200   ffffa0000105a280               vnd2 fstchg
 0       67 3   0       200   ffffa00000d73640            ioflush fstchg
 6533     1 3   0         0   ffffa00001f77080           vnconfig biowait
 25777    1 3   0        80   ffffa00001e5f480           vnconfig fstcnt

 db> tr/a ffffa0000213e5a0
 trace: pid 0 lid 105 at 0xffffa0002cffd4f0
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait() at netbsd:cv_wait+0xf0
 fstrans_start() at netbsd:fstrans_start+0x78e
 VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x42
 genfs_getpages() at netbsd:genfs_getpages+0x1344
 VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x4b
 ubc_fault() at netbsd:ubc_fault+0x188
 uvm_fault_internal() at netbsd:uvm_fault_internal+0x6d4
 trap() at netbsd:trap+0x3c1
 --- trap (number 6) ---
 kcopy() at netbsd:kcopy+0x15
 uiomove() at netbsd:uiomove+0xb9  
 ubc_uiomove() at netbsd:ubc_uiomove+0xf7
 ffs_read() at netbsd:ffs_read+0xf7
 VOP_READ() at netbsd:VOP_READ+0x33
 vn_rdwr() at netbsd:vn_rdwr+0x10c
 vndthread() at netbsd:vndthread+0x5b1

 db>  tr/a ffffa0000105a280       
 trace: pid 0 lid 96 at 0xffffa0002cf4d9c0
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait() at netbsd:cv_wait+0xf0
 fstrans_start() at netbsd:fstrans_start+0x78e
 VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x42
 genfs_do_io() at netbsd:genfs_do_io+0x1b4
 genfs_gop_write() at netbsd:genfs_gop_write+0x52
 genfs_do_putpages() at netbsd:genfs_do_putpages+0xb9c
 VOP_PUTPAGES() at netbsd:VOP_PUTPAGES+0x36
 vndthread() at netbsd:vndthread+0x683

 db> tr/a ffffa00000d73640
 trace: pid 0 lid 67 at 0xffffa0002cd48ca0
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait() at netbsd:cv_wait+0xf0
 fstrans_start() at netbsd:fstrans_start+0x78e
 VOP_BWRITE() at netbsd:VOP_BWRITE+0x42
 ffs_sbupdate() at netbsd:ffs_sbupdate+0xc3
 ffs_cgupdate() at netbsd:ffs_cgupdate+0x20
 ffs_sync() at netbsd:ffs_sync+0x1e9
 sched_sync() at netbsd:sched_sync+0x93

 db> tr/a ffffa00001f77080
 trace: pid 6533 lid 1 at 0xffffa0002cff8910
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait() at netbsd:cv_wait+0xf0
 biowait() at netbsd:biowait+0x4f
 scan_iso_vrs_session() at netbsd:scan_iso_vrs_session+0x60
 readdisklabel() at netbsd:readdisklabel+0x304
 vndopen() at netbsd:vndopen+0x305
 spec_open() at netbsd:spec_open+0x385
 VOP_OPEN() at netbsd:VOP_OPEN+0x2f
 vn_open() at netbsd:vn_open+0x1e9
 do_open() at netbsd:do_open+0x112
 do_sys_openat() at netbsd:do_sys_openat+0x68
 sys_open() at netbsd:sys_open+0x24
 syscall() at netbsd:syscall+0x9c
 db> tr/a ffffa00001e5f480
 trace: pid 25777 lid 1 at 0xffffa0002b358860
 sleepq_block() at netbsd:sleepq_block+0x99
 cv_wait_sig() at netbsd:cv_wait_sig+0xf4
 fstrans_setstate() at netbsd:fstrans_setstate+0xaa
 genfs_suspendctl() at netbsd:genfs_suspendctl+0x57
 vfs_suspend() at netbsd:vfs_suspend+0x5b
 vrevoke_suspend_next() at netbsd:vrevoke_suspend_next+0x2a
 vrevoke() at netbsd:vrevoke+0x2b
 genfs_revoke() at netbsd:genfs_revoke+0x13
 VOP_REVOKE() at netbsd:VOP_REVOKE+0x2e
 vdevgone() at netbsd:vdevgone+0x5a
 vnddoclear() at netbsd:vnddoclear+0xc6
 vndioctl() at netbsd:vndioctl+0x3bb
 VOP_IOCTL() at netbsd:VOP_IOCTL+0x37
 vn_ioctl() at netbsd:vn_ioctl+0xa6
 sys_ioctl() at netbsd:sys_ioctl+0x101
 syscall() at netbsd:syscall+0x9c

 db> call fstrans_dump
 Fstrans locks by lwp:
 6533.1   (/) shared 1 cow 0
 0.105    (/domains) lazy 3 cow 0
 0.96     (/domains) lazy 2 cow 0
 0.67     (/domains) shared 1 cow 0
 Fstrans state by mount:
 /                state suspending

 So it looks like we have a 3-way deadlock between ioflush and the two vnconfig
 threads (while kern/53624 was only between 2 vnconfig threads) but I can't
 see the exact scenario yet. Also, the files backing the vnd are in
 /domains, not in /

 WAPBL is configured in the kernel but not in use.


 >How-To-Repeat:
 	xl create/shutdown several domUs in parallel
 >Fix:
 	please ...

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
Date: Thu, 3 Oct 2019 11:04:27 +0200

 --Apple-Mail=_081FA46D-A7E9-4FC3-B58D-E92A321A69E4
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 To me this makes no sense:

 - 25777.1 suspends "/", state "suspending" and waits for 6533.1

 - 6533.1 may wait on a vnd thread

 - the traces of the two vnd threads in "fstchg" don't contain
   obvious accesses to "/"

 Are you able to get a core or do you have the corresponding
 "netbsd.gdb"?

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig

 --Apple-Mail=_081FA46D-A7E9-4FC3-B58D-E92A321A69E4
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl2VuZsACgkQKoaVJdNr
 +uEEsgf8Cmfy9cbqpBY87DdHV9N43m30KlHlfTU8CLT/RawbKM2xyyAYyj/nhwdu
 E6CAD0Ys794Wll4xB2Qp71FdqxfaapmzBiBgBiWUNSztTmdd+RNwMsllz7SPGKu8
 aZk7I6Ta6ljjyAWSpTkDplwDfHdxGK5IA8eW4zPjiDAZjtK+vqxHWV+o9mmD9uB/
 GLEvor971BSEaF6xGRsZJ0TzyW6SE8ZhKF8HlxXWxc1ApUbVYwkmt1dw4pBhX6b4
 G3UqGi2FdWimVjK13x98bNN41OrQr7opMlmXYxH6urVjUlBznbWSErgsYg355vps
 AHhtOWII1j+yvjds3eR6r5pjchYYFg==
 =9z3v
 -----END PGP SIGNATURE-----

 --Apple-Mail=_081FA46D-A7E9-4FC3-B58D-E92A321A69E4--

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: hannken@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        manu@netbsd.org, gson@gson.org
Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
Date: Thu, 3 Oct 2019 12:07:01 +0200

 On Thu, Oct 03, 2019 at 09:05:01AM +0000, J. Hannken-Illjes wrote:
 > The following reply was made to PR kern/53624; it has been noted by GNATS.
 > 
 > From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
 > Date: Thu, 3 Oct 2019 11:04:27 +0200
 > 
 >  --Apple-Mail=_081FA46D-A7E9-4FC3-B58D-E92A321A69E4
 >  Content-Transfer-Encoding: 7bit
 >  Content-Type: text/plain;
 >  	charset=us-ascii
 >  
 >  To me this makes no sense:
 >  
 >  - 25777.1 suspends "/", state "suspending" and waits for 6533.1
 >  
 >  - 6533.1 may wait on a vnd thread
 >  
 >  - the traces of the two vnd threads in "fstchg" don't contain
 >    obvious accesses to "/"
 >  
 >  Are you able to get a core

 No, kernel core dump don't work on Xen

 > or do you have the corresponding
 >  "netbsd.gdb"?

 Yes

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
Date: Thu, 3 Oct 2019 18:09:59 +0200

 --Apple-Mail=_9D0BB28E-3676-4CD0-ADE7-D400968E0442
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain;
 	charset=us-ascii


 > On 3. Oct 2019, at 11:05, J. Hannken-Illjes <hannken@eis.cs.tu-bs.de> =
 wrote:
 <snip>
 > - the traces of the two vnd threads in "fstchg" don't contain
 >   obvious accesses to "/"

 Problem understood, VOP_GETPAGES()/VOP_PUTPAGES() do:

 VOP_BMAP(vp, lbn, &devvp, &blkno, &run);
 VOP_STRATEGY(devvp, bp);

 While vp resides on "/domains", devvp (the device it was
 mounted from) resides on "/".

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig


 --Apple-Mail=_9D0BB28E-3676-4CD0-ADE7-D400968E0442
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----

 iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl2WHVcACgkQKoaVJdNr
 +uEU1wf/aHq5fUnJdWYfaRRHSXhmqYtMa2fjS3bcOEkVCJXv7Zgj5Omsl2rNNFN8
 CQ/A024LvGAsBL4RgFwofrAPbk/KBx/B+E3m40rBZnV0DMgijBZBQnQSO5CgG3X1
 x7QWyp95K0+0lnKLGY8pZ231pOeZR+JhSiJTOB8UwJyAqU4K8FrgnIUVUv8ePchp
 076bPyALg3Sv2dGbNrkti53RI2KxM/rrvykw0kRcnnV+s4yckfMknU1XM6KeJuyX
 0j6ZAO1VXDBW7sYYWK2T3ZmFJLl4DYendMHDlL2/Tzcwgu1RclqAIndAMw+vFezB
 xchhPwWm93Pnz6t5WAO3cgFykVPirA==
 =KgS5
 -----END PGP SIGNATURE-----

 --Apple-Mail=_9D0BB28E-3676-4CD0-ADE7-D400968E0442--

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
Date: Fri, 4 Oct 2019 11:40:54 +0200

 --Apple-Mail=_788E5721-9DB9-4F79-AE00-7266B8F7D8D0
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 Looks like we have to use fstrans_start_lazy() for VOP_STRATEGY() too
 as it usually calls itself on the file system holding "/dev".

 The attached diff could help, please give it a try.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)


 --Apple-Mail=_788E5721-9DB9-4F79-AE00-7266B8F7D8D0
 Content-Disposition: attachment;
 	filename=vnode_if.c.diff
 Content-Type: application/octet-stream;
 	x-unix-mode=0644;
 	name="vnode_if.c.diff"
 Content-Transfer-Encoding: 7bit

 diff -r b9b26f2b5eeb -r 45acdd7da973 sys/kern/vnode_if.c
 --- sys/kern/vnode_if.c
 +++ sys/kern/vnode_if.c
 @@ -49,7 +49,7 @@
  #include <sys/lock.h>
  #include <sys/fstrans.h>

 -enum fst_op { FST_NO, FST_YES, FST_TRY };
 +enum fst_op { FST_NO, FST_YES, FST_LAZY, FST_TRY };

  static inline int
  vop_pre(vnode_t *vp, struct mount **mp, bool *mpsafe, enum fst_op op)
 @@ -62,7 +62,7 @@ vop_pre(vnode_t *vp, struct mount **mp, 
  		KERNEL_LOCK(1, curlwp);
  	}

 -	if (op == FST_YES || op == FST_TRY) {
 +	if (op == FST_YES || op == FST_LAZY || op == FST_TRY) {
  		for (;;) {
  			*mp = vp->v_mount;
  			if (op == FST_TRY) {
 @@ -73,6 +73,8 @@ vop_pre(vnode_t *vp, struct mount **mp, 
  					}
  					return error;
  				}
 +			} else if (op == FST_LAZY) {
 +				fstrans_start_lazy(*mp);
  			} else {
  				fstrans_start(*mp);
  			}
 @@ -91,7 +93,7 @@ static inline void
  vop_post(vnode_t *vp, struct mount *mp, bool mpsafe, enum fst_op op)
  {

 -	if (op == FST_YES) {
 +	if (op == FST_YES || op == FST_LAZY) {
  		fstrans_done(mp);
  	}

 @@ -1378,11 +1380,11 @@ VOP_STRATEGY(struct vnode *vp,
  	a.a_desc = VDESC(vop_strategy);
  	a.a_vp = vp;
  	a.a_bp = bp;
 -	error = vop_pre(vp, &mp, &mpsafe, FST_YES);
 +	error = vop_pre(vp, &mp, &mpsafe, FST_LAZY);
  	if (error)
  		return error;
  	error = (VCALL(vp, VOFFSET(vop_strategy), &a));
 -	vop_post(vp, mp, mpsafe, FST_YES);
 +	vop_post(vp, mp, mpsafe, FST_LAZY);
  	return error;
  }


 --Apple-Mail=_788E5721-9DB9-4F79-AE00-7266B8F7D8D0--

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: hannken@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        manu@netbsd.org, gson@gson.org
Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
Date: Wed, 9 Oct 2019 15:31:42 +0200

 On Fri, Oct 04, 2019 at 09:45:01AM +0000, J. Hannken-Illjes wrote:
 > The following reply was made to PR kern/53624; it has been noted by GNATS.
 > 
 > From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
 > Date: Fri, 4 Oct 2019 11:40:54 +0200
 > 
 >  --Apple-Mail=_788E5721-9DB9-4F79-AE00-7266B8F7D8D0
 >  Content-Transfer-Encoding: 7bit
 >  Content-Type: text/plain;
 >  	charset=us-ascii
 >  
 >  Looks like we have to use fstrans_start_lazy() for VOP_STRATEGY() too
 >  as it usually calls itself on the file system holding "/dev".
 >  
 >  The attached diff could help, please give it a try.

 Looks good, I have completed 2 rounds of tests without problems.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53624 CVS commit: src/sys/kern
Date: Fri, 11 Oct 2019 08:04:52 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Fri Oct 11 08:04:52 UTC 2019

 Modified Files:
 	src/sys/kern: vnode_if.sh vnode_if.src

 Log Message:
 As VOP_STRATEGY() usually calls itself on the file system holding "/dev"
 it may deadlock on suspension of this file system.

 Add fstrans type LAZY and use it for VOP_STRATEGY().

 Adress PR kern/53624 (dom0 freeze on domU exit) is still there


 To generate a diff of this commit:
 cvs rdiff -u -r1.66 -r1.67 src/sys/kern/vnode_if.sh
 cvs rdiff -u -r1.77 -r1.78 src/sys/kern/vnode_if.src

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53624 CVS commit: [netbsd-8] src/sys/kern
Date: Mon, 14 Oct 2019 17:43:58 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Oct 14 17:43:58 UTC 2019

 Modified Files:
 	src/sys/kern [netbsd-8]: vnode_if.sh vnode_if.src

 Log Message:
 Pull up following revision(s) (requested by hannken in ticket #1405):

 	sys/kern/vnode_if.sh: revision 1.67
 	sys/kern/vnode_if.src: revision 1.78

 As VOP_STRATEGY() usually calls itself on the file system holding "/dev"
 it may deadlock on suspension of this file system.

 Add fstrans type LAZY and use it for VOP_STRATEGY().

 Adress PR kern/53624 (dom0 freeze on domU exit) is still there


 To generate a diff of this commit:
 cvs rdiff -u -r1.64.4.1 -r1.64.4.2 src/sys/kern/vnode_if.sh
 cvs rdiff -u -r1.75.2.2 -r1.75.2.3 src/sys/kern/vnode_if.src

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53624 CVS commit: [netbsd-9] src/sys/kern
Date: Tue, 15 Oct 2019 18:12:26 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Tue Oct 15 18:12:25 UTC 2019

 Modified Files:
 	src/sys/kern [netbsd-9]: vnode_if.sh vnode_if.src

 Log Message:
 Pull up following revision(s) (requested by hannken in ticket #307):

 	sys/kern/vnode_if.sh: revision 1.67
 	sys/kern/vnode_if.src: revision 1.78

 As VOP_STRATEGY() usually calls itself on the file system holding "/dev"
 it may deadlock on suspension of this file system.

 Add fstrans type LAZY and use it for VOP_STRATEGY().

 Adress PR kern/53624 (dom0 freeze on domU exit) is still there


 To generate a diff of this commit:
 cvs rdiff -u -r1.66 -r1.66.10.1 src/sys/kern/vnode_if.sh
 cvs rdiff -u -r1.77 -r1.77.10.1 src/sys/kern/vnode_if.src

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.