NetBSD Problem Report #53624
From www@NetBSD.org Sat Sep 22 01:45:28 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 4262C7A17C
for <gnats-bugs@gnats.NetBSD.org>; Sat, 22 Sep 2018 01:45:28 +0000 (UTC)
Message-Id: <20180922014526.BA49A7A26A@mollari.NetBSD.org>
Date: Sat, 22 Sep 2018 01:45:26 +0000 (UTC)
From: manu@netbsd.org
Reply-To: manu@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: dom0 freeze on domU exit
X-Send-Pr-Version: www-1.0
>Number: 53624
>Notify-List: gson@gson.org
>Category: kern
>Synopsis: dom0 freeze on domU exit
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: hannken
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Sep 22 01:50:00 +0000 2018
>Closed-Date: Thu Dec 13 10:24:06 +0000 2018
>Last-Modified: Tue Oct 15 18:15:00 +0000 2019
>Originator: Emmanuel Dreyfus
>Release: NetBSD 8.0
>Organization:
NetBSD
>Environment:
NetBSD xmai 8.0_STABLE NetBSD 8.0_STABLE (XEN3_DOM0_NOAGP) #63: Fri Sep 21 16:37:10 CEST 2018 root@lego:/pkg_comp/NetBSD-8stable-amd64/src/sys/arch/amd64/compile/XEN3_DOM0_NOAGP amd64
>Description:
When shutting down a Xen domU that has two block devices backed by plain files, there it a race condition that can freeze the dom0.
Here are the relevant processes at freeze time:
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
1880 1 3 0 84 ffffa000010a0a20 vnconfig fstcnt
1711 1 3 0 c ffffa000028d48c0 vnconfig biowait
0 103 3 0 200 ffffa00001c4a300 vnd0 fstchg
fstrans_dump output:
Fstrans locks by lwp:
1711.1 (/) shared 1 cow 0
Fstrans state by mount:
/ state suspended
vconfig 1711.1 waits for an I/O to complete:
sleepq_block/cv_wait/biowait/convertdisklabel/validate_label/readdisklabel/vndopen/spec_open/VOP_OPEN/vn_open/do_open/do_sys_openat/sys_open/syscall
This I/O should be done by kernel thread vnd0 0.103, which waits for
filesystem resume on cv_wait(&fstrans_state_cv, &fstrans_lock)
sleepq_block/cv_wait/fstrans_start/genfs_do_putpages/VOP_PUTPAGES/vndthread
The process that suspended filesystem is vnconfig 1880.1 through vrevoke.
It is itself waiting for vconfig 1711.1 to finish its transaction, on
cv_wait_sig(&fstrans_count_cv, &fstrans_lock):
sleepq_block/cv_wait_sig/fstrans_setstate/genfs_suspendctl/vfs_suspend/vrevoke_suspend_next.part.1/vrevoke/genfs_revoke/VOP_REVOKE/vdevgone/vnddoclear/vndioctl/VOP_IOCTL/vn_ioctl/sys_ioctl/syscall
Processes wait each others, we have a deadlock.
>How-To-Repeat:
Setup a domU with two block devices backed by file on the root filesystem, and create/shutdown it until dom0 freezes:
while true; do xl shutdown -w test ; xl create test ; sleep 60; done
>Fix:
The root of the problem seems to wait forever on fstrans_count_cv in
strans_setstate(). As condvar(9) notes, "Non-interruptable waits have
the potential to deadlock the system". This wait is interruptible,
but most processes in the system end up waiting in fstrans_start() because they attempt to do a filesystem access. Once sshd and getty are hit, it becomes impossible to login and kill a process.
Here is a proposal to fix the problem: use cv_timewait_sig() instead
of cv_wait_sig(). Since the original code allowed failure when catching a signal, failing because of a timeout is already correctly handled by calling functions.
--- sys/kern/vfs_trans.c.orig
+++ sys/kern/vfs_trans.c
@@ -41,8 +41,9 @@
#endif
#include <sys/param.h>
#include <sys/systm.h>
+#include <sys/kernel.h>
#include <sys/atomic.h>
#include <sys/buf.h>
#include <sys/kmem.h>
#include <sys/mount.h>
@@ -531,12 +532,16 @@
/*
* All threads see the new state now.
* Wait for transactions invalid at this state to leave.
+ * We cannot wait forever because many processes would
+ * get stuck waiting for fstcnt in fstrans_start(). This
+ * is acute when suspending the root filesystem.
*/
error = 0;
while (! state_change_done(mp)) {
- error = cv_wait_sig(&fstrans_count_cv, &fstrans_lock);
+ error = cv_timedwait_sig(&fstrans_count_cv,
+ &fstrans_lock, hz / 4);
if (error) {
new_state = fmi->fmi_state = FSTRANS_NORMAL;
break;
}
--
Emmanuel Dreyfus
manu%netbsd.org@localhost
>Release-Note:
>Audit-Trail:
From: "Maya Rashish" <maya@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53624 CVS commit: pkgsrc/wm/cwm
Date: Sat, 22 Sep 2018 06:32:11 +0000
Module Name: pkgsrc
Committed By: maya
Date: Sat Sep 22 06:32:11 UTC 2018
Modified Files:
pkgsrc/wm/cwm: DESCR Makefile PLIST distinfo
Log Message:
cwm: update to 6.3
From Sunil Nimmagadda in pkgsrc-wip, PR pkg/53624.
To generate a diff of this commit:
cvs rdiff -u -r1.1.1.1 -r1.2 pkgsrc/wm/cwm/DESCR pkgsrc/wm/cwm/PLIST
cvs rdiff -u -r1.19 -r1.20 pkgsrc/wm/cwm/Makefile
cvs rdiff -u -r1.2 -r1.3 pkgsrc/wm/cwm/distinfo
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: manu@netbsd.org (Emmanuel Dreyfus)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: PR/53624 CVS commit: pkgsrc/wm/cwm
Date: Sat, 22 Sep 2018 08:58:21 +0200
Maya Rashish <maya@netbsd.org> wrote:
> From Sunil Nimmagadda in pkgsrc-wip, PR pkg/53624.
Wrong PR, #53624 is about "dom0 freeze on domU exit"
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@netbsd.org
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53624: dom0 freeze on domU exit
Date: Thu, 4 Oct 2018 11:45:10 +0200
To break this deadlock fstrans(9) needs a bracket operation that will
not block while the file system is suspending.
I'm working on a fix.
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Responsible-Changed-From-To: kern-bug-people->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Thu, 04 Oct 2018 10:09:47 +0000
Responsible-Changed-Why:
Take.
State-Changed-From-To: open->analyzed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Thu, 04 Oct 2018 10:09:47 +0000
State-Changed-Why:
Problem understood, working on a fix.
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53624 CVS commit: src
Date: Fri, 5 Oct 2018 09:51:56 +0000
Module Name: src
Committed By: hannken
Date: Fri Oct 5 09:51:56 UTC 2018
Modified Files:
src/distrib/sets/lists/comp: mi
src/share/man/man9: Makefile fstrans.9
src/sys/dev: vnd.c
src/sys/kern: vfs_trans.c
src/sys/miscfs/genfs: genfs_vfsops.c
src/sys/rump/librump/rumpkern: emul.c
src/sys/sys: fstrans.h
Log Message:
Bring back three state file system suspension:
NORMAL -> SUSPENDING -> SUSPENDED
and add operation fstrans_start_lazy() that only blocks while SUSPENDED.
Change vndthread() support operation handle_with_rdwr() to bracket
its file system operations by fstrans_start_lazy() and fstrans_done().
PR kern/53624 (dom0 freeze on domU exit)
To generate a diff of this commit:
cvs rdiff -u -r1.2232 -r1.2233 src/distrib/sets/lists/comp/mi
cvs rdiff -u -r1.430 -r1.431 src/share/man/man9/Makefile
cvs rdiff -u -r1.26 -r1.27 src/share/man/man9/fstrans.9
cvs rdiff -u -r1.265 -r1.266 src/sys/dev/vnd.c
cvs rdiff -u -r1.50 -r1.51 src/sys/kern/vfs_trans.c
cvs rdiff -u -r1.7 -r1.8 src/sys/miscfs/genfs/genfs_vfsops.c
cvs rdiff -u -r1.186 -r1.187 src/sys/rump/librump/rumpkern/emul.c
cvs rdiff -u -r1.11 -r1.12 src/sys/sys/fstrans.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->feedback
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Fri, 05 Oct 2018 09:57:35 +0000
State-Changed-Why:
Committed a fix to -current, please confirm.
From: manu@netbsd.org (Emmanuel Dreyfus)
To: gnats-bugs@NetBSD.org, hannken@NetBSD.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org
Cc:
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Sat, 6 Oct 2018 03:47:28 +0200
<hannken@NetBSD.org> wrote:
> Committed a fix to -current, please confirm.
I applied the change to netbsd-8 and ran my previous test: it does not
freeze.
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@netbsd.org
State-Changed-From-To: feedback->pending-pullups
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Sat, 06 Oct 2018 10:10:02 +0000
State-Changed-Why:
Pullup to -8 requested with ticket #1052
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53624 CVS commit: [netbsd-8] src
Date: Tue, 9 Oct 2018 09:58:10 +0000
Module Name: src
Committed By: martin
Date: Tue Oct 9 09:58:09 UTC 2018
Modified Files:
src/distrib/sets/lists/comp [netbsd-8]: mi
src/share/man/man9 [netbsd-8]: Makefile fstrans.9
src/sys/dev [netbsd-8]: vnd.c
src/sys/kern [netbsd-8]: vfs_trans.c
src/sys/miscfs/genfs [netbsd-8]: genfs_vfsops.c
src/sys/rump/librump/rumpkern [netbsd-8]: emul.c
src/sys/sys [netbsd-8]: fstrans.h
Log Message:
Pull up following revision(s) (requested by hannken in ticket #1052):
sys/kern/vfs_trans.c: revision 1.51
distrib/sets/lists/comp/mi: revision 1.2233
share/man/man9/fstrans.9: revision 1.27
share/man/man9/Makefile: revision 1.431
sys/sys/fstrans.h: revision 1.12
sys/rump/librump/rumpkern/emul.c: revision 1.187
sys/dev/vnd.c: revision 1.266
sys/miscfs/genfs/genfs_vfsops.c: revision 1.8
Bring back three state file system suspension:
NORMAL -> SUSPENDING -> SUSPENDED
and add operation fstrans_start_lazy() that only blocks while SUSPENDED.
Change vndthread() support operation handle_with_rdwr() to bracket
its file system operations by fstrans_start_lazy() and fstrans_done().
PR kern/53624 (dom0 freeze on domU exit)
To generate a diff of this commit:
cvs rdiff -u -r1.2138.2.6 -r1.2138.2.7 src/distrib/sets/lists/comp/mi
cvs rdiff -u -r1.414.2.1 -r1.414.2.2 src/share/man/man9/Makefile
cvs rdiff -u -r1.24.2.1 -r1.24.2.2 src/share/man/man9/fstrans.9
cvs rdiff -u -r1.259.6.1 -r1.259.6.2 src/sys/dev/vnd.c
cvs rdiff -u -r1.45.2.2 -r1.45.2.3 src/sys/kern/vfs_trans.c
cvs rdiff -u -r1.7 -r1.7.2.1 src/sys/miscfs/genfs/genfs_vfsops.c
cvs rdiff -u -r1.181.6.2 -r1.181.6.3 src/sys/rump/librump/rumpkern/emul.c
cvs rdiff -u -r1.10.60.1 -r1.10.60.2 src/sys/sys/fstrans.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Wed, 10 Oct 2018 09:09:31 +0000
State-Changed-Why:
Pullup complete -- thanks for the report.
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Mon, 26 Nov 2018 15:51:03 +0200
Hi,
Even though PR 53624 has been closed, I'm still seeing dom0 hangs in
my Xen test runs, and they seem to be happening around the time of
domUs exiting, with domUs having two block devices backed by plain
files just as in the original PR.
For logs, see the items marked "Log (with timeout)" here:
http://www.gson.org/netbsd/bugs/xen/results/2018-11-15/index.html
http://www.gson.org/netbsd/bugs/xen/results/2018-11-25/index.html
http://www.gson.org/netbsd/bugs/xen/results/2018-11-25/index.html
Anything I can do to help debug this? Is there a way to break into
DDB from a serial console shared with Xen?
--
Andreas Gustafsson, gson@gson.org
From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: gson@gson.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Mon, 26 Nov 2018 13:58:46 +0000
On Mon, Nov 26, 2018 at 01:55:00PM +0000, Andreas Gustafsson wrote:
> Anything I can do to help debug this? Is there a way to break into
> DDB from a serial console shared with Xen?
Have you tried +++++ ?
--
Emmanuel Dreyfus
manu@netbsd.org
From: Andreas Gustafsson <gson@gson.org>
To: Emmanuel Dreyfus <manu@netbsd.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Mon, 26 Nov 2018 16:43:16 +0200
Emmanuel Dreyfus wrote:
> Have you tried +++++ ?
That worked - thanks. Now I just need to get it to hang again...
--
Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: Emmanuel Dreyfus <manu@netbsd.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Mon, 26 Nov 2018 20:46:41 +0200
Trying to reproduce the dom0 freeze, I got a hypervisor panic instead,
with the error message "Assertion 'oc > 0' failed at mm.c:766". This
was also reported by Manuel Bouyer in
https://lists.xenproject.org/archives/html/xen-users/2018-01/msg00116.html
This was running a current xenkernel48.
--
Andreas Gustafsson, gson@gson.org
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: hannken@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
manu@netbsd.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Mon, 26 Nov 2018 22:49:12 +0100
On Mon, Nov 26, 2018 at 06:50:01PM +0000, Andreas Gustafsson wrote:
> The following reply was made to PR kern/53624; it has been noted by GNATS.
>
> From: Andreas Gustafsson <gson@gson.org>
> To: Emmanuel Dreyfus <manu@netbsd.org>
> Cc: gnats-bugs@NetBSD.org
> Subject: Re: kern/53624 (dom0 freeze on domU exit)
> Date: Mon, 26 Nov 2018 20:46:41 +0200
>
> Trying to reproduce the dom0 freeze, I got a hypervisor panic instead,
> with the error message "Assertion 'oc > 0' failed at mm.c:766". This
> was also reported by Manuel Bouyer in
>
> https://lists.xenproject.org/archives/html/xen-users/2018-01/msg00116.html
>
> This was running a current xenkernel48.
There is a workaround for it in the xenkernel411 package
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
State-Changed-From-To: closed->open
State-Changed-By: gson@NetBSD.org
State-Changed-When: Wed, 05 Dec 2018 08:36:41 +0000
State-Changed-Why:
Freezes are still happening.
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: Emmanuel Dreyfus <manu@netbsd.org>
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Wed, 5 Dec 2018 10:35:18 +0200
I reproduced the freeze using xenkernel411. Here's some ddb output:
db> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
11125 1 3 0 80 ffffa30001f3f740 vnconfig fstcnt
10492 1 3 0 0 ffffa300020c34e0 vnconfig biowait
[...]
0 126 3 0 200 ffffa300020834a0 vnd4 vndbp
0 125 3 0 200 ffffa3000204e040 vnd3 fstchg
0 124 3 0 200 ffffa30000ed1660 vnd2 vndbp
0 123 3 0 200 ffffa30002031840 vnd1 fstchg
0 66 3 0 200 ffffa3000088a8e0 vnd0 vndbp
[...]
db> call fstrans_dump
Fstrans locks by lwp:
[ 87719.6801535] 10492.1 (/) shared 1 cow 0
[ 87719.6801535] Fstrans state by mount:
[ 87719.6801535] / state suspending
0
db>
If there are other ddb commands I can run to help debug this, please
let me know. I will try to keep the system at the ddb prompt for the
next 24 hours at least.
--
Andreas Gustafsson, gson@gson.org
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: hannken@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
manu@netbsd.org, gson@gson.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Wed, 5 Dec 2018 09:50:13 +0100
On Wed, Dec 05, 2018 at 08:40:00AM +0000, Andreas Gustafsson wrote:
> The following reply was made to PR kern/53624; it has been noted by GNATS.
>
> From: Andreas Gustafsson <gson@gson.org>
> To: gnats-bugs@NetBSD.org
> Cc: Emmanuel Dreyfus <manu@netbsd.org>
> Subject: Re: kern/53624 (dom0 freeze on domU exit)
> Date: Wed, 5 Dec 2018 10:35:18 +0200
>
> I reproduced the freeze using xenkernel411. Here's some ddb output:
>
> db> ps
> PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
> 11125 1 3 0 80 ffffa30001f3f740 vnconfig fstcnt
> 10492 1 3 0 0 ffffa300020c34e0 vnconfig biowait
> [...]
> 0 126 3 0 200 ffffa300020834a0 vnd4 vndbp
> 0 125 3 0 200 ffffa3000204e040 vnd3 fstchg
> 0 124 3 0 200 ffffa30000ed1660 vnd2 vndbp
> 0 123 3 0 200 ffffa30002031840 vnd1 fstchg
> 0 66 3 0 200 ffffa3000088a8e0 vnd0 vndbp
> [...]
> db> call fstrans_dump
> Fstrans locks by lwp:
> [ 87719.6801535] 10492.1 (/) shared 1 cow 0
> [ 87719.6801535] Fstrans state by mount:
> [ 87719.6801535] / state suspending
> 0
> db>
>
> If there are other ddb commands I can run to help debug this, please
> let me know. I will try to keep the system at the ddb prompt for the
> next 24 hours at least.
Can you get a stack trace (tr/a) for the processes listed above ?
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Emmanuel Dreyfus <manu@netbsd.org>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@NetBSD.org, Emmanuel Dreyfus <manu@netbsd.org>
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Wed, 5 Dec 2018 08:50:50 +0000
On Wed, Dec 05, 2018 at 10:35:18AM +0200, Andreas Gustafsson wrote:
> I reproduced the freeze using xenkernel411. Here's some ddb output:
>
> db> ps
> PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
> 11125 1 3 0 80 ffffa30001f3f740 vnconfig fstcnt
> 10492 1 3 0 0 ffffa300020c34e0 vnconfig biowait
> [...]
> 0 126 3 0 200 ffffa300020834a0 vnd4 vndbp
> 0 125 3 0 200 ffffa3000204e040 vnd3 fstchg
> 0 124 3 0 200 ffffa30000ed1660 vnd2 vndbp
> 0 123 3 0 200 ffffa30002031840 vnd1 fstchg
> 0 66 3 0 200 ffffa3000088a8e0 vnd0 vndbp
(...)
> If there are other ddb commands I can run to help debug this, please
> let me know. I will try to keep the system at the ddb prompt for the
> next 24 hours at least.
PLease get a kernel backtrace for the processes of interest here:
bt/a ffffa30001f3f740
bt/a ffffa300020c34e0
...
At some point, running a kernel with options LOCKDEBUG may help.
--
Emmanuel Dreyfus
manu@netbsd.org
From: Andreas Gustafsson <gson@gson.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Wed, 5 Dec 2018 10:58:35 +0200
Manuel Bouyer wrote:
> Can you get a stack trace (tr/a) for the processes listed above ?
db> tr/a ffffa30001f3f740
trace: pid 11125 lid 1 at 0xffffa3003b0c5850
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait_sig() at netbsd:cv_wait_sig+0xf3
fstrans_setstate() at netbsd:fstrans_setstate+0xa7
genfs_suspendctl() at netbsd:genfs_suspendctl+0x57
vfs_suspend() at netbsd:vfs_suspend+0x74
vrevoke_suspend_next() at netbsd:vrevoke_suspend_next+0x2a
vrevoke() at netbsd:vrevoke+0x2b
genfs_revoke() at netbsd:genfs_revoke+0x13
VOP_REVOKE() at netbsd:VOP_REVOKE+0x2e
vdevgone() at netbsd:vdevgone+0x5b
vnddoclear() at netbsd:vnddoclear+0xbc
vndioctl() at netbsd:vndioctl+0x384
VOP_IOCTL() at netbsd:VOP_IOCTL+0x37
vn_ioctl() at netbsd:vn_ioctl+0xa1
sys_ioctl() at netbsd:sys_ioctl+0x103
syscall() at netbsd:syscall+0x9c
--- syscall (number 54) ---
7d2cfc11a78a:
db> tr /a ffffa300020c34e0
trace: pid 10492 lid 1 at 0xffffa3003b36d8c0
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait() at netbsd:cv_wait+0xf1
biowait() at netbsd:biowait+0x4f
scan_mbr() at netbsd:scan_mbr+0x3e
readdisklabel() at netbsd:readdisklabel+0x15a
vndopen() at netbsd:vndopen+0x2e7
spec_open() at netbsd:spec_open+0x386
VOP_OPEN() at netbsd:VOP_OPEN+0x2f
vn_open() at netbsd:vn_open+0x203
do_open() at netbsd:do_open+0x10d
do_sys_openat() at netbsd:do_sys_openat+0x68
sys_open() at netbsd:sys_open+0x24
syscall() at netbsd:syscall+0x9c
--- syscall (number 5) ---
72a42d23ea6a:
db> tr/a ffffa300020834a0
trace: pid 0 lid 126 at 0xffffa3003b0dee30
sleepq_block() at netbsd:sleepq_block+0x99
vndthread() at netbsd:vndthread+0x53a
db> tr/a ffffa3000204e040
trace: pid 0 lid 125 at 0xffffa3003b6d4d30
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait() at netbsd:cv_wait+0xf1
fstrans_start() at netbsd:fstrans_start+0x73c
VOP_LOCK() at netbsd:VOP_LOCK+0x57
vn_lock() at netbsd:vn_lock+0x90
vndthread() at netbsd:vndthread+0x2bd
db> tr/a ffffa30000ed1660
trace: pid 0 lid 124 at 0xffffa3003b009e30
sleepq_block() at netbsd:sleepq_block+0x99
vndthread() at netbsd:vndthread+0x53a
db> tr/a ffffa30002031840
trace: pid 0 lid 123 at 0xffffa3003b0cad30
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait() at netbsd:cv_wait+0xf1
fstrans_start() at netbsd:fstrans_start+0x73c
VOP_LOCK() at netbsd:VOP_LOCK+0x57
vn_lock() at netbsd:vn_lock+0x90
vndthread() at netbsd:vndthread+0x2bd
db> tr/a ffffa3000088a8e0
trace: pid 0 lid 66 at 0xffffa3003b024e30
sleepq_block() at netbsd:sleepq_block+0x99
vndthread() at netbsd:vndthread+0x53a
db>
--
Andreas Gustafsson, gson@gson.org
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Wed, 5 Dec 2018 10:28:28 +0100
--Apple-Mail=_6219B15B-4A05-4835-8608-FF6A8F5937DF
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
> On 5. Dec 2018, at 10:00, Andreas Gustafsson <gson@gson.org> wrote:
<snip>
> trace: pid 0 lid 125 at 0xffffa3003b6d4d30
> sleepq_block() at netbsd:sleepq_block+0x99
> cv_wait() at netbsd:cv_wait+0xf1
> fstrans_start() at netbsd:fstrans_start+0x73c
> VOP_LOCK() at netbsd:VOP_LOCK+0x57
> vn_lock() at netbsd:vn_lock+0x90
> vndthread() at netbsd:vndthread+0x2bd
Oops, my bad -- we have to protect handle_with_strategy() too.
Please try the attached diff.
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
--Apple-Mail=_6219B15B-4A05-4835-8608-FF6A8F5937DF
Content-Disposition: attachment;
filename=vnd.c.diff
Content-Type: application/octet-stream;
x-unix-mode=0644;
name="vnd.c.diff"
Content-Transfer-Encoding: 7bit
diff -r 400ec4f24994 sys/dev/vnd.c
--- sys/dev/vnd.c
+++ sys/dev/vnd.c
@@ -733,12 +733,17 @@ vndthread(void *arg)
bp->b_bcount = obp->b_bcount;
BIO_COPYPRIO(bp, obp);
+ /* Make sure the request succeeds while suspending this fs. */
+ fstrans_start_lazy(vnd->sc_vp->v_mount);
+
/* Handle the request using the appropriate operations. */
if ((vnd->sc_flags & VNF_USE_VN_RDWR) == 0)
handle_with_strategy(vnd, obp, bp);
else
handle_with_rdwr(vnd, obp, bp);
+ fstrans_done(vnd->sc_vp->v_mount);
+
s = splbio();
continue;
@@ -804,9 +809,6 @@ handle_with_rdwr(struct vnd_softc *vnd,
bp->b_bcount);
#endif
- /* Make sure the request succeeds while suspending this fs. */
- fstrans_start_lazy(vp->v_mount);
-
/* Issue the read or write operation. */
bp->b_error =
vn_rdwr(doread ? UIO_READ : UIO_WRITE,
@@ -828,8 +830,6 @@ handle_with_rdwr(struct vnd_softc *vnd,
else
mutex_exit(vp->v_interlock);
- fstrans_done(vp->v_mount);
-
/* We need to increase the number of outputs on the vnode if
* there was any write to it. */
if (!doread) {
--Apple-Mail=_6219B15B-4A05-4835-8608-FF6A8F5937DF--
From: Andreas Gustafsson <gson@gson.org>
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Fri, 7 Dec 2018 19:54:06 +0200
J. Hannken-Illjes wrote:
> Oops, my bad -- we have to protect handle_with_strategy() too.
>
> Please try the attached diff.
Looking good - with the patch applied, my test system has now created
and destroyed 22 domUs and is still running. Without the patch, it
usually froze after creating and destroying just a few domUs.
--
Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: hannken@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53624 (dom0 freeze on domU exit)
Date: Mon, 10 Dec 2018 16:39:41 +0200
Hi hannken,
I have now shut down my test system after it completed than 100
successful domU create/destroy cycles without freezing with your
patch. Feel free to commit it :)
--
Andreas Gustafsson, gson@gson.org
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53624 CVS commit: src/sys/dev
Date: Mon, 10 Dec 2018 15:22:35 +0000
Module Name: src
Committed By: hannken
Date: Mon Dec 10 15:22:35 UTC 2018
Modified Files:
src/sys/dev: vnd.c
Log Message:
Operation handle_with_strategy() also needs the
fstrans_start_lazy() / fstrans_done() bracket.
PR kern/53624 (dom0 freeze on domU exit)
To generate a diff of this commit:
cvs rdiff -u -r1.269 -r1.270 src/sys/dev/vnd.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53624 CVS commit: [netbsd-8] src/sys/dev
Date: Mon, 10 Dec 2018 17:16:11 +0000
Module Name: src
Committed By: martin
Date: Mon Dec 10 17:16:11 UTC 2018
Modified Files:
src/sys/dev [netbsd-8]: vnd.c
Log Message:
Pull up following revision(s) (requested by hannken in ticket #1133):
sys/dev/vnd.c: revision 1.270
Operation handle_with_strategy() also needs the
fstrans_start_lazy() / fstrans_done() bracket.
PR kern/53624 (dom0 freeze on domU exit)
To generate a diff of this commit:
cvs rdiff -u -r1.259.6.4 -r1.259.6.5 src/sys/dev/vnd.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Thu, 13 Dec 2018 10:24:06 +0000
State-Changed-Why:
Fix committed to -current, pullup to -8 complete.
From: Manuel.Bouyer@lip6.fr
To: gnats-bugs@NetBSD.org
Cc:
Subject: kern/53624 (dom0 freeze on domU exit) is still there
Date: Wed, 18 Sep 2019 16:54:56 +0200 (MEST)
>Submitter-Id: net
>Originator: Manuel Bouyer
>Organization:
>Confidential: no
>Synopsis: kern/53624 (dom0 freeze on domU exit) is still there
>Severity: serious
>Priority: high
>Category: kern
>Class: sw-bug
>Release: NetBSD 8.1_STABLE
>Environment:
System: NetBSD xen1.soc.lip6.fr 8.1_STABLE NetBSD 8.1_STABLE (ADMIN_DOM0) #0: Tue Sep 17 15:47:43 MEST 2019 bouyer@armandeche.soc.lip6.fr:/local/armandeche1/tmp/build/amd64/obj/local/armandeche1/netbsd-8/src/sys/arch/amd64/compile/ADMIN_DOM0 x86_64
Architecture: x86_64
Machine: amd64
>Description:
On my testbed, which starts/destroys several domUs per day (eventually
in parallel), I see occasional filesystem hang with processes
waiting on fstchg.
Interesting processes are:
0 105 3 0 200 ffffa0000213e5a0 vnd1 fstchg
0 104 3 0 200 ffffa00002088160 vnd0 vndbp
0 97 3 0 200 ffffa0000206a980 vnd3 vndbp
0 96 3 0 200 ffffa0000105a280 vnd2 fstchg
0 67 3 0 200 ffffa00000d73640 ioflush fstchg
6533 1 3 0 0 ffffa00001f77080 vnconfig biowait
25777 1 3 0 80 ffffa00001e5f480 vnconfig fstcnt
db> tr/a ffffa0000213e5a0
trace: pid 0 lid 105 at 0xffffa0002cffd4f0
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait() at netbsd:cv_wait+0xf0
fstrans_start() at netbsd:fstrans_start+0x78e
VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x42
genfs_getpages() at netbsd:genfs_getpages+0x1344
VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x4b
ubc_fault() at netbsd:ubc_fault+0x188
uvm_fault_internal() at netbsd:uvm_fault_internal+0x6d4
trap() at netbsd:trap+0x3c1
--- trap (number 6) ---
kcopy() at netbsd:kcopy+0x15
uiomove() at netbsd:uiomove+0xb9
ubc_uiomove() at netbsd:ubc_uiomove+0xf7
ffs_read() at netbsd:ffs_read+0xf7
VOP_READ() at netbsd:VOP_READ+0x33
vn_rdwr() at netbsd:vn_rdwr+0x10c
vndthread() at netbsd:vndthread+0x5b1
db> tr/a ffffa0000105a280
trace: pid 0 lid 96 at 0xffffa0002cf4d9c0
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait() at netbsd:cv_wait+0xf0
fstrans_start() at netbsd:fstrans_start+0x78e
VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x42
genfs_do_io() at netbsd:genfs_do_io+0x1b4
genfs_gop_write() at netbsd:genfs_gop_write+0x52
genfs_do_putpages() at netbsd:genfs_do_putpages+0xb9c
VOP_PUTPAGES() at netbsd:VOP_PUTPAGES+0x36
vndthread() at netbsd:vndthread+0x683
db> tr/a ffffa00000d73640
trace: pid 0 lid 67 at 0xffffa0002cd48ca0
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait() at netbsd:cv_wait+0xf0
fstrans_start() at netbsd:fstrans_start+0x78e
VOP_BWRITE() at netbsd:VOP_BWRITE+0x42
ffs_sbupdate() at netbsd:ffs_sbupdate+0xc3
ffs_cgupdate() at netbsd:ffs_cgupdate+0x20
ffs_sync() at netbsd:ffs_sync+0x1e9
sched_sync() at netbsd:sched_sync+0x93
db> tr/a ffffa00001f77080
trace: pid 6533 lid 1 at 0xffffa0002cff8910
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait() at netbsd:cv_wait+0xf0
biowait() at netbsd:biowait+0x4f
scan_iso_vrs_session() at netbsd:scan_iso_vrs_session+0x60
readdisklabel() at netbsd:readdisklabel+0x304
vndopen() at netbsd:vndopen+0x305
spec_open() at netbsd:spec_open+0x385
VOP_OPEN() at netbsd:VOP_OPEN+0x2f
vn_open() at netbsd:vn_open+0x1e9
do_open() at netbsd:do_open+0x112
do_sys_openat() at netbsd:do_sys_openat+0x68
sys_open() at netbsd:sys_open+0x24
syscall() at netbsd:syscall+0x9c
db> tr/a ffffa00001e5f480
trace: pid 25777 lid 1 at 0xffffa0002b358860
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait_sig() at netbsd:cv_wait_sig+0xf4
fstrans_setstate() at netbsd:fstrans_setstate+0xaa
genfs_suspendctl() at netbsd:genfs_suspendctl+0x57
vfs_suspend() at netbsd:vfs_suspend+0x5b
vrevoke_suspend_next() at netbsd:vrevoke_suspend_next+0x2a
vrevoke() at netbsd:vrevoke+0x2b
genfs_revoke() at netbsd:genfs_revoke+0x13
VOP_REVOKE() at netbsd:VOP_REVOKE+0x2e
vdevgone() at netbsd:vdevgone+0x5a
vnddoclear() at netbsd:vnddoclear+0xc6
vndioctl() at netbsd:vndioctl+0x3bb
VOP_IOCTL() at netbsd:VOP_IOCTL+0x37
vn_ioctl() at netbsd:vn_ioctl+0xa6
sys_ioctl() at netbsd:sys_ioctl+0x101
syscall() at netbsd:syscall+0x9c
db> call fstrans_dump
Fstrans locks by lwp:
6533.1 (/) shared 1 cow 0
0.105 (/domains) lazy 3 cow 0
0.96 (/domains) lazy 2 cow 0
0.67 (/domains) shared 1 cow 0
Fstrans state by mount:
/ state suspending
So it looks like we have a 3-way deadlock between ioflush and the two vnconfig
threads (while kern/53624 was only between 2 vnconfig threads) but I can't
see the exact scenario yet. Also, the files backing the vnd are in
/domains, not in /
WAPBL is configured in the kernel but not in use.
>How-To-Repeat:
xl create/shutdown several domUs in parallel
>Fix:
please ...
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
Date: Thu, 3 Oct 2019 11:04:27 +0200
--Apple-Mail=_081FA46D-A7E9-4FC3-B58D-E92A321A69E4
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
To me this makes no sense:
- 25777.1 suspends "/", state "suspending" and waits for 6533.1
- 6533.1 may wait on a vnd thread
- the traces of the two vnd threads in "fstchg" don't contain
obvious accesses to "/"
Are you able to get a core or do you have the corresponding
"netbsd.gdb"?
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig
--Apple-Mail=_081FA46D-A7E9-4FC3-B58D-E92A321A69E4
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl2VuZsACgkQKoaVJdNr
+uEEsgf8Cmfy9cbqpBY87DdHV9N43m30KlHlfTU8CLT/RawbKM2xyyAYyj/nhwdu
E6CAD0Ys794Wll4xB2Qp71FdqxfaapmzBiBgBiWUNSztTmdd+RNwMsllz7SPGKu8
aZk7I6Ta6ljjyAWSpTkDplwDfHdxGK5IA8eW4zPjiDAZjtK+vqxHWV+o9mmD9uB/
GLEvor971BSEaF6xGRsZJ0TzyW6SE8ZhKF8HlxXWxc1ApUbVYwkmt1dw4pBhX6b4
G3UqGi2FdWimVjK13x98bNN41OrQr7opMlmXYxH6urVjUlBznbWSErgsYg355vps
AHhtOWII1j+yvjds3eR6r5pjchYYFg==
=9z3v
-----END PGP SIGNATURE-----
--Apple-Mail=_081FA46D-A7E9-4FC3-B58D-E92A321A69E4--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: hannken@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
manu@netbsd.org, gson@gson.org
Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
Date: Thu, 3 Oct 2019 12:07:01 +0200
On Thu, Oct 03, 2019 at 09:05:01AM +0000, J. Hannken-Illjes wrote:
> The following reply was made to PR kern/53624; it has been noted by GNATS.
>
> From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
> Date: Thu, 3 Oct 2019 11:04:27 +0200
>
> --Apple-Mail=_081FA46D-A7E9-4FC3-B58D-E92A321A69E4
> Content-Transfer-Encoding: 7bit
> Content-Type: text/plain;
> charset=us-ascii
>
> To me this makes no sense:
>
> - 25777.1 suspends "/", state "suspending" and waits for 6533.1
>
> - 6533.1 may wait on a vnd thread
>
> - the traces of the two vnd threads in "fstchg" don't contain
> obvious accesses to "/"
>
> Are you able to get a core
No, kernel core dump don't work on Xen
> or do you have the corresponding
> "netbsd.gdb"?
Yes
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
Date: Thu, 3 Oct 2019 18:09:59 +0200
--Apple-Mail=_9D0BB28E-3676-4CD0-ADE7-D400968E0442
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=us-ascii
> On 3. Oct 2019, at 11:05, J. Hannken-Illjes <hannken@eis.cs.tu-bs.de> =
wrote:
<snip>
> - the traces of the two vnd threads in "fstchg" don't contain
> obvious accesses to "/"
Problem understood, VOP_GETPAGES()/VOP_PUTPAGES() do:
VOP_BMAP(vp, lbn, &devvp, &blkno, &run);
VOP_STRATEGY(devvp, bp);
While vp resides on "/domains", devvp (the device it was
mounted from) resides on "/".
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig
--Apple-Mail=_9D0BB28E-3676-4CD0-ADE7-D400968E0442
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEE2BL3ha7Xao4WUZVYKoaVJdNr+uEFAl2WHVcACgkQKoaVJdNr
+uEU1wf/aHq5fUnJdWYfaRRHSXhmqYtMa2fjS3bcOEkVCJXv7Zgj5Omsl2rNNFN8
CQ/A024LvGAsBL4RgFwofrAPbk/KBx/B+E3m40rBZnV0DMgijBZBQnQSO5CgG3X1
x7QWyp95K0+0lnKLGY8pZ231pOeZR+JhSiJTOB8UwJyAqU4K8FrgnIUVUv8ePchp
076bPyALg3Sv2dGbNrkti53RI2KxM/rrvykw0kRcnnV+s4yckfMknU1XM6KeJuyX
0j6ZAO1VXDBW7sYYWK2T3ZmFJLl4DYendMHDlL2/Tzcwgu1RclqAIndAMw+vFezB
xchhPwWm93Pnz6t5WAO3cgFykVPirA==
=KgS5
-----END PGP SIGNATURE-----
--Apple-Mail=_9D0BB28E-3676-4CD0-ADE7-D400968E0442--
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
Date: Fri, 4 Oct 2019 11:40:54 +0200
--Apple-Mail=_788E5721-9DB9-4F79-AE00-7266B8F7D8D0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
Looks like we have to use fstrans_start_lazy() for VOP_STRATEGY() too
as it usually calls itself on the file system holding "/dev".
The attached diff could help, please give it a try.
--
J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
--Apple-Mail=_788E5721-9DB9-4F79-AE00-7266B8F7D8D0
Content-Disposition: attachment;
filename=vnode_if.c.diff
Content-Type: application/octet-stream;
x-unix-mode=0644;
name="vnode_if.c.diff"
Content-Transfer-Encoding: 7bit
diff -r b9b26f2b5eeb -r 45acdd7da973 sys/kern/vnode_if.c
--- sys/kern/vnode_if.c
+++ sys/kern/vnode_if.c
@@ -49,7 +49,7 @@
#include <sys/lock.h>
#include <sys/fstrans.h>
-enum fst_op { FST_NO, FST_YES, FST_TRY };
+enum fst_op { FST_NO, FST_YES, FST_LAZY, FST_TRY };
static inline int
vop_pre(vnode_t *vp, struct mount **mp, bool *mpsafe, enum fst_op op)
@@ -62,7 +62,7 @@ vop_pre(vnode_t *vp, struct mount **mp,
KERNEL_LOCK(1, curlwp);
}
- if (op == FST_YES || op == FST_TRY) {
+ if (op == FST_YES || op == FST_LAZY || op == FST_TRY) {
for (;;) {
*mp = vp->v_mount;
if (op == FST_TRY) {
@@ -73,6 +73,8 @@ vop_pre(vnode_t *vp, struct mount **mp,
}
return error;
}
+ } else if (op == FST_LAZY) {
+ fstrans_start_lazy(*mp);
} else {
fstrans_start(*mp);
}
@@ -91,7 +93,7 @@ static inline void
vop_post(vnode_t *vp, struct mount *mp, bool mpsafe, enum fst_op op)
{
- if (op == FST_YES) {
+ if (op == FST_YES || op == FST_LAZY) {
fstrans_done(mp);
}
@@ -1378,11 +1380,11 @@ VOP_STRATEGY(struct vnode *vp,
a.a_desc = VDESC(vop_strategy);
a.a_vp = vp;
a.a_bp = bp;
- error = vop_pre(vp, &mp, &mpsafe, FST_YES);
+ error = vop_pre(vp, &mp, &mpsafe, FST_LAZY);
if (error)
return error;
error = (VCALL(vp, VOFFSET(vop_strategy), &a));
- vop_post(vp, mp, mpsafe, FST_YES);
+ vop_post(vp, mp, mpsafe, FST_LAZY);
return error;
}
--Apple-Mail=_788E5721-9DB9-4F79-AE00-7266B8F7D8D0--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: hannken@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
manu@netbsd.org, gson@gson.org
Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
Date: Wed, 9 Oct 2019 15:31:42 +0200
On Fri, Oct 04, 2019 at 09:45:01AM +0000, J. Hannken-Illjes wrote:
> The following reply was made to PR kern/53624; it has been noted by GNATS.
>
> From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/53624 (dom0 freeze on domU exit) is still there
> Date: Fri, 4 Oct 2019 11:40:54 +0200
>
> --Apple-Mail=_788E5721-9DB9-4F79-AE00-7266B8F7D8D0
> Content-Transfer-Encoding: 7bit
> Content-Type: text/plain;
> charset=us-ascii
>
> Looks like we have to use fstrans_start_lazy() for VOP_STRATEGY() too
> as it usually calls itself on the file system holding "/dev".
>
> The attached diff could help, please give it a try.
Looks good, I have completed 2 rounds of tests without problems.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53624 CVS commit: src/sys/kern
Date: Fri, 11 Oct 2019 08:04:52 +0000
Module Name: src
Committed By: hannken
Date: Fri Oct 11 08:04:52 UTC 2019
Modified Files:
src/sys/kern: vnode_if.sh vnode_if.src
Log Message:
As VOP_STRATEGY() usually calls itself on the file system holding "/dev"
it may deadlock on suspension of this file system.
Add fstrans type LAZY and use it for VOP_STRATEGY().
Adress PR kern/53624 (dom0 freeze on domU exit) is still there
To generate a diff of this commit:
cvs rdiff -u -r1.66 -r1.67 src/sys/kern/vnode_if.sh
cvs rdiff -u -r1.77 -r1.78 src/sys/kern/vnode_if.src
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53624 CVS commit: [netbsd-8] src/sys/kern
Date: Mon, 14 Oct 2019 17:43:58 +0000
Module Name: src
Committed By: martin
Date: Mon Oct 14 17:43:58 UTC 2019
Modified Files:
src/sys/kern [netbsd-8]: vnode_if.sh vnode_if.src
Log Message:
Pull up following revision(s) (requested by hannken in ticket #1405):
sys/kern/vnode_if.sh: revision 1.67
sys/kern/vnode_if.src: revision 1.78
As VOP_STRATEGY() usually calls itself on the file system holding "/dev"
it may deadlock on suspension of this file system.
Add fstrans type LAZY and use it for VOP_STRATEGY().
Adress PR kern/53624 (dom0 freeze on domU exit) is still there
To generate a diff of this commit:
cvs rdiff -u -r1.64.4.1 -r1.64.4.2 src/sys/kern/vnode_if.sh
cvs rdiff -u -r1.75.2.2 -r1.75.2.3 src/sys/kern/vnode_if.src
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53624 CVS commit: [netbsd-9] src/sys/kern
Date: Tue, 15 Oct 2019 18:12:26 +0000
Module Name: src
Committed By: martin
Date: Tue Oct 15 18:12:25 UTC 2019
Modified Files:
src/sys/kern [netbsd-9]: vnode_if.sh vnode_if.src
Log Message:
Pull up following revision(s) (requested by hannken in ticket #307):
sys/kern/vnode_if.sh: revision 1.67
sys/kern/vnode_if.src: revision 1.78
As VOP_STRATEGY() usually calls itself on the file system holding "/dev"
it may deadlock on suspension of this file system.
Add fstrans type LAZY and use it for VOP_STRATEGY().
Adress PR kern/53624 (dom0 freeze on domU exit) is still there
To generate a diff of this commit:
cvs rdiff -u -r1.66 -r1.66.10.1 src/sys/kern/vnode_if.sh
cvs rdiff -u -r1.77 -r1.77.10.1 src/sys/kern/vnode_if.src
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.