NetBSD Problem Report #53506
From www@NetBSD.org Tue Aug 7 16:37:43 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 4E69D7A1FE
for <gnats-bugs@gnats.NetBSD.org>; Tue, 7 Aug 2018 16:37:43 +0000 (UTC)
Message-Id: <20180807163742.087D57A210@mollari.NetBSD.org>
Date: Tue, 7 Aug 2018 16:37:42 +0000 (UTC)
From: manu@netbsd.org
Reply-To: manu@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: Easy trigger of "panic: biodone2 already" on NetBSD 8.0 Xen domU
X-Send-Pr-Version: www-1.0
>Number: 53506
>Category: port-xen
>Synopsis: Easy trigger of "panic: biodone2 already" on NetBSD 8.0 Xen domU
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: jdolecek
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Aug 07 16:40:00 +0000 2018
>Closed-Date: Sat Oct 06 20:23:35 +0000 2018
>Last-Modified: Sat Oct 06 20:23:35 +0000 2018
>Originator: Emmanuel Dreyfus
>Release: NetBSD 8.0
>Organization:
NetBSD
>Environment:
NetBSD bacasable 8.0 NetBSD 8.0 (XEN3PAE_DOMU) #0: Tue Jul 17 14:59:51 UTC 2018 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/xen/compile/XEN3PAE_DOMU i386
>Description:
On NetBSD 8.0 domU (either i386 and amd64), running dump and a concurent filesystem access crashes the kernel with "panic: biodone2 already". Stack trace is below.
panic: biodone2 already
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip 0xc0105334 cs 0x9 eflags 0x246 cr2 0xbb56d000 ilevel 0 esp 0xda67ff2c
curlwp 0xc2d4e7e0 pid 0 lid 4 lowest kstack 0xda67e2c0
Stopped in pid 0.4 (system) at netbsd:breakpoint+0x4: popl %ebp
breakpoint(c04da2c8,c06d84e0,c04f9e2c,da67ff48,c42a1718,c057fe00,da6b82cc,da67ff3c,c03c4e48,c04f9e2c) at netbsd:breakpoint+0x4
vpanic(c04f9e2c,da67ff48,da67ff54,c03fdd2f,c04f9e2c,c0114934,c42a1718,c057fe00,da67ff68,c03fdd5d) at netbsd:vpanic+0x12d
panic(c04f9e2c,c0114934,c42a1718,c057fe00,da67ff68,c03fdd5d,0,0,da6b8004,da67ff8c) at netbsd:panic+0x18
biodone2(0,0,da6b8004,da67ff8c,c039e66d,0,da67ff88,c2d4e7e0,0,da6b8004) at netbsd:biodone2+0xff
biointr(0,da67ff88,c2d4e7e0,0,da6b8004,c039e610,c2d4e7e0,0,c0102031,da6b8004) at netbsd:biointr+0x2d
softint_thread(da6b8004,96e000,c0595200,0,c0100075,0,0,0,0,0) at netbsd:softint_thread+0x5d
Relevant processes:
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
446 1 2 0 0 c3eef2a0 grep
562 1 3 0 80 c33cfaa0 dump lockf
169 1 3 0 80 c345d800 dump lockf
1487 1 3 0 0 c3f9cd40 dump physio
145 1 3 0 80 c3eef7e0 dump netio
1351 1 3 0 80 c33f27e0 dump wait
>How-To-Repeat:
Run as root in two different terminals:
/sbin/dump -a0f /dev/null /
grep -r something /var
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Tue, 14 Aug 2018 17:26:53 +0000
Responsible-Changed-Why:
I'd like to look at this to get this fixed for 8.1
From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53506: Easy trigger of "panic: biodone2 already" on NetBSD
8.0 Xen domU
Date: Mon, 20 Aug 2018 01:38:59 +0000
--/04w6evG8XlLl3ft
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
The attached patch clears the struct xbd_req content between usage,
while there adds a few sanity checks. At mine it is enough to
prevent any panic on biodone2 already.
If it looks good enough, I think we should pull it up to netbsd-8 so
that backups are possible again. The code can be more heavily
reworked on -current afterwards.
--
Emmanuel Dreyfus
manu@netbsd.org
--/04w6evG8XlLl3ft
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=patch-biodone2already
--- ./sys/arch/xen/xen/xbd_xenbus.c.orig
+++ ./sys/arch/xen/xen/xbd_xenbus.c
@@ -222,8 +222,21 @@
.d_close = xbdclose,
.d_diskstart = xbd_diskstart,
};
+static inline void
+xbd_xbdreq_clear(struct xbd_xenbus_softc *sc, struct xbd_req *xbdreq)
+{
+ int i = xbdreq - sc->sc_reqs;
+
+ KASSERT(xbdreq->req_id == i);
+ memset(xbdreq, 0, sizeof(*xbdreq));
+ xbdreq->req_id = i;
+
+ return;
+}
+
+
static int
xbd_xenbus_match(device_t parent, cfdata_t match, void *aux)
{
struct xenbusdev_attach_args *xa = aux;
@@ -652,8 +665,9 @@
for (i = sc->sc_ring.rsp_cons; i != resp_prod; i++) {
blkif_response_t *rep = RING_GET_RESPONSE(&sc->sc_ring, i);
struct xbd_req *xbdreq = &sc->sc_reqs[rep->id];
bp = xbdreq->req_bp;
+ KASSERT(bp != NULL);
DPRINTF(("xbd_handler(%p): b_bcount = %ld\n",
xbdreq->req_bp, (long)bp->b_bcount));
if (rep->operation == BLKIF_OP_FLUSH_DISKCACHE) {
xbdreq->req_sync.s_error = rep->status;
@@ -693,8 +707,9 @@
xbd_unmap_align(xbdreq);
dk_done(&sc->sc_dksc, bp);
+ xbdreq->req_bp = NULL; /* Do not reuse */
SLIST_INSERT_HEAD(&sc->sc_xbdreq_head, xbdreq, req_next);
}
done:
xen_rmb();
@@ -854,8 +869,9 @@
if (__predict_false(xbdreq == NULL)) {
DPRINTF(("xbdioctl: no req\n"));
error = ENOMEM;
} else {
+ xbd_xbdreq_clear(sc, xbdreq);
SLIST_REMOVE_HEAD(&sc->sc_xbdreq_head, req_next);
req = RING_GET_REQUEST(&sc->sc_ring,
sc->sc_ring.req_prod_pvt);
req->id = xbdreq->req_id;
@@ -952,8 +968,9 @@
error = EAGAIN;
goto out;
}
+ xbd_xbdreq_clear(sc, xbdreq);
xbdreq->req_bp = bp;
xbdreq->req_data = bp->b_data;
if ((vaddr_t)bp->b_data & (XEN_BSIZE - 1)) {
if (__predict_false(xbd_map_align(xbdreq) != 0)) {
--/04w6evG8XlLl3ft--
From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: Emmanuel Dreyfus <manu@netbsd.org>
Subject: Re: kern/53506: Easy trigger of "panic: biodone2 already" on NetBSD
8.0 Xen domU
Date: Mon, 20 Aug 2018 21:51:11 +0200
2018-08-20 3:40 GMT+02:00 Emmanuel Dreyfus <manu@netbsd.org>:
> The attached patch clears the struct xbd_req content between usage,
> while there adds a few sanity checks. At mine it is enough to
> prevent any panic on biodone2 already.
This is interesting, since according to how I read the code, it should
not actually change the behaviour. The fields are not checked
anywhere, so it should just process the thing same way as before and
hence panic. Also very likely, if you add xbd_xbdreq_clear() also for
the DIOCCACHESYNC path, it would trigger panic too :)
I've noticed there is possibly missing splbio()/splx() in
xbd_diskstart(). Can you try if following patch (without your changes)
changes the behaviour?
http://www.netbsd.org/~jdolecek/xbd_diskstart_splbio.diff
Jaromir
From: manu@netbsd.org (Emmanuel Dreyfus)
To: gnats-bugs@NetBSD.org, jdolecek@NetBSD.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/53506: Easy trigger of "panic: biodone2 already" on NetBSD 8.0 Xen domU
Date: Tue, 21 Aug 2018 02:54:26 +0200
Jarom=EDr Dole?ek <jaromir.dolecek@gmail.com> wrote:
> I've noticed there is possibly missing splbio()/splx() in
> xbd_diskstart(). Can you try if following patch (without your changes)
> changes the behaviour?
> =20
> http://www.netbsd.org/~jdolecek/xbd_diskstart_splbio.diff
It also works around the panic biodone2 already. It seems my patch was
enough to win a race but yours is the right fix. Will you commit this
change and pullup to netbsd-8 (I tested on netbsd-8-0-RELEASE)?
--=20
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@netbsd.org
State-Changed-From-To: open->pending-pullups
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Tue, 21 Aug 2018 18:14:52 +0000
State-Changed-Why:
Requested pullup to netbsd-8 as #984
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53506 CVS commit: src/sys/arch/xen/xen
Date: Tue, 21 Aug 2018 18:11:10 +0000
Module Name: src
Committed By: jdolecek
Date: Tue Aug 21 18:11:10 UTC 2018
Modified Files:
src/sys/arch/xen/xen: xbd_xenbus.c
Log Message:
avoid race condition between I/O submission in xbd_diskstart() and
interrupt handling in xbd_handler() - need to protect it with splbio()
fixes PR port-xen/53506 by Emmanuel Dreyfus, and likely also port-xen/53074
by Brad Spencer
To generate a diff of this commit:
cvs rdiff -u -r1.82 -r1.83 src/sys/arch/xen/xen/xbd_xenbus.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53506 CVS commit: src/sys/arch/xen/xen
Date: Tue, 21 Aug 2018 18:55:09 +0000
Module Name: src
Committed By: jdolecek
Date: Tue Aug 21 18:55:09 UTC 2018
Modified Files:
src/sys/arch/xen/xen: xbd_xenbus.c
Log Message:
(only commit message for rev 1.85, no actual change)
shuffle code in xbd_handler() response loop so that req_bp is never
touched for non I/O responses; unknown commands are now just skipped, also
without touching neither xbdreq, nor req_bp
add some KASSERTS() for xbdreq->req_nr_segments and xbdreq->req_bp,
and reset req_bp after it's processed to ensure the buf can never
be processed twice
intended to simplify debugging in cases like PR port-xen/53506
To generate a diff of this commit:
cvs rdiff -u -r1.85 -r1.86 src/sys/arch/xen/xen/xbd_xenbus.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53506 CVS commit: [netbsd-8] src/sys/arch/xen/xen
Date: Sat, 25 Aug 2018 14:28:01 +0000
Module Name: src
Committed By: martin
Date: Sat Aug 25 14:28:00 UTC 2018
Modified Files:
src/sys/arch/xen/xen [netbsd-8]: xbd_xenbus.c
Log Message:
Pull up following revision(s) (requested by jdolecek in ticket #984):
sys/arch/xen/xen/xbd_xenbus.c: revision 1.83
avoid race condition between I/O submission in xbd_diskstart() and
interrupt handling in xbd_handler() - need to protect it with splbio()
fixes PR port-xen/53506 by Emmanuel Dreyfus, and likely also port-xen/53074
by Brad Spencer
To generate a diff of this commit:
cvs rdiff -u -r1.76 -r1.76.6.1 src/sys/arch/xen/xen/xbd_xenbus.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sat, 06 Oct 2018 20:23:35 +0000
State-Changed-Why:
Pullups done. Thanks for report.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.