NetBSD Problem Report #53506

From www@NetBSD.org  Tue Aug  7 16:37:43 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4E69D7A1FE
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  7 Aug 2018 16:37:43 +0000 (UTC)
Message-Id: <20180807163742.087D57A210@mollari.NetBSD.org>
Date: Tue,  7 Aug 2018 16:37:42 +0000 (UTC)
From: manu@netbsd.org
Reply-To: manu@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: Easy trigger of "panic: biodone2 already" on NetBSD 8.0 Xen domU
X-Send-Pr-Version: www-1.0

>Number:         53506
>Category:       port-xen
>Synopsis:       Easy trigger of "panic: biodone2 already" on NetBSD 8.0 Xen domU
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    jdolecek
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Aug 07 16:40:00 +0000 2018
>Closed-Date:    Sat Oct 06 20:23:35 +0000 2018
>Last-Modified:  Sat Oct 06 20:23:35 +0000 2018
>Originator:     Emmanuel Dreyfus
>Release:        NetBSD 8.0
>Organization:
NetBSD
>Environment:
NetBSD bacasable 8.0 NetBSD 8.0 (XEN3PAE_DOMU) #0: Tue Jul 17 14:59:51 UTC 2018  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/xen/compile/XEN3PAE_DOMU i386
>Description:
On NetBSD 8.0 domU (either i386 and amd64), running dump and a concurent filesystem access crashes the kernel with "panic: biodone2 already". Stack trace is below.


panic: biodone2 already
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip 0xc0105334 cs 0x9 eflags 0x246 cr2 0xbb56d000 ilevel 0 esp 0xda67ff2c
curlwp 0xc2d4e7e0 pid 0 lid 4 lowest kstack 0xda67e2c0
Stopped in pid 0.4 (system) at  netbsd:breakpoint+0x4:  popl    %ebp
breakpoint(c04da2c8,c06d84e0,c04f9e2c,da67ff48,c42a1718,c057fe00,da6b82cc,da67ff3c,c03c4e48,c04f9e2c) at netbsd:breakpoint+0x4
vpanic(c04f9e2c,da67ff48,da67ff54,c03fdd2f,c04f9e2c,c0114934,c42a1718,c057fe00,da67ff68,c03fdd5d) at netbsd:vpanic+0x12d
panic(c04f9e2c,c0114934,c42a1718,c057fe00,da67ff68,c03fdd5d,0,0,da6b8004,da67ff8c) at netbsd:panic+0x18
biodone2(0,0,da6b8004,da67ff8c,c039e66d,0,da67ff88,c2d4e7e0,0,da6b8004) at netbsd:biodone2+0xff
biointr(0,da67ff88,c2d4e7e0,0,da6b8004,c039e610,c2d4e7e0,0,c0102031,da6b8004) at netbsd:biointr+0x2d
softint_thread(da6b8004,96e000,c0595200,0,c0100075,0,0,0,0,0) at netbsd:softint_thread+0x5d


Relevant processes:
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
446      1 2   0         0           c3eef2a0               grep
562      1 3   0        80           c33cfaa0               dump lockf
169      1 3   0        80           c345d800               dump lockf
1487     1 3   0         0           c3f9cd40               dump physio
145      1 3   0        80           c3eef7e0               dump netio
1351     1 3   0        80           c33f27e0               dump wait


>How-To-Repeat:
Run as root in two different terminals:

/sbin/dump -a0f /dev/null /
grep -r something /var
>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Tue, 14 Aug 2018 17:26:53 +0000
Responsible-Changed-Why:
I'd like to look at this to get this fixed for 8.1


From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53506: Easy trigger of "panic: biodone2 already" on NetBSD
 8.0 Xen domU
Date: Mon, 20 Aug 2018 01:38:59 +0000

 --/04w6evG8XlLl3ft
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 The attached patch clears the struct xbd_req content between usage, 
 while there adds a few sanity checks. At mine it is enough to 
 prevent any panic on biodone2 already.

 If it looks good enough, I think we should pull it up to netbsd-8 so
 that backups are possible again. The code can be more heavily 
 reworked on -current afterwards.

 --  
 Emmanuel Dreyfus
 manu@netbsd.org

 --/04w6evG8XlLl3ft
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=patch-biodone2already

 --- ./sys/arch/xen/xen/xbd_xenbus.c.orig
 +++ ./sys/arch/xen/xen/xbd_xenbus.c
 @@ -222,8 +222,21 @@
  	.d_close = xbdclose,
  	.d_diskstart = xbd_diskstart,
  };

 +static inline void
 +xbd_xbdreq_clear(struct xbd_xenbus_softc *sc, struct xbd_req *xbdreq)
 +{       
 +	int i = xbdreq - sc->sc_reqs;
 +
 +	KASSERT(xbdreq->req_id == i);
 +	memset(xbdreq, 0, sizeof(*xbdreq));
 +	xbdreq->req_id = i;
 +
 +	return;
 +}
 +
 +
  static int
  xbd_xenbus_match(device_t parent, cfdata_t match, void *aux)
  {
  	struct xenbusdev_attach_args *xa = aux;
 @@ -652,8 +665,9 @@
  	for (i = sc->sc_ring.rsp_cons; i != resp_prod; i++) {
  		blkif_response_t *rep = RING_GET_RESPONSE(&sc->sc_ring, i);
  		struct xbd_req *xbdreq = &sc->sc_reqs[rep->id];
  		bp = xbdreq->req_bp;
 +		KASSERT(bp != NULL);
  		DPRINTF(("xbd_handler(%p): b_bcount = %ld\n",
  		    xbdreq->req_bp, (long)bp->b_bcount));
  		if (rep->operation == BLKIF_OP_FLUSH_DISKCACHE) {
  			xbdreq->req_sync.s_error = rep->status;
 @@ -693,8 +707,9 @@
  			xbd_unmap_align(xbdreq);

  		dk_done(&sc->sc_dksc, bp);

 +		xbdreq->req_bp = NULL; /* Do not reuse */
  		SLIST_INSERT_HEAD(&sc->sc_xbdreq_head, xbdreq, req_next);
  	}
  done:
  	xen_rmb();
 @@ -854,8 +869,9 @@
  		if (__predict_false(xbdreq == NULL)) {
  			DPRINTF(("xbdioctl: no req\n"));
  			error = ENOMEM;
  		} else {
 +			xbd_xbdreq_clear(sc, xbdreq);
  			SLIST_REMOVE_HEAD(&sc->sc_xbdreq_head, req_next);
  			req = RING_GET_REQUEST(&sc->sc_ring,
  			    sc->sc_ring.req_prod_pvt);
  			req->id = xbdreq->req_id;
 @@ -952,8 +968,9 @@
  		error = EAGAIN;
  		goto out;
  	}

 +	xbd_xbdreq_clear(sc, xbdreq);
  	xbdreq->req_bp = bp;
  	xbdreq->req_data = bp->b_data;
  	if ((vaddr_t)bp->b_data & (XEN_BSIZE - 1)) {
  		if (__predict_false(xbd_map_align(xbdreq) != 0)) {

 --/04w6evG8XlLl3ft--

From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: Emmanuel Dreyfus <manu@netbsd.org>
Subject: Re: kern/53506: Easy trigger of "panic: biodone2 already" on NetBSD
 8.0 Xen domU
Date: Mon, 20 Aug 2018 21:51:11 +0200

 2018-08-20 3:40 GMT+02:00 Emmanuel Dreyfus <manu@netbsd.org>:
 >  The attached patch clears the struct xbd_req content between usage,
 >  while there adds a few sanity checks. At mine it is enough to
 >  prevent any panic on biodone2 already.

 This is interesting, since according to how I read the code, it should
 not actually change the behaviour. The fields are not checked
 anywhere, so it should just process the thing same way as before and
 hence panic. Also very likely, if you add xbd_xbdreq_clear() also for
 the DIOCCACHESYNC path, it would trigger panic too :)

 I've noticed there is possibly missing splbio()/splx() in
 xbd_diskstart(). Can you try if following patch (without your changes)
 changes the behaviour?

 http://www.netbsd.org/~jdolecek/xbd_diskstart_splbio.diff

 Jaromir

From: manu@netbsd.org (Emmanuel Dreyfus)
To: gnats-bugs@NetBSD.org, jdolecek@NetBSD.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/53506: Easy trigger of "panic: biodone2 already" on NetBSD 8.0 Xen domU
Date: Tue, 21 Aug 2018 02:54:26 +0200

 Jarom=EDr Dole?ek <jaromir.dolecek@gmail.com> wrote:

 >  I've noticed there is possibly missing splbio()/splx() in
 >  xbd_diskstart(). Can you try if following patch (without your changes)
 >  changes the behaviour?
 > =20
 >  http://www.netbsd.org/~jdolecek/xbd_diskstart_splbio.diff

 It also works around the panic biodone2 already. It seems my patch was
 enough to win a race but yours is the right fix. Will you commit this
 change and pullup to netbsd-8 (I tested on netbsd-8-0-RELEASE)?

 --=20
 Emmanuel Dreyfus
 http://hcpnet.free.fr/pubz
 manu@netbsd.org

State-Changed-From-To: open->pending-pullups
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Tue, 21 Aug 2018 18:14:52 +0000
State-Changed-Why:
Requested pullup to netbsd-8 as #984


From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53506 CVS commit: src/sys/arch/xen/xen
Date: Tue, 21 Aug 2018 18:11:10 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Tue Aug 21 18:11:10 UTC 2018

 Modified Files:
 	src/sys/arch/xen/xen: xbd_xenbus.c

 Log Message:
 avoid race condition between I/O submission in xbd_diskstart() and
 interrupt handling in xbd_handler() - need to protect it with splbio()

 fixes PR port-xen/53506 by Emmanuel Dreyfus, and likely also port-xen/53074
 by Brad Spencer


 To generate a diff of this commit:
 cvs rdiff -u -r1.82 -r1.83 src/sys/arch/xen/xen/xbd_xenbus.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53506 CVS commit: src/sys/arch/xen/xen
Date: Tue, 21 Aug 2018 18:55:09 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Tue Aug 21 18:55:09 UTC 2018

 Modified Files:
 	src/sys/arch/xen/xen: xbd_xenbus.c

 Log Message:
 (only commit message for rev 1.85, no actual change)

 shuffle code in xbd_handler() response loop so that req_bp is never
 touched for non I/O responses; unknown commands are now just skipped, also
 without touching neither xbdreq, nor req_bp

 add some KASSERTS() for xbdreq->req_nr_segments and xbdreq->req_bp,
 and reset req_bp after it's processed to ensure the buf can never
 be processed twice

 intended to simplify debugging in cases like PR port-xen/53506


 To generate a diff of this commit:
 cvs rdiff -u -r1.85 -r1.86 src/sys/arch/xen/xen/xbd_xenbus.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53506 CVS commit: [netbsd-8] src/sys/arch/xen/xen
Date: Sat, 25 Aug 2018 14:28:01 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sat Aug 25 14:28:00 UTC 2018

 Modified Files:
 	src/sys/arch/xen/xen [netbsd-8]: xbd_xenbus.c

 Log Message:
 Pull up following revision(s) (requested by jdolecek in ticket #984):

 	sys/arch/xen/xen/xbd_xenbus.c: revision 1.83

 avoid race condition between I/O submission in xbd_diskstart() and
 interrupt handling in xbd_handler() - need to protect it with splbio()

 fixes PR port-xen/53506 by Emmanuel Dreyfus, and likely also port-xen/53074
 by Brad Spencer


 To generate a diff of this commit:
 cvs rdiff -u -r1.76 -r1.76.6.1 src/sys/arch/xen/xen/xbd_xenbus.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sat, 06 Oct 2018 20:23:35 +0000
State-Changed-Why:
Pullups done. Thanks for report.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.