NetBSD Problem Report #56328
From oster@gonzo.fween.ca Sat Jul 24 22:54:33 2021
Return-Path: <oster@gonzo.fween.ca>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 203B01A921F
for <gnats-bugs@gnats.NetBSD.org>; Sat, 24 Jul 2021 22:54:33 +0000 (UTC)
Message-Id: <20210724225430.F14A41047FB@gonzo.fween.ca>
Date: Sat, 24 Jul 2021 16:54:30 -0600 (CST)
From: oster@netbsd.org
Reply-To: oster@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: XEN3_DOM0 bcount <= MAXPHYS panic...
X-Send-Pr-Version: 3.95
>Number: 56328
>Category: port-xen
>Synopsis: XEN3 DOM0 panic with bcount <= MAXPHYS in xbdback_xenbus.c
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: jdolecek
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jul 24 22:55:00 +0000 2021
>Closed-Date: Thu Jul 29 06:42:20 +0000 2021
>Last-Modified: Thu Jul 29 06:42:20 +0000 2021
>Originator: Greg Oster
>Release: NetBSD 9.99.87
>Organization:
>Environment:
System: NetBSD merlin 9.99.87 NetBSD 9.99.87 (XEN3_DOM0) #0: Fri Jul 23 22:39:05 CST 2021 oster@gonzo:/u1/builds/build350/src/obj/amd64/u1/builds/build350/src/sys/arch/amd64/compile/XEN3_DOM0 amd64
Architecture: x86_64
Machine: amd64
>Description:
When a Ubuntu 20.04.2 DOMU (with a 4.6.7 Linux kernel) is running on a
NetBSD 9.99.87 DOM0, the following panic is encountered:
[50.8700852] panic: kernel diagnostic assertion "bcount <= MAXPHYS" failed:
file "/u1/builds/build358/src/sys/arch/xen/xen/xbdback_xenbus.c", line 1320
[50.8700852] cpu0: Begin traceback...
[50.8700852] vpanic() at netbsd:vpanic+0x14a
[50.8700852] kern_assert at netbsd:kern_assert+0x4b
[50.8700852] xbdback_co_io_gotio() at netbsd:xbdback_co_io_gotio+0x202
[50.8700852] xbdback_thread() at netbsd:xbdback_thread+0x9b
[50.8700852] cpu0: End traceback...
when the domain is started.
>How-To-Repeat:
Upgrade a DOM0 kernel from NetBSD 9.2 to 9.99.87. Watch the machine panic
after starting DOMUs. Hunt for a fix, and discover that bcount is
MAXPHYS+PAGE_SIZE (i.e. 69632), and that there are 16 segments in the IO
that is failing. Wonder how a DOMU could have issued an IO larger than
MAXPHYS, and then realize the Ubuntu DOMU might be doing that. Test booting
without the Ubuntu DOMU, and it works fine. Add Ubuntu DOMU back, and it fails.
Ubuntu DOMU is:
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 4.6.7-040607-generic x86_64)
>Fix:
Please. :)
(I'm happy to help test any proposed fix, as it's easy to replicate the issue
on my Xen machine.)
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: port-xen-maintainer->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Sun, 25 Jul 2021 06:18:53 +0000
Responsible-Changed-Why:
I'll check this. It's either Linux kernel not honoring max number of segments,
or an off-by-one on our side setting it.
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/56328 (XEN3 DOM0 panic with bcount <= MAXPHYS in xbdback_xenbus.c)
Date: Sun, 25 Jul 2021 07:07:09 -0000 (UTC)
jdolecek@NetBSD.org writes:
>I'll check this. It's either Linux kernel not honoring max number of segments,
>or an off-by-one on our side setting it.
I'm not sure if there is a size limit to a segment, so one segment
could transfer up to 256 sectors (128kB, with optionally sectors larger
than 512 bytes even more). Linux uses a segment per 4k page though.
But there can also be more than BLKIF_MAX_SEGMENTS_PER_REQUEST=11 segments in
an indirect page. With 4k page size, it fits 256 descriptors (probably 255
since nr_segments is uint8_t).
We still accumulate the whole request into a single buffer.
The driver needs to learn to split requests into multiple I/O operations
and it probably shouldn't panic when it gets a request it doesn't support
but return an error to the frontend instead.
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56328 CVS commit: src/sys/arch/xen/xen
Date: Wed, 28 Jul 2021 21:38:50 +0000
Module Name: src
Committed By: jdolecek
Date: Wed Jul 28 21:38:50 UTC 2021
Modified Files:
src/sys/arch/xen/xen: xbdback_xenbus.c
Log Message:
fix intentional, but eventually faulty off-by-one for the maximum number
of segments for I/O - this was supposed to allow MAXPHYS-size I/O even
with page offset, but actually ended up letting through I/O up to
MAXPHYS+PAGE_SIZE
the KASSERT(bcount < MAXPHYS) is kept as-is, since at that place the number
of segments should already be validated, so it's kernel bug if the size
is still too big there
fixes PR port-xen/56328 by Greg Oster
To generate a diff of this commit:
cvs rdiff -u -r1.97 -r1.98 src/sys/arch/xen/xen/xbdback_xenbus.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Greg Oster <oster@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: PR/56328 CVS commit: src/sys/arch/xen/xen
Date: Wed, 28 Jul 2021 15:57:33 -0600
It gets further now... but now dies at:
[ 52.3001210] panic: kernel diagnostic assertion "bcount +
xbd_io->xio_start_offset < VBD_VBA_SIZE" failed: file
"/u1/builds/build350/src/sys/arch/xen/xen/xbdback_xenbus.c", line 1314
[ 52.3001210] cpu0: Begin traceback
[ 52.3001210] vpanic() at netbsd:vpanic+0x14a
[ 52.3001210] kern_assert() at netbsd:kern_assert+0x4b
[ 52.3001210] xbdback_co_io_gotio() at netbsd:xbdback_co_io_gotio+0x3cb
[ 52.3001210] xbdback_thread() at netbsd:xbdback_thread+0x9b
[ 52.3001210] cpu0: End traceback...
Thanks!
On 2021-07-28 3:40 p.m., Jaromir Dolecek wrote:
> The following reply was made to PR port-xen/56328; it has been noted by GNATS.
>
> From: "Jaromir Dolecek" <jdolecek@netbsd.org>
> To: gnats-bugs@gnats.NetBSD.org
> Cc:
> Subject: PR/56328 CVS commit: src/sys/arch/xen/xen
> Date: Wed, 28 Jul 2021 21:38:50 +0000
>
> Module Name: src
> Committed By: jdolecek
> Date: Wed Jul 28 21:38:50 UTC 2021
>
> Modified Files:
> src/sys/arch/xen/xen: xbdback_xenbus.c
>
> Log Message:
> fix intentional, but eventually faulty off-by-one for the maximum number
> of segments for I/O - this was supposed to allow MAXPHYS-size I/O even
> with page offset, but actually ended up letting through I/O up to
> MAXPHYS+PAGE_SIZE
>
> the KASSERT(bcount < MAXPHYS) is kept as-is, since at that place the number
> of segments should already be validated, so it's kernel bug if the size
> is still too big there
>
> fixes PR port-xen/56328 by Greg Oster
>
>
> To generate a diff of this commit:
> cvs rdiff -u -r1.97 -r1.98 src/sys/arch/xen/xen/xbdback_xenbus.c
>
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
>
>
State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Wed, 28 Jul 2021 22:14:25 +0000
State-Changed-Why:
Can you check if the committed change fixes the issue, please?
State-Changed-From-To: feedback->analyzed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Wed, 28 Jul 2021 22:15:17 +0000
State-Changed-Why:
OK actually got the feedback. More work to fix this completely.
State-Changed-From-To: analyzed->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Wed, 28 Jul 2021 22:18:24 +0000
State-Changed-Why:
OK, KASSERT() was wrong. Fixed now, can you check?
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56328 CVS commit: src/sys/arch/xen/xen
Date: Wed, 28 Jul 2021 22:17:49 +0000
Module Name: src
Committed By: jdolecek
Date: Wed Jul 28 22:17:49 UTC 2021
Modified Files:
src/sys/arch/xen/xen: xbdback_xenbus.c
Log Message:
fix off-by-one check in another KASSERT() for bcount
still related to PR port-xen/56328
To generate a diff of this commit:
cvs rdiff -u -r1.98 -r1.99 src/sys/arch/xen/xen/xbdback_xenbus.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Greg Oster <oster@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-xen/56328 (XEN3 DOM0 panic with bcount <= MAXPHYS in
xbdback_xenbus.c)
Date: Wed, 28 Jul 2021 16:36:18 -0600
Yes, all working great now! Thanks for the quick fix!
On 2021-07-28 4:18 p.m., jdolecek@NetBSD.org wrote:
> Synopsis: XEN3 DOM0 panic with bcount <= MAXPHYS in xbdback_xenbus.c
>
> State-Changed-From-To: analyzed->feedback
> State-Changed-By: jdolecek@NetBSD.org
> State-Changed-When: Wed, 28 Jul 2021 22:18:24 +0000
> State-Changed-Why:
> OK, KASSERT() was wrong. Fixed now, can you check?
>
>
State-Changed-From-To: feedback->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Thu, 29 Jul 2021 06:42:20 +0000
State-Changed-Why:
Confirmed fixed. Thanks for report.
Nothing to pullup, related changes are in -current only.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.