NetBSD Problem Report #58395
From www@netbsd.org Wed Jul 3 13:38:22 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 73CDD1A923A
for <gnats-bugs@gnats.NetBSD.org>; Wed, 3 Jul 2024 13:38:22 +0000 (UTC)
Message-Id: <20240703133821.1E7351A923C@mollari.NetBSD.org>
Date: Wed, 3 Jul 2024 13:38:21 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: panic: HYPERVISOR_mmu_update failed, ret: -22
X-Send-Pr-Version: www-1.0
>Number: 58395
>Category: port-xen
>Synopsis: panic: HYPERVISOR_mmu_update failed, ret: -22
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-xen-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jul 03 13:40:00 +0000 2024
>Last-Modified: Thu Jul 04 16:20:01 +0000 2024
>Originator: Taylor R Campbell
>Release: 10
>Organization:
The NetBSD FoundEINVAL
>Environment:
NetBSD 10.0 (XEN3_DOMU) #0: Thu Mar 28 08:33:33 UTC 2024
amd64
>Description:
When booting NetBSD 10 on tornadovps, which sets `gnttab=max-ver:1':
[ 1.0000000] xpq_flush_queue: 1 entries (0 successful) on cpu0 (0)
[ 1.0000000] panic: HYPERVISOR_mmu_update failed, ret: -22
panic() at netbsd:panic+0x3c
xpq_flush_queue() at netbsd:xpq_flush_queue+0x100
pmap_kenter_ma() at netbsd:pmap_kenter_ma+0xcb
xengnt_finish_init() at netbsd:xengnt_finish_init+0xe9
xengnt_init() at netbsd:xengnt_init+0x139
hypervisor_attach() at netbsd:hypervisor_attach+0x279
>How-To-Repeat:
boot NetBSD 10 on tornadovps
>Fix:
Yes, please!
>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Cc:
Subject: Re: port-xen/58395: panic: HYPERVISOR_mmu_update failed, ret: -22
Date: Thu, 4 Jul 2024 11:53:29 +0000
Some more details:
1. hypervisor0 at mainbus0: Xen version 4.14.0.88.g1d1d1f53
2. At the time of xengnt_init, GNTTABOP_query_size returns:
- nr_frames=3D32
- max_nr_frames=3D64
3. The pfns returned by GNTTABOP_setup_table look reasonable at first,
e.g.:
xengnt_more_entries: pages[0]@0xffffd380025b9c00=3D54d87
xengnt_more_entries: pages[1]@0xffffd380025b9c08=3D54d86
xengnt_more_entries: pages[2]@0xffffd380025b9c10=3D54d85
xengnt_more_entries: pages[3]@0xffffd380025b9c18=3D54d84
xengnt_more_entries: pages[4]@0xffffd380025b9c20=3D54df3
...
xengnt_more_entries: pages[27]@0xffffd380025c04d8=3D54d94
xengnt_more_entries: pages[28]@0xffffd380025c04e0=3D54d93
xengnt_more_entries: pages[29]@0xffffd380025c04e8=3D54d92
xengnt_more_entries: pages[30]@0xffffd380025c04f0=3D54d91
xengnt_more_entries: pages[31]@0xffffd380025c04f8=3D54d90
4. Both (2) and (3) remain true until we call GNTTABOP_setup_table
with nr_frames=3D33, at which point:
- GNTTABOP_setup_table returns a pfn of -1 (i.e., ffffffffffffffff,
all bits set), but only for frame 32, and it still returns zero
and sets op.status =3D GNTST_okay indicating success
- GNTTABOP_query_size returns nr_frames=3D33 as expected
xengnt_more_entries: GNTTABOP_query_size before: rc=3D0 nr_frames=3D32 m=
ax_nr_frames=3D64 status=3D0
xengnt_more_entries: pages=3D0xffffd3800297c5c0 n=3D33
xengnt_more_entries: pages[28]@0xffffd3800297c6a0=3D54d93
xengnt_more_entries: pages[29]@0xffffd3800297c6a8=3D54d92
xengnt_more_entries: pages[30]@0xffffd3800297c6b0=3D54d91
xengnt_more_entries: pages[31]@0xffffd3800297c6b8=3D54d90
xengnt_more_entries: pages[32]@0xffffd3800297c6c0=3Dffffffffffffffff
xengnt_more_entries: GNTTABOP_query_size after: rc=3D0 nr_frames=3D33 ma=
x_nr_frames=3D64 status=3D0
xengnt_more_entries then passes pfn=3D-1 into pmap_kenter_ma which
passes it through to HYPERVISOR_mmu_update which fails with EINVAL
(22) presumably because pfn=3D-1 is invalid.
If I patch xengnt_init to do
- gnt_max_grant_frames =3D query.max_nr_frames;
+ gnt_max_grant_frames =3D MIN(32, query.max_nr_frames);
then the kernel boots just fine in this environment.
I'm guessing that setting max_grant_frames=3D32 in the domU's xl.conf
would also work but I don't have control over that.
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: port-xen/58395: panic: HYPERVISOR_mmu_update failed, ret: -22
Date: Thu, 4 Jul 2024 18:10:16 +0200
On Thu, Jul 04, 2024 at 11:53:29AM +0000, Taylor R Campbell wrote:
> Some more details:
>
> 1. hypervisor0 at mainbus0: Xen version 4.14.0.88.g1d1d1f53
>
> 2. At the time of xengnt_init, GNTTABOP_query_size returns:
> - nr_frames=32
> - max_nr_frames=64
>
> 3. The pfns returned by GNTTABOP_setup_table look reasonable at first,
> e.g.:
>
> xengnt_more_entries: pages[0]@0xffffd380025b9c00=54d87
> xengnt_more_entries: pages[1]@0xffffd380025b9c08=54d86
> xengnt_more_entries: pages[2]@0xffffd380025b9c10=54d85
> xengnt_more_entries: pages[3]@0xffffd380025b9c18=54d84
> xengnt_more_entries: pages[4]@0xffffd380025b9c20=54df3
> ...
> xengnt_more_entries: pages[27]@0xffffd380025c04d8=54d94
> xengnt_more_entries: pages[28]@0xffffd380025c04e0=54d93
> xengnt_more_entries: pages[29]@0xffffd380025c04e8=54d92
> xengnt_more_entries: pages[30]@0xffffd380025c04f0=54d91
> xengnt_more_entries: pages[31]@0xffffd380025c04f8=54d90
>
> 4. Both (2) and (3) remain true until we call GNTTABOP_setup_table
> with nr_frames=33, at which point:
> - GNTTABOP_setup_table returns a pfn of -1 (i.e., ffffffffffffffff,
> all bits set), but only for frame 32, and it still returns zero
> and sets op.status = GNTST_okay indicating success
> - GNTTABOP_query_size returns nr_frames=33 as expected
>
> xengnt_more_entries: GNTTABOP_query_size before: rc=0 nr_frames=32 max_nr_frames=64 status=0
> xengnt_more_entries: pages=0xffffd3800297c5c0 n=33
> xengnt_more_entries: pages[28]@0xffffd3800297c6a0=54d93
> xengnt_more_entries: pages[29]@0xffffd3800297c6a8=54d92
> xengnt_more_entries: pages[30]@0xffffd3800297c6b0=54d91
> xengnt_more_entries: pages[31]@0xffffd3800297c6b8=54d90
> xengnt_more_entries: pages[32]@0xffffd3800297c6c0=ffffffffffffffff
> xengnt_more_entries: GNTTABOP_query_size after: rc=0 nr_frames=33 max_nr_frames=64 status=0
>
> xengnt_more_entries then passes pfn=-1 into pmap_kenter_ma which
> passes it through to HYPERVISOR_mmu_update which fails with EINVAL
> (22) presumably because pfn=-1 is invalid.
>
> If I patch xengnt_init to do
>
> - gnt_max_grant_frames = query.max_nr_frames;
> + gnt_max_grant_frames = MIN(32, query.max_nr_frames);
>
> then the kernel boots just fine in this environment.
>
> I'm guessing that setting max_grant_frames=32 in the domU's xl.conf
> would also work but I don't have control over that.
This looks like a bug in this hypervisor version. Looking at the linux code it
seems to assume that if GNTTABOP_setup_table returned OK then all the entries
are valid. But I don't think linux tries to allocate max_nr_frames
at startup as we do (and I think this was changed between -9 and -10)
so maybe it doens't run into this in common usage.
A workaround could be to fail the request when the last entry of pages[] is
-1, but we'd still need to pmap_kenter_ma() the previous ones.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: port-xen/58395: panic: HYPERVISOR_mmu_update failed, ret: -22
Date: Thu, 4 Jul 2024 18:16:51 +0200
On Thu, Jul 04, 2024 at 06:10:16PM +0200, Manuel Bouyer wrote:
> On Thu, Jul 04, 2024 at 11:53:29AM +0000, Taylor R Campbell wrote:
> > Some more details:
> >
> > 1. hypervisor0 at mainbus0: Xen version 4.14.0.88.g1d1d1f53
> >
> > 2. At the time of xengnt_init, GNTTABOP_query_size returns:
> > - nr_frames=32
> > - max_nr_frames=64
> >
> > 3. The pfns returned by GNTTABOP_setup_table look reasonable at first,
> > e.g.:
> >
> > xengnt_more_entries: pages[0]@0xffffd380025b9c00=54d87
> > xengnt_more_entries: pages[1]@0xffffd380025b9c08=54d86
> > xengnt_more_entries: pages[2]@0xffffd380025b9c10=54d85
> > xengnt_more_entries: pages[3]@0xffffd380025b9c18=54d84
> > xengnt_more_entries: pages[4]@0xffffd380025b9c20=54df3
> > ...
> > xengnt_more_entries: pages[27]@0xffffd380025c04d8=54d94
> > xengnt_more_entries: pages[28]@0xffffd380025c04e0=54d93
> > xengnt_more_entries: pages[29]@0xffffd380025c04e8=54d92
> > xengnt_more_entries: pages[30]@0xffffd380025c04f0=54d91
> > xengnt_more_entries: pages[31]@0xffffd380025c04f8=54d90
> >
> > 4. Both (2) and (3) remain true until we call GNTTABOP_setup_table
> > with nr_frames=33, at which point:
> > - GNTTABOP_setup_table returns a pfn of -1 (i.e., ffffffffffffffff,
> > all bits set), but only for frame 32, and it still returns zero
> > and sets op.status = GNTST_okay indicating success
> > - GNTTABOP_query_size returns nr_frames=33 as expected
> >
> > xengnt_more_entries: GNTTABOP_query_size before: rc=0 nr_frames=32 max_nr_frames=64 status=0
> > xengnt_more_entries: pages=0xffffd3800297c5c0 n=33
> > xengnt_more_entries: pages[28]@0xffffd3800297c6a0=54d93
> > xengnt_more_entries: pages[29]@0xffffd3800297c6a8=54d92
> > xengnt_more_entries: pages[30]@0xffffd3800297c6b0=54d91
> > xengnt_more_entries: pages[31]@0xffffd3800297c6b8=54d90
> > xengnt_more_entries: pages[32]@0xffffd3800297c6c0=ffffffffffffffff
> > xengnt_more_entries: GNTTABOP_query_size after: rc=0 nr_frames=33 max_nr_frames=64 status=0
> >
> > xengnt_more_entries then passes pfn=-1 into pmap_kenter_ma which
> > passes it through to HYPERVISOR_mmu_update which fails with EINVAL
> > (22) presumably because pfn=-1 is invalid.
> >
> > If I patch xengnt_init to do
> >
> > - gnt_max_grant_frames = query.max_nr_frames;
> > + gnt_max_grant_frames = MIN(32, query.max_nr_frames);
> >
> > then the kernel boots just fine in this environment.
> >
> > I'm guessing that setting max_grant_frames=32 in the domU's xl.conf
> > would also work but I don't have control over that.
>
> This looks like a bug in this hypervisor version. Looking at the linux code it
> seems to assume that if GNTTABOP_setup_table returned OK then all the entries
> are valid. But I don't think linux tries to allocate max_nr_frames
> at startup as we do (and I think this was changed between -9 and -10)
> so maybe it doens't run into this in common usage.
>
> A workaround could be to fail the request when the last entry of pages[] is
> -1, but we'd still need to pmap_kenter_ma() the previous ones.
But it's not that simple, given how the code is actually structured.
We can do this on boot but not on resume.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.