NetBSD Problem Report #58395

From www@netbsd.org  Wed Jul  3 13:38:22 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 73CDD1A923A
	for <gnats-bugs@gnats.NetBSD.org>; Wed,  3 Jul 2024 13:38:22 +0000 (UTC)
Message-Id: <20240703133821.1E7351A923C@mollari.NetBSD.org>
Date: Wed,  3 Jul 2024 13:38:21 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: panic: HYPERVISOR_mmu_update failed, ret: -22
X-Send-Pr-Version: www-1.0

>Number:         58395
>Category:       port-xen
>Synopsis:       panic: HYPERVISOR_mmu_update failed, ret: -22
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-xen-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 03 13:40:00 +0000 2024
>Last-Modified:  Thu Jul 04 16:20:01 +0000 2024
>Originator:     Taylor R Campbell
>Release:        10
>Organization:
The NetBSD FoundEINVAL
>Environment:
NetBSD 10.0 (XEN3_DOMU) #0: Thu Mar 28 08:33:33 UTC 2024
amd64
>Description:
When booting NetBSD 10 on tornadovps, which sets `gnttab=max-ver:1':

[   1.0000000] xpq_flush_queue: 1 entries (0 successful) on cpu0 (0)
[   1.0000000] panic: HYPERVISOR_mmu_update failed, ret: -22

panic() at netbsd:panic+0x3c
xpq_flush_queue() at netbsd:xpq_flush_queue+0x100
pmap_kenter_ma() at netbsd:pmap_kenter_ma+0xcb
xengnt_finish_init() at netbsd:xengnt_finish_init+0xe9
xengnt_init() at netbsd:xengnt_init+0x139
hypervisor_attach() at netbsd:hypervisor_attach+0x279

>How-To-Repeat:
boot NetBSD 10 on tornadovps
>Fix:
Yes, please!

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Cc: 
Subject: Re: port-xen/58395: panic: HYPERVISOR_mmu_update failed, ret: -22
Date: Thu, 4 Jul 2024 11:53:29 +0000

 Some more details:

 1. hypervisor0 at mainbus0: Xen version 4.14.0.88.g1d1d1f53

 2. At the time of xengnt_init, GNTTABOP_query_size returns:
    - nr_frames=3D32
    - max_nr_frames=3D64

 3. The pfns returned by GNTTABOP_setup_table look reasonable at first,
    e.g.:

    xengnt_more_entries: pages[0]@0xffffd380025b9c00=3D54d87
    xengnt_more_entries: pages[1]@0xffffd380025b9c08=3D54d86
    xengnt_more_entries: pages[2]@0xffffd380025b9c10=3D54d85
    xengnt_more_entries: pages[3]@0xffffd380025b9c18=3D54d84
    xengnt_more_entries: pages[4]@0xffffd380025b9c20=3D54df3
    ...
    xengnt_more_entries: pages[27]@0xffffd380025c04d8=3D54d94
    xengnt_more_entries: pages[28]@0xffffd380025c04e0=3D54d93
    xengnt_more_entries: pages[29]@0xffffd380025c04e8=3D54d92
    xengnt_more_entries: pages[30]@0xffffd380025c04f0=3D54d91
    xengnt_more_entries: pages[31]@0xffffd380025c04f8=3D54d90

 4. Both (2) and (3) remain true until we call GNTTABOP_setup_table
    with nr_frames=3D33, at which point:
    - GNTTABOP_setup_table returns a pfn of -1 (i.e., ffffffffffffffff,
      all bits set), but only for frame 32, and it still returns zero
      and sets op.status =3D GNTST_okay indicating success
    - GNTTABOP_query_size returns nr_frames=3D33 as expected

    xengnt_more_entries: GNTTABOP_query_size before: rc=3D0 nr_frames=3D32 m=
 ax_nr_frames=3D64 status=3D0
    xengnt_more_entries: pages=3D0xffffd3800297c5c0 n=3D33
    xengnt_more_entries: pages[28]@0xffffd3800297c6a0=3D54d93
    xengnt_more_entries: pages[29]@0xffffd3800297c6a8=3D54d92
    xengnt_more_entries: pages[30]@0xffffd3800297c6b0=3D54d91
    xengnt_more_entries: pages[31]@0xffffd3800297c6b8=3D54d90
    xengnt_more_entries: pages[32]@0xffffd3800297c6c0=3Dffffffffffffffff
    xengnt_more_entries: GNTTABOP_query_size after: rc=3D0 nr_frames=3D33 ma=
 x_nr_frames=3D64 status=3D0

 xengnt_more_entries then passes pfn=3D-1 into pmap_kenter_ma which
 passes it through to HYPERVISOR_mmu_update which fails with EINVAL
 (22) presumably because pfn=3D-1 is invalid.

 If I patch xengnt_init to do

 -		gnt_max_grant_frames =3D query.max_nr_frames;
 +		gnt_max_grant_frames =3D MIN(32, query.max_nr_frames);

 then the kernel boots just fine in this environment.

 I'm guessing that setting max_grant_frames=3D32 in the domU's xl.conf
 would also work but I don't have control over that.

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: port-xen/58395: panic: HYPERVISOR_mmu_update failed, ret: -22
Date: Thu, 4 Jul 2024 18:10:16 +0200

 On Thu, Jul 04, 2024 at 11:53:29AM +0000, Taylor R Campbell wrote:
 > Some more details:
 > 
 > 1. hypervisor0 at mainbus0: Xen version 4.14.0.88.g1d1d1f53
 > 
 > 2. At the time of xengnt_init, GNTTABOP_query_size returns:
 >    - nr_frames=32
 >    - max_nr_frames=64
 > 
 > 3. The pfns returned by GNTTABOP_setup_table look reasonable at first,
 >    e.g.:
 > 
 >    xengnt_more_entries: pages[0]@0xffffd380025b9c00=54d87
 >    xengnt_more_entries: pages[1]@0xffffd380025b9c08=54d86
 >    xengnt_more_entries: pages[2]@0xffffd380025b9c10=54d85
 >    xengnt_more_entries: pages[3]@0xffffd380025b9c18=54d84
 >    xengnt_more_entries: pages[4]@0xffffd380025b9c20=54df3
 >    ...
 >    xengnt_more_entries: pages[27]@0xffffd380025c04d8=54d94
 >    xengnt_more_entries: pages[28]@0xffffd380025c04e0=54d93
 >    xengnt_more_entries: pages[29]@0xffffd380025c04e8=54d92
 >    xengnt_more_entries: pages[30]@0xffffd380025c04f0=54d91
 >    xengnt_more_entries: pages[31]@0xffffd380025c04f8=54d90
 > 
 > 4. Both (2) and (3) remain true until we call GNTTABOP_setup_table
 >    with nr_frames=33, at which point:
 >    - GNTTABOP_setup_table returns a pfn of -1 (i.e., ffffffffffffffff,
 >      all bits set), but only for frame 32, and it still returns zero
 >      and sets op.status = GNTST_okay indicating success
 >    - GNTTABOP_query_size returns nr_frames=33 as expected
 > 
 >    xengnt_more_entries: GNTTABOP_query_size before: rc=0 nr_frames=32 max_nr_frames=64 status=0
 >    xengnt_more_entries: pages=0xffffd3800297c5c0 n=33
 >    xengnt_more_entries: pages[28]@0xffffd3800297c6a0=54d93
 >    xengnt_more_entries: pages[29]@0xffffd3800297c6a8=54d92
 >    xengnt_more_entries: pages[30]@0xffffd3800297c6b0=54d91
 >    xengnt_more_entries: pages[31]@0xffffd3800297c6b8=54d90
 >    xengnt_more_entries: pages[32]@0xffffd3800297c6c0=ffffffffffffffff
 >    xengnt_more_entries: GNTTABOP_query_size after: rc=0 nr_frames=33 max_nr_frames=64 status=0
 > 
 > xengnt_more_entries then passes pfn=-1 into pmap_kenter_ma which
 > passes it through to HYPERVISOR_mmu_update which fails with EINVAL
 > (22) presumably because pfn=-1 is invalid.
 > 
 > If I patch xengnt_init to do
 > 
 > -		gnt_max_grant_frames = query.max_nr_frames;
 > +		gnt_max_grant_frames = MIN(32, query.max_nr_frames);
 > 
 > then the kernel boots just fine in this environment.
 > 
 > I'm guessing that setting max_grant_frames=32 in the domU's xl.conf
 > would also work but I don't have control over that.

 This looks like a bug in this hypervisor version. Looking at the linux code it
 seems to assume that if GNTTABOP_setup_table returned OK then all the entries
 are valid. But I don't think linux tries to allocate max_nr_frames
 at startup as we do (and I think this was changed between -9 and -10)
 so maybe it doens't run into this in common usage.

 A workaround could be to fail the request when the last entry of pages[] is
 -1, but we'd still need to pmap_kenter_ma() the previous ones.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: port-xen/58395: panic: HYPERVISOR_mmu_update failed, ret: -22
Date: Thu, 4 Jul 2024 18:16:51 +0200

 On Thu, Jul 04, 2024 at 06:10:16PM +0200, Manuel Bouyer wrote:
 > On Thu, Jul 04, 2024 at 11:53:29AM +0000, Taylor R Campbell wrote:
 > > Some more details:
 > > 
 > > 1. hypervisor0 at mainbus0: Xen version 4.14.0.88.g1d1d1f53
 > > 
 > > 2. At the time of xengnt_init, GNTTABOP_query_size returns:
 > >    - nr_frames=32
 > >    - max_nr_frames=64
 > > 
 > > 3. The pfns returned by GNTTABOP_setup_table look reasonable at first,
 > >    e.g.:
 > > 
 > >    xengnt_more_entries: pages[0]@0xffffd380025b9c00=54d87
 > >    xengnt_more_entries: pages[1]@0xffffd380025b9c08=54d86
 > >    xengnt_more_entries: pages[2]@0xffffd380025b9c10=54d85
 > >    xengnt_more_entries: pages[3]@0xffffd380025b9c18=54d84
 > >    xengnt_more_entries: pages[4]@0xffffd380025b9c20=54df3
 > >    ...
 > >    xengnt_more_entries: pages[27]@0xffffd380025c04d8=54d94
 > >    xengnt_more_entries: pages[28]@0xffffd380025c04e0=54d93
 > >    xengnt_more_entries: pages[29]@0xffffd380025c04e8=54d92
 > >    xengnt_more_entries: pages[30]@0xffffd380025c04f0=54d91
 > >    xengnt_more_entries: pages[31]@0xffffd380025c04f8=54d90
 > > 
 > > 4. Both (2) and (3) remain true until we call GNTTABOP_setup_table
 > >    with nr_frames=33, at which point:
 > >    - GNTTABOP_setup_table returns a pfn of -1 (i.e., ffffffffffffffff,
 > >      all bits set), but only for frame 32, and it still returns zero
 > >      and sets op.status = GNTST_okay indicating success
 > >    - GNTTABOP_query_size returns nr_frames=33 as expected
 > > 
 > >    xengnt_more_entries: GNTTABOP_query_size before: rc=0 nr_frames=32 max_nr_frames=64 status=0
 > >    xengnt_more_entries: pages=0xffffd3800297c5c0 n=33
 > >    xengnt_more_entries: pages[28]@0xffffd3800297c6a0=54d93
 > >    xengnt_more_entries: pages[29]@0xffffd3800297c6a8=54d92
 > >    xengnt_more_entries: pages[30]@0xffffd3800297c6b0=54d91
 > >    xengnt_more_entries: pages[31]@0xffffd3800297c6b8=54d90
 > >    xengnt_more_entries: pages[32]@0xffffd3800297c6c0=ffffffffffffffff
 > >    xengnt_more_entries: GNTTABOP_query_size after: rc=0 nr_frames=33 max_nr_frames=64 status=0
 > > 
 > > xengnt_more_entries then passes pfn=-1 into pmap_kenter_ma which
 > > passes it through to HYPERVISOR_mmu_update which fails with EINVAL
 > > (22) presumably because pfn=-1 is invalid.
 > > 
 > > If I patch xengnt_init to do
 > > 
 > > -		gnt_max_grant_frames = query.max_nr_frames;
 > > +		gnt_max_grant_frames = MIN(32, query.max_nr_frames);
 > > 
 > > then the kernel boots just fine in this environment.
 > > 
 > > I'm guessing that setting max_grant_frames=32 in the domU's xl.conf
 > > would also work but I don't have control over that.
 > 
 > This looks like a bug in this hypervisor version. Looking at the linux code it
 > seems to assume that if GNTTABOP_setup_table returned OK then all the entries
 > are valid. But I don't think linux tries to allocate max_nr_frames
 > at startup as we do (and I think this was changed between -9 and -10)
 > so maybe it doens't run into this in common usage.
 > 
 > A workaround could be to fail the request when the last entry of pages[] is
 > -1, but we'd still need to pmap_kenter_ma() the previous ones.

 But it's not that simple, given how the code is actually structured.
 We can do this on boot but not on resume.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.