NetBSD Problem Report #57657
From www@netbsd.org Sat Oct 14 18:55:02 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id B16D81A923A
for <gnats-bugs@gnats.NetBSD.org>; Sat, 14 Oct 2023 18:55:02 +0000 (UTC)
Message-Id: <20231014185501.168A51A923C@mollari.NetBSD.org>
Date: Sat, 14 Oct 2023 18:55:01 +0000 (UTC)
From: logix@foobar.franken.de
Reply-To: logix@foobar.franken.de
To: gnats-bugs@NetBSD.org
Subject: NetBSD crashes if the number of CPUs is not of the form N*[1..8]
X-Send-Pr-Version: www-1.0
>Number: 57657
>Category: kern
>Synopsis: NetBSD crashes if the number of CPUs is not of the form N*[1..8]
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Oct 14 19:00:01 +0000 2023
>Last-Modified: Sun Oct 15 10:35:01 +0000 2023
>Originator: Harold Gutch
>Release: NetBSD current
>Organization:
>Environment:
NetBSD 10.99.10 NetBSD 10.99.10 (GENERIC) #0: Thu Oct 12 23:51:05 UTC 2023 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
Booting NetBSD in KVM (on a RHEL 9.2 host) with 1..40 virtual CPUs succeeds if and only if the number of CPUs is of the form N*[1..8], i.e., only for 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, 28, 32, 40 CPUs. For any other number the following panic happens:
Stopped in pid 0.0 (system) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
vpanic at netbsd:vpanic+0x173
kern_assert() at netbsd:kern_assert+0x4b
uvm_pagealloc_pgb() at netbsd:uvm_pagealloc_pgb+0x2e
uvm_pagealloc_pgfl() at netbsd:uvm_pagealloc_pgfl+0x63
uvm_pagealloc_strat() at netbsd_uvm_pagealloc_strat+0x130
uvm_km_alloc() at netbsd:uvm_km_alloc+0x17a
cpu_uarea_alloc() at netbsd:cpu_uarea_alloc+0x26
uarea_system_poolpage_alloc() at netbsd:uarea_system_poolpage_alloc+0x16
pool_grow() at netbsd:pool_grow+0x34c
pool_get() at netbsd:pool_get+0xa8
pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x139
pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x263
kthread_create() at netbsd:kthread_create+0x4d
config_create_interruptthreads() at netbsd:_config_create_interrptthreads+0x33
main() at netbsd:main+0x3be
oster@ could reproduce this and mentioned that for him Fedora 37 does *not* crash with 9 CPUs, so it does not seem to be a bug in KVM.
The systematic test for all numbers of CPUs from 1 to 40 was with an Ivy Bridge host CPU, but for all other emulated CPUs tested, it booted with 8 but I got exactly the same crash with 9 CPUs.
# objdump --disassemble=uvm_pagealloc_pgb /netbsd
ffffffff80d8591b <uvm_pagealloc_pgb>:
[...]
ffffffff80d85949: 49 83 3a 00 cmq $0x0,(%r10)
ffffffff80d8594d: 4c 89 55 b8 mov %r10,-0x48(%rbp)
ffffffff80d85951: 0f 84 80 01 00 00 je ffffffff80d85ad7 <uvm_pagealloc_pgb+0x1bc>
This seems to be line 1017 in uvm_page.c 1.254.
>How-To-Repeat:
Boot NetBSD on a VM where the number of CPUs is not of the form N*[1..8].
>Fix:
>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57657: NetBSD crashes if the number of CPUs is not of the form N*[1..8]
Date: Sun, 15 Oct 2023 10:05:32 -0000 (UTC)
logix@foobar.franken.de writes:
>Booting NetBSD in KVM (on a RHEL 9.2 host) with 1..40 virtual CPUs succeeds if and only if the number of CPUs is of the form N*[1..8], i.e., only for 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, 28, 32, 40 CPUs. For any other number the following panic happens:
>Stopped in pid 0.0 (system) at netbsd:breakpoint+0x5: leave
>breakpoint() at netbsd:breakpoint+0x5
>vpanic at netbsd:vpanic+0x173
>kern_assert() at netbsd:kern_assert+0x4b
>uvm_pagealloc_pgb() at netbsd:uvm_pagealloc_pgb+0x2e
>uvm_pagealloc_pgfl() at netbsd:uvm_pagealloc_pgfl+0x63
That's a bug in uvm_page_rebucket(). This patch helps (the
line numbers are a bit off):
Index: uvm_page.c
===================================================================
RCS file: /cvsroot/src/sys/uvm/uvm_page.c,v
retrieving revision 1.254
diff -p -u -r1.254 uvm_page.c
--- uvm_page.c 23 Sep 2023 18:20:20 -0000 1.254
+++ uvm_page.c 14 Oct 2023 21:37:39 -0000
@@ -868,7 +883,7 @@ uvm_page_recolor(int newncolors)
void
uvm_page_rebucket(void)
{
- u_int min_numa, max_numa, npackage, shift;
+ u_int min_numa, max_numa, npackage, div;
struct cpu_info *ci, *ci2, *ci3;
CPU_INFO_ITERATOR cii;
@@ -906,12 +921,11 @@ uvm_page_rebucket(void)
/*
* Figure out how to arrange the packages & buckets, and the total
- * number of buckets we need. XXX 2 may not be the best factor.
+ * number of buckets we need.
*/
- for (shift = 0; npackage > PGFL_MAX_BUCKETS; shift++) {
- npackage >>= 1;
- }
- uvm_page_redim(uvmexp.ncolors, npackage);
+
+ div = howmany(npackage, PGFL_MAX_BUCKETS);
+ uvm_page_redim(uvmexp.ncolors, howmany(npackage, div));
/*
* Now tell each CPU which bucket to use. In the outer loop, scroll
@@ -927,7 +941,7 @@ uvm_page_rebucket(void)
*/
ci3 = ci2;
do {
- ci3->ci_data.cpu_uvm->pgflbucket = npackage >> shift;
+ ci3->ci_data.cpu_uvm->pgflbucket = npackage / div;
ci3 = ci3->ci_sibling[CPUREL_PACKAGE];
} while (ci3 != ci2);
npackage++;
@@ -935,7 +949,7 @@ uvm_page_rebucket(void)
} while (ci2 != ci->ci_sibling[CPUREL_PACKAGE1ST]);
aprint_debug("UVM: using package allocation scheme, "
- "%d package(s) per bucket\n", 1 << shift);
+ "%d package(s) per bucket\n", div);
}
/*
From: Harold Gutch <logix@foobar.franken.de>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/57657: NetBSD crashes if the number of CPUs is not of the form N*[1..8]
Date: Sun, 15 Oct 2023 12:29:48 +0200
Hi,
thanks, from what I can see your patch does "effectively the same",
just with slightly other rounding/bucket distribution if repeated
dividing by 2 requires rounding before getting to a number in [1, 8]
(i.e., the right condition is not N*[1..8], but 2^N*[1..8]).
For a "bad" number of CPUs it now behaves better for me. I didn't
test all numbers, but for the few that I tried I successfully booted.
So: looks good, thanks!
Harold
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.