NetBSD Problem Report #58366
From www@netbsd.org Tue Jun 25 07:02:17 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
client-signature RSA-PSS (2048 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 0A0D61A923A
for <gnats-bugs@gnats.NetBSD.org>; Tue, 25 Jun 2024 07:02:17 +0000 (UTC)
Message-Id: <20240625070215.EED421A923C@mollari.NetBSD.org>
Date: Tue, 25 Jun 2024 07:02:15 +0000 (UTC)
From: logix@foobar.franken.de
Reply-To: logix@foobar.franken.de
To: gnats-bugs@NetBSD.org
Subject: KASLR broken
X-Send-Pr-Version: www-1.0
>Number: 58366
>Category: port-amd64
>Synopsis: KASLR broken
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: port-amd64-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jun 25 07:05:00 +0000 2024
>Last-Modified: Mon Jul 01 01:45:01 +0000 2024
>Originator: Harold Gutch
>Release: HEAD, netbsd-10
>Organization:
>Environment:
NetBSD 9.99.92 NetBSD 9.99.92 (GENERIC) #0: Tue Jun 25 08:25:24 CEST 2024 h@ubuntu:/home/h/netbsd/bugs/kaslr_test/out/sys/arch/amd64/compile/GENERIC amd64
>Description:
Enabling KASLR as on https://wiki.netbsd.org/security/kaslr/ (or by installing the KASLR kernel from sysinst during installation) yields the following message in prekern:
****** FAULT OCCURRED ******
page fault
****************************
This is the case since this commit:
https://mail-index.netbsd.org/source-changes/2021/10/28/msg133374.html
>How-To-Repeat:
Take a 10.0 or -current installed system and enable KASLR as on https://wiki.netbsd.org/security/kaslr/ . Alternatively, take a 10.0 or -current installer and select the KASLR kernel from sysinst during the installation.
>Fix:
Don't use KASLR
>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: logix@foobar.franken.de
Cc: gnats-bugs@NetBSD.org, port-amd64-maintainer@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: port-amd64/58366: KASLR broken
Date: Tue, 25 Jun 2024 13:36:07 +0000
This is a multi-part message in MIME format.
--=_fvz4bw6Sgb7EvIjKFB6Nr1oMeWUc8Daj
Can you please try the attached patch?
--=_fvz4bw6Sgb7EvIjKFB6Nr1oMeWUc8Daj
Content-Type: text/plain; charset="ISO-8859-1"; name="pr58366-rndseedkaslr"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr58366-rndseedkaslr.patch"
From ed2bcd44cce72438419b29c38717a205e311a60e Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh@NetBSD.org>
Date: Tue, 25 Jun 2024 12:18:59 +0000
Subject: [PATCH] x86: Defer x86_rndseed until after pmap_bootstrap.
Loading the random seed, which is what x86_rndseed does, requires
direct map access on KASLR kernels, which requires pmap_bootstrap to
have run.
This had been broken in
amd64/machdep.c 1.359
i386/machdep.c 1.832
because we apparently don't have any automatic test setup for KASLR
kernels, which we should address.
This change shouldn't cause any security regression on kernels that
previously owrked, because none of the logic that now happens before
x86_rndseed uses the entropy pool anyway (uvm_md_init,
init_x86_clusters, xen_parse_cmdline, .
PR port-amd64/58366
---
sys/arch/amd64/amd64/machdep.c | 25 ++++++++++++++++---------
sys/arch/i386/i386/machdep.c | 25 ++++++++++++++++---------
2 files changed, 32 insertions(+), 18 deletions(-)
diff --git a/sys/arch/amd64/amd64/machdep.c b/sys/arch/amd64/amd64/machdep.c
index bc91a3595ae5..b77bcf98c4c9 100644
--- a/sys/arch/amd64/amd64/machdep.c
+++ b/sys/arch/amd64/amd64/machdep.c
@@ -1754,15 +1754,6 @@ init_x86_64(paddr_t first_avail)
=20
consinit(); /* XXX SHOULD NOT BE DONE HERE */
=20
- /*
- * Initialize RNG to get entropy ASAP either from CPU
- * RDRAND/RDSEED or from seed on disk. Must happen after
- * cpu_init_msrs. Prefer to happen after consinit so we have
- * the opportunity to print useful feedback.
- */
- cpu_rng_init();
- x86_rndseed();
-
/*
* Initialize PAGE_SIZE-dependent variables.
*/
@@ -1803,6 +1794,22 @@ init_x86_64(paddr_t first_avail)
*/
pmap_bootstrap(VM_MIN_KERNEL_ADDRESS);
=20
+ /*
+ * Initialize RNG to get entropy ASAP either from CPU
+ * RDRAND/RDSEED or from seed on disk. Constraints:
+ *
+ * - Must happen after cpu_init_msrs so that curcpu() and
+ * curlwp work.
+ *
+ * - Must happen after consinit so we have the opportunity to
+ * print useful feedback.
+ *
+ * - On KASLR kernels, must happen after pmap_bootstrap because
+ * x86_rndseed requires access to the direct map.
+ */
+ cpu_rng_init();
+ x86_rndseed();
+
#ifndef XENPV
/* Internalize the physical pages into the VM system. */
init_x86_vm(avail_start);
diff --git a/sys/arch/i386/i386/machdep.c b/sys/arch/i386/i386/machdep.c
index f176330f9d64..dd46efb6afe4 100644
--- a/sys/arch/i386/i386/machdep.c
+++ b/sys/arch/i386/i386/machdep.c
@@ -1280,15 +1280,6 @@ init386(paddr_t first_avail)
=20
consinit(); /* XXX SHOULD NOT BE DONE HERE */
=20
- /*
- * Initialize RNG to get entropy ASAP either from CPU
- * RDRAND/RDSEED or from seed on disk. Must happen after
- * cpu_init_msrs. Prefer to happen after consinit so we have
- * the opportunity to print useful feedback.
- */
- cpu_rng_init();
- x86_rndseed();
-
#ifdef DEBUG_MEMLOAD
printf("mem_cluster_count: %d\n", mem_cluster_cnt);
#endif
@@ -1299,6 +1290,22 @@ init386(paddr_t first_avail)
*/
pmap_bootstrap((vaddr_t)atdevbase + IOM_SIZE);
=20
+ /*
+ * Initialize RNG to get entropy ASAP either from CPU
+ * RDRAND/RDSEED or from seed on disk. Constraints:
+ *
+ * - Must happen after cpu_init_msrs so that curcpu() and
+ * curlwp work.
+ *
+ * - Must happen after consinit so we have the opportunity to
+ * print useful feedback.
+ *
+ * - On KASLR kernels, must happen after pmap_bootstrap because
+ * x86_rndseed requires access to the direct map.
+ */
+ cpu_rng_init();
+ x86_rndseed();
+
#ifndef XENPV
/* Initialize the memory clusters. */
init_x86_clusters();
--=_fvz4bw6Sgb7EvIjKFB6Nr1oMeWUc8Daj--
From: Harold Gutch <logix@foobar.franken.de>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, port-amd64-maintainer@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: port-amd64/58366: KASLR broken
Date: Tue, 25 Jun 2024 18:07:44 +0200
On Tue, Jun 25, 2024 at 01:36:07PM +0000, Taylor R Campbell wrote:
> Can you please try the attached patch?
Thanks, that gets past prekern but then panics:
[ 1.4884345] trap type 4 code 0 rip 0xffffffffacefd336 cs 0x8 rflags 0x246 cr2 0 ilevel 0x6 rsp 0xffffffffe9e85a80
[ 1.5005255] curlwp 0xffffffffa0be8480 pid 0.0 lowest kstack 0xffffffffe9e812c0
kernel: protection fault trap, code=0
Stopped in pid 0.0 (system) at netbsd:aes_sse2_selftest+0xb9: ???
aes_sse2_selftest() at netbsd:aes_sse2_selftest+0xb9
aes_sse2_probe() at netbsd:aes_sse2_probe+0x14
aes_selftest() at netbsd:aes_selftest+0x26
aes_modcmd() at netbsd:aes_modcmd+0xf7
module_do_builtin() at netbsd:module_do_builtin+0x17d
module_do_builtin() at netbsd:module_do_builtin+0x132
module_init_class() at netbsd:module_init_class+0x1cf
main() at netbsd:main+0x4fc
start_prekern() at netbsd:start_prekern+0xf5
?() at 100641
ds 0
es 1
fs 8
gs c20f
rdi 0
rsi 2
rbp ffffffffe9e85ac0
rbx ffffffffb62f6c14
rdx 0
rcx 0
rax 0
r8 0
r9 ffffffffe9e85af0
r10 0
r11 0
r12 ffffffff00000000
r13 0
r14 3c
r15 0
rip ffffffffacefd336 aes_sse2_selftest+0xb9
cs 8
rflags 246
rsp ffffffffe9e85a80
ss 10
netbsd:aes_sse2_selftest+0xb9: ???
db{0}>
This is with a ~10 day old current tree, installed with sysinst where
I picked the GENERIC_KASLR kernel. A "standard" install with GENERIC
succeeds.
Harold
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Harold Gutch <logix@foobar.franken.de>
Cc: gnats-bugs@NetBSD.org, port-amd64-maintainer@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: port-amd64/58366: KASLR broken
Date: Tue, 25 Jun 2024 18:03:39 +0000
> Date: Tue, 25 Jun 2024 18:07:44 +0200
> From: Harold Gutch <logix@foobar.franken.de>
>=20
> On Tue, Jun 25, 2024 at 01:36:07PM +0000, Taylor R Campbell wrote:
> > Can you please try the attached patch?
>=20
> Thanks, that gets past prekern but then panics:
>=20
> [ 1.4884345] trap type 4 code 0 rip 0xffffffffacefd336 cs 0x8 rflags 0x=
246 cr2 0 ilevel 0x6 rsp 0xffffffffe9e85a80
> [ 1.5005255] curlwp 0xffffffffa0be8480 pid 0.0 lowest kstack 0xffffffff=
e9e812c0
> kernel: protection fault trap, code=3D0
> Stopped in pid 0.0 (system) at netbsd:aes_sse2_selftest+0xb9: ???
> aes_sse2_selftest() at netbsd:aes_sse2_selftest+0xb9
Can you try the patch on top of the first revision you found with
broken prekern?
If that works, time for another round of bisection, I guess!
From: Harold Gutch <logix@foobar.franken.de>
To: gnats-bugs@netbsd.org
Cc: port-amd64-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-amd64/58366: KASLR broken
Date: Thu, 27 Jun 2024 20:36:34 +0200
Hi,
On Tue, Jun 25, 2024 at 06:05:01PM +0000, Taylor R Campbell wrote:
> The following reply was made to PR port-amd64/58366; it has been noted by GNATS.
>
> From: Taylor R Campbell <riastradh@NetBSD.org>
> To: Harold Gutch <logix@foobar.franken.de>
> Cc: gnats-bugs@NetBSD.org, port-amd64-maintainer@NetBSD.org,
> gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
> Subject: Re: port-amd64/58366: KASLR broken
> Date: Tue, 25 Jun 2024 18:03:39 +0000
>
> > Date: Tue, 25 Jun 2024 18:07:44 +0200
> > From: Harold Gutch <logix@foobar.franken.de>
> >=20
> > On Tue, Jun 25, 2024 at 01:36:07PM +0000, Taylor R Campbell wrote:
> > > Can you please try the attached patch?
> >=20
> > Thanks, that gets past prekern but then panics:
> >=20
> > [ 1.4884345] trap type 4 code 0 rip 0xffffffffacefd336 cs 0x8 rflags 0x=
> 246 cr2 0 ilevel 0x6 rsp 0xffffffffe9e85a80
> > [ 1.5005255] curlwp 0xffffffffa0be8480 pid 0.0 lowest kstack 0xffffffff=
> e9e812c0
> > kernel: protection fault trap, code=3D0
> > Stopped in pid 0.0 (system) at netbsd:aes_sse2_selftest+0xb9: ???
> > aes_sse2_selftest() at netbsd:aes_sse2_selftest+0xb9
>
> Can you try the patch on top of the first revision you found with
> broken prekern?
>
> If that works, time for another round of bisection, I guess!
I am not 100% sure, but it might be
https://mail-index.netbsd.org/source-changes/2024/03/25/msg150542.html
, however I don't see where aes_sse2_selftest() or
aes_sse2_xts_update_selftest() might be calling snprintb().
There might also be some undefined behavior involved somewhere as not
every boot panics - it's hard to say how often it happens, but I'd put
it at around p=50%. With a source tree from just before that change I
have so far not encountered this panic a single time.
So, I'd say your patch has improved things but the snprintb() issue
also needs to be addressed.
thanks,
Harold
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Harold Gutch <logix@foobar.franken.de>
Cc: gnats-bugs@NetBSD.org, port-amd64-maintainer@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: port-amd64/58366: KASLR broken
Date: Thu, 27 Jun 2024 23:15:40 +0000
> Date: Thu, 27 Jun 2024 20:36:34 +0200
> From: Harold Gutch <logix@foobar.franken.de>
>
> On Tue, Jun 25, 2024 at 06:05:01PM +0000, Taylor R Campbell wrote:
> > Can you try the patch on top of the first revision you found with
> > broken prekern?
> >
> > If that works, time for another round of bisection, I guess!
>
> I am not 100% sure, but it might be
> https://mail-index.netbsd.org/source-changes/2024/03/25/msg150542.html
> , however I don't see where aes_sse2_selftest() or
> aes_sse2_xts_update_selftest() might be calling snprintb().
>
> There might also be some undefined behavior involved somewhere as not
> every boot panics - it's hard to say how often it happens, but I'd put
> it at around p=50%. With a source tree from just before that change I
> have so far not encountered this panic a single time.
>
> So, I'd say your patch has improved things but the snprintb() issue
> also needs to be addressed.
Bizarre!
Can you:
1. update to the snprintb change,
2. apply the pmap directmap patch I attached earlier,
3. put db_stacktrace() (#include <ddb/ddb.h>) at the top of snprintb_m,
and
4. share dmesg when it panics?
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58366 CVS commit: src/sys/arch
Date: Thu, 27 Jun 2024 23:58:47 +0000
Module Name: src
Committed By: riastradh
Date: Thu Jun 27 23:58:47 UTC 2024
Modified Files:
src/sys/arch/amd64/amd64: machdep.c
src/sys/arch/i386/i386: machdep.c
Log Message:
x86: Defer x86_rndseed until after pmap_bootstrap.
Loading the random seed, which is what x86_rndseed does, requires
direct map access on KASLR kernels, which requires pmap_bootstrap to
have run.
This had been broken in
amd64/machdep.c 1.359
i386/machdep.c 1.832
because we apparently don't have any automatic test setup for KASLR
kernels, which we should address.
This change shouldn't cause any security regression on kernels that
previously owrked, because none of the logic that now happens before
x86_rndseed uses the entropy pool anyway (uvm_md_init,
init_x86_clusters, xen_parse_cmdline).
PR port-amd64/58366
To generate a diff of this commit:
cvs rdiff -u -r1.368 -r1.369 src/sys/arch/amd64/amd64/machdep.c
cvs rdiff -u -r1.841 -r1.842 src/sys/arch/i386/i386/machdep.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Harold Gutch <logix@foobar.franken.de>
To: gnats-bugs@netbsd.org
Cc: port-amd64-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-amd64/58366: KASLR broken
Date: Sat, 29 Jun 2024 13:47:05 +0200
On Thu, Jun 27, 2024 at 11:20:02PM +0000, Taylor R Campbell wrote:
> > Date: Thu, 27 Jun 2024 20:36:34 +0200
> > From: Harold Gutch <logix@foobar.franken.de>
> >
> > On Tue, Jun 25, 2024 at 06:05:01PM +0000, Taylor R Campbell wrote:
> > > Can you try the patch on top of the first revision you found with
> > > broken prekern?
> > >
> > > If that works, time for another round of bisection, I guess!
> >
> > I am not 100% sure, but it might be
> > https://mail-index.netbsd.org/source-changes/2024/03/25/msg150542.html
> > , however I don't see where aes_sse2_selftest() or
> > aes_sse2_xts_update_selftest() might be calling snprintb().
> >
> > There might also be some undefined behavior involved somewhere as not
> > every boot panics - it's hard to say how often it happens, but I'd put
> > it at around p=50%. With a source tree from just before that change I
> > have so far not encountered this panic a single time.
> >
> > So, I'd say your patch has improved things but the snprintb() issue
> > also needs to be addressed.
>
> Bizarre!
>
> Can you:
>
> 1. update to the snprintb change,
> 2. apply the pmap directmap patch I attached earlier,
> 3. put db_stacktrace() (#include <ddb/ddb.h>) at the top of snprintb_m,
> and
> 4. share dmesg when it panics?
I get one stacktrace:
[ 1.0216565] wm0 at pci0 dev 3 function 0: Intel i82540EM 1000BASE-T Ethernet (rev. 0x03)
[ 1.0216565] wm0: interrupting at ioapic0 pin 11
[ 1.0216565] wm0: Ethernet address 52:54:00:12:34:56
[ 1.0216565] wm_attach() at netbsd:wm_attach+0x35d7
[ 1.0216565] config_attach_internal() at netbsd:config_attach_internal+0x1a7
[ 1.0216565] config_found_acquire() at netbsd:config_found_acquire+0xd9
[ 1.0216565] config_found() at netbsd:config_found+0x32
[ 1.0216565] pci_probe_device() at netbsd:pci_probe_device+0x661
[ 1.0216565] pci_enumerate_bus() at netbsd:pci_enumerate_bus+0x1a4
[ 1.0216565] pcirescan() at netbsd:pcirescan+0x4e
[ 1.0216565] pciattach() at netbsd:pciattach+0x186
[ 1.0216565] config_attach_internal() at netbsd:config_attach_internal+0x1a7
[ 1.0216565] config_found_acquire() at netbsd:config_found_acquire+0xd9
[ 1.0216565] config_found() at netbsd:config_found+0x32
[ 1.0216565] mp_pci_scan() at netbsd:mp_pci_scan+0xd6
[ 1.0216565] amd64_mainbus_attach() at netbsd:amd64_mainbus_attach+0x361
[ 1.0216565] config_attach_internal() at netbsd:config_attach_internal+0x1a7
[ 1.0216565] config_attach() at netbsd:config_attach+0x53
[ 1.0216565] config_rootfound() at netbsd:config_rootfound+0x45
[ 1.0216565] cpu_configure() at netbsd:cpu_configure+0x38
[ 1.0216565] main() at netbsd:main+0x326
[ 1.0216565] start_prekern() at netbsd:start_prekern+0xf5
[ 1.0216565] ?() at 100641
[ 1.0216565] makphy0 at wm0 phy 1: Marvell 88E1011 Gigabit PHY, rev. 0
[ 1.0216565] makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
After a bit of extra printf() debugging I no longer think it is this
commit that is responsible. It seems that when bisecting I ended up
getting "unlucky" and just so happened to not trigger this. I am
sitting right now in a ddb coming from *before* the snprintb() change
but with your pmap changes.
Back to the bisecting board...
thanks,
Harold
From: Harold Gutch <logix@foobar.franken.de>
To: gnats-bugs@netbsd.org
Cc: port-amd64-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-amd64/58366: KASLR broken
Date: Sat, 29 Jun 2024 17:42:49 +0200
On Tue, Jun 25, 2024 at 06:05:01PM +0000, Taylor R Campbell wrote:
> > Date: Tue, 25 Jun 2024 18:07:44 +0200
> > From: Harold Gutch <logix@foobar.franken.de>
> >=20
> > On Tue, Jun 25, 2024 at 01:36:07PM +0000, Taylor R Campbell wrote:
> > > Can you please try the attached patch?
> >=20
> > Thanks, that gets past prekern but then panics:
> >=20
> > [ 1.4884345] trap type 4 code 0 rip 0xffffffffacefd336 cs 0x8 rflags 0x=
> 246 cr2 0 ilevel 0x6 rsp 0xffffffffe9e85a80
> > [ 1.5005255] curlwp 0xffffffffa0be8480 pid 0.0 lowest kstack 0xffffffff=
> e9e812c0
> > kernel: protection fault trap, code=3D0
> > Stopped in pid 0.0 (system) at netbsd:aes_sse2_selftest+0xb9: ???
> > aes_sse2_selftest() at netbsd:aes_sse2_selftest+0xb9
>
> Can you try the patch on top of the first revision you found with
> broken prekern?
>
> If that works, time for another round of bisection, I guess!
Redoing round two now points to
https://mail-index.netbsd.org/source-changes/2024/03/09/msg150314.html
With a KASLR kernel from just before that commit I ran a 10 reboot
loop in qemu and I also went through a few additional cold boots. I
didn't run into that panic a single time.
... but once again, I don't see how that might be related, so this
might be a red herring. I'll run another few boot cycles with a
kernel from just before this commit and see what happens.
In the meanwhile:
1) Output from a *successful* pkboot /netbsd.KASLR -vx:
[ 1.2098422] acpicpu0: ACPI CPUs started
[ 1.2500817] IPsec: Initialized Security Association Processing.
[ 1.2998684] aes: Intel SSE2 bitsliced
[ 1.3103742] chacha: x86 SSE2 ChaCha
[ 1.3103742] adiantum: self-test passed
[ 1.3198766] aes_ccm: self-test passed
[ 1.3198766] blake2s: self-test passed
[ 2.3400158] waiting for devices: atabus0 atabus1
[ 3.3400611] waiting for devices: atabus0 atabus1
[ 4.3100719] wd0 at atabus0 drive 0
[ 4.3100719] wd0: <QEMU HARDDISK>
[ 4.3100719] wd0: drive supports 16-sector PIO transfers, LBA48 addressing
[ 4.3100719] wd0: 5120 MB, 10402 cyl, 16 head, 63 sec, 512 bytes/sect x 10485760 sectors
[ 4.3400349] waiting for devices: atabus0 atabus1 wd0
[ 4.3700128] wd0: GPT GUID: 2e1ad449-2ee1-451a-8189-e947f29f9634
[ 4.3800016] dk0 at wd0: "3f2e9cef-87be-4a09-939c-51e87ce238e4", 8388480 blocks at 64, type: ffs
[ 4.3800016] dk1 at wd0: "337fa9b6-c1c5-4fc0-a64e-711efac5b191", 2097152 blocks at 8388544, type: swap
[ 4.3900440] wd0: 32-bit data port
[ 4.3900440] wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
[ 4.3900440] wd0(piixide0:0:0): using PIO mode 4, DMA mode 2 (using DMA)
[ 4.3999894] atapibus0 at atabus1: 2 targets
[ 4.4099915] cd0 at atapibus0 drive 0: <QEMU DVD-ROM, QM00003, 2.5+> cdrom removable
[ 4.4099915] cd0: 32-bit data port
[ 4.4099915] cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
[ 4.4099915] cd0(piixide0:1:0): using PIO mode 4, DMA mode 2 (using DMA)
[ 4.4200051] crypto: assign driver 0, flags 2
[ 4.4200051] crypto: driver 0 registers alg 1 flags 0 maxoplen 0
[...]
[ 4.4399868] crypto: driver 0 registers alg 22 flags 0 maxoplen 0
[ 4.4399868] cgd: self-test aes-xts-256
[ 4.4399868] cgd: self-test aes-xts-512
[ 4.4399868] cgd: self-test aes-cbc-128
[ 4.4399868] cgd: self-test aes-cbc-256
[ 4.4399868] cgd: self-test 3des-cbc-192
[ 4.4499762] cgd: self-test blowfish-cbc-448
[ 4.4499762] cgd: self-test aes-cbc-128 (encblkno8)
[ 4.4499762] cgd: self-tests passed
[ 4.4499762] swwdog0: software watchdog initialized
2) Output from a failed pkgboot /netbsd.KASLR -vx:
[ 1.2131918] acpicpu0: ACPI CPUs started
[ 1.2430493] IPsec: Initialized Security Association Processing.
[ 1.2730054] fatal protection fault in supervisor mode
[ 1.2730054] trap type 4 code 0 rip 0xffffffffe78c60a6 cs 0x8 rflags 0x246 cr2 0 ilevel 0x6 rsp 0xffffffff91ed6a80
[ 1.2730054] curlwp 0xffffffffc1f00f00 pid 0.0 lowest kstack 0xffffffff91ed22c0
kernel: protection fault trap, code=0
Stopped in pid 0.0 (system) at netbsd:aes_sse2_selftest+0xb9: ???
aes_sse2_selftest() at netbsd:aes_sse2_selftest+0xb9
aes_sse2_probe() at netbsd:aes_sse2_probe+0x14
aes_selftest() at netbsd:aes_selftest+0x26
aes_modcmd() at netbsd:aes_modcmd+0xf7
module_do_builtin() at netbsd:module_do_builtin+0x17d
module_do_builtin() at netbsd:module_do_builtin+0x132
module_init_class() at netbsd:module_init_class+0x1cf
main() at netbsd:main+0x4fc
start_prekern() at netbsd:start_prekern+0xf5
?() at 100641
ds 0
es 1
fs 8
gs f1ef
rdi 0
rsi 2
rbp ffffffff91ed6ac0
rbx fffffffff6ab4714
rdx 0
rcx 0
rax 0
db{0}> show page
PAGE 0xffffffffe78c60a6:
flags=0x6601f173<CLEAN,DIRTY,PAGEOUT,RELEASED,FAKE,ZERO,FILE,READAHEAD,FREE,MARKER,PAGER1>
pqflags=0xf66d04d<INTENT_0,INTENT_SET,INTENT_QUEUED,PRIVATE3,WANTED>
uobject=0x6a0f66c86f0f66c8, uanon=0x7e0f416655c8700f, offset=0xf66c97e0f4166c8
[ 1.2851297] panic: kernel diagnostic assertion "upm != UVM_PHYSSEG_TYPE_INVALID" failed: file "/home/h/netbsd/git/src/sys/uvm/uvm_page.c", line 2015
[ 1.2851297] cpu0: Begin traceback...
[ 1.2851297] vpanic() at netbsd:vpanic+0x173
[ 1.2851297] kern_assert() at netbsd:kern_assert+0x4b
[ 1.2851297] uvm_page_lookup_freelist() at netbsd:uvm_page_lookup_freelist+0x59
[ 1.2851297] uvm_page_printit() at netbsd:uvm_page_printit+0xc1
[ 1.2851297] db_command() at netbsd:db_command+0x123
[ 1.2851297] db_command_loop() at netbsd:db_command_loop+0xa4
[ 1.2851297] db_trap() at netbsd:db_trap+0xcc
[ 1.2851297] kdb_trap() at netbsd:kdb_trap+0x106
[ 1.2851297] trap() at netbsd:trap+0x28f
[ 1.2851297] --- trap (number 4) ---
[ 1.2851297] aes_sse2_selftest() at netbsd:aes_sse2_selftest+0xb9
[ 1.2851297] aes_sse2_probe() at netbsd:aes_sse2_probe+0x14
[ 1.2851297] aes_selftest() at netbsd:aes_selftest+0x26
[ 1.2851297] aes_modcmd() at netbsd:aes_modcmd+0xf7
[ 1.2851297] module_do_builtin() at netbsd:module_do_builtin+0x17d
[ 1.2851297] module_do_builtin() at netbsd:module_do_builtin+0x132
[ 1.2851297] module_init_class() at netbsd:module_init_class+0x1cf
[ 1.2851297] main() at netbsd:main+0x4fc
[ 1.2851297] start_prekern() at netbsd:start_prekern+0xf5
[ 1.2851297] ?() at 100641
[ 1.2851297] cpu0: End traceback...
[ 1.2851297] fatal breakpoint trap in supervisor mode
[ 1.2851297] trap type 1 code 0 rip 0xffffffffae63c405 cs 0x8 rflags 0x202 cr2 0 ilevel 0x8 rsp 0xffffffff91ed6480
[ 1.2851297] curlwp 0xffffffffc1f00f00 pid 0.0 lowest kstack 0xffffffff91ed22c0
Stopped in pid 0.0 (system) at netbsd:breakpoint+0x5: leave
db{0}>
Harold
From: Taylor R Campbell <campbell@mumble.net>
To: Harold Gutch <logix@foobar.franken.de>
Cc: gnats-bugs@NetBSD.org, port-amd64-maintainer@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: port-amd64/58366: KASLR broken
Date: Sun, 30 Jun 2024 14:35:34 +0000
This is a multi-part message in MIME format.
--=_La7nkjIJilZyzQ04m7iYdJZjSuvrF8DG
Based on the attached sampling of ten boots, four failed and four
successful, tested by logix, it looks like the issue is alignment of
the PADDQ memory operand address.
The trapping instruction, at aes_sse2_selftest + 0xb9, is:
60 0f d4 05 .. .. .. .. paddq ........(%rip),%xmm0
where the ellipsis encodes the sign-extended displacement from the
starting address of the next instruction, which lies at
aes_sse2_selftest + 0xb9 + 8, to the address of a constant operand in
memory. In the sampling we find:
aes_sse2_selftest+0xb9 displacement operand address
ffffffffa0c9b146 3a8448fa ffffffffdb4dfa48 crash
ffffffffb92bb886 01fe0dd2 ffffffffbb29c660 boot
ffffffff924d4b86 455eb222 ffffffffd7abfdb0 boot
ffffffffd98cbbc6 fd1c9c6a ffffffffd6a95838 crash
ffffffff96ee5866 ee73e9a2 ffffffff85624210 boot
fffffffff0663406 9cc81eda ffffffff8d2e52e8 crash
ffffffffb584ed46 24a92c22 ffffffffda2e1970 boot
ffffffff884698a6 578434fa fffffffedfcacda8 crash
fffffffffc4b7d86 d83e72f2 ffffffffd489f080 boot
fffffffffa0af5a6 f9fe4002 fffffffff40935b0 boot
Normally, x86 isn't picky about alignment. But this looks like a
strong correlation between misalignment and crashes. The Intel manual
says:
Some instructions that operate on double quadwords require
memory operands to be aligned on a natural boundary. These
instructions generate a general-protection exception (#GP
[trap type T_PROTFLT=4 in NetBSD]) if an unaligned operand is
specified. (4.1.1 Alignment of Words, Doublewords, Quadwords,
and Double Quadwords, p. 4-2)
The address of a 128-bit packed memory operand must be aligned
on a 16-byte boundary, except in the following cases:
- a MOVUPD instruction which supports unaligned accesses
- scalar instructions that use an 8-byte memory operand that
is not subject to alignment requirements.
(11.3 SSE2 Data Types, p. 11-4)
--Intel 64 and IA-32 Architectures Software Developers Manual,
Volume 1: Basic Architecture, Order Number: 253665-077US,
April 2022
The AMD manual says:
Generally, legacy SSE instructions that attempt to access a
vector operand in memory that is not naturally aligned trigger
a general-protection fault (#GP). (4.3.2 Data Alignment,
p. 120)
--AMD64 Architecture Programmer's Manual, Volume 1:
Application Programming, Publication No. 24592,
Revision 3.23, October 2020
So that's a plausible reason for this trap to happen. The attached
program confirms that PADDQ with unaligned address gets SIGSEGV with
si_trap=4, i.e., T_PROTFLT. (Annoyingly, I don't see how to get at
the _memory operand_ address from siginfo -- si_addr is the
_instruction_ address in this case.)
Now why is the address misaligned? The aes_sse2_subr.S generated by
gcc contains:
.text
.globl aes_sse2_selftest
.type aes_sse2_selftest, @function
aes_sse2_selftest:
...
paddq .LC11(%rip), %xmm0
...
.section .rodata.cst16,"aM",@progbits,16
...
.align 16
.LC11:
.quad -1
.quad -1
So .LC11 _should_ be aligned on a 16-byte boundary inside the
.rodata.cst16 section. And `readelf -Ss aes_sse2_subr.o' confirms
(a) that the .rodata.cst16 section requests 16-byte alignment, and
(b) that the .LC11 symbol's address in the section has 16-byte
alignment in the .rodata.cst16 section:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[ 9] .rodata.cst16 PROGBITS 0000000000000000 000031a0
0000000000000020 0000000000000010 AM 0 0 16
...
Symbol table '.symtab' contains 68 entries:
Num: Value Size Type Bind Vis Ndx Name
...
6: 0000000000000010 0 NOTYPE LOCAL DEFAULT 9 .LC11
But when the kernel is linked with `--split-by-file=0x100000', the
combined .rodata section is split into multiple subsections sometimes
on _non-aligned_ boundaries with _less_ alignment:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[33] .rodata PROGBITS 00000000000022c0 00112700
000000000005cfe0 0000000000000000 A 0 0 64
...
[133] .rodata.0 PROGBITS 000000000005f2a0 0103aea0
00000000000e2c80 0000000000000000 A 0 0 32
...
[135] .rodata.1 PROGBITS 0000000000141f20 0111db20
00000000001000e0 0000000000000000 A 0 0 32
...
[137] .rodata.2 PROGBITS 0000000000242000 0121dc00
00000000000ffdc0 0000000000000000 A 0 0 64
...
[139] .rodata.3 PROGBITS 0000000000341dc0 0131d9c0
00000000001004f8 0000000000000000 A 0 0 64
...
[141] .rodata.4 PROGBITS 00000000004422b8 0141deb8
0000000000100bb0 0000000000000000 A 0 0 8
[142] .rodata.5 PROGBITS 0000000000542e68 0151ea68
00000000000231c8 0000000000000000 A 0 0 8
With -X omitted from the link flags so it doesn't delete local
symbols, we see that .LC11 winds up in .rodata.4 (not sure which .LC11
it is but all three are in .rodata.4):
Symbol table '.symtab' contains 56230 entries:
Num: Value Size Type Bind Vis Ndx Name
...
11530: 0000000000016548 0 NOTYPE LOCAL DEFAULT 141 .LC11
...
11566: 0000000000016778 0 NOTYPE LOCAL DEFAULT 141 .LC11
...
11574: 0000000000016868 0 NOTYPE LOCAL DEFAULT 141 .LC11
And for some reason, .rodata.4 only requests 8-byte alignment.
It looks like when ld splits sections, it sometimes chooses
non-aligned splitting points and then reduces the alignment of the
next section accordingly:
section address size align
.rodata 0x22c0 0x5cfe0 64
.rodata.0 0x5f2a0 0xe2c80 32
The starting address of .rodata is 64-byte-aligned, but its size is
only 32-byte-aligned. The starting address of .rodata.0, which starts
contiguously after .rodata in the virtual address space of the ELF
file, is only 32-byte-aligned. And when we get to .rodata.4, it's
gone down to only 8-byte alignment.
So when the KASLR bootloader (`prekern') randomizes the address space,
if it respects the requested alignment but roughly uniformly
randomizes everything else, there's a roughly 1/2 probability that the
.rodata.4 section will come out misaligned for PADDQ and the kernel
will crash at boot.
We can try removing `--split-by-file', but that will reduce the
efficacy of KASLR as a security measure, since it will only be able to
randomize .rodata (and .text and .data and ...) as a whole and not the
separate parts of each section independently.
But the right fix is probably to convince ld to insert appropriate
padding in the split sections so that the alignment can be maintained
(or convince ELF to support section alignment constraints of the form
`congruent to k modulo 2^n' and not just `congruent to 0 modulo 2^n',
but that might be a taller order).
--=_La7nkjIJilZyzQ04m7iYdJZjSuvrF8DG
Content-Type: text/plain; charset="ISO-8859-1"; name="sample"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="sample.txt"
---------------------------------------------------------------------------=
-----
[ 1.2520095] trap type 4 code 0 rip 0xffffffffa0c9b146 cs 0x8 rflags 0x24=
6 cr2 0 ilevel 0x6 rsp 0xffffffffa30ffa80
db{0}> print aes_sse2_selftest+0xb9
ffffffffa0c9b146
db{0}> x/xb aes_sse2_selftest+0xb9,8
netbsd:aes_sse2_selftest+0xb9: 5d40f66 3a8448fa c0700f66 6f0f664e
netbsd:aes_sse2_selftest+0xc9: f66d04d 6601f173 d305df0f 663a8448
netbsd:aes_sse2_selftest+0xd9:
---------------------------------------------------------------------------=
-----
db{0}> print aes_sse2_selftest+0xb9
ffffffffb92bb886
db{0}> x/xb aes_sse2_selftest+0xb9,8
netbsd:aes_sse2_selftest+0xb9: 5d40f66 1fe0dd2 c0700f66 6f0f664e
netbsd:aes_sse2_selftest+0xc9: f66d04d 6601f173 ab05df0f 6601fe0d
netbsd:aes_sse2_selftest+0xd9:
---------------------------------------------------------------------------=
-----
db{0}> print aes_sse2_selftest+0xb9
ffffffff924d4b86
db{0}> x/xb aes_sse2_selftest+0xb9,8
netbsd:aes_sse2_selftest+0xb9: 5d40f66 455eb222 c0700f66 6f0f664e
netbsd:aes_sse2_selftest+0xc9: f66d04d 6601f173 fb05df0f 66455eb1
netbsd:aes_sse2_selftest+0xd9:
---------------------------------------------------------------------------=
-----
[ 1.2417008] trap type 4 code 0 rip 0xffffffffd98cbbc6 cs 0x8 rflags 0x24=
6 cr2 0 ilevel 0x6 rsp 0xffffffffa2608a80
db{0}> print aes_sse2_selftest+0xb9
ffffffffd98cbbc6
db{0}> x/xb aes_sse2_selftest+0xb9,8
netbsd:aes_sse2_selftest+0xb9: 5d40f66 fd1c9c6a c0700f66 6f0f664e
netbsd:aes_sse2_selftest+0xc9: f66d04d 6601f173 4305df0f 66fd1c9c
netbsd:aes_sse2_selftest+0xd9:
---------------------------------------------------------------------------=
-----
db{0}> print aes_sse2_selftest+0xb9
ffffffff96ee5866
db{0}> x/xb aes_sse2_selftest+0xb9,8
netbsd:aes_sse2_selftest+0xb9: 5d40f66 ee73e9a2 c0700f66 6f0f664e
netbsd:aes_sse2_selftest+0xc9: f66d04d 6601f173 7b05df0f 66ee73e9
netbsd:aes_sse2_selftest+0xd9:
---------------------------------------------------------------------------=
-----
[ 1.2660439] trap type 4 code 0 rip 0xfffffffff0663406 cs 0x8 rflags 0x24=
6 cr2 0 ilevel 0x6 rsp 0xffffffffa2967a80
db{0}> print aes_sse2_selftest+0xb9
fffffffff0663406
db{0}> x/xb aes_sse2_selftest+0xb9,8
netbsd:aes_sse2_selftest+0xb9: 5d40f66 9cc81eda c0700f66 6f0f664e
netbsd:aes_sse2_selftest+0xc9: f66d04d 6601f173 b305df0f 669cc81e
netbsd:aes_sse2_selftest+0xd9:
---------------------------------------------------------------------------=
-----
db{0}> print aes_sse2_selftest+0xb9
ffffffffb584ed46
db{0}> x/xb aes_sse2_selftest+0xb9,8
netbsd:aes_sse2_selftest+0xb9: 5d40f66 24a92c22 c0700f66 6f0f664e
netbsd:aes_sse2_selftest+0xc9: f66d04d 6601f173 fb05df0f 6624a92b
netbsd:aes_sse2_selftest+0xd9:
---------------------------------------------------------------------------=
-----
[ 1.2370654] trap type 4 code 0 rip 0xffffffff884698a6 cs 0x8 rflags 0x24=
6 cr2 0 ilevel 0x6 rsp 0xffffffffaf0daa80
db{0}> print aes_sse2_selftest+0xb9
ffffffff884698a6
db{0}> x/xb aes_sse2_selftest+0xb9,8
netbsd:aes_sse2_selftest+0xb9: 5d40f66 578434fa c0700f66 6f0f664e
netbsd:aes_sse2_selftest+0xc9: f66d04d 6601f173 d305df0f 66578434
netbsd:aes_sse2_selftest+0xd9:
---------------------------------------------------------------------------=
-----
db{0}> print aes_sse2_selftest+0xb9
fffffffffc4b7d86
db{0}> x/xb aes_sse2_selftest+0xb9,8
netbsd:aes_sse2_selftest+0xb9: 5d40f66 d83e72f2 c0700f66 6f0f664e
netbsd:aes_sse2_selftest+0xc9: f66d04d 6601f173 cb05df0f 66d83e72
netbsd:aes_sse2_selftest+0xd9:
---------------------------------------------------------------------------=
-----
db{0}> print aes_sse2_selftest+0xb9
fffffffffa0af5a6
db{0}> x/xb aes_sse2_selftest+0xb9,8
netbsd:aes_sse2_selftest+0xb9: 5d40f66 f9fe4002 c0700f66 6f0f664e
netbsd:aes_sse2_selftest+0xc9: f66d04d 6601f173 db05df0f 66f9fe3f
netbsd:aes_sse2_selftest+0xd9:
---------------------------------------------------------------------------=
-----
--=_La7nkjIJilZyzQ04m7iYdJZjSuvrF8DG
Content-Type: text/plain; charset="ISO-8859-1"; name="paddq_unaligned"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="paddq_unaligned.c"
#include <emmintrin.h>
#include <err.h>
#include <immintrin.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
__attribute__((noinline))
__m128i
paddq(const __m128i *p, __m128i x)
{
return _mm_add_epi64(*p, x);
}
static void
on_sigsegv(int signo, siginfo_t *si, void *ctx)
{
char buf[1024];
snprintf(buf, sizeof(buf), "SIGSEGV:"
" si_signo=3D%d si_errno=3D%d si_code=3D%d"
" si_addr=3D%p si_trap=3D%d\n",
si->si_signo, si->si_errno, si->si_code,
si->si_addr, si->si_trap);
(void)write(STDERR_FILENO, buf, strlen(buf));
_exit(0);
}
int
main(void)
{
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_sigaction =3D &on_sigsegv;
if (sigfillset(&sa.sa_mask) =3D=3D -1)
err(1, "sigfillset");
sa.sa_flags =3D SA_SIGINFO;
if (sigaction(SIGSEGV, &sa, NULL) =3D=3D -1)
err(1, "sigaction");
char buf[17] __attribute__((aligned(16)));
volatile __m128i x =3D _mm_loadu_si128((const __m128i_u *)buf);
volatile __m128i y =3D paddq((const __m128i *)(buf + 1), x);
(void)y;
return 1;
}
--=_La7nkjIJilZyzQ04m7iYdJZjSuvrF8DG--
From: Harold Gutch <logix@foobar.franken.de>
To: Taylor R Campbell <campbell@mumble.net>
Cc: gnats-bugs@NetBSD.org, port-amd64-maintainer@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: port-amd64/58366: KASLR broken
Date: Mon, 1 Jul 2024 03:42:05 +0200
On Sun, Jun 30, 2024 at 02:35:34PM +0000, Taylor R Campbell wrote:
> But when the kernel is linked with `--split-by-file=0x100000', the
> combined .rodata section is split into multiple subsections sometimes
> on _non-aligned_ boundaries with _less_ alignment:
Changing this to --split-by-file=0x800000 seems to improve things,
with that I survived a couple of reboot loops without any issues. But
I might have just gotten (un)lucky of course. I don't know if values
that are not powers of two make sense here but 0x400000 is not enough,
with that I still see the panics.
> We can try removing `--split-by-file', but that will reduce the
> efficacy of KASLR as a security measure, since it will only be able to
> randomize .rodata (and .text and .data and ...) as a whole and not the
> separate parts of each section independently.
Yes, without --split-by-file I also don't see the panics anymore.
Harold
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.