NetBSD Problem Report #59236

From gson@gson.org  Sun Mar 30 09:13:12 2025
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits)
	 client-signature RSA-PSS (2048 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 8ED051A9239
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 30 Mar 2025 09:13:12 +0000 (UTC)
Message-Id: <20250330091310.650C9253F02@guava.gson.org>
Date: Sun, 30 Mar 2025 12:13:10 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Multiple segfaults in erlite3 boot
X-Send-Pr-Version: 3.95

>Number:         59236
>Notify-List:    riastradh@NetBSD.org
>Category:       port-evbmips
>Synopsis:       Multiple segfaults in erlite3 boot
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-evbmips-maintainer
>State:          needs-pullups
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Mar 30 09:15:00 +0000 2025
>Closed-Date:    
>Last-Modified:  Wed May 14 23:18:12 +0000 2025
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date 2025.03.29.19.40.42
>Organization:

>Environment:
System: NetBSD
Architecture: evbmips
Machine: mips64eb
>Description:

I tried to install NetBSD-current on a Ubiquity EdgeRouter Lite aka
ERLite-3 by building a evbmips-mips64eb release (without debug
symbols because PR 59233), uncompressing octeon.img.gz, dd:ing it
onto on a USB flash drive, plugging the drive into the internal
USB connector, and using the u-boot configuration from
https://www.cambus.net/netbsd-on-the-edgerouter-lite/.

During the initial boot, ps dumped core:

  [   1.7995905] WARNING: CHECK AND RESET THE DATE!
  Sat Mar 29 19:40:42 UTC 2025
  [1]   Segmentation fault      /bin/sh -c "ps -p \$\$ -o ppid="
  Starting root file system check:

The system nonetheless managed to resize the root partition and
reboot, but during the second boot, there were further segfaults
from ps, mount, and fsck:

  [   6.6503459] WARNING: CHECK AND RESET THE DATE!
  Sat Mar 29 19:40:44 UTC 2025
  [1]   Segmentation fault      /bin/sh -c "ps -p \$\$ -o ppid="
  Starting root file system check:
  /dev/rdk1: file system is clean; not checking
  Not resizing / (NAME=octeon-root): already correct size
  mount: /: Segmentation fault
  Setting sysctl variables:
  ddb.onpanic: 1 -> 0
  Starting file system checks:
  [1]   Segmentation fault      fsck -x / ${fsck_flags}
  Unknown error 139; help!
  ERROR: ABORTING BOOT (sending SIGTERM to parent)!
  init: `/bin/sh' Enter pathname of shell or RETURN for /bin/sh: 

A full console log is at

  https://www.gson.org/netbsd/bugs/erlite3/erlite3-2025.03.29.19.40.42-boot.txt

A system built from CVS source date 2025.01.01.00.00.05 does not have
this problem.  Perhaps it is related to the mipsn64 problem reported in

  https://mail-index.netbsd.org/port-mips/2025/02/03/msg001435.html

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Fri, 18 Apr 2025 19:04:59 +0000
State-Changed-Why:
This is probably the the same CN50xx bug that we have been puzzling
over in PR port-mips/59064: jemalloc switch to 5.3 broke userland
<https://gnats.NetBSD.org/59064>.

Can you try the patch at the bottom of this message?

https://mail-index.NetBSD.org/netbsd-bugs/2025/04/14/msg088307.html

If you open one of the core dumps in gdb (if you are able to do that
from another machine where everything isn't segfaulting all the time,
e.g. if the core dump is written to nfs) and do `x/i $pc' and `bt', I
bet you will find it in malloc_default (via some stack trace through
jemalloc) at this instruction:

00008a58 <malloc_default>:
malloc_default():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
    8a58:       27bdff70        addiu   sp,sp,-144
    8a5c:       ffbc0078        sd      gp,120(sp)
    8a60:       3c1c0000        lui     gp,0x0
                        8a60: R_MIPS_GPREL16    malloc_default
                        8a60: R_MIPS_SUB        *ABS*
                        8a60: R_MIPS_HI16       *ABS*
    8a64:       0399e021        addu    gp,gp,t9
    8a68:       279c0000        addiu   gp,gp,0
                        8a68: R_MIPS_GPREL16    malloc_default
                        8a68: R_MIPS_SUB        *ABS*
                        8a68: R_MIPS_LO16       *ABS*
tsd_fetch_impl():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
    8a6c:       8f820000        lw      v0,0(gp)
                        8a6c: R_MIPS_TLS_GOTTPREL       je_tsd_tls
    8a70:       7c03e83b        0x7c03e83b
malloc_default():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
    8a74:       ffb10040        sd      s1,64(sp)
    8a78:       ffb00038        sd      s0,56(sp)
tsd_fetch_impl():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
    8a7c:       00433021        addu    a2,v0,v1
malloc_default():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
    8a80:       ffbf0088        sd      ra,136(sp)
    8a84:       ffbe0080        sd      s8,128(sp)
    8a88:       ffb70070        sd      s7,112(sp)
    8a8c:       ffb60068        sd      s6,104(sp)
    8a90:       ffb50060        sd      s5,96(sp)
    8a94:       ffb40058        sd      s4,88(sp)
    8a98:       ffb30050        sd      s3,80(sp)
    8a9c:       ffb20048        sd      s2,72(sp)
tsd_fetch_impl():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:422
 => 8aa0:       90c30258        lbu     v1,600(a2)

And I bet you will find that $v0 holds the address malloc_default+0x18,
i.e., the pc of this instruction:

tsd_fetch_impl():
/home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
    8a6c:       8f820000        lw      v0,0(gp)
                        8a6c: R_MIPS_TLS_GOTTPREL       je_tsd_tls
 => 8a70:       7c03e83b        0x7c03e83b

The instruction 0x7c03e83b is sometimes also written

	rdhwr	$3,$29

or

	rdhwr	v1,ulr

but it is architecturally undefined so it traps to the kernel to
emulate, and the kernel is supposed to return the thread's tcb pointer
in v1.

But as a side effect, the emulation clobbers the register v0 with the
address of the excepting instruction, rather than leaving it as the
value it found at -1234(gp) (or whatever; written as 0(gp) above, but
the linker will replace it by some probably-nonzero number; you can use
`objdump --disassemble=malloc_default libc.so' to find it), which is
decidedly not the instruction address malloc_default+0x18 but rather
some tls offset that is reasonable to add to the tcb pointer.

Now, the emulation routine
https://nxr.netbsd.org/xref/src/sys/arch/mips/mips/mipsX_subr.S?r=1.115#1297
is not _supposed_ to clobber v0 -- it goes out of its way to save v0 on
the kernel stack and restore it before returning from the exception:

   1312 	/* Need two working registers */
   1313 	REG_S	AT, CALLFRAME_SIZ+TF_REG_AST(k0)
   1314 	REG_S	v0, CALLFRAME_SIZ+TF_REG_V0(k0)
...
   1349 	REG_L	AT, CALLFRAME_SIZ+TF_REG_AST(k0)# restore reg
   1350 	REG_L	v0, CALLFRAME_SIZ+TF_REG_V0(k0) # restore reg
   1351 	eret

But, in all my trials, it has been consistently corrupted in the same
way.  The best theory we have for why it is corrupted is cn50xx CPUs --
found in erlite3 (but not er4) -- have some kind of register-writeback
bug (which passes through some register renaming unchanged) provoked by
the particular combination of reading MIPS_COP_0_EXC_PC and eret so
that after the eret, the exception pc gets written back to v0 even
though we just restored v0 from the kernel stack.

So, all that said, here is a summary of the science we did on my
erlite3, together with a patch that seems to address the issue and --
under the theory that it is the register that we move MIPS_COP_0_EXC_PC
into -- will only corrupt a temporary register k0 which is not
accessible to userland and treated as garbage on any kernel entry
points, so it's safe:

https://mail-index.NetBSD.org/netbsd-bugs/2025/04/14/msg088307.html


From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, port-evbmips-maintainer@netbsd.org,
 netbsd-bugs@netbsd.org, gnats-admin@netbsd.org, riastradh@NetBSD.org,
 Andreas Gustafsson <gson@gson.org>, Martin Husemann <martin@duskware.de>
Cc: 
Subject: Re: port-evbmips/59236 (Multiple segfaults in erlite3 boot)
Date: Sat, 19 Apr 2025 15:06:09 +0900

 On 2025/04/19 4:04, riastradh@NetBSD.org wrote:
 > Synopsis: Multiple segfaults in erlite3 boot
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: riastradh@NetBSD.org
 > State-Changed-When: Fri, 18 Apr 2025 19:04:59 +0000
 > State-Changed-Why:
 > This is probably the the same CN50xx bug that we have been puzzling
 > over in PR port-mips/59064: jemalloc switch to 5.3 broke userland
 > <https://gnats.NetBSD.org/59064>.
 > 
 > Can you try the patch at the bottom of this message?
 > 
 > https://mail-index.NetBSD.org/netbsd-bugs/2025/04/14/msg088307.html

 Thank you very much for working on this problem!

 However, unfortunately, even with your patch, erlite3 cannot boot
 into multiuser mode, both for n64 and n32 userlands:
 https://gist.github.com/rokuyama/7bbe1619e55e8e3aba5bf3b112a23725

 On the other hand, MIPSSIM64 kernel on QEMU successfully boots into
 multiuser mode.

 In the above-mentioned log, debug printf is enabled for trap():
 ```
 diff --git a/sys/arch/mips/mips/trap.c b/sys/arch/mips/mips/trap.c
 index 58caf19e2d2..a079dec91dd 100644
 --- a/sys/arch/mips/mips/trap.c
 +++ b/sys/arch/mips/mips/trap.c
 @@ -448,8 +448,8 @@ trap(uint32_t status, uint32_t cause, vaddr_t vaddr, 
 vaddr_t pc,
   		rv = uvm_fault(map, va, ftype);
   		pcb->pcb_onfault = onfault;

 -#if defined(VMFAULT_TRACE)
 -		if (!KERNLAND_P(va))
 +#if defined(VMFAULT_TRACE) || 1
 +		if (!KERNLAND_P(va) && rv != 0)
   			printf(
   			    "uvm_fault(%p (pmap %p), %#"PRIxVADDR
   			    " (%"PRIxVADDR"), %d) -> %d at pc %#"PRIxVADDR"\n",
 ```

 You can see SEGVs are caused by read access to NULL:
 ```
 [  13.3599689] uvm_fault(0x980000041f9c0c00 (pmap 0x980000041fce44d0), 0 
 (0), 1) -> 14 at pc 0xfff83b1db4
 [1]   Segmentation fault (core dumped) /sbin/ifconfig lo0 inet6 
  >/dev/null 2>&1
 ...
 [  19.5399661] uvm_fault(0x980000041f20c800 (pmap 0x980000041fce44d0), 0 
 (0), 1) -> 14 at pc 0xfff8391db4
 [1]   Segmentation fault (core dumped) awk "/^sendmail[ \t]/{print\$2}" 
 /etc/mailer.conf
 ```

 As you pointed out earlier, SEGVs can be avoided by replacing
 `user_reserved_insn` with `user_gen_exception`, i.e.:
 https://gist.github.com/rokuyama/c7a50b8e7a62dc25f3f536f1434eea9b

 By grep'ping into Linux codes, I've found they check TLB entry
 for PC before fetching it:
 https://github.com/torvalds/linux/commit/5b10496b6e65#diff-bbe4c1a54ce7bd13e6109d887383993c3b5276a1362f84092e9ef31dc84064d9R390

 and our `user_gen_exception` path uses copyin(9), of course.

 I don't know ~anything for mips, and much more destructive results
 may happen for this "double-fault scenario", although...

 Thanks,
 rin

 > If you open one of the core dumps in gdb (if you are able to do that
 > from another machine where everything isn't segfaulting all the time,
 > e.g. if the core dump is written to nfs) and do `x/i $pc' and `bt', I
 > bet you will find it in malloc_default (via some stack trace through
 > jemalloc) at this instruction:
 > 
 > 00008a58 <malloc_default>:
 > malloc_default():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
 >      8a58:       27bdff70        addiu   sp,sp,-144
 >      8a5c:       ffbc0078        sd      gp,120(sp)
 >      8a60:       3c1c0000        lui     gp,0x0
 >                          8a60: R_MIPS_GPREL16    malloc_default
 >                          8a60: R_MIPS_SUB        *ABS*
 >                          8a60: R_MIPS_HI16       *ABS*
 >      8a64:       0399e021        addu    gp,gp,t9
 >      8a68:       279c0000        addiu   gp,gp,0
 >                          8a68: R_MIPS_GPREL16    malloc_default
 >                          8a68: R_MIPS_SUB        *ABS*
 >                          8a68: R_MIPS_LO16       *ABS*
 > tsd_fetch_impl():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
 >      8a6c:       8f820000        lw      v0,0(gp)
 >                          8a6c: R_MIPS_TLS_GOTTPREL       je_tsd_tls
 >      8a70:       7c03e83b        0x7c03e83b
 > malloc_default():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
 >      8a74:       ffb10040        sd      s1,64(sp)
 >      8a78:       ffb00038        sd      s0,56(sp)
 > tsd_fetch_impl():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
 >      8a7c:       00433021        addu    a2,v0,v1
 > malloc_default():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../dist/src/jemalloc.c:2727
 >      8a80:       ffbf0088        sd      ra,136(sp)
 >      8a84:       ffbe0080        sd      s8,128(sp)
 >      8a88:       ffb70070        sd      s7,112(sp)
 >      8a8c:       ffb60068        sd      s6,104(sp)
 >      8a90:       ffb50060        sd      s5,96(sp)
 >      8a94:       ffb40058        sd      s4,88(sp)
 >      8a98:       ffb30050        sd      s3,80(sp)
 >      8a9c:       ffb20048        sd      s2,72(sp)
 > tsd_fetch_impl():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:422
 >   => 8aa0:       90c30258        lbu     v1,600(a2)
 > 
 > And I bet you will find that $v0 holds the address malloc_default+0x18,
 > i.e., the pc of this instruction:
 > 
 > tsd_fetch_impl():
 > /home/riastradh/netbsd/current/src/external/bsd/jemalloc/lib/../include/jemalloc/internal/tsd.h:270
 >      8a6c:       8f820000        lw      v0,0(gp)
 >                          8a6c: R_MIPS_TLS_GOTTPREL       je_tsd_tls
 >   => 8a70:       7c03e83b        0x7c03e83b
 > 
 > The instruction 0x7c03e83b is sometimes also written
 > 
 > 	rdhwr	$3,$29
 > 
 > or
 > 
 > 	rdhwr	v1,ulr
 > 
 > but it is architecturally undefined so it traps to the kernel to
 > emulate, and the kernel is supposed to return the thread's tcb pointer
 > in v1.
 > 
 > But as a side effect, the emulation clobbers the register v0 with the
 > address of the excepting instruction, rather than leaving it as the
 > value it found at -1234(gp) (or whatever; written as 0(gp) above, but
 > the linker will replace it by some probably-nonzero number; you can use
 > `objdump --disassemble=malloc_default libc.so' to find it), which is
 > decidedly not the instruction address malloc_default+0x18 but rather
 > some tls offset that is reasonable to add to the tcb pointer.
 > 
 > Now, the emulation routine
 > https://nxr.netbsd.org/xref/src/sys/arch/mips/mips/mipsX_subr.S?r=1.115#1297
 > is not _supposed_ to clobber v0 -- it goes out of its way to save v0 on
 > the kernel stack and restore it before returning from the exception:
 > 
 >     1312 	/* Need two working registers */
 >     1313 	REG_S	AT, CALLFRAME_SIZ+TF_REG_AST(k0)
 >     1314 	REG_S	v0, CALLFRAME_SIZ+TF_REG_V0(k0)
 > ...
 >     1349 	REG_L	AT, CALLFRAME_SIZ+TF_REG_AST(k0)# restore reg
 >     1350 	REG_L	v0, CALLFRAME_SIZ+TF_REG_V0(k0) # restore reg
 >     1351 	eret
 > 
 > But, in all my trials, it has been consistently corrupted in the same
 > way.  The best theory we have for why it is corrupted is cn50xx CPUs --
 > found in erlite3 (but not er4) -- have some kind of register-writeback
 > bug (which passes through some register renaming unchanged) provoked by
 > the particular combination of reading MIPS_COP_0_EXC_PC and eret so
 > that after the eret, the exception pc gets written back to v0 even
 > though we just restored v0 from the kernel stack.
 > 
 > So, all that said, here is a summary of the science we did on my
 > erlite3, together with a patch that seems to address the issue and --
 > under the theory that it is the register that we move MIPS_COP_0_EXC_PC
 > into -- will only corrupt a temporary register k0 which is not
 > accessible to userland and treated as garbage on any kernel entry
 > points, so it's safe:
 > 
 > https://mail-index.NetBSD.org/netbsd-bugs/2025/04/14/msg088307.html
 > 
 > 
 > 

From: Andreas Gustafsson <gson@gson.org>
To: riastradh@NetBSD.org, gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-evbmips/59236 (Multiple segfaults in erlite3 boot)
Date: Sat, 19 Apr 2025 21:46:24 +0300

 riastradh@NetBSD.org wrote:
 > Can you try the patch at the bottom of this message?

 With the patch, I'm still getting coredumps.  This time, the root file
 system resizing step did not happen, but I did get a login prompt:

   [   1.8020371] WARNING: no TOD clock present
   [   1.8020371] WARNING: using filesystem time
   [   1.8101320] WARNING: CHECK AND RESET THE DATE!
   Sat Apr 19 07:20:47 UTC 2025
   [1]   Segmentation fault      /bin/sh -c "ps -p \$\$ -o ppid="
   [2]   Segmentation fault      rcorder -s nostart ${rc_rcorder_flags} ${scripts}
   rcorder terminated with signal 11
   The following components reported failures:
       rcorder
   See /var/run/rc.log for more information.
   Sat Apr 19 07:20:47 UTC 2025
   init: can't add init: can't add init: kernel secinit: can't add 
   NetBSD/evbmips (Amnesiac) (constty)

   login:

 I was able to log in, and the login shell works, but if I try to run a
 subshell from it, the subshell dumps core.  Many other commands, such
 as ls or ps, also dump core.  The ls and ps in /rescue do work.

 > If you open one of the core dumps in gdb (if you are able to do that
 > from another machine where everything isn't segfaulting all the time,
 > e.g. if the core dump is written to nfs)

 I'm not using nfs, but I managed to get some core dumps (after
 remounting the root file system r/w) and saved them by removing the
 embedded USB flash drive from the erlite3 and reading it on another
 system.  You can find them in

   https://www.gson.org/netbsd/bugs/erlite3/cores.tar.gz

 I have not looked at them myself yet.
 -- 
 Andreas Gustafsson, gson@gson.org

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/59236 CVS commit: src
Date: Sun, 20 Apr 2025 22:31:01 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Apr 20 22:31:01 UTC 2025

 Modified Files:
 	src/distrib/sets/lists/debug: mi
 	src/distrib/sets/lists/tests: mi
 	src/tests/kernel: Makefile t_signal_and_sp.c
 	src/tests/kernel/arch/aarch64: stack_pointer.h
 Added Files:
 	src/tests/kernel: h_execsp.c h_execsp.h
 	src/tests/kernel/arch/aarch64: execsp.S signalsphandler.S
 	src/tests/kernel/arch/x86_64: execsp.S signalsphandler.S
 	    stack_pointer.h

 Log Message:
 Test stack pointer alignment in various scenarios.

 1. elf entry point
 2. main function
 3. signal handler

 Extend the test to amd64 while here -- fortunately both aarch64 and
 amd64 pass, but others, such as mips, will fail:

 PR kern/59327: user stack pointer is not aligned properly

 This extends the test that was previously written for:

 PR kern/58149: aarch64: Cannot return from a signal handler if SP was
 misaligned when the signal arrived

 With any luck, this will help us to systematically eradicate misaligned
 stack pointers as hypothesized to be the reason for:

 PR port-mips/59236: Multiple segfaults in erlite3 boot


 To generate a diff of this commit:
 cvs rdiff -u -r1.476 -r1.477 src/distrib/sets/lists/debug/mi
 cvs rdiff -u -r1.1369 -r1.1370 src/distrib/sets/lists/tests/mi
 cvs rdiff -u -r1.87 -r1.88 src/tests/kernel/Makefile
 cvs rdiff -u -r0 -r1.1 src/tests/kernel/h_execsp.c \
     src/tests/kernel/h_execsp.h
 cvs rdiff -u -r1.1 -r1.2 src/tests/kernel/t_signal_and_sp.c
 cvs rdiff -u -r0 -r1.1 src/tests/kernel/arch/aarch64/execsp.S \
     src/tests/kernel/arch/aarch64/signalsphandler.S
 cvs rdiff -u -r1.1 -r1.2 src/tests/kernel/arch/aarch64/stack_pointer.h
 cvs rdiff -u -r0 -r1.1 src/tests/kernel/arch/x86_64/execsp.S \
     src/tests/kernel/arch/x86_64/signalsphandler.S \
     src/tests/kernel/arch/x86_64/stack_pointer.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/59236 CVS commit: src
Date: Sun, 20 Apr 2025 22:32:26 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Apr 20 22:32:25 UTC 2025

 Modified Files:
 	src/sys/arch/mips/include: mips_param.h
 	src/tests/kernel: Makefile t_signal_and_sp.c
 Added Files:
 	src/tests/kernel/arch/mips: execsp.S signalsphandler.S stack_pointer.h

 Log Message:
 t_signal_and_sp: Add mips support.

 PR kern/59327: user stack pointer is not aligned properly

 PR kern/58149: Cannot return from a signal handler if SP was
 misaligned when the signal arrived

 Stack pointer misaligment in some cases hypothesized to be a possible
 cause of:

 PR port-evbmips/59236: Multiple segfaults in erlite3 boot


 To generate a diff of this commit:
 cvs rdiff -u -r1.52 -r1.53 src/sys/arch/mips/include/mips_param.h
 cvs rdiff -u -r1.88 -r1.89 src/tests/kernel/Makefile
 cvs rdiff -u -r1.4 -r1.5 src/tests/kernel/t_signal_and_sp.c
 cvs rdiff -u -r0 -r1.1 src/tests/kernel/arch/mips/execsp.S \
     src/tests/kernel/arch/mips/signalsphandler.S \
     src/tests/kernel/arch/mips/stack_pointer.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/59236 CVS commit: src/sys/arch/mips/mips
Date: Thu, 24 Apr 2025 01:40:27 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Thu Apr 24 01:40:27 UTC 2025

 Modified Files:
 	src/sys/arch/mips/mips: mipsX_subr.S

 Log Message:
 mips: Disable rdhwr emulation fast path on Octeon CPUs.

 They are haunted.

 PR kern/59064: jemalloc switch to 5.3 broke userland
 PR kern/59236: Multiple segfaults in erlite3 boot


 To generate a diff of this commit:
 cvs rdiff -u -r1.115 -r1.116 src/sys/arch/mips/mips/mipsX_subr.S

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Andreas Gustafsson <gson@gson.org>
To: riastradh@NetBSD.org, gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-evbmips/59236 (Multiple segfaults in erlite3 boot)
Date: Thu, 24 Apr 2025 21:53:42 +0300

 Last week, riastradh@NetBSD.org wrote:
 > Can you try the patch at the bottom of this message?
 > 
 > https://mail-index.NetBSD.org/netbsd-bugs/2025/04/14/msg088307.html

 I built a new octeon.img from -current sources from 2025.04.24.12.54.43
 with the patch (and a second patch to add comp and test to the
 "sets=" line in src/distrib/utils/embedded/conf/octeon.conf), booted
 it from a USB stick, and it sucessfully resized the root FS and
 rebooted without any core dumps.  I'm not sure which commit fixed it,
 but since it probably was one of yours, thank you!

 I have not tried this version without the patch.

 The system did later lock up while running the ATF tests, but that's
 a separate issue that should get its own PR.
 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: feedback->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 14 May 2025 23:18:12 +0000
State-Changed-Why:
fixed in HEAD, needs pullup-9 and pullup-10


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.