NetBSD Problem Report #55990

From tsutsui@ceres.dti.ne.jp  Fri Feb 12 12:02:06 2021
Return-Path: <tsutsui@ceres.dti.ne.jp>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E170D1A9217
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 12 Feb 2021 12:02:06 +0000 (UTC)
Message-Id: <202102121201.11CC1xKQ005201@ceres.dti.ne.jp>
Date: Fri, 12 Feb 2021 21:01:59 +0900 (JST)
From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Reply-To: tsutsui@ceres.dti.ne.jp
To: gnats-bugs@NetBSD.org
Cc: tsutsui@ceres.dti.ne.jp
Subject: kernel stack leak in m68k cpu_setmcontext() and reenter_syscall()
X-Send-Pr-Version: 3.95

>Number:         55990
>Category:       port-m68k
>Synopsis:       kernel stack leak in m68k cpu_setmcontext() and reenter_syscall()
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    tsutsui
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Feb 12 12:05:00 +0000 2021
>Closed-Date:    Fri Feb 26 15:56:39 +0000 2021
>Last-Modified:  Fri Feb 26 15:56:39 +0000 2021
>Originator:     Izumi Tsutsui
>Release:        NetBSD 9.1
>Organization:
>Environment:
System: NetBSD ferrari 9.1 NetBSD 9.1 (GENERIC) #13: Fri Feb 12 18:53:22 JST 2021  tsutsui@mirage:/s/netbsd-9/src/sys/arch/sun3/compile/GENERIC sun3
Architecture: m68k
Machine: confirmed on x68k and sun3, but maybe affects all m68k ports
>Description:
During investigation of panics on NetBSD/sun3 (on 3/60) and NetBSD/x68k,
I've found there are kernel stack leaks in cpu_setmcontext() in
m68k/sig_machdep.c, and it seems to cause random Address error or
MMU fault panics, especially on running Xservers etc.

With the following debug printfs, it shows stack address is moved
to forward on each reenter_syscall() calls:

---

Index: sig_machdep.c
===================================================================
RCS file: /cvsroot/src/sys/arch/m68k/m68k/sig_machdep.c,v
retrieving revision 1.50
diff -u -p -d -r1.50 sig_machdep.c
--- sig_machdep.c	27 Nov 2018 14:09:54 -0000	1.50
+++ sig_machdep.c	12 Feb 2021 11:46:01 -0000
@@ -287,6 +287,8 @@ cpu_setmcontext(struct lwp *l, const mco
 	unsigned int format = mcp->__mc_pad.__mc_frame.__mcf_format;
 	int sz, error;

+printf("%s: stack (&sz) = %p", __func__, &sz);
+
 	/* Validate the supplied context */
 	if ((flags & _UC_CPU) != 0) {
 		error = cpu_mcontext_validate(l, mcp);
@@ -301,8 +303,10 @@ cpu_setmcontext(struct lwp *l, const mco
 		sz = exframesize[format];
 		if (sz < 0)
 			return (EINVAL);
+printf(" restore frame (format=%d, sz=%d)", format, sz);

 		if (frame->f_stackadj == 0) {
+printf(" ->reenter_syscall\n");
 			reenter_syscall(frame, sz);
 			/* NOTREACHED */
 		}
@@ -411,5 +415,6 @@ cpu_setmcontext(struct lwp *l, const mco
 		l->l_sigstk.ss_flags &= ~SS_ONSTACK;
 	mutex_exit(l->l_proc->p_lock);

+printf(" ->return\n");
 	return 0;
 }

---

On NetBSD/x68k 9.1 (12MB X68030 on XM6i emulator)
---
4) ->return
cpu_setmcontext: stack (&sz) = 0x39c2bec restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2b98 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2b98 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2b44 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2b44 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2af0 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2af0 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2a9c restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2a9c restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2a48 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2a48 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c29f4 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c29f4 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c29a0 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c29a0 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c294c restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c294c restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c28f8 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c28f8 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c28a4 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c28a4 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2850 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2850 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c27fc restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c27fc restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c27a8 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c27a8 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2754 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2754 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2700 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2700 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c26ac restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c26ac restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2658 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2658 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2604 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2604 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c25b0 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c25b0 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c255c restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c255c restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2508 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2508 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c24b4 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c24b4 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2460 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2460 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c240c restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c240c restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c23b8 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c23b8 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2364 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2364 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2310 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2310 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c22bc restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c22bc restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2268 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2268 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c2214 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c2214 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c21c0 restore frame (format=11, sz=84) ->return
cpu_setmcontext: stack (&sz) = 0x39c21c0 restore frame (format=11, sz=84) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0x39c216c restore frame (format=11, sz=84)uvm_fault(0x2d20c0, 0x7fff0000, 0x1) -> 0xe
  type 8, code [mmu,,ssw]: 4015166
trap type 8, code = 0x4015166, v = 0x7fff0000
kernel program counter = 0x7fff0000
kernel: MMU fault trap
pid = 1112, lid = 1, pc = 7FFF0000, ps = 2008, sfc = 1, dfc = 1
Registers:
             0        1        2        3        4        5        6        7
dreg: FFFFFFFF 0000FFFF 000A000D 0022BAC8 00000054 00000000 00000000 0016CB00
areg: FFFFFFFF 039C2178 039C21E0 039C2000 008BB660 00182C6C 039C2170 FFEFF190

Kernel stack (039C1F3C):
9C1F3C: 000035C4 039C2094 00000080 000A000D 0022BAC8 00000054 00000000 00000000
9C1F5C: 0016CB00 039C21E0 039C2000 008BB660 00182C6C 0087E458 039C1F9C 00000001
9C1F7C: 002D20C0 7FFF0000 00000D00 0082F3C0 00000000 002323D8 00000000 00000D00
9C1F9C: 039C1FD0 0000B310 0082F3C0 00000009 00000000 00000D00 00000029 0000000A
9C1FBC: FFFFFFFC 00181A24 00000000 00000D00 002D6B29 039C1FF4 00007668 00000000
9C1FDC: 00000D00 00000029 00000029 00000005 00000000 00000D00 039C2078 00183084
9C1FFC: 00000029 
panic: MMU fault
cpu0: Begin traceback...
?(?)
db_panic(2000,8,10,182c6c,39c1f3c) at 0
cpu0: End traceback...

dumping to dev 4,1 offset 239549
dump 

---

On NetBSD/sun3 9.1 (24M real sun3/60)
---
0220] cpu_setmcontext: stack (&sz) = 0xf7b84e8 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b84e8 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b84d0 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b84d0 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b84b8 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b84b8 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b84a0 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b84a0 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8488 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8488 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8470 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8470 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8458 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8458 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8440 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8440 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8428 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8428 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8410 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8410 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b83f8 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b83f8 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b83e0 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b83e0 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b83c8 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b83c8 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b83b0 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b83b0 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8398 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8398 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8380 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8380 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8368 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8368 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8350 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8350 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8338 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8338 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8320 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8320 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8308 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8308 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b82f0 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b82f0 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b82d8 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b82d8 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b82c0 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b82c0 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b82a8 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b82a8 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8290 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8290 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8278 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8278 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8260 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8260 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8248 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8248 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8230 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8230 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8218 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8218 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b8200 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b8200 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b81e8 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b81e8 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b81d0 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b81d0 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b81b8 restore frame (format=10, sz=24) ->return
cpu_setmcontext: stack (&sz) = 0xf7b81b8 restore frame (format=10, sz=24) ->reenter_syscall
cpu_setmcontext: stack (&sz) = 0xf7b81a0 restore frame (format=10, sz=24)trap type=0x1, code=0xa045, v=0xffffffff
kernel: Address error trap
pid = 549, lid = 1, pc = 0E152C0E, ps = 2008, sfc = 1, dfc = 1
Registers:
             0        1        2        3        4        5        6        7
dreg: FFFFFFFF 0000FFFF 000A000D 0E152B94 00000004 00000000 00000027 00004000
areg: FFFFFFFF 0F7B8178 0F7B8210 0F7B8000 0E301920 0E0E277C 0F7B81A4 0DFFF364

Kernel stack (0F7B7FE4):
7B7FE4: 0E008D3E 0F7B80D4 00000080 000A000D 0E152B94 00000004 00000000 
panic: Address error
cpu0: Begin traceback...
?(?)
db_panic(1,2000,f7b8000,e0e29b6,f7b7fe4) at 0
cpu0: End traceback...

dumping to dev 7,1 offset 213232

---


>How-To-Repeat:
See above.

It would be triggered by instruction address errors by page faults,
i.e. memory shortage, so only happens on lower (<32MB) memory machines?

>Fix:
It looks m68k/reenter_syscall.s adjusts stack pointer to prepare
"moved stack frame by stkadj bytes" but doesn't restore %sp
after syscall() is returned?
(I'm not sure how reenter_syscall() was designed though)

---
Izumi Tsutsui

>Release-Note:

>Audit-Trail:
From: "Izumi Tsutsui" <tsutsui@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55990 CVS commit: src/sys/arch/m68k/m68k
Date: Sat, 20 Feb 2021 18:04:20 +0000

 Module Name:	src
 Committed By:	tsutsui
 Date:		Sat Feb 20 18:04:20 UTC 2021

 Modified Files:
 	src/sys/arch/m68k/m68k: reenter_syscall.s

 Log Message:
 Replace magic numbers with proper macros prepared in assym.h.

 No binary changes.
 Note this is a preparation for a possible fix of PR port-m68k/55990.


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.5 src/sys/arch/m68k/m68k/reenter_syscall.s

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: gnats-bugs@netbsd.org
Cc: tsutsui@ceres.dti.ne.jp
Subject: Re: port-m68k/55990: kernel stack leak in m68k cpu_setmcontext() and
	 reenter_syscall()
Date: Sun, 21 Feb 2021 21:39:12 +0900

 > cpu_setmcontext: stack (&sz) = 0x39c2bec restore frame (format=11, sz=84) ->reenter_syscall
 > cpu_setmcontext: stack (&sz) = 0x39c2b98 restore frame (format=11, sz=84) ->return
 > cpu_setmcontext: stack (&sz) = 0x39c2b98 restore frame (format=11, sz=84) ->reenter_syscall
  :
 > >Fix:
 > It looks m68k/reenter_syscall.s adjusts stack pointer to prepare
 > "moved stack frame by stkadj bytes" but doesn't restore %sp
 > after syscall() is returned?
 > (I'm not sure how reenter_syscall() was designed though)

 After misc observations, these stack leaks seem caused by:
 1) heavy setcontext(2) calls from pthread applications (i.e. Xorg server)
 2) heavy address errors (i.e. page faults) on lower RAM (<24MB) environment

 Per various outputs on ddb(4) with patched NetBSD/x68k 9.1 GENERIC on XM6i,
 the following sequence are observed in setcontext(2):
 - setcontext(2) system call is invoked via trap0()
 - trap0() -> syscall() -> syscall_plain() -> sys_setcontext()
   -> setucontext() -> cpu_setmcontext() is called
 - in cpu_setmcontext() a frame format type to be restored to the frame
   (mcp->__mc_pad.__mc_frame.__mcf_format) is FMTB (triggered by the
   prior address error?), so it jumps to reenter_syscall() to allocate
   extra stack space for FMTB (84 bytes)
    https://nxr.netbsd.org/xref/src/sys/arch/m68k/m68k/sig_machdep.c?r=1.50#297
 - reenter_syscall() adjust whole stack frame to store FMTB and calls
   syscall() again with updated stack frame
    https://nxr.netbsd.org/xref/src/sys/arch/m68k/m68k/reenter_syscall.s?r=1.6#34
 - cpu_setmcontext() is called again, and it restores FMTB frame to
   the updated stack frame and resets frame->f_stackadj to zero
    https://nxr.netbsd.org/xref/src/sys/arch/m68k/m68k/sig_machdep.c?r=1.50#316
 - cpu_setmcontext() just returns with frame->f_stackadj=0
 - after cpu_setmcontext() (a syscall entry function) returns,
   machine_userret() (in MD trap.c) is called
    https://nxr.netbsd.org/xref/src/sys/arch/x68k/x68k/trap.c?r=1.108#241
    https://nxr.netbsd.org/xref/src/sys/arch/x68k/x68k/trap.c?r=1.108#174
 - machine_userret() -> lwp_userret() -> postsig() -> sendsig_siginfo()
    -> cpu_getmcontext() is called
    https://nxr.netbsd.org/xref/src/sys/sys/userret.h?r=1.28#95
    https://nxr.netbsd.org/xref/src/sys/arch/m68k/m68k/sig_machdep.c?r=1.50#175
 - m68k cpu_getmcontext() adjusts lwp's frame->f_stackadj and frame->f_format 
    https://nxr.netbsd.org/xref/src/sys/arch/m68k/m68k/sig_machdep.c?r=1.50#232
   -> the kernel stack needs to be adjusted before returning userland?
 - syscall() finally returns to reenter_syscall() with the updated
   frame (FMT0), but reenter_syscall() doesn't check frame->f_stackadj
   and then the kernel stack is left as storing FMTB frame, so
   the extra 84 bytes allocated for FMTB is leaked?

 I have not tracked which function actually updates SR and PC in the FMT0
 frame on returning syscall(), but the following patch that adjusts stack
 per frame->f_stackadj (as existing faultstkadj() in m68k/trap_subr.s does)
 seems to fix this leak.

 With this patch, the Xorg based servers both on sun3 and x68k survive
 over 24 hours without kernel crashes.
 (Note the similar x68k crashes were observed at least back in 2012.)

 I wonder if we should also check frame->f_stackadj in trap0()
 but I would like to commit this fix (workaround?) for now.

 Any comments (especially from m68k and siginfo gurus; kleink@? thorpej@?)
 are appreciated.

 ---

 Index: reenter_syscall.s
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/m68k/m68k/reenter_syscall.s,v
 retrieving revision 1.6
 diff -u -p -d -r1.6 reenter_syscall.s
 --- reenter_syscall.s	21 Feb 2021 07:23:41 -0000	1.6
 +++ reenter_syscall.s	21 Feb 2021 11:41:56 -0000
 @@ -51,6 +51,19 @@ ENTRY_NOPROFILE(reenter_syscall)
  #endif
  	moveal	FR_SP(%sp),%a0		| grab and restore
  	movel	%a0,%usp		|   user SP
 +	movw	FR_ADJ(%sp),%d0		| need to adjust stack?
 +	jne	.Ladjstk		| yes, go to it
  	moveml	(%sp)+,#0x7FFF		| restore user registers
  	addql	#8,%sp			| pop SP and stack adjust
  	jra	_ASM_LABEL(rei)		| rte
 +.Ladjstk:
 +	lea	FR_HW(%sp),%a1		| pointer to HW frame
 +	addql	#8,%a1			| source pointer
 +	movl	%a1,%a0			| source
 +	addw	%d0,%a0			|  + hole size = dest pointer
 +	movl	-(%a1),-(%a0)		| copy
 +	movl	-(%a1),-(%a0)		|  8 bytes
 +	movl	%a0,FR_SP(%sp)		| new SSP
 +	moveml	(%sp)+,#0x7FFF		| restore user register
 +	movl	(%sp),%sp		| and do real RTE
 +	jra	_ASM_LABEL(rei)		| rte

 ---
 Izumi Tsutsui

Responsible-Changed-From-To: port-m68k-maintainer->tsutsui
Responsible-Changed-By: tsutsui@NetBSD.org
Responsible-Changed-When: Sun, 21 Feb 2021 12:46:54 +0000
Responsible-Changed-Why:
I have a possible fix.


State-Changed-From-To: open->analyzed
State-Changed-By: tsutsui@NetBSD.org
State-Changed-When: Sun, 21 Feb 2021 12:46:54 +0000
State-Changed-Why:
The leak sequence is observed.


From: John Klos <john@ziaspace.com>
To: gnats-bugs@netbsd.org
Cc: port-m68k-maintainer@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: port-m68k/55990: kernel stack leak in m68k cpu_setmcontext()
 and  reenter_syscall()
Date: Mon, 22 Feb 2021 02:22:05 +0000 (UTC)

 > > It looks m68k/reenter_syscall.s adjusts stack pointer to prepare
 > > "moved stack frame by stkadj bytes" but doesn't restore %sp
 > > after syscall() is returned?
 > > (I'm not sure how reenter_syscall() was designed though)
 >
 > After misc observations, these stack leaks seem caused by:
 > 1) heavy setcontext(2) calls from pthread applications (i.e. Xorg server)
 > 2) heavy address errors (i.e. page faults) on lower RAM (<24MB) environment

 With this patch, I've been able to run a mac68k system with 10 megabytes 
 of memory for many hours fully multiuser, whereas in the past it would 
 freeze after just minutes or tens of minutes.

 John Klos

From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: port-m68k-maintainer@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 tsutsui@ceres.dti.ne.jp
Subject: Re: port-m68k/55990: kernel stack leak in m68k cpu_setmcontext() and
 reenter_syscall()
Date: Mon, 22 Feb 2021 07:58:15 -0800

 > On Feb 21, 2021, at 4:40 AM, Izumi Tsutsui <tsutsui@ceres.dti.ne.jp> =
 wrote:
 >=20
 > With this patch, the Xorg based servers both on sun3 and x68k survive
 > over 24 hours without kernel crashes.
 > (Note the similar x68k crashes were observed at least back in 2012.)
 >=20
 > I wonder if we should also check frame->f_stackadj in trap0()
 > but I would like to commit this fix (workaround?) for now.
 >=20
 > Any comments (especially from m68k and siginfo gurus; kleink@? =
 thorpej@?)
 > are appreciated.

 I'm really digging into archival section of by brain for this one, but =
 this change seems perfectly reasonable.

 -- thorpej

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: john@ziaspace.com, thorpej@me.com
Cc: gnats-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: port-m68k/55990: kernel stack leak in m68k cpu_setmcontext()and
	  reenter_syscall()
Date: Tue, 23 Feb 2021 23:55:45 +0900

 jklos@ wrote:

 > With this patch, I've been able to run a mac68k system with 10 megabytes 
 > of memory for many hours fully multiuser, whereas in the past it would 
 > freeze after just minutes or tens of minutes.

 thorpej@ wrote:

 > I'm really digging into archival section of by brain for this one, but this change seems perfectly reasonable.

 Thanks for your comments. I'll commit the change and send pullup requests
 soon.

 ---
 Izumi Tsutsui

From: "Izumi Tsutsui" <tsutsui@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55990 CVS commit: src/sys/arch/m68k/m68k
Date: Tue, 23 Feb 2021 16:54:17 +0000

 Module Name:	src
 Committed By:	tsutsui
 Date:		Tue Feb 23 16:54:17 UTC 2021

 Modified Files:
 	src/sys/arch/m68k/m68k: reenter_syscall.s

 Log Message:
 Plug kernel stack leaks in reenter_syscall() for setcontext(2).

 This fixes long standing kernel crashes (MMU fault, address error,
 and silent freeze by a double bus fault etc. seen for ~10 years)
 caused by kernel stack overflow, especially on x68k and sun3 running
 Xorg based servers.  See PR/55990 for more details.

 "This change seems perfectly reasonable" from thorpej@ and
 jklos@ also reported this also solved freeze of his mac68k system
 with 10 megabyes of memory.

 Should be pulled up to netbsd-9 and netbsd-8.


 To generate a diff of this commit:
 cvs rdiff -u -r1.6 -r1.7 src/sys/arch/m68k/m68k/reenter_syscall.s

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->needs-pullups
State-Changed-By: tsutsui@NetBSD.org
State-Changed-When: Tue, 23 Feb 2021 18:25:50 +0000
State-Changed-Why:
I'll send pullup requests soon.


State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: tsutsui@NetBSD.org
State-Changed-When: Wed, 24 Feb 2021 16:54:42 +0000
State-Changed-Why:
[pullup-9 #1214] [pullup-8 #1659]


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55990 CVS commit: [netbsd-9] src/sys/arch/m68k/m68k
Date: Thu, 25 Feb 2021 09:36:27 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Thu Feb 25 09:36:27 UTC 2021

 Modified Files:
 	src/sys/arch/m68k/m68k [netbsd-9]: reenter_syscall.s

 Log Message:
 Pull up following revision(s) (requested by tsutsui in ticket #1214):

 	sys/arch/m68k/m68k/reenter_syscall.s: revision 1.5
 	sys/arch/m68k/m68k/reenter_syscall.s: revision 1.6
 	sys/arch/m68k/m68k/reenter_syscall.s: revision 1.7

 Replace magic numbers with proper macros prepared in assym.h.

 No binary changes.

 Note this is a preparation for a possible fix of PR port-m68k/55990.

 Consistently use motorola style.  No binary changes.
 Seems missed in rev 1.3:
  https://mail-index.netbsd.org/source-changes/2013/08/01/msg046378.html

 Plug kernel stack leaks in reenter_syscall() for setcontext(2).
 This fixes long standing kernel crashes (MMU fault, address error,
 and silent freeze by a double bus fault etc. seen for ~10 years)
 caused by kernel stack overflow, especially on x68k and sun3 running
 Xorg based servers.  See PR/55990 for more details.

 "This change seems perfectly reasonable" from thorpej@ and
 jklos@ also reported this also solved freeze of his mac68k system
 with 10 megabyes of memory.

 Should be pulled up to netbsd-9 and netbsd-8.


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.4.34.1 src/sys/arch/m68k/m68k/reenter_syscall.s

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55990 CVS commit: [netbsd-8] src/sys/arch/m68k/m68k
Date: Thu, 25 Feb 2021 09:38:48 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Thu Feb 25 09:38:48 UTC 2021

 Modified Files:
 	src/sys/arch/m68k/m68k [netbsd-8]: reenter_syscall.s

 Log Message:
 Pull up following revision(s) (requested by tsutsui in ticket #1659):

 	sys/arch/m68k/m68k/reenter_syscall.s: revision 1.5
 	sys/arch/m68k/m68k/reenter_syscall.s: revision 1.6
 	sys/arch/m68k/m68k/reenter_syscall.s: revision 1.7

 Replace magic numbers with proper macros prepared in assym.h.

 No binary changes.

 Note this is a preparation for a possible fix of PR port-m68k/55990.

 Consistently use motorola style.  No binary changes.
 Seems missed in rev 1.3:
  https://mail-index.netbsd.org/source-changes/2013/08/01/msg046378.html

 Plug kernel stack leaks in reenter_syscall() for setcontext(2).
 This fixes long standing kernel crashes (MMU fault, address error,
 and silent freeze by a double bus fault etc. seen for ~10 years)
 caused by kernel stack overflow, especially on x68k and sun3 running
 Xorg based servers.  See PR/55990 for more details.

 "This change seems perfectly reasonable" from thorpej@ and
 jklos@ also reported this also solved freeze of his mac68k system
 with 10 megabyes of memory.

 Should be pulled up to netbsd-9 and netbsd-8.


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.4.22.1 src/sys/arch/m68k/m68k/reenter_syscall.s

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: tsutsui@NetBSD.org
State-Changed-When: Fri, 26 Feb 2021 15:56:39 +0000
State-Changed-Why:
Pullup complete.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.