NetBSD Problem Report #55556

From www@netbsd.org  Sun Aug  9 00:56:24 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id C7E5F1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  9 Aug 2020 00:56:24 +0000 (UTC)
Message-Id: <20200809005623.750D21A923A@mollari.NetBSD.org>
Date: Sun,  9 Aug 2020 00:56:23 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: [FIXED] amiga kernel freeze when compiled by GCC8
X-Send-Pr-Version: www-1.0

>Number:         55556
>Category:       port-m68k
>Synopsis:       [FIXED] amiga kernel freeze when compiled by GCC8
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-m68k-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Aug 09 01:00:00 +0000 2020
>Closed-Date:    Sun Oct 04 10:05:06 +0000 2020
>Last-Modified:  Sun Oct 04 10:05:06 +0000 2020
>Originator:     Rin Okuyama
>Release:        9.99.69
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD a1200 9.99.69 NetBSD 9.99.69 (A1200.8hack) #13: Thu Aug  6 22:44:27 JST 2020  rin@latipes:/sys/arch/amiga/compile/A1200.8hack amiga
>Description:
amiga kernel compiled by GCC8 randomly freezes as reported for A1200
with 68060:

[1] http://mail-index.netbsd.org/port-amiga/2020/06/04/msg008130.html

as well as FS-UAE (WinUAE-derived emulator):

[2] http://mail-index.netbsd.org/port-amiga/2020/07/23/msg008159.html

When this kind of freezes occur, I cannot even enter DDB nor obtain crash
dump.

As suggested in [1], if kern_tc.o is replaced by that compiled by GCC7,
freeze is remarkably mitigated.

I've found that this is due to wrong setting in
external/gpl3/gcc/dist/gcc/config/m68k/netbsd-elf.h:

| /* Boundary (in *bits*) on which stack pointer should be aligned.
|    The m68k/SVR4 convention is to keep the stack pointer longword aligned.  */
|
| #undef STACK_BOUNDARY
| #define STACK_BOUNDARY 32
| #undef PREFERRED_STACK_BOUNDARY
| #define PREFERRED_STACK_BOUNDARY 32

Meanings of these macros are described in
external/gpl3/gcc/dist/gcc/doc/gcc-int.info:

| -- Macro: STACK_BOUNDARY
|     Define this macro to the minimum alignment enforced by hardware for
|     the stack pointer on this machine.  The definition is a C
|     expression for the desired alignment (measured in bits).  This
|     value is used as a default if 'PREFERRED_STACK_BOUNDARY' is not
|     defined.  On most machines, this should be the same as
|     'PARM_BOUNDARY'.
|
| -- Macro: PREFERRED_STACK_BOUNDARY
|     Define this macro if you wish to preserve a certain alignment for
|     the stack pointer, greater than what the hardware enforces.  The
|     definition is a C expression for the desired alignment (measured in
|     bits).  This macro must evaluate to a value equal to or larger than
|     'STACK_BOUNDARY'.

For m68k, stack is required to be aligned to at least 2-byte boundary by
architecture [3]. Whereas System V ABI [4] demands it to be aligned to
4-byte boundary.

[3] Motorola M68000 Family Programmer’s Reference Manual, etc.
[4] AT&T System V Application Binary Interface Motorola 68000 Processor
    Family Supplement

Therefore, the correct settings should be

| /* Boundary (in *bits*) on which stack pointer should be aligned.
|    The m68k/SVR4 convention is to keep the stack pointer longword aligned.  */
|
| #if 0 /* default to 16 */
| #undef STACK_BOUNDARY
| #define STACK_BOUNDARY 32
| #endif
| #undef PREFERRED_STACK_BOUNDARY
| #define PREFERRED_STACK_BOUNDARY 32

This coincides with how external/gpl3/gcc/dist/gcc/config/m68k/linux.h
defines.

With these setting, amiga kernel works just fine as far as I can see.
Furthermore, for amiga, mac68k, and sun3,

(1)   Kernel compiled by patched GCC8 works with
(1-a) userland built by GCC7 and non-modified GCC8, and
(1-b) userland built by patched GCC8.

(2)   Userland binaries compiled by GCC7 and non-modified GCC8 work fine
      with kernel and base libraries built by patched GCC8.

(3)   There's no regression observed for tests/kernel, tests/lib/libc/sys,
      and tests/lib/libc/gen.

Now, the question is why non-modified GCC8 fails for amiga kernel. This is
because GCC8 more wisely allocates variables on stack than GCC7 does, by
using STACK_BOUNDARY.

For example, the code below is taken from umoddi3.o compiled by non-modified
GCC8, which allocates 8-byte object on stack:

|  10:	240e           	movel %fp,%d2
|  12:	5b82           	subql #5,%d2	| char *p = fp - 5
|  14:	76f8           	moveq #-8,%d3
|  16:	c483           	andl %d3,%d2	| p &= ~7

On the other hand, the following is the same code segment compiled by GCC7
or patched GCC8:

|  10:	74f7           	moveq #-9,%d2
|  12:	d48e           	addl %fp,%d2	| char *p = fp - 9
|  14:	76f8           	moveq #-8,%d3
|  16:	c483           	andl %d3,%d2	| p &= ~7

The former code works only when the frame pointer, i.e., stack boundary is
aligned to 4-byte boundary. Otherwise, that code corrupts stack frame.

Usually, code confirming to System V ABI aligns stack to 4-byte boundary.
However, since the minimum alignment required by hardware is only 2 bytes,
stack is not aligned by 4-byte boundary in an instant in general; there's
a plenty of idiom below in kernel:

| movew	%d0,%sp@-	| push 2-byte word to stack
| clrw	%sp@-		| align stack to 4-byte boundary

If an interrupt occurs between movew and clrw in this example, the stack
is not aligned to 4-byte boundary in the interrupt handler.

If stack is not aligned to 4-byte boundary, stack frame is corrupted when
allocating 8-byte object on it as mentioned above. This is the cause of
freeze; tc_windup() in kern_tc.c called by hardclock() via tc_ticktock()
uses 8-byte objects here and there for 64-bit time_t.

Actually, by inserting assertion

|#ifdef amiga
|        uint32_t sp;
|
|        __asm volatile("movl %%sp,%0" : "=d"(sp));
|        if ((sp & 3) != 0)
|                panic("hardclock");
|#endif

to hardclock(), it fires as expected.

Since the priority of timer interrupt is highest (except for NMI) for
amiga, it can cause for almost every moment in kernel, which results in
the freeze.

I guess that similar failures occur for userland with signal. Therefore,
STACK_BOUNDARY should be set to 16, not 32 for m68k.

Note that by changing STACK_BOUNDARY from 32 to 16 also fixes sun2 kernel.
With non-modified GCC8, sun2 kernel crashes in strange ways during the
early boot stages. However, with this change it boots singleuser. Also, by
adding -fno-omit-frame-pointer, it boots multiuser (I haven't figured out
why).
>How-To-Repeat:
Described above.
>Fix:
Described above.

>Release-Note:

>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org
Cc: port-m68k-maintainer@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: port-m68k/55556: [FIXED] amiga kernel freeze when compiled by GCC8
Date: Sun, 09 Aug 2020 12:41:29 +1000

 > | #if 0 /* default to 16 */
 > | #undef STACK_BOUNDARY
 > | #define STACK_BOUNDARY 32
 > | #endif
 > | #undef PREFERRED_STACK_BOUNDARY
 > | #define PREFERRED_STACK_BOUNDARY 32

 please commit this!  does it complete m68k works? :)


 .mrg.

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55556 CVS commit: src/external/gpl3/gcc/dist/gcc/config/m68k
Date: Mon, 10 Aug 2020 06:24:39 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Mon Aug 10 06:24:39 UTC 2020

 Modified Files:
 	src/external/gpl3/gcc/dist/gcc/config/m68k: netbsd-elf.h

 Log Message:
 PR port-m68k/55556

 Reset STACK_BOUNDARY to default, 16, to fix strange freeze for amiga,
 when kernel is compiled by GCC8.

 For m68k, the stack pointer is required to be aligned to 16-bit boundary
 by architecture. Whereas System V ABI demands it to be aligned to 32-bit
 boundary.

 According to the document, STACK_BOUNDARY is ``the minimum alignment
 enforced by hardware for the stack pointer on this machine.'' Whereas,
 PREFERRED_STACK_BOUNDARY should be used ``if you wish to preserve a
 certain alignment for the stack pointer, greater than what the hardware
 enforces.''

 Therefore, STACK_BOUNDARY and PREFERRED_STACK_BOUNDARY should be 16 and
 32, respectively, for m68k. This is how Linux/m68k does.

 GCC 8 generates codes that wisely allocate 64-bit objects on stack by
 using STACK_BOUNDARY. This corrupts the stack frame if it is not properly
 aligned.

 Since the architecture only guarantees the stack pointer to be aligned to
 16-bit boundary, it is not aligned to 32-bit boundary in an instance in
 general. If the interrupt occurs at this moment, the interrupt handler
 spoils the stack frame as explained above, which results in the mysterious
 kernel freezes.

 I guess that similar failures can occur even for userland with signal.

 With this setting, amiga kernel works just fine as far as I can see.
 Furthermore, I've confirmed for amiga, mac68k, and sun3,

 (1)   Kernel compiled by patched GCC8 works with
 (1-a) userland built by GCC7 and non-modified GCC8, and
 (1-b) userland built by patched GCC8.

 (2)   Userland binaries compiled by GCC7 and non-modified GCC8 work fine
       with kernel and base libraries built by patched GCC8.

 (3)   There's no regression observed for tests/kernel, tests/lib/libc/sys,
       and tests/lib/libc/gen.

 This also fixes sun2 kernel to a considerable extent. With non-modified
 GCC8, sun2 kernel crashes in strange ways during the early boot stages.
 With this change, it boots singleuser.

 OK mrg


 To generate a diff of this commit:
 cvs rdiff -u -r1.15 -r1.16 \
     src/external/gpl3/gcc/dist/gcc/config/m68k/netbsd-elf.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: matthew green <mrg@eterna.com.au>, gnats-bugs@netbsd.org
Cc: port-m68k-maintainer@netbsd.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: port-m68k/55556: [FIXED] amiga kernel freeze when compiled by
 GCC8
Date: Mon, 10 Aug 2020 16:03:11 +0900

 On 2020/08/09 11:41, matthew green wrote:
 > please commit this!  does it complete m68k works? :)

 Done. Thank you for confirmation!

 Combined with two hacks I added to doc/HACKS:

 http://mail-index.netbsd.org/source-changes/2020/08/10/msg120426.html

 m68k ports work fine as far as I can see (at least, amiga, mac68k, and
 sun3 work fine). I think we can switch m68k ports to GCC8 :).

 Thanks,
 rin

State-Changed-From-To: open->closed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Sun, 04 Oct 2020 10:05:06 +0000
State-Changed-Why:
Fixed. No release branches affected.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.