NetBSD Problem Report #58411

From www@netbsd.org  Wed Jul 10 06:46:28 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits)
	 client-signature RSA-PSS (2048 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A71C41A9238
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 10 Jul 2024 06:46:28 +0000 (UTC)
Message-Id: <20240710064627.608C81A9239@mollari.NetBSD.org>
Date: Wed, 10 Jul 2024 06:46:27 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
X-Send-Pr-Version: www-1.0

>Number:         58411
>Notify-List:    uwe@NetBSD.org
>Category:       toolchain
>Synopsis:       GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    toolchain-manager
>State:          feedback
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 10 06:50:00 +0000 2024
>Closed-Date:    
>Last-Modified:  Mon Sep 23 10:30:02 +0000 2024
>Originator:     Rin Okuyama
>Release:        10.99.11 and netbsd-10
>Organization:
Internet Initiative Japan Inc.
>Environment:
NetBSD sakaizumii.local 10.99.11 NetBSD 10.99.11 (AMD64_NET_MPSAFE) #4: Wed Jul  3 14:55:32 JST 2024  rin@sakaizumii.local:/home/rin/src/sys/arch/amd64/compile/AMD64_NET_MPSAFE
>Description:
I've been troubled by wrong binaries generated by GCC/sh3 12.
This turned out to be, surprisingly enough, due to GCC/x86_64!!

If GCC/sh3 12.4 is built by GCC/x86_64 11.4 shipped with Ubuntu 22.04
(as well as our in-tree GCC/powerpc 12.4), it generates working GENERIC
kernel for landisk. On the other hand, if it is compiled by our in-tree
GCC/x86_64, it drops some `tst` insns by mistake:

https://mail-index.netbsd.org/tech-toolchain/2024/07/02/msg004458.html

and generated kernel does not work, of course.

For `cc --version --verbose`, `-mtune=nocona` appears for us and
`-mtune=generic` for Ubuntu, respectively.
(This comes from bsd.own.mk.)

If `env HOST_CFLAGS='-O -mtune=nocona' HOST_CXXFLAGS='-O -mtune=nocona'`
is used to invoke build.sh, GCC/sh3 built on Ubuntu miscompiles landisk
GENERIC in a similar manner to ours.

Also, if
`env HOST_CFLAGS='-O0 -mtune=generic' HOST_CXXFLAGS='-O -mtune=generic'`
is used, our GCC/x86_64 10.5 seems to generate working GCC/sh3 12.4.

However, unfortunately, `-mtune=generic` is not sufficient for
GCC/x86_64 12.4. Even if this option is specified, generated GCC/sh3
binary is broken.

Our GCC/x86_64 miscompiles insn-preds.cc (generated from sh.md).
If `#GCC pragma optimize ("O0")` is added to this file, generated
GCC/sh3 seems to work even with GCC/x86_64 12.4.

GCC/x86_64 uses `sete %al` (or equivalent) to prepare return values
from bool functions, *without* zero extension. As a result, return
values changes quasi-indeterministic depending on untouched upper
(24+32)-bits of %rax register.

What I've still not understood is why this bug does not affects us
more terribly. As far as I can tell, GCC/x86_64 mostly works fine;
it actually generates successfully working GCC for platforms other
than sh3.

Something wrong for GCC/sh3, or we've just overlooked other
serious problems :(
>How-To-Repeat:
Described above.
>Fix:
N/A

>Release-Note:

>Audit-Trail:
From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org
Cc: toolchain-manager@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Wed, 10 Jul 2024 18:16:36 +1000

 looking at config/i386/x86-tune-costs.h nocona, i see that between
 gcc10 and gcc12 this gained some additional things:

 struct processor_costs nocona_cost =3D {     =

 [ ... ]
   20, 12,                               /* mask->integer and integer->mask=
  moves */
   {4, 4, 4},                            /* cost of loading mask register
                                            in QImode, HImode, SImode.  */
   {4, 4, 4},                            /* cost if storing mask register
                                            in QImode, HImode, SImode.  */
   2,                                    /* cost of moving mask register.  =
 */

 which might be related (the mask/int parts?)

 for generic_cost, the change is:

   6, 6,                                /* mask->integer and integer->mask =
 moves */
   {6, 6, 6},                           /* cost of loading mask register
                                           in QImode, HImode, SImode.  */
   {6, 6, 6},                   /* cost if storing mask register
                                           in QImode, HImode, SImode.  */
   2,                                   /* cost of moving mask register.  *=
 /


 .mrg.

From: Harold Gutch <logix@foobar.franken.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Wed, 10 Jul 2024 12:06:21 +0200

 Hi,

 just as an additional data point, I did the following on Ubuntu 24.04,
 which ships with "gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4)" with a
 current tree from 2024-07-06 (so four days ago):

 1a)
   $ ./build.sh -u -U -O ../../bugs/58411 -m dreamcast -j 16 tools

 1b)

   $ tooldir.Linux-6.8.0-36-generic-x86_64/bin/shle--netbsdelf-gcc-10.5.0 -O1 -c test.c
   $ tooldir.Linux-6.8.0-36-generic-x86_64/bin/shle--netbsdelf-objdump -d test.o
   [...]
   00000000 <func>:
      0:   48 24           tst     r4,r4
      2:   06 89           bt      12 <func+0x12>
      4:   43 60           mov     r4,r0
      6:   01 88           cmp/eq  #1,r0
      8:   04 8f           bf.s    14 <func+0x14>
      a:   01 e0           mov     #1,r0
      c:   04 90           mov.w   18 <func+0x18>,r0       ! beaf
      e:   0b 00           rts
     10:   09 00           nop
     12:   02 90           mov.w   1a <func+0x1a>,r0       ! dead
     14:   0b 00           rts
     16:   09 00           nop
     18:   af be           bsr     fffffd7a <func+0xfffffd7a>
     1a:   ad de           mov.l   2d0 <func+0x2d0>,r14


 2a)
   $ ./build.sh -u -U -O ../../bugs/58411 -m dreamcast -j 16 -VHAVE_GCC=12 tools

 2b)

   $ tooldir.Linux-6.8.0-36-generic-x86_64/bin/shle--netbsdelf-gcc-12.4.0 -O1 -c test.c
   $ tooldir.Linux-6.8.0-36-generic-x86_64/bin/shle--netbsdelf-objdump -d test.o
   [...]
   00000000 <func>:
      0:   04 8b           bf      c <func+0xc>
      2:   43 60           mov     r4,r0
      4:   01 88           cmp/eq  #1,r0
      6:   04 89           bt      12 <func+0x12>
      8:   0b 00           rts
      a:   01 e0           mov     #1,r0
      c:   04 90           mov.w   18 <func+0x18>,r0       ! dead
      e:   0b 00           rts
     10:   09 00           nop
     12:   02 90           mov.w   1a <func+0x1a>,r0       ! beaf
     14:   0b 00           rts
     16:   09 00           nop
     18:   ad de           mov.l   2d0 <func+0x2d0>,r14
     1a:   af be           bsr     fffffd7c <func+0xfffffd7c>


 So it appears that this is behavior is not specific to our host gcc
 but also appears on other platforms.


   Harold

From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Wed, 10 Jul 2024 13:44:51 +0300

 That reminded me about a bug I once stubled into:

   https://github.com/virtio-win/kvm-guest-drivers-windows/issues/87

 Given that gcc is now written, afaik, in a mix of C and C++, it makes
 me wonder...

 -uwe

From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org
Cc: uwe@NetBSD.org, toolchain-manager@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, rokuyama.rk@gmail.com
Subject: re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Thu, 11 Jul 2024 11:14:17 +1000

 >  That reminded me about a bug I once stubled into:
 >  
 >    https://github.com/virtio-win/kvm-guest-drivers-windows/issues/87
 >  
 >  Given that gcc is now written, afaik, in a mix of C and C++, it makes
 >  me wonder...

 all of GCC proper compiles with a c++ compiler, except for the
 things like libiberty and libgmp etc.  between gcc 10 and 12
 they finally renamed all the .c files to .cc, meaning a minor
 hack for us could be removed :)


 .mrg.

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Thu, 11 Jul 2024 22:36:20 +0900

 This is not a GCC/amd64 bug! I should have read x86_64 ELF ABI.
 It says:

  > When a value of type _Bool is returned or passed in a register or
  > on the stack, bit 0 contains the truth value and bits 1 to 7
  > shall be zero ^14.
  > ...
  > ^14 Other bits are left unspecified, hence the consumer side of
  > those values can rely on it being 0 or 1 when truncated to 8 bit.

 So, returning bool value in %al is completely legal, and it is
 callers' responsibility to zero-extend to %rax (or %eax or so).

 This strongly suggests that there may be bugs in caller side.
 And I've finally found it:

 https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e214cab68cb34e77622b91113f7698cf137bbdd6

 Alas! Upstream already has fixed it, but just forgot to pull it
 up to gcc-12 and -11!

 With cherry-picking the above commit, landisk kernel and userland
 successfully build. Now, I'm carrying out full ATF run and compiling
 some pkgsrc's on my machines.

 If there's no regression, I'll commit it to gcc, as well as
 gcc.old (and send pull up request to netbsd-10).

 Thanks, and sorry for frightening you,
 rin

State-Changed-From-To: open->analyzed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Thu, 11 Jul 2024 13:50:00 +0000
State-Changed-Why:
Not a GCC/x86_64 bug. Now, testing upstream fix from development branch.


From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Thu, 11 Jul 2024 19:00:41 +0300

 On Thu, Jul 11, 2024 at 13:40:02 +0000, Rin Okuyama wrote:

 >  https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e214cab68cb34e77622b91113f7698cf137bbdd6

 Ah, this is exactly waht I suspected it to be (cf. the virtio driver
 bug above, where the C caller saw a protoype with int, but the C++
 callee was declared as returning bool).

 Thanks for tracking this down!

 -uwe

From: matthew green <mrg@eterna23.net>
To: Rin Okuyama <rokuyama.rk@gmail.com>
Cc: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
    gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Fri, 12 Jul 2024 04:18:18 +1000

 > https://gcc.gnu.org/git/?p=3Dgcc.git;a=3Dcommit;h=3De214cab68cb34e77622b=
 91113f7698cf137bbdd6
 >
 > Alas! Upstream already has fixed it, but just forgot to pull it
 > up to gcc-12 and -11!
 >
 > With cherry-picking the above commit, landisk kernel and userland
 > successfully build. Now, I'm carrying out full ATF run and compiling
 > some pkgsrc's on my machines.
 >
 > If there's no regression, I'll commit it to gcc, as well as
 > gcc.old (and send pull up request to netbsd-10).
 >
 > Thanks, and sorry for frightening you,

 excellent news!  thank you for frightening yourself :-)


 .mrg.

State-Changed-From-To: analyzed->feedback
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Sun, 11 Aug 2024 08:08:22 +0000
State-Changed-Why:
fix was merged into gcc 12.


From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
    gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
    Rin Okuyama <rokuyama.rk@gmail.com>
Cc: 
Subject: re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Sun, 11 Aug 2024 18:07:49 +1000

 i've commited the main GCC 12.x sh fix to -current.

 while looking at the recent GCC changes in gcc/config/sh/ i noticed these
 three that look interesting:

    https://gcc.gnu.org/git/?p=3Dgcc.git;a=3Dcommit;h=3Df49267e163687212824=
 9431e9e5d20c0908b7e8e

 which seems to fix some optimisation pass issues, and could be merged into
 our GCC 12 i think.

    https://gcc.gnu.org/git/?p=3Dgcc.git;a=3Dcommit;h=3D58b78cf068b3b24c11d=
 7812a5f4de865e9cdb8b4

 which looks like the future has some code-size reduction coming, but
 maybe not until GCC 14 or 15...


 .mrg.

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
 matthew green <mrg@NetBSD.org>
Cc: 
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Fri, 16 Aug 2024 20:35:44 +0900

 On 2024/08/11 17:07, matthew green wrote:
 > i've commited the main GCC 12.x sh fix to -current.

 Thanks!!

 > while looking at the recent GCC changes in gcc/config/sh/ i noticed these
 > three that look interesting:
 > 
 >     https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f49267e1636872128249431e9e5d20c0908b7e8e
 > 
 > which seems to fix some optimisation pass issues, and could be merged into
 > our GCC 12 i think.
 > 
 >     https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=58b78cf068b3b24c11d7812a5f4de865e9cdb8b4
 > 
 > which looks like the future has some code-size reduction coming, but
 > maybe not until GCC 14 or 15...

 I've confirmed that there's no new regression for ATF on landisk
 built with GCC 12.4, both with and without these changes.

 Also, some pkgsrc's can be built natively on system built with
 these upstream commits.

 Would it be better to cherry-pick these commits now?

 I believe we're ready to switch sh3 to gcc 12.4, anyway :)

 Thanks,
 rin

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
 matthew green <mrg@NetBSD.org>
Cc: 
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Fri, 16 Aug 2024 22:10:17 +0900

 On 2024/08/16 20:35, Rin Okuyama wrote:
 > On 2024/08/11 17:07, matthew green wrote:
 >> i've commited the main GCC 12.x sh fix to -current.
 > 
 > Thanks!!

 Also, I've tested netbsd-10 branch with this diff back-ported to
 GCC 10.5; although no serious problem has been observed for GCC 10,
 it can be, precisely depending on host-compiler behavior.

 There was no regression for full ATF. I will commit it for gcc.old,
 and send pullup request to netbsd-10 with s/gcc.old/gcc/, if there's
 no objection.

 Thanks,
 rin

From: matthew green <mrg@eterna23.net>
To: rin@netbsd.org
Cc: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
    gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Sat, 17 Aug 2024 15:21:31 +1000

 Rin Okuyama writes:
 > On 2024/08/16 20:35, Rin Okuyama wrote:
 > > On 2024/08/11 17:07, matthew green wrote:
 > >> i've commited the main GCC 12.x sh fix to -current.
 > > 
 > > Thanks!!
 >
 > Also, I've tested netbsd-10 branch with this diff back-ported to
 > GCC 10.5; although no serious problem has been observed for GCC 10,
 > it can be, precisely depending on host-compiler behavior.
 >
 > There was no regression for full ATF. I will commit it for gcc.old,
 > and send pullup request to netbsd-10 with s/gcc.old/gcc/, if there's
 > no objection.

 for GCC 10 there shouldn't be a problem.

 the relevant function was changed from "int" to "bool" return
 between 10 and 12, but the sh backend had an awful ugly "extern"
 in the .cc file instead of including the right header. [*]

 so while the problem exist with GCC 10, the "awful ugly" part
 is at least identical and does not cause a problem.

 ie, i don't object to a pullup/etc but i don't think it matters.


 .mrg.

 [*] eg, on x86 this would cause the caller to check 32 bits of
     the register, when the callee now only set 8 bits.

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: matthew green <mrg@NetBSD.org>
Cc: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Sat, 17 Aug 2024 20:03:22 +0900

 On 2024/08/17 14:21, matthew green wrote:
 > for GCC 10 there shouldn't be a problem.
 > 
 > the relevant function was changed from "int" to "bool" return
 > between 10 and 12, but the sh backend had an awful ugly "extern"
 > in the .cc file instead of including the right header. [*]
 > 
 > so while the problem exist with GCC 10, the "awful ugly" part
 > is at least identical and does not cause a problem.
 > 
 > ie, i don't object to a pullup/etc but i don't think it matters.

 Ah, you are right. I'd wrongly assumed that these functions are
 already bool for GCC 10.x. I will leave gcc.old as is.

 How about these?:

 On 2024/08/16 20:35, Rin Okuyama wrote:
  > On 2024/08/11 17:07, matthew green wrote:
  >> 
 https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f49267e1636872128249431e9e5d20c0908b7e8e
  >>
  >> which seems to fix some optimisation pass issues, and could be merged
  >> into
  >> our GCC 12 i think.
  >>
  >>
  >> 
 https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=58b78cf068b3b24c11d7812a5f4de865e9cdb8b4
  >>
  >> which looks like the future has some code-size reduction coming, but
  >> maybe not until GCC 14 or 15...
  >
  > I've confirmed that there's no new regression for ATF on landisk
  > built with GCC 12.4, both with and without these changes.
  >
  > Also, some pkgsrc's can be built natively on system built with
  > these upstream commits.

 Would it be better to cherry-pick before switch sh3 to 12.4?

 Thanks,
 rin

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/58411 CVS commit: src/share/mk
Date: Mon, 23 Sep 2024 10:21:14 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Mon Sep 23 10:21:14 UTC 2024

 Modified Files:
 	src/share/mk: bsd.own.mk

 Log Message:
 bsd.own.mk: Switch sh3 to GCC12

 No new regression observed for full ATF run on DIAGNOSTIC
 kernel for landisk.

 PR toolchain/58411


 To generate a diff of this commit:
 cvs rdiff -u -r1.1403 -r1.1404 src/share/mk/bsd.own.mk

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/58411 CVS commit: src/external/gpl3/gcc
Date: Mon, 23 Sep 2024 10:25:04 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Mon Sep 23 10:25:04 UTC 2024

 Modified Files:
 	src/external/gpl3/gcc: README.gcc12

 Log Message:
 README.gcc12: Document sh3 switch

 Everything works just fine (at least for landisk) after
 PR toolchain/58411 fix.


 To generate a diff of this commit:
 cvs rdiff -u -r1.27 -r1.28 src/external/gpl3/gcc/README.gcc12

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.