NetBSD Problem Report #58411
From www@netbsd.org Wed Jul 10 06:46:28 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
client-signature RSA-PSS (2048 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id A71C41A9238
for <gnats-bugs@gnats.NetBSD.org>; Wed, 10 Jul 2024 06:46:28 +0000 (UTC)
Message-Id: <20240710064627.608C81A9239@mollari.NetBSD.org>
Date: Wed, 10 Jul 2024 06:46:27 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
X-Send-Pr-Version: www-1.0
>Number: 58411
>Notify-List: uwe@NetBSD.org
>Category: toolchain
>Synopsis: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: toolchain-manager
>State: feedback
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jul 10 06:50:00 +0000 2024
>Closed-Date:
>Last-Modified: Mon Sep 23 10:30:02 +0000 2024
>Originator: Rin Okuyama
>Release: 10.99.11 and netbsd-10
>Organization:
Internet Initiative Japan Inc.
>Environment:
NetBSD sakaizumii.local 10.99.11 NetBSD 10.99.11 (AMD64_NET_MPSAFE) #4: Wed Jul 3 14:55:32 JST 2024 rin@sakaizumii.local:/home/rin/src/sys/arch/amd64/compile/AMD64_NET_MPSAFE
>Description:
I've been troubled by wrong binaries generated by GCC/sh3 12.
This turned out to be, surprisingly enough, due to GCC/x86_64!!
If GCC/sh3 12.4 is built by GCC/x86_64 11.4 shipped with Ubuntu 22.04
(as well as our in-tree GCC/powerpc 12.4), it generates working GENERIC
kernel for landisk. On the other hand, if it is compiled by our in-tree
GCC/x86_64, it drops some `tst` insns by mistake:
https://mail-index.netbsd.org/tech-toolchain/2024/07/02/msg004458.html
and generated kernel does not work, of course.
For `cc --version --verbose`, `-mtune=nocona` appears for us and
`-mtune=generic` for Ubuntu, respectively.
(This comes from bsd.own.mk.)
If `env HOST_CFLAGS='-O -mtune=nocona' HOST_CXXFLAGS='-O -mtune=nocona'`
is used to invoke build.sh, GCC/sh3 built on Ubuntu miscompiles landisk
GENERIC in a similar manner to ours.
Also, if
`env HOST_CFLAGS='-O0 -mtune=generic' HOST_CXXFLAGS='-O -mtune=generic'`
is used, our GCC/x86_64 10.5 seems to generate working GCC/sh3 12.4.
However, unfortunately, `-mtune=generic` is not sufficient for
GCC/x86_64 12.4. Even if this option is specified, generated GCC/sh3
binary is broken.
Our GCC/x86_64 miscompiles insn-preds.cc (generated from sh.md).
If `#GCC pragma optimize ("O0")` is added to this file, generated
GCC/sh3 seems to work even with GCC/x86_64 12.4.
GCC/x86_64 uses `sete %al` (or equivalent) to prepare return values
from bool functions, *without* zero extension. As a result, return
values changes quasi-indeterministic depending on untouched upper
(24+32)-bits of %rax register.
What I've still not understood is why this bug does not affects us
more terribly. As far as I can tell, GCC/x86_64 mostly works fine;
it actually generates successfully working GCC for platforms other
than sh3.
Something wrong for GCC/sh3, or we've just overlooked other
serious problems :(
>How-To-Repeat:
Described above.
>Fix:
N/A
>Release-Note:
>Audit-Trail:
From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org
Cc: toolchain-manager@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Wed, 10 Jul 2024 18:16:36 +1000
looking at config/i386/x86-tune-costs.h nocona, i see that between
gcc10 and gcc12 this gained some additional things:
struct processor_costs nocona_cost =3D { =
[ ... ]
20, 12, /* mask->integer and integer->mask=
moves */
{4, 4, 4}, /* cost of loading mask register
in QImode, HImode, SImode. */
{4, 4, 4}, /* cost if storing mask register
in QImode, HImode, SImode. */
2, /* cost of moving mask register. =
*/
which might be related (the mask/int parts?)
for generic_cost, the change is:
6, 6, /* mask->integer and integer->mask =
moves */
{6, 6, 6}, /* cost of loading mask register
in QImode, HImode, SImode. */
{6, 6, 6}, /* cost if storing mask register
in QImode, HImode, SImode. */
2, /* cost of moving mask register. *=
/
.mrg.
From: Harold Gutch <logix@foobar.franken.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Wed, 10 Jul 2024 12:06:21 +0200
Hi,
just as an additional data point, I did the following on Ubuntu 24.04,
which ships with "gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4)" with a
current tree from 2024-07-06 (so four days ago):
1a)
$ ./build.sh -u -U -O ../../bugs/58411 -m dreamcast -j 16 tools
1b)
$ tooldir.Linux-6.8.0-36-generic-x86_64/bin/shle--netbsdelf-gcc-10.5.0 -O1 -c test.c
$ tooldir.Linux-6.8.0-36-generic-x86_64/bin/shle--netbsdelf-objdump -d test.o
[...]
00000000 <func>:
0: 48 24 tst r4,r4
2: 06 89 bt 12 <func+0x12>
4: 43 60 mov r4,r0
6: 01 88 cmp/eq #1,r0
8: 04 8f bf.s 14 <func+0x14>
a: 01 e0 mov #1,r0
c: 04 90 mov.w 18 <func+0x18>,r0 ! beaf
e: 0b 00 rts
10: 09 00 nop
12: 02 90 mov.w 1a <func+0x1a>,r0 ! dead
14: 0b 00 rts
16: 09 00 nop
18: af be bsr fffffd7a <func+0xfffffd7a>
1a: ad de mov.l 2d0 <func+0x2d0>,r14
2a)
$ ./build.sh -u -U -O ../../bugs/58411 -m dreamcast -j 16 -VHAVE_GCC=12 tools
2b)
$ tooldir.Linux-6.8.0-36-generic-x86_64/bin/shle--netbsdelf-gcc-12.4.0 -O1 -c test.c
$ tooldir.Linux-6.8.0-36-generic-x86_64/bin/shle--netbsdelf-objdump -d test.o
[...]
00000000 <func>:
0: 04 8b bf c <func+0xc>
2: 43 60 mov r4,r0
4: 01 88 cmp/eq #1,r0
6: 04 89 bt 12 <func+0x12>
8: 0b 00 rts
a: 01 e0 mov #1,r0
c: 04 90 mov.w 18 <func+0x18>,r0 ! dead
e: 0b 00 rts
10: 09 00 nop
12: 02 90 mov.w 1a <func+0x1a>,r0 ! beaf
14: 0b 00 rts
16: 09 00 nop
18: ad de mov.l 2d0 <func+0x2d0>,r14
1a: af be bsr fffffd7c <func+0xfffffd7c>
So it appears that this is behavior is not specific to our host gcc
but also appears on other platforms.
Harold
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Wed, 10 Jul 2024 13:44:51 +0300
That reminded me about a bug I once stubled into:
https://github.com/virtio-win/kvm-guest-drivers-windows/issues/87
Given that gcc is now written, afaik, in a mix of C and C++, it makes
me wonder...
-uwe
From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org
Cc: uwe@NetBSD.org, toolchain-manager@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, rokuyama.rk@gmail.com
Subject: re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Thu, 11 Jul 2024 11:14:17 +1000
> That reminded me about a bug I once stubled into:
>
> https://github.com/virtio-win/kvm-guest-drivers-windows/issues/87
>
> Given that gcc is now written, afaik, in a mix of C and C++, it makes
> me wonder...
all of GCC proper compiles with a c++ compiler, except for the
things like libiberty and libgmp etc. between gcc 10 and 12
they finally renamed all the .c files to .cc, meaning a minor
hack for us could be removed :)
.mrg.
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Thu, 11 Jul 2024 22:36:20 +0900
This is not a GCC/amd64 bug! I should have read x86_64 ELF ABI.
It says:
> When a value of type _Bool is returned or passed in a register or
> on the stack, bit 0 contains the truth value and bits 1 to 7
> shall be zero ^14.
> ...
> ^14 Other bits are left unspecified, hence the consumer side of
> those values can rely on it being 0 or 1 when truncated to 8 bit.
So, returning bool value in %al is completely legal, and it is
callers' responsibility to zero-extend to %rax (or %eax or so).
This strongly suggests that there may be bugs in caller side.
And I've finally found it:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e214cab68cb34e77622b91113f7698cf137bbdd6
Alas! Upstream already has fixed it, but just forgot to pull it
up to gcc-12 and -11!
With cherry-picking the above commit, landisk kernel and userland
successfully build. Now, I'm carrying out full ATF run and compiling
some pkgsrc's on my machines.
If there's no regression, I'll commit it to gcc, as well as
gcc.old (and send pull up request to netbsd-10).
Thanks, and sorry for frightening you,
rin
State-Changed-From-To: open->analyzed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Thu, 11 Jul 2024 13:50:00 +0000
State-Changed-Why:
Not a GCC/x86_64 bug. Now, testing upstream fix from development branch.
From: Valery Ushakov <uwe@stderr.spb.ru>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Thu, 11 Jul 2024 19:00:41 +0300
On Thu, Jul 11, 2024 at 13:40:02 +0000, Rin Okuyama wrote:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e214cab68cb34e77622b91113f7698cf137bbdd6
Ah, this is exactly waht I suspected it to be (cf. the virtio driver
bug above, where the C caller saw a protoype with int, but the C++
callee was declared as returning bool).
Thanks for tracking this down!
-uwe
From: matthew green <mrg@eterna23.net>
To: Rin Okuyama <rokuyama.rk@gmail.com>
Cc: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Fri, 12 Jul 2024 04:18:18 +1000
> https://gcc.gnu.org/git/?p=3Dgcc.git;a=3Dcommit;h=3De214cab68cb34e77622b=
91113f7698cf137bbdd6
>
> Alas! Upstream already has fixed it, but just forgot to pull it
> up to gcc-12 and -11!
>
> With cherry-picking the above commit, landisk kernel and userland
> successfully build. Now, I'm carrying out full ATF run and compiling
> some pkgsrc's on my machines.
>
> If there's no regression, I'll commit it to gcc, as well as
> gcc.old (and send pull up request to netbsd-10).
>
> Thanks, and sorry for frightening you,
excellent news! thank you for frightening yourself :-)
.mrg.
State-Changed-From-To: analyzed->feedback
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Sun, 11 Aug 2024 08:08:22 +0000
State-Changed-Why:
fix was merged into gcc 12.
From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Rin Okuyama <rokuyama.rk@gmail.com>
Cc:
Subject: re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Sun, 11 Aug 2024 18:07:49 +1000
i've commited the main GCC 12.x sh fix to -current.
while looking at the recent GCC changes in gcc/config/sh/ i noticed these
three that look interesting:
https://gcc.gnu.org/git/?p=3Dgcc.git;a=3Dcommit;h=3Df49267e163687212824=
9431e9e5d20c0908b7e8e
which seems to fix some optimisation pass issues, and could be merged into
our GCC 12 i think.
https://gcc.gnu.org/git/?p=3Dgcc.git;a=3Dcommit;h=3D58b78cf068b3b24c11d=
7812a5f4de865e9cdb8b4
which looks like the future has some code-size reduction coming, but
maybe not until GCC 14 or 15...
.mrg.
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
matthew green <mrg@NetBSD.org>
Cc:
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Fri, 16 Aug 2024 20:35:44 +0900
On 2024/08/11 17:07, matthew green wrote:
> i've commited the main GCC 12.x sh fix to -current.
Thanks!!
> while looking at the recent GCC changes in gcc/config/sh/ i noticed these
> three that look interesting:
>
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f49267e1636872128249431e9e5d20c0908b7e8e
>
> which seems to fix some optimisation pass issues, and could be merged into
> our GCC 12 i think.
>
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=58b78cf068b3b24c11d7812a5f4de865e9cdb8b4
>
> which looks like the future has some code-size reduction coming, but
> maybe not until GCC 14 or 15...
I've confirmed that there's no new regression for ATF on landisk
built with GCC 12.4, both with and without these changes.
Also, some pkgsrc's can be built natively on system built with
these upstream commits.
Would it be better to cherry-pick these commits now?
I believe we're ready to switch sh3 to gcc 12.4, anyway :)
Thanks,
rin
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
matthew green <mrg@NetBSD.org>
Cc:
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Fri, 16 Aug 2024 22:10:17 +0900
On 2024/08/16 20:35, Rin Okuyama wrote:
> On 2024/08/11 17:07, matthew green wrote:
>> i've commited the main GCC 12.x sh fix to -current.
>
> Thanks!!
Also, I've tested netbsd-10 branch with this diff back-ported to
GCC 10.5; although no serious problem has been observed for GCC 10,
it can be, precisely depending on host-compiler behavior.
There was no regression for full ATF. I will commit it for gcc.old,
and send pullup request to netbsd-10 with s/gcc.old/gcc/, if there's
no objection.
Thanks,
rin
From: matthew green <mrg@eterna23.net>
To: rin@netbsd.org
Cc: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Sat, 17 Aug 2024 15:21:31 +1000
Rin Okuyama writes:
> On 2024/08/16 20:35, Rin Okuyama wrote:
> > On 2024/08/11 17:07, matthew green wrote:
> >> i've commited the main GCC 12.x sh fix to -current.
> >
> > Thanks!!
>
> Also, I've tested netbsd-10 branch with this diff back-ported to
> GCC 10.5; although no serious problem has been observed for GCC 10,
> it can be, precisely depending on host-compiler behavior.
>
> There was no regression for full ATF. I will commit it for gcc.old,
> and send pullup request to netbsd-10 with s/gcc.old/gcc/, if there's
> no objection.
for GCC 10 there shouldn't be a problem.
the relevant function was changed from "int" to "bool" return
between 10 and 12, but the sh backend had an awful ugly "extern"
in the .cc file instead of including the right header. [*]
so while the problem exist with GCC 10, the "awful ugly" part
is at least identical and does not cause a problem.
ie, i don't object to a pullup/etc but i don't think it matters.
.mrg.
[*] eg, on x86 this would cause the caller to check 32 bits of
the register, when the callee now only set 8 bits.
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: matthew green <mrg@NetBSD.org>
Cc: gnats-bugs@netbsd.org, toolchain-manager@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: toolchain/58411: GCC/x86_64 10.5 and 12.4 miscompile GCC/sh3 12.4
Date: Sat, 17 Aug 2024 20:03:22 +0900
On 2024/08/17 14:21, matthew green wrote:
> for GCC 10 there shouldn't be a problem.
>
> the relevant function was changed from "int" to "bool" return
> between 10 and 12, but the sh backend had an awful ugly "extern"
> in the .cc file instead of including the right header. [*]
>
> so while the problem exist with GCC 10, the "awful ugly" part
> is at least identical and does not cause a problem.
>
> ie, i don't object to a pullup/etc but i don't think it matters.
Ah, you are right. I'd wrongly assumed that these functions are
already bool for GCC 10.x. I will leave gcc.old as is.
How about these?:
On 2024/08/16 20:35, Rin Okuyama wrote:
> On 2024/08/11 17:07, matthew green wrote:
>>
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f49267e1636872128249431e9e5d20c0908b7e8e
>>
>> which seems to fix some optimisation pass issues, and could be merged
>> into
>> our GCC 12 i think.
>>
>>
>>
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=58b78cf068b3b24c11d7812a5f4de865e9cdb8b4
>>
>> which looks like the future has some code-size reduction coming, but
>> maybe not until GCC 14 or 15...
>
> I've confirmed that there's no new regression for ATF on landisk
> built with GCC 12.4, both with and without these changes.
>
> Also, some pkgsrc's can be built natively on system built with
> these upstream commits.
Would it be better to cherry-pick before switch sh3 to 12.4?
Thanks,
rin
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58411 CVS commit: src/share/mk
Date: Mon, 23 Sep 2024 10:21:14 +0000
Module Name: src
Committed By: rin
Date: Mon Sep 23 10:21:14 UTC 2024
Modified Files:
src/share/mk: bsd.own.mk
Log Message:
bsd.own.mk: Switch sh3 to GCC12
No new regression observed for full ATF run on DIAGNOSTIC
kernel for landisk.
PR toolchain/58411
To generate a diff of this commit:
cvs rdiff -u -r1.1403 -r1.1404 src/share/mk/bsd.own.mk
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58411 CVS commit: src/external/gpl3/gcc
Date: Mon, 23 Sep 2024 10:25:04 +0000
Module Name: src
Committed By: rin
Date: Mon Sep 23 10:25:04 UTC 2024
Modified Files:
src/external/gpl3/gcc: README.gcc12
Log Message:
README.gcc12: Document sh3 switch
Everything works just fine (at least for landisk) after
PR toolchain/58411 fix.
To generate a diff of this commit:
cvs rdiff -u -r1.27 -r1.28 src/external/gpl3/gcc/README.gcc12
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.