NetBSD Problem Report #57466
From www@netbsd.org Mon Jun 12 22:29:47 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id D657E1A9239
for <gnats-bugs@gnats.NetBSD.org>; Mon, 12 Jun 2023 22:29:47 +0000 (UTC)
Message-Id: <20230612222946.E48B51A923D@mollari.NetBSD.org>
Date: Mon, 12 Jun 2023 22:29:46 +0000 (UTC)
From: jbglaw@lug-owl.de
Reply-To: jbglaw@lug-owl.de
To: gnats-bugs@NetBSD.org
Subject: Reproducible builds probably not as reproducible as we thought
X-Send-Pr-Version: www-1.0
>Number: 57466
>Category: toolchain
>Synopsis: Reproducible builds probably not as reproducible as we thought
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: toolchain-manager
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jun 12 22:30:00 +0000 2023
>Last-Modified: Thu Jun 15 22:20:01 +0000 2023
>Originator: Jan-Benedict Glaw
>Release: current
>Organization:
>Environment:
Linux lili 5.16.0-4-amd64 #1 SMP PREEMPT Debian 5.16.12-1 (2022-03-08) x86_64 GNU/Linux
>Description:
I'm doing CI builds and right now in an attempt to make VAX builds reproducible. Usually, I'm cross-compiling from Linux.
While doing so, I noticed that two builds from different source directories won't produce the same result. While working myself through the individual issues (in tight contact with Christos), my impression is that you can only get reproducible builds right now when the sources are in the same directory. This is due to -fdebug-prefix-map (will only remap DWARF infos, but not __FILE__ names) being used instead of -ffile-prefix-map.
>How-To-Repeat:
Cross-build from different directories.
Unfortunately, the script to check reproducible builds (https://salsa.debian.org/qa/jenkins.debian.net/-/blob/master/bin/reproducible_netbsd.sh) seems to start all builds from the same directory, so it won't catch these issue.
>Fix:
This patch (WIP --- do not apply yet!) solves most of the issue:
diff --git a/share/mk/bsd.sys.mk b/share/mk/bsd.sys.mk
index bc20ff87b096..0803f83ca0a9 100644
--- a/share/mk/bsd.sys.mk
+++ b/share/mk/bsd.sys.mk
@@ -20,22 +20,22 @@ error2:
.if !empty(DESTDIR)
CPPFLAGS+= -Wp,-iremap,${DESTDIR}:
-REPROFLAGS+= -fdebug-prefix-map=\$$DESTDIR=
+REPROFLAGS+= -ffile-prefix-map=\$$DESTDIR=
.endif
CPPFLAGS+= -Wp,-fno-canonical-system-headers
CPPFLAGS+= -Wp,-iremap,${NETBSDSRCDIR}:/usr/src
CPPFLAGS+= -Wp,-iremap,${X11SRCDIR}:/usr/xsrc
-REPROFLAGS+= -fdebug-prefix-map=\$$NETBSDSRCDIR=/usr/src
-REPROFLAGS+= -fdebug-prefix-map=\$$X11SRCDIR=/usr/xsrc
+REPROFLAGS+= -ffile-prefix-map=\$$NETBSDSRCDIR=/usr/src
+REPROFLAGS+= -ffile-prefix-map=\$$X11SRCDIR=/usr/xsrc
.if defined(MAKEOBJDIRPREFIX)
NETBSDOBJDIR= ${MAKEOBJDIRPREFIX}${NETBSDSRCDIR}
.endif
.if defined(NETBSDOBJDIR)
.export NETBSDOBJDIR
-REPROFLAGS+= -fdebug-prefix-map=\$$NETBSDOBJDIR=/usr/obj
+REPROFLAGS+= -ffile-prefix-map=\$$NETBSDOBJDIR=/usr/obj
.endif
LINTFLAGS+= -R${NETBSDSRCDIR}=/usr/src -R${X11SRCDIR}=/usr/xsrc
Additional to that, (at least for VAX) I see unmapped DW_AT_comp_dir data when `-g` is in CFLAGS. Still have to check why it's not mapped (looking at the code in gcc.old, it should be mapped), but blocklistctl/blocklistd and netpgpverify are probably affected. (Currently building locally to test a workaround by just dropping `-g` from these programs.)
>Audit-Trail:
From: Jan-Benedict Glaw <jbglaw@lug-owl.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: toolchain/57466
Date: Tue, 13 Jun 2023 12:30:39 +0200
--i7nd2ngnb25fasot
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Hi!
As I got to this issue while trying to get reproducible builds for
VAX, here's some more notes:
* With the previously mentioned patch (map all files instead of only
debug infos), almost all VAX stuff is reproducible.
* Notable deviations:
* Everything that used `-g` in their CFLAGS will have the current
build directory (for DW_AT_comp_dir) in the object files. This
affects blocklistctl/blocklistd and netpgpverify, but also (all
of?) the kernel files.
* The two (C + C++) ubsan tests. It seems ubsan inserts, on its
own, the source filename into the object files without being
mapped. (At least for gcc.old, which is used for VAX. Haven't
checked yet with the amd64 builds.)
So we have three primary issues:
1. Globally wrong CFLAGS not hiding (as expected) the source
location. This will affect all ports.
2. Issue with CWD mapping for the DW_AT_comp_dir attribute when
`-g` is in place. Haven't yet looked at this, might be an issue
with some missing CFLAG or a compiler issue. If it's a compiler
issue, it at least affects gcc.old, gcc is untested as of now.
3. ubsan (at least with gcc.old) inserting filenames as well. Need to
check for gcc.
At least the first issue cripples all ports. The second and third may
be a niche thing wrt. gcc.old . But all of these can be solved.
MfG, JBG
--=20
--i7nd2ngnb25fasot
Content-Type: application/pgp-signature; name="signature.asc"
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQQlDTvPcScNjKREqWEdvV51g5nhuwUCZIhFTQAKCRAdvV51g5nh
uydtAKCFIL8Kj1ffYXkHq+2Wu18Jn7QpIACbB54R02gmp8CxTziZ1Ombh1K1Vdg=
=NmQg
-----END PGP SIGNATURE-----
--i7nd2ngnb25fasot--
From: Jan-Benedict Glaw <jbglaw@lug-owl.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: toolchain/57466
Date: Tue, 13 Jun 2023 14:26:04 +0200
--fjspjs4hzvvc32fb
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
And a note wrt. DW_AT_comp_dir: It seems there's code in bsd.sys.mk to
cover that, but only with MAKEOBJDIRPREFIX being set. So for this to
work, it is *not* enough to set an objdir with ./build.sh -O ...
Should I call it ./build.sh -M ... instead?
(In any case, with ./build.sh -P, I'd would expect that variable to be
set to a proper value with -O given...)
MfG, JBG
--=20
--fjspjs4hzvvc32fb
Content-Type: application/pgp-signature; name="signature.asc"
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQQlDTvPcScNjKREqWEdvV51g5nhuwUCZIhgWQAKCRAdvV51g5nh
u+toAJ9nkfLyXpbwAFewkwPCH7hlFxW+IACgiNUVbk0N0WTjB/6WprjTrLnj4pw=
=I+ko
-----END PGP SIGNATURE-----
--fjspjs4hzvvc32fb--
From: Jan-Benedict Glaw <jbglaw@lug-owl.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: toolchain/57466
Date: Wed, 14 Jun 2023 09:35:06 +0200
--yezcbzvcsmrkcyx5
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Hi!
Having done more builds, my current conclusion is that at least
supplying -O <somedir> to build.sh will break reproducible builds.
Will do another round with trying -M <somedir>.
So, if you want to get something that's reproducible, do _not_ add
-O <somedir> to the build process. That should probably be fixed
nonetheless, I guess?
Thanks,
Jan-Benedict
--=20
--yezcbzvcsmrkcyx5
Content-Type: application/pgp-signature; name="signature.asc"
-----BEGIN PGP SIGNATURE-----
iF0EABECAB0WIQQlDTvPcScNjKREqWEdvV51g5nhuwUCZIltpwAKCRAdvV51g5nh
u2+qAJ9vw38/QGo4ucuTzJk4bTe5oQ2mowCgkcF/Yc6sDfJbg6kzQBHDw9+7K4w=
=zuMf
-----END PGP SIGNATURE-----
--yezcbzvcsmrkcyx5--
From: Jan-Benedict Glaw <jbglaw@lug-owl.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: toolchain/57466
Date: Fri, 16 Jun 2023 00:16:46 +0200
--hi5gnoo3wgobr4uk
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Hi!
Just to keep my observations on record:
* Supplying -O <dir> to build.sh leads to differences all over,
which mostly can be dealt with by using the patch I suggested.
* Supplying -M <dir> instead works. Also works with the suggested
patch. Then, the below mentioned differences for the "comp" set
will vanish.
* Without -O (and possibly using -M) things mostly work out quite
well. I'm using vax and amd64 builds for my testing, they both
have similar remaining differences where the source path shows up
in the binaries (which isn't caught by the repro build script as
it seems to start off with the same source dir for its two builds.)
* The "tests" set has differences in the two ubsan tests. GCC's
ubsan code seems to add the source's filename, but that's not
caught by either the remapping regexps, not the -fdebug maps,
and also not even by the -ffile maps I suggested. This needs a
proper look at GCC, I suspect it's outputting an unmapped path.
* The "comp" set has differences, where (full) source filenames
show up in the binaries (lto-dump, cc1, cc1obj cc1objplus,
cc1plus, lto1)
So to keep object files out of the source tree, -M or -O may be used.
With -P active, -O must not be used, while -M is okay. This needs at
least a docs update, forbid -O with -P in build.sh, or apply the
suggested patch to make it work. With the patch and when using -M, the
only remaining differences (wrt. embedded pathes) are in the "tests"
set for the ubsan testcases.
MfG, JBG
--=20
--hi5gnoo3wgobr4uk
Content-Type: application/pgp-signature; name="signature.asc"
-----BEGIN PGP SIGNATURE-----
iFwEABECAB0WIQQlDTvPcScNjKREqWEdvV51g5nhuwUCZIuNyQAKCRAdvV51g5nh
u0IOAJi+FXZg9jMz8WiaZ6+MKP0Iy3HmAJsGviN4ukEq5cjMm5xMQZYJLUdo3g==
=gC+U
-----END PGP SIGNATURE-----
--hi5gnoo3wgobr4uk--
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.