NetBSD Problem Report #57466

From www@netbsd.org  Mon Jun 12 22:29:47 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id D657E1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 12 Jun 2023 22:29:47 +0000 (UTC)
Message-Id: <20230612222946.E48B51A923D@mollari.NetBSD.org>
Date: Mon, 12 Jun 2023 22:29:46 +0000 (UTC)
From: jbglaw@lug-owl.de
Reply-To: jbglaw@lug-owl.de
To: gnats-bugs@NetBSD.org
Subject: Reproducible builds probably not as reproducible as we thought
X-Send-Pr-Version: www-1.0

>Number:         57466
>Category:       toolchain
>Synopsis:       Reproducible builds probably not as reproducible as we thought
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    toolchain-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jun 12 22:30:00 +0000 2023
>Last-Modified:  Thu Jun 15 22:20:01 +0000 2023
>Originator:     Jan-Benedict Glaw
>Release:        current
>Organization:
>Environment:
Linux lili 5.16.0-4-amd64 #1 SMP PREEMPT Debian 5.16.12-1 (2022-03-08) x86_64 GNU/Linux
>Description:
I'm doing CI builds and right now in an attempt to make VAX builds reproducible. Usually, I'm cross-compiling from Linux.

While doing so, I noticed that two builds from different source directories won't produce the same result. While working myself through the individual issues (in tight contact with Christos), my impression is that you can only get reproducible builds right now when the sources are in the same directory. This is due to -fdebug-prefix-map (will only remap DWARF infos, but not __FILE__ names) being used instead of -ffile-prefix-map.
>How-To-Repeat:
Cross-build from different directories.

Unfortunately, the script to check reproducible builds (https://salsa.debian.org/qa/jenkins.debian.net/-/blob/master/bin/reproducible_netbsd.sh) seems to start all builds from the same directory, so it won't catch these issue.
>Fix:
This patch (WIP --- do not apply yet!) solves most of the issue:

diff --git a/share/mk/bsd.sys.mk b/share/mk/bsd.sys.mk
index bc20ff87b096..0803f83ca0a9 100644
--- a/share/mk/bsd.sys.mk
+++ b/share/mk/bsd.sys.mk
@@ -20,22 +20,22 @@ error2:

 .if !empty(DESTDIR)
 CPPFLAGS+=     -Wp,-iremap,${DESTDIR}:
-REPROFLAGS+=   -fdebug-prefix-map=\$$DESTDIR=
+REPROFLAGS+=   -ffile-prefix-map=\$$DESTDIR=
 .endif

 CPPFLAGS+=     -Wp,-fno-canonical-system-headers
 CPPFLAGS+=     -Wp,-iremap,${NETBSDSRCDIR}:/usr/src
 CPPFLAGS+=     -Wp,-iremap,${X11SRCDIR}:/usr/xsrc

-REPROFLAGS+=   -fdebug-prefix-map=\$$NETBSDSRCDIR=/usr/src
-REPROFLAGS+=   -fdebug-prefix-map=\$$X11SRCDIR=/usr/xsrc
+REPROFLAGS+=   -ffile-prefix-map=\$$NETBSDSRCDIR=/usr/src
+REPROFLAGS+=   -ffile-prefix-map=\$$X11SRCDIR=/usr/xsrc
 .if defined(MAKEOBJDIRPREFIX)
 NETBSDOBJDIR=  ${MAKEOBJDIRPREFIX}${NETBSDSRCDIR}
 .endif

 .if defined(NETBSDOBJDIR)
 .export NETBSDOBJDIR
-REPROFLAGS+=   -fdebug-prefix-map=\$$NETBSDOBJDIR=/usr/obj
+REPROFLAGS+=   -ffile-prefix-map=\$$NETBSDOBJDIR=/usr/obj
 .endif

 LINTFLAGS+=    -R${NETBSDSRCDIR}=/usr/src -R${X11SRCDIR}=/usr/xsrc



Additional to that, (at least for VAX) I see unmapped DW_AT_comp_dir data when `-g` is in CFLAGS. Still have to check why it's not mapped (looking at the code in gcc.old, it should be mapped), but blocklistctl/blocklistd and netpgpverify are probably affected. (Currently building locally to test a workaround by just dropping `-g` from these programs.)

>Audit-Trail:
From: Jan-Benedict Glaw <jbglaw@lug-owl.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: toolchain/57466
Date: Tue, 13 Jun 2023 12:30:39 +0200

 --i7nd2ngnb25fasot
 Content-Type: text/plain; charset=utf-8
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable

 Hi!

 As I got to this issue while trying to get reproducible builds for
 VAX, here's some more notes:

   * With the previously mentioned patch (map all files instead of only
     debug infos), almost all VAX stuff is reproducible.
   * Notable deviations:
     * Everything that used `-g` in their CFLAGS will have the current
       build directory (for DW_AT_comp_dir) in the object files. This
       affects blocklistctl/blocklistd and netpgpverify, but also (all
       of?) the kernel files.
     * The two (C + C++) ubsan tests. It seems ubsan inserts, on its
       own, the source filename into the object files without being
       mapped. (At least for gcc.old, which is used for VAX. Haven't
       checked yet with the amd64 builds.)

 So we have three primary issues:

   1. Globally wrong CFLAGS not hiding (as expected) the source
      location. This will affect all ports.

   2. Issue with CWD mapping for the DW_AT_comp_dir attribute when
      `-g` is in place. Haven't yet looked at this, might be an issue
      with some missing CFLAG or a compiler issue. If it's a compiler
      issue, it at least affects gcc.old, gcc is untested as of now.

   3. ubsan (at least with gcc.old) inserting filenames as well. Need to
      check for gcc.

 At least the first issue cripples all ports. The second and third may
 be a niche thing wrt. gcc.old .  But all of these can be solved.

 MfG, JBG

 --=20

 --i7nd2ngnb25fasot
 Content-Type: application/pgp-signature; name="signature.asc"

 -----BEGIN PGP SIGNATURE-----

 iF0EABECAB0WIQQlDTvPcScNjKREqWEdvV51g5nhuwUCZIhFTQAKCRAdvV51g5nh
 uydtAKCFIL8Kj1ffYXkHq+2Wu18Jn7QpIACbB54R02gmp8CxTziZ1Ombh1K1Vdg=
 =NmQg
 -----END PGP SIGNATURE-----

 --i7nd2ngnb25fasot--

From: Jan-Benedict Glaw <jbglaw@lug-owl.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: toolchain/57466
Date: Tue, 13 Jun 2023 14:26:04 +0200

 --fjspjs4hzvvc32fb
 Content-Type: text/plain; charset=utf-8
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable

 And a note wrt. DW_AT_comp_dir: It seems there's code in bsd.sys.mk to
 cover that, but only with MAKEOBJDIRPREFIX being set. So for this to
 work, it is *not* enough to set an objdir with ./build.sh -O ...

 Should I call it ./build.sh -M ... instead?

 (In any case, with ./build.sh -P, I'd would expect that variable to be
 set to a proper value with -O given...)

 MfG, JBG
 --=20

 --fjspjs4hzvvc32fb
 Content-Type: application/pgp-signature; name="signature.asc"

 -----BEGIN PGP SIGNATURE-----

 iF0EABECAB0WIQQlDTvPcScNjKREqWEdvV51g5nhuwUCZIhgWQAKCRAdvV51g5nh
 u+toAJ9nkfLyXpbwAFewkwPCH7hlFxW+IACgiNUVbk0N0WTjB/6WprjTrLnj4pw=
 =I+ko
 -----END PGP SIGNATURE-----

 --fjspjs4hzvvc32fb--

From: Jan-Benedict Glaw <jbglaw@lug-owl.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: toolchain/57466
Date: Wed, 14 Jun 2023 09:35:06 +0200

 --yezcbzvcsmrkcyx5
 Content-Type: text/plain; charset=utf-8
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable

 Hi!

 Having done more builds, my current conclusion is that at least
 supplying -O <somedir> to build.sh will break reproducible builds.
 Will do another round with trying -M <somedir>.

   So, if you want to get something that's reproducible, do _not_ add
 -O <somedir> to the build process.  That should probably be fixed
 nonetheless, I guess?

 Thanks,
   Jan-Benedict

 --=20

 --yezcbzvcsmrkcyx5
 Content-Type: application/pgp-signature; name="signature.asc"

 -----BEGIN PGP SIGNATURE-----

 iF0EABECAB0WIQQlDTvPcScNjKREqWEdvV51g5nhuwUCZIltpwAKCRAdvV51g5nh
 u2+qAJ9vw38/QGo4ucuTzJk4bTe5oQ2mowCgkcF/Yc6sDfJbg6kzQBHDw9+7K4w=
 =zuMf
 -----END PGP SIGNATURE-----

 --yezcbzvcsmrkcyx5--

From: Jan-Benedict Glaw <jbglaw@lug-owl.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: toolchain/57466
Date: Fri, 16 Jun 2023 00:16:46 +0200

 --hi5gnoo3wgobr4uk
 Content-Type: text/plain; charset=utf-8
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable

 Hi!

 Just to keep my observations on record:

   * Supplying -O <dir> to build.sh leads to differences all over,
     which mostly can be dealt with by using the patch I suggested.

   * Supplying -M <dir> instead works. Also works with the suggested
     patch. Then, the below mentioned differences for the "comp" set
     will vanish.

   * Without -O (and possibly using -M) things mostly work out quite
     well. I'm using vax and amd64 builds for my testing, they both
     have similar remaining differences where the source path shows up
     in the binaries (which isn't caught by the repro build script as
     it seems to start off with the same source dir for its two builds.)

       * The "tests" set has differences in the two ubsan tests. GCC's
         ubsan code seems to add the source's filename, but that's not
         caught by either the remapping regexps, not the -fdebug maps,
         and also not even by the -ffile maps I suggested. This needs a
         proper look at GCC, I suspect it's outputting an unmapped path.

       * The "comp" set has differences, where (full) source filenames
         show up in the binaries (lto-dump, cc1, cc1obj cc1objplus,
         cc1plus, lto1)


 So to keep object files out of the source tree, -M or -O may be used.
 With -P active, -O must not be used, while -M is okay. This needs at
 least a docs update, forbid -O with -P in build.sh, or apply the
 suggested patch to make it work. With the patch and when using -M, the
 only remaining differences (wrt. embedded pathes) are in the "tests"
 set for the ubsan testcases.

 MfG, JBG

 --=20

 --hi5gnoo3wgobr4uk
 Content-Type: application/pgp-signature; name="signature.asc"

 -----BEGIN PGP SIGNATURE-----

 iFwEABECAB0WIQQlDTvPcScNjKREqWEdvV51g5nhuwUCZIuNyQAKCRAdvV51g5nh
 u0IOAJi+FXZg9jMz8WiaZ6+MKP0Iy3HmAJsGviN4ukEq5cjMm5xMQZYJLUdo3g==
 =gC+U
 -----END PGP SIGNATURE-----

 --hi5gnoo3wgobr4uk--

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.