NetBSD Problem Report #51654
From dholland@netbsd.org Sat Nov 26 03:58:50 2016
Return-Path: <dholland@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 3AE7B7A2A4
for <gnats-bugs@gnats.NetBSD.org>; Sat, 26 Nov 2016 03:58:50 +0000 (UTC)
Message-Id: <20161126035849.BAC35855F6@mail.netbsd.org>
Date: Sat, 26 Nov 2016 03:58:49 +0000 (UTC)
From: dholland@NetBSD.org
Reply-To: dholland@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: wrong ps_strings breaks emacs20
X-Send-Pr-Version: 3.95
>Number: 51654
>Category: lib
>Synopsis: wrong ps_strings breaks emacs20
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: lib-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Nov 26 04:00:00 +0000 2016
>Closed-Date: Wed Feb 28 16:54:10 +0000 2018
>Last-Modified: Wed Feb 28 16:54:10 +0000 2018
>Originator: David A. Holland
>Release: NetBSD 7.99.42 (20161125)
>Organization:
>Environment:
System: NetBSD valkyrie 7.99.42 NetBSD 7.99.42 (VALKYRIE) #20: Fri Nov 25 18:33:19 EST 2016 dholland@valkyrie:/usr/src/sys/arch/amd64/compile/VALKYRIE amd64
Architecture: x86_64
Machine: amd64
>Description:
After updating emacs dumps core. After spending a long time barking up
the wrong PaX tree, it seems that the problem is that an invalid
pointer is being provided in __ps_strings and this causes _libc_init
to segv.
I stuck some debugging code into a copy of _libc_init and inserted it
with LD_PRELOAD, and found that __ps_strings is 0x7f7fffffffe0 while
the highest valid range in the address space in /proc/pid maps is
00007f7fffff0000-00007f7ffffff000 rw-p 0000000000000000 00:00 0
..................... 0x7f7fffffffe0 is off the end.
It looks to me like this is because the kernel is providing the wrong
ps_strings address, not because ld.elf_so is mucking about and
garbaging the value, but it's hard to tell because even at its most
verbose ktrace doesn't record this information. (That is itself a bug
and should get fixed.)
I haven't the slightest idea why this happens only with emacs but I
imagine it's a consequence of the emacs dump/undump mechanism somehow.
>How-To-Repeat:
You can build a new emacs binary that exhibits the same behavior as my
old one by building editors/emacs20-20.7nb21 (which I just committed,
with fixes for some other problems) on current amd64. Or at least, I
can. YMMV, but hopefully it's not just me.
>Fix:
Not being able to run my editor is making it rather difficult to do
stuff, so I'd appreciate help...
>Release-Note:
>Audit-Trail:
From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/51654: wrong ps_strings breaks emacs20
Date: Sat, 26 Nov 2016 06:49:24 +0100
On Sat, Nov 26, 2016 at 04:00:00AM +0000, dholland@NetBSD.org wrote:
> After updating emacs dumps core. After spending a long time barking up
> the wrong PaX tree, it seems that the problem is that an invalid
> pointer is being provided in __ps_strings and this causes _libc_init
> to segv.
The address space layout changed. ps_strings is placed at the very top,
without PAX it is at a fixed address.
> I haven't the slightest idea why this happens only with emacs but I
> imagine it's a consequence of the emacs dump/undump mechanism somehow.
Combine this with the changed MAXUSER address and now certain cached
locations no longer match.
Joerg
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51654: wrong ps_strings breaks emacs20
Date: Sat, 26 Nov 2016 06:14:39 +0000
On Sat, Nov 26, 2016 at 05:50:01AM +0000, Joerg Sonnenberger wrote:
> On Sat, Nov 26, 2016 at 04:00:00AM +0000, dholland@NetBSD.org wrote:
> > After updating emacs dumps core. After spending a long time barking up
> > the wrong PaX tree, it seems that the problem is that an invalid
> > pointer is being provided in __ps_strings and this causes _libc_init
> > to segv.
>
> The address space layout changed. ps_strings is placed at the very top,
> without PAX it is at a fixed address.
Turning off aslr with paxctl has no effect.
> > I haven't the slightest idea why this happens only with emacs but I
> > imagine it's a consequence of the emacs dump/undump mechanism somehow.
>
> Combine this with the changed MAXUSER address and now certain cached
> locations no longer match.
Cached in what? The value for __ps_strings is provided by the kernel
to crt0 and propagates to _libc_init (and causes it to croak) before
anything in emacs gets control.
--
David A. Holland
dholland@netbsd.org
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/51654: wrong ps_strings breaks emacs20
Date: Sat, 26 Nov 2016 06:20:43 +0000
On Sat, Nov 26, 2016 at 06:15:00AM +0000, David Holland wrote:
> > The address space layout changed. ps_strings is placed at the very top,
> > without PAX it is at a fixed address.
>
> Turning off aslr with paxctl has no effect.
I take it back: making sure that emacs is dumped from a temacs with
aslr disabled fixes the problem.
But I don't follow why. Also, this doesn't explain why my preexisting
emacs binary (which was built long before we started mucking with aslr
this year) doesn't work and fails in the same way.
Or did the default non-randomized layout also change, and by the same
unclear mechanism thereby cause things to stop working?
> > > I haven't the slightest idea why this happens only with emacs but I
> > > imagine it's a consequence of the emacs dump/undump mechanism somehow.
> >
> > Combine this with the changed MAXUSER address and now certain cached
> > locations no longer match.
>
> Cached in what? The value for __ps_strings is provided by the kernel
> to crt0 and propagates to _libc_init (and causes it to croak) before
> anything in emacs gets control.
Even if the prior value is saved when undumped, that should be
overwritten by a new value from the current execution, no?
--
David A. Holland
dholland@netbsd.org
From: "David A. Holland" <dholland@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/51654 CVS commit: pkgsrc/editors/emacs20
Date: Sat, 26 Nov 2016 06:38:06 +0000
Module Name: pkgsrc
Committed By: dholland
Date: Sat Nov 26 06:38:06 UTC 2016
Modified Files:
pkgsrc/editors/emacs20: Makefile distinfo
pkgsrc/editors/emacs20/patches: patch-af
Log Message:
Use paxctl +a like in emacs21 to make the build work on -current. It seems
that if you dump with a non-ASLR temacs you get a working emacs binary, and
if you don't you don't, although I don't really see why -- perhaps it's
something broken in crtstuff. Closes PR 51654.
Note that pre-ASLR emacs20 binaries not dumped by an ASLR temacs also
blow up in the same way, which doesn't make much sense either, but
undoubtedly it's all connected.
It's not particularly good that we apparently don't have backwards
compatibility for old Emacs binaries because of this, but for the time
being I'm more worried about it working at all.
PKGREVISION++ again, to 22.
To generate a diff of this commit:
cvs rdiff -u -r1.55 -r1.56 pkgsrc/editors/emacs20/Makefile
cvs rdiff -u -r1.33 -r1.34 pkgsrc/editors/emacs20/distinfo
cvs rdiff -u -r1.2 -r1.3 pkgsrc/editors/emacs20/patches/patch-af
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 26 Nov 2016 06:47:31 +0000
State-Changed-Why:
fixed
State-Changed-From-To: closed->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 26 Nov 2016 06:49:46 +0000
State-Changed-Why:
on reflection, let's keep this open until we understand the mechanism.
Also, it seems that the reason old binaries broke is that a few days ago
maxv unilaterally changed the stack top address:
http://mail-index.netbsd.org/source-changes-d/2016/11/24/msg008895.html
presumably via the same mechanism that somehow causes dumping to bake the
ps_strings value into crt0 or libc when it should be getting passed through
from the kernel.
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51654 (wrong ps_strings breaks emacs20)
Date: Sat, 26 Nov 2016 10:10:53 +0100
The only way I see we could provide backward compat is to mark binaries
like emacs with a special "fixed VA layout" note including a version
number (similar to the compiler memory model notes used on sparc64)
and then exec them with special compat VA layout.
Sane binaries, of course, don't care about this details.
Lots of compat code and rarely tested complexity for stupid binaries,
so probably better just ignore it.
Martin
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51654 (wrong ps_strings breaks emacs20)
Date: Sun, 27 Nov 2016 00:26:17 +0000
On Sat, Nov 26, 2016 at 09:15:01AM +0000, Martin Husemann wrote:
> The only way I see we could provide backward compat is to mark binaries
> like emacs with a special "fixed VA layout" note including a version
> number (similar to the compiler memory model notes used on sparc64)
> and then exec them with special compat VA layout.
That or back out the stack change that broke them all.
Anyway, we figured out what's going on:
(1) _libc_init is called twice, but the comment on it (since updated)
was wrong. It is called from crt0 explicitly and from global
constructor handling, but the order of these is inverted for
static binaries vs. dynamically linked binaries. In a dynamically
linked binary the global constructors for libc are called from
the dynamic linker before crt0 gets control at all. (This is why
a broken emacs cores before __start.)
(2) Therefore, in a dynamic executable, _libc_init is reached before
__ps_strings is initialized by crt0. Ordinarily, then it will be
null and the logic that sets __libc_dlauxinfo will be skipped.
This is fine because __libc_dlauxinfo is only used in static
executables; it's there to help substitute for not having the
dynamic linker image.
(3) However, when emacs dumps it saves the static data of crt0, which
is part of the linked program, but not the static data of libc,
which isn't.
(4) Therefore when starting the dumped emacs, libc is not initialized
yet, but the value from the previous startup is still sitting in
__ps_strings. Then _libc_init sees that it's not null and tries
to set __libc_dlauxinfo from it. If the stack location is not the
same as it was for the previous execution, the old __ps_strings
can point off the end of the stack and this results in SIGSEGV.
Thanks to Joerg for providing necessary explanations.
Disabling ASLR for both the original temacs binary and the dumped
emacs binary works around the problem (this fix has been put in place
for emacs20/21 and I think later ones do it on their own) but it isn't
particularly desirable. (While emacs will never run as a pie, it
should still be possible to randomize the stack and library mappings.
temacs itself runs fine with ASLR enabled.)
There are several possible real fixes, but they all have problems:
(a) Have emacs explicitly zero __ps_strings before dumping. This
will work, but exposes the dirty laundry and makes it that much
harder to tidy the internals up later. However, it works
transparently for all or nearly all NetBSD versions going a long
way back.
(b) Give emacs some otherwise-private function to call before
dumping, something like __crt_reset(). On the plus side, this
doesn't expose the details, but on the minus side it would
require configure tests and other fussing to use properly.
(c) Add another variable to crt0 for libc to use in a way that allows
it to tell that the __ps_strings value is bogus. E.g. add "int
__initcount" to crt0 and have _libc_init increment it; then if
on entry to _libc_init it's > 0 we know we're running after
dumping and shouldn't trust __ps_strings. This has the advantage
of not being exposed, but it's messy and adds more crud to crt0
which we don't want.
(d) Misuse #ifdef __PIC__ or similar to detect at compile time in
_libc_init whether the current execution is statically linked.
(This is abusive, and also, a binary statically linked against
libc_pic.a wouldn't work properly.)
(e) Come up with some other way to distinguish being dynamically or
statically linked in _libc_init, either at compile time or
runtime. The problem is: what/how? I don't have any bright ideas,
but if someone else does, speak up: the advantage of this is that
(unlike the other alternatives) it can fix the crash for already-
existing Emacs binaries.
(f) Kick __ps_strings (and also __environ and __progname) out of
crt0.o entirely, and instead provide private entry points in libc
for crt0 to assign them. This has a number of tidiness advantages
going forward, but Joerg is worried that it might generate compat
issues.
--
David A. Holland
dholland@netbsd.org
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51654 (wrong ps_strings breaks emacs20)
Date: Sun, 27 Nov 2016 08:40:49 +0700
Date: Sun, 27 Nov 2016 00:30:00 +0000 (UTC)
From: David Holland <dholland-bugs@netbsd.org>
Message-ID: <20161127003000.E514B7A33D@mollari.NetBSD.org>
| There are several possible real fixes, but they all have problems:
(g) Get rid of the undump procedure completely, and just use temacs
(if you really have to use emacs of any form) - undump is an
abomination and always has been.
kre
From: "Joerg Sonnenberger" <joerg@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/51654 CVS commit: src/lib/libc/misc
Date: Sat, 17 Jun 2017 15:26:44 +0000
Module Name: src
Committed By: joerg
Date: Sat Jun 17 15:26:44 UTC 2017
Modified Files:
src/lib/libc/misc: initfini.c
Log Message:
PR 51654: Don't initialize _dlauxinfo for static binaries.
Emacs likes save all memory of the main binary and the first run of
_libc_init via .init will get the wrong (old) value of __ps_strings.
By avoiding the initialization of _dlauxinfo for shared applications,
it will be touched only by the _libc_init call from crt0.o itself,
at which point __ps_strings is correct.
To generate a diff of this commit:
cvs rdiff -u -r1.13 -r1.14 src/lib/libc/misc/initfini.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: kern-bug-people->lib-bug-people
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Sun, 18 Jun 2017 02:45:54 +0000
Responsible-Changed-Why:
misfiled; was originally thought to be the kernel providing wrong
__ps_strings but actually it was a crtstuff issue.
State-Changed-From-To: open->needs-pullups
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 18 Jun 2017 02:45:54 +0000
State-Changed-Why:
fixed for real, but the fix needs to get into -8
From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/51654 CVS commit: [netbsd-8] src/lib/libc/misc
Date: Wed, 21 Jun 2017 17:46:03 +0000
Module Name: src
Committed By: snj
Date: Wed Jun 21 17:46:02 UTC 2017
Modified Files:
src/lib/libc/misc [netbsd-8]: initfini.c
Log Message:
Pull up following revision(s) (requested by joerg in ticket #44):
lib/libc/misc/initfini.c: revision 1.14
PR 51654: Don't initialize _dlauxinfo for static binaries.
Emacs likes save all memory of the main binary and the first run of
_libc_init via .init will get the wrong (old) value of __ps_strings.
By avoiding the initialization of _dlauxinfo for shared applications,
it will be touched only by the _libc_init call from crt0.o itself,
at which point __ps_strings is correct.
To generate a diff of this commit:
cvs rdiff -u -r1.13 -r1.13.6.1 src/lib/libc/misc/initfini.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: needs-pullups->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Wed, 28 Feb 2018 16:54:10 +0000
State-Changed-Why:
pulled up back in june
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.