NetBSD Problem Report #52662
From www@NetBSD.org Sat Oct 28 11:44:07 2017
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 2C71E7A1CE
for <gnats-bugs@gnats.NetBSD.org>; Sat, 28 Oct 2017 11:44:07 +0000 (UTC)
Message-Id: <20171028114405.C82677A210@mollari.NetBSD.org>
Date: Sat, 28 Oct 2017 11:44:05 +0000 (UTC)
From: coypu@sdf.org
Reply-To: coypu@sdf.org
To: gnats-bugs@NetBSD.org
Subject: Almost everything crashes on -current kernel
X-Send-Pr-Version: www-1.0
>Number: 52662
>Category: port-xen
>Synopsis: Almost everything crashes on -current kernel
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: port-xen-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Oct 28 11:45:00 +0000 2017
>Closed-Date: Sun Oct 29 06:56:20 +0000 2017
>Last-Modified: Sun Oct 29 06:56:20 +0000 2017
>Originator: coypu
>Release: NetBSD 8.99.5
>Organization:
>Environment:
NetBSD 8.99.5 NetBSD 8.99.5 (XEN3_DOMU) #3: Sat Oct 28 11:17:52 UTC 2017 lio@lio:/home/lio/obj/sys/arch/amd64/compile/XEN3_DOMU amd64
>Description:
Updating the kernel only
Can make it to userland but:
# service sshd onestart
Starting sshd.
[1] Segmentation fault (core dumped) RC_PID= _rc_pid=...
...
$ gunzip netbsd-XEN3_DOMU.gz
Memory fault (core dumped)
reverting to a kernel from 10 days ago works, also using /rescue
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: port-xen-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-xen/52662: Almost everything crashes on -current kernel
Date: Sat, 28 Oct 2017 13:51:33 +0200
On Sat, Oct 28, 2017 at 11:45:00AM +0000, coypu@sdf.org wrote:
> Updating the kernel only
> Can make it to userland but:
> # service sshd onestart
> Starting sshd.
> [1] Segmentation fault (core dumped) RC_PID= _rc_pid=...
>
> ...
>
> $ gunzip netbsd-XEN3_DOMU.gz
> Memory fault (core dumped)
>
> reverting to a kernel from 10 days ago works, also using /rescue
I noticed this too:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/
in fact no result is available for amd64 because atf crashed on the
first test.
I strongly suspect maxv's commits from Oct, 19 but I couldn't find what's
wrong yet.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Robert Elz <kre@munnari.OZ.AU>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org, port-xen-maintainer@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-xen/52662: Almost everything crashes on -current kernel
Date: Sat, 28 Oct 2017 19:46:55 +0700
Date: Sat, 28 Oct 2017 13:51:33 +0200
From: Manuel Bouyer <bouyer@antioche.eu.org>
Message-ID: <20171028115133.GA3069@antioche.eu.org>
| in fact no result is available for amd64 because atf crashed on the
| first test.
I am seeing the same, what is kind of surprising though is just how
few programs crash - I'm seeing sshd gdb gunzip makemandb and whatever
it is in atf that fails (there could be more, I haven't tested everything).
But mostly (until I moved back to an earlier kernel) I could use the
system OK (gcc, sh, ... all work fine, as do the ATF tests I needed to
run, just running them standalone without ATF).
kre
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Robert Elz <kre@munnari.OZ.AU>
Cc: gnats-bugs@NetBSD.org, port-xen-maintainer@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-xen/52662: Almost everything crashes on -current kernel
Date: Sat, 28 Oct 2017 14:54:03 +0200
On Sat, Oct 28, 2017 at 07:46:55PM +0700, Robert Elz wrote:
> Date: Sat, 28 Oct 2017 13:51:33 +0200
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> Message-ID: <20171028115133.GA3069@antioche.eu.org>
>
> | in fact no result is available for amd64 because atf crashed on the
> | first test.
>
> I am seeing the same, what is kind of surprising though is just how
> few programs crash - I'm seeing sshd gdb gunzip makemandb and whatever
> it is in atf that fails (there could be more, I haven't tested everything).
It may be thread-related. The programs in the list above are linked with
libpthread (exept atf, but it may be a program called by atf which crashes).
I'm not sure if we use %fs or %gs on x86 for TLS ... if we do that
would be consistent with maxv's changes.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-xen/52662: Almost everything crashes on -current kernel
Date: Sat, 28 Oct 2017 17:39:29 +0200
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--LUQv8DmudPbw3ewnhLqXgVf0oQNaecCxR
Content-Type: multipart/mixed; boundary="f0BicEtxX2TSUw4KoQINUjHcr9a4kE8gh";
protected-headers="v1"
From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Message-ID: <6a7803f1-ef2f-488f-239a-a958b2577fdd@gmx.com>
Subject: Re: port-xen/52662: Almost everything crashes on -current kernel
References: <pr-port-xen-52662@gnats.netbsd.org>
<20171028114405.C82677A210@mollari.NetBSD.org>
<20171028133001.DA12D7A210@mollari.NetBSD.org>
In-Reply-To: <20171028133001.DA12D7A210@mollari.NetBSD.org>
--f0BicEtxX2TSUw4KoQINUjHcr9a4kE8gh
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
On 28.10.2017 15:30, Manuel Bouyer wrote:
> I'm not sure if we use %fs or %gs on x86 for TLS ... if we do that
> would be consistent with maxv's changes.
We use %fs+%gs for TLS.
--f0BicEtxX2TSUw4KoQINUjHcr9a4kE8gh--
--LUQv8DmudPbw3ewnhLqXgVf0oQNaecCxR
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAEBCAAGBQJZ9KS3AAoJEEuzCOmwLnZscl0QAJSDjobXXJY9rqV0UrjxxpwN
zYpSr8nV5FMHCv7PNvW5RV6wReVxttYXgJYk9fKVHQnkD+woLwipivTeaY7fw3KS
EUGSBVmLabMdUn6T+V5JYnCHf+DkR+s5HA16E5rQoLxURtkehPp3A83WCi7a1zGc
8ZuzLoFzbRYVXJ73PRffdIrbdlCLGxStneO4hteInFQSmw9kTCehxZZ/LleE9tYx
AhTnkApcWUjCEaG46nYmArrylvrYLkDCTzYtFivJJlDhwUYekSQ8qgyZl7jn6ljG
jhg0p2wSEuk8Et3ahhnPAK1jmLXXqNfqB9wo5yFB16fIpzEeYbxB1WdPy2uRKjNS
FGuy7xQ3ZJ8aLcMBJaELJC+DvAvpJRnvFamdM9Gfo+cLZueOutNm6KhNQ0iJh/KV
H62+R5hjQU1G6lOnWFFMqFcF2vZPPs7DEaBd+HVnxCmTSncWKMKAc+HjV75oCnlg
8/kFBti81wJqm+1TtlrqEZcq3F9AlL5sPuZntrDFre+A3ORMZlQxE03E8Ue5iamr
sQS1M+BWLt+w16265bd4MS4/KxJ9VDPd5GnFhhm4v8A89eL698lAeXgHPVDBHOij
e47sRtu8fPop0biI8ixqzaaZZs0qBh3+fIgLS+LnF54702j6ug3ZQtFkmfyV7kot
KB7NnPMth+Vvan0j8bKM
=Nted
-----END PGP SIGNATURE-----
--LUQv8DmudPbw3ewnhLqXgVf0oQNaecCxR--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: netbsd-bugs@netbsd.org
Subject: Re: port-xen/52662: Almost everything crashes on -current kernel
Date: Sat, 28 Oct 2017 19:58:51 +0200
Here's what I found so far:
makemandb, gzip and gdb all dies at the same point in libpthread:
Core was generated by `gdb'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000793be0208fad in ?? () from /usr/lib/libpthread.so.1
(gdb) x/i 0x0000793be0208fad
=> 0x793be0208fad: nopl %cs:0x0(%rax,%rax,1)
(gdb) info registers
rax 0x0 0
rbx 0x793be2907800 133298111150080
rcx 0x53 83
rdx 0x793be020ac83 133298070269059
rsi 0x0 0
rdi 0x793be096d4e0 133298078012640
rbp 0x793be0411840 0x793be0411840 <pthread.allqueue>
rsp 0x7f7fff8c8d58 0x7f7fff8c8d58
r8 0x101010101010101 72340172838076673
r9 0x8080808080808080 -9187201950435737472
r10 0x793be063eb0a 133298074675978
r11 0x202 514
r12 0x0 0
r13 0x0 0
r14 0x793be020ac83 133298070269059
r15 0x793be0638820 133298074650656
rip 0x793be0208fad 0x793be0208fad
eflags 0x10246 [ PF ZF IF RF ]
cs 0xe033 57395
ss 0xe02b 57387
ds 0x23 35
es 0x23 35
fs 0x0 0
gs 0x0 0
atf-run and atf-report are different:
Core was generated by `atf-run'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f7f92e0a4f0 in _rtld_process_hints () from /usr/libexec/ld.elf_so
(gdb) x/i 0x00007f7f92e0a4f0
=> 0x7f7f92e0a4f0 <_rtld_process_hints+1717>: callq 0x7f7f92e07012 <xmalloc>
(gdb) info registers
rax 0x1 1
rbx 0x7a5b14b19240 134531607794240
rcx 0x4 4
rdx 0x4e22b364 1310896996
rsi 0x4e445d30 1313103152
rdi 0x7a5b14f13160 134531611963744
rbp 0x4e22b364 0x4e22b364 <tools::system_error::~system_error()>
rsp 0x7f7fffcfdba8 0x7f7fffcfdba8
r8 0x7a5b14b2c11c 134531607871772
r9 0x7a5b14b2c14c 134531607871820
r10 0x7263742f30486e4e 8242559489040477774
r11 0xfffffffffffffffc -4
r12 0x4e445d30 1313103152
r13 0x7a5b14b19240 134531607794240
r14 0x7f7fffcfdc00 140187729386496
r15 0x7f7fffcfdef0 140187729387248
rip 0x7f7f92e0a4f0 0x7f7f92e0a4f0 <_rtld_process_hints+1717>
eflags 0x10202 [ IF RF ]
cs 0xe033 57395
ss 0xe02b 57387
ds 0x23 35
es 0x92e00023 -1830813661
fs 0x0 0
gs 0x0 0
On a working netbsd-8 domU I get:
cs 0xe033 57395
ss 0xe02b 57387
ds 0x3f 63
es 0xffff003f -65473
fs 0x0 0
gs 0x0 0
while on bare-metal:
cs 0x47 71
ss 0x3f 63
ds 0x3f 63
es 0x3f 63
fs 0x0 0
gs 0x0 0
So I suspect Xen is remapping GDT enstries, and we can't blindly reset them
to our defaults.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52662 CVS commit: src/sys/arch/amd64/amd64
Date: Sat, 28 Oct 2017 20:06:31 +0000
Module Name: src
Committed By: maxv
Date: Sat Oct 28 20:06:31 UTC 2017
Modified Files:
src/sys/arch/amd64/amd64: locore.S
Log Message:
It appears that Xen remaps the userland %cs to 0xE033. So add it to the
checklist. Otherwise we're going through Luexit32: %fs gets reloaded,
which sets the FS.base to NULL, which will cause the thread to page-fault
next time it accesses its TLS (as seen in PR/52662).
This fix is not very clean, and it would be nice to understand why Xen
remaps %cs. But I'm committing it now anyway, so that people can test.
To generate a diff of this commit:
cvs rdiff -u -r1.138 -r1.139 src/sys/arch/amd64/amd64/locore.S
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Sat, 28 Oct 2017 20:58:31 +0000
State-Changed-Why:
Should be fixed in HEAD, please test.
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-xen/52662 (Almost everything crashes on -current kernel)
Date: Sun, 29 Oct 2017 09:57:31 +0700
Looks good to me. I have both maxv's patch and bouyer's tidyup of it
in my tree, and no more core dumps from commands that were dumping core.
From: coypu@sdf.org
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-xen/52662 (Almost everything crashes on -current kernel)
Date: Sun, 29 Oct 2017 05:55:49 +0000
It works! thanks
State-Changed-From-To: feedback->closed
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Sun, 29 Oct 2017 06:56:20 +0000
State-Changed-Why:
Confirmed fixed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.