NetBSD Problem Report #52662

From www@NetBSD.org  Sat Oct 28 11:44:07 2017
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 2C71E7A1CE
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 28 Oct 2017 11:44:07 +0000 (UTC)
Message-Id: <20171028114405.C82677A210@mollari.NetBSD.org>
Date: Sat, 28 Oct 2017 11:44:05 +0000 (UTC)
From: coypu@sdf.org
Reply-To: coypu@sdf.org
To: gnats-bugs@NetBSD.org
Subject: Almost everything crashes on -current kernel
X-Send-Pr-Version: www-1.0

>Number:         52662
>Category:       port-xen
>Synopsis:       Almost everything crashes on -current kernel
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-xen-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Oct 28 11:45:00 +0000 2017
>Closed-Date:    Sun Oct 29 06:56:20 +0000 2017
>Last-Modified:  Sun Oct 29 06:56:20 +0000 2017
>Originator:     coypu
>Release:        NetBSD 8.99.5
>Organization:
>Environment:
NetBSD  8.99.5 NetBSD 8.99.5 (XEN3_DOMU) #3: Sat Oct 28 11:17:52 UTC 2017  lio@lio:/home/lio/obj/sys/arch/amd64/compile/XEN3_DOMU amd64
>Description:
Updating the kernel only
Can make it to userland but:
# service sshd onestart
Starting sshd.
[1]   Segmentation fault (core dumped) RC_PID= _rc_pid=...

...

$ gunzip netbsd-XEN3_DOMU.gz                                                   
Memory fault (core dumped) 

reverting to a kernel from 10 days ago works, also using /rescue
>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: port-xen-maintainer@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: port-xen/52662: Almost everything crashes on -current kernel
Date: Sat, 28 Oct 2017 13:51:33 +0200

 On Sat, Oct 28, 2017 at 11:45:00AM +0000, coypu@sdf.org wrote:
 > Updating the kernel only
 > Can make it to userland but:
 > # service sshd onestart
 > Starting sshd.
 > [1]   Segmentation fault (core dumped) RC_PID= _rc_pid=...
 > 
 > ...
 > 
 > $ gunzip netbsd-XEN3_DOMU.gz                                                   
 > Memory fault (core dumped) 
 > 
 > reverting to a kernel from 10 days ago works, also using /rescue

 I noticed this too:
 http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/

 in fact no result is available for amd64 because atf crashed on the
 first test.
 I strongly suspect maxv's commits from Oct, 19 but I couldn't find what's
 wrong yet.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Robert Elz <kre@munnari.OZ.AU>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org, port-xen-maintainer@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: port-xen/52662: Almost everything crashes on -current kernel
Date: Sat, 28 Oct 2017 19:46:55 +0700

     Date:        Sat, 28 Oct 2017 13:51:33 +0200
     From:        Manuel Bouyer <bouyer@antioche.eu.org>
     Message-ID:  <20171028115133.GA3069@antioche.eu.org>

   | in fact no result is available for amd64 because atf crashed on the
   | first test.

 I am seeing the same, what is kind of surprising though is just how
 few programs crash - I'm seeing sshd gdb gunzip makemandb and whatever
 it is in atf that fails (there could be more, I haven't tested everything).

 But mostly (until I moved back to an earlier kernel) I could use the
 system OK (gcc, sh, ... all work fine, as do the ATF tests I needed to
 run, just running them standalone without ATF).

 kre

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Robert Elz <kre@munnari.OZ.AU>
Cc: gnats-bugs@NetBSD.org, port-xen-maintainer@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: port-xen/52662: Almost everything crashes on -current kernel
Date: Sat, 28 Oct 2017 14:54:03 +0200

 On Sat, Oct 28, 2017 at 07:46:55PM +0700, Robert Elz wrote:
 >     Date:        Sat, 28 Oct 2017 13:51:33 +0200
 >     From:        Manuel Bouyer <bouyer@antioche.eu.org>
 >     Message-ID:  <20171028115133.GA3069@antioche.eu.org>
 > 
 >   | in fact no result is available for amd64 because atf crashed on the
 >   | first test.
 > 
 > I am seeing the same, what is kind of surprising though is just how
 > few programs crash - I'm seeing sshd gdb gunzip makemandb and whatever
 > it is in atf that fails (there could be more, I haven't tested everything).

 It may be thread-related. The programs in the list above are linked with
 libpthread (exept atf, but it may be a program called by atf which crashes).
 I'm not sure if we use %fs or %gs on x86 for TLS ... if we do that
 would be consistent with maxv's changes.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-xen/52662: Almost everything crashes on -current kernel
Date: Sat, 28 Oct 2017 17:39:29 +0200

 This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 --LUQv8DmudPbw3ewnhLqXgVf0oQNaecCxR
 Content-Type: multipart/mixed; boundary="f0BicEtxX2TSUw4KoQINUjHcr9a4kE8gh";
  protected-headers="v1"
 From: Kamil Rytarowski <n54@gmx.com>
 To: gnats-bugs@NetBSD.org
 Message-ID: <6a7803f1-ef2f-488f-239a-a958b2577fdd@gmx.com>
 Subject: Re: port-xen/52662: Almost everything crashes on -current kernel
 References: <pr-port-xen-52662@gnats.netbsd.org>
  <20171028114405.C82677A210@mollari.NetBSD.org>
  <20171028133001.DA12D7A210@mollari.NetBSD.org>
 In-Reply-To: <20171028133001.DA12D7A210@mollari.NetBSD.org>

 --f0BicEtxX2TSUw4KoQINUjHcr9a4kE8gh
 Content-Type: text/plain; charset=utf-8
 Content-Language: en-US
 Content-Transfer-Encoding: quoted-printable

 On 28.10.2017 15:30, Manuel Bouyer wrote:
 >  I'm not sure if we use %fs or %gs on x86 for TLS ... if we do that
 >  would be consistent with maxv's changes.

 We use %fs+%gs for TLS.


 --f0BicEtxX2TSUw4KoQINUjHcr9a4kE8gh--

 --LUQv8DmudPbw3ewnhLqXgVf0oQNaecCxR
 Content-Type: application/pgp-signature; name="signature.asc"
 Content-Description: OpenPGP digital signature
 Content-Disposition: attachment; filename="signature.asc"

 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2

 iQIcBAEBCAAGBQJZ9KS3AAoJEEuzCOmwLnZscl0QAJSDjobXXJY9rqV0UrjxxpwN
 zYpSr8nV5FMHCv7PNvW5RV6wReVxttYXgJYk9fKVHQnkD+woLwipivTeaY7fw3KS
 EUGSBVmLabMdUn6T+V5JYnCHf+DkR+s5HA16E5rQoLxURtkehPp3A83WCi7a1zGc
 8ZuzLoFzbRYVXJ73PRffdIrbdlCLGxStneO4hteInFQSmw9kTCehxZZ/LleE9tYx
 AhTnkApcWUjCEaG46nYmArrylvrYLkDCTzYtFivJJlDhwUYekSQ8qgyZl7jn6ljG
 jhg0p2wSEuk8Et3ahhnPAK1jmLXXqNfqB9wo5yFB16fIpzEeYbxB1WdPy2uRKjNS
 FGuy7xQ3ZJ8aLcMBJaELJC+DvAvpJRnvFamdM9Gfo+cLZueOutNm6KhNQ0iJh/KV
 H62+R5hjQU1G6lOnWFFMqFcF2vZPPs7DEaBd+HVnxCmTSncWKMKAc+HjV75oCnlg
 8/kFBti81wJqm+1TtlrqEZcq3F9AlL5sPuZntrDFre+A3ORMZlQxE03E8Ue5iamr
 sQS1M+BWLt+w16265bd4MS4/KxJ9VDPd5GnFhhm4v8A89eL698lAeXgHPVDBHOij
 e47sRtu8fPop0biI8ixqzaaZZs0qBh3+fIgLS+LnF54702j6ug3ZQtFkmfyV7kot
 KB7NnPMth+Vvan0j8bKM
 =Nted
 -----END PGP SIGNATURE-----

 --LUQv8DmudPbw3ewnhLqXgVf0oQNaecCxR--

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: netbsd-bugs@netbsd.org
Subject: Re: port-xen/52662: Almost everything crashes on -current kernel
Date: Sat, 28 Oct 2017 19:58:51 +0200

 Here's what I found so far:

 makemandb, gzip and gdb all dies at the same point in libpthread:
 Core was generated by `gdb'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x0000793be0208fad in ?? () from /usr/lib/libpthread.so.1
 (gdb) x/i 0x0000793be0208fad
 => 0x793be0208fad:      nopl   %cs:0x0(%rax,%rax,1)
 (gdb) info registers
 rax            0x0      0
 rbx            0x793be2907800   133298111150080
 rcx            0x53     83
 rdx            0x793be020ac83   133298070269059
 rsi            0x0      0
 rdi            0x793be096d4e0   133298078012640
 rbp            0x793be0411840   0x793be0411840 <pthread.allqueue>
 rsp            0x7f7fff8c8d58   0x7f7fff8c8d58
 r8             0x101010101010101        72340172838076673
 r9             0x8080808080808080       -9187201950435737472
 r10            0x793be063eb0a   133298074675978
 r11            0x202    514
 r12            0x0      0
 r13            0x0      0
 r14            0x793be020ac83   133298070269059
 r15            0x793be0638820   133298074650656
 rip            0x793be0208fad   0x793be0208fad
 eflags         0x10246  [ PF ZF IF RF ]
 cs             0xe033   57395
 ss             0xe02b   57387
 ds             0x23     35
 es             0x23     35
 fs             0x0      0
 gs             0x0      0

 atf-run and atf-report are different:
 Core was generated by `atf-run'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x00007f7f92e0a4f0 in _rtld_process_hints () from /usr/libexec/ld.elf_so
 (gdb) x/i 0x00007f7f92e0a4f0
 => 0x7f7f92e0a4f0 <_rtld_process_hints+1717>:   callq  0x7f7f92e07012 <xmalloc>
 (gdb) info registers
 rax            0x1      1
 rbx            0x7a5b14b19240   134531607794240
 rcx            0x4      4
 rdx            0x4e22b364       1310896996
 rsi            0x4e445d30       1313103152
 rdi            0x7a5b14f13160   134531611963744
 rbp            0x4e22b364       0x4e22b364 <tools::system_error::~system_error()>
 rsp            0x7f7fffcfdba8   0x7f7fffcfdba8
 r8             0x7a5b14b2c11c   134531607871772
 r9             0x7a5b14b2c14c   134531607871820
 r10            0x7263742f30486e4e       8242559489040477774
 r11            0xfffffffffffffffc       -4
 r12            0x4e445d30       1313103152
 r13            0x7a5b14b19240   134531607794240
 r14            0x7f7fffcfdc00   140187729386496
 r15            0x7f7fffcfdef0   140187729387248
 rip            0x7f7f92e0a4f0   0x7f7f92e0a4f0 <_rtld_process_hints+1717>
 eflags         0x10202  [ IF RF ]
 cs             0xe033   57395
 ss             0xe02b   57387
 ds             0x23     35
 es             0x92e00023       -1830813661
 fs             0x0      0
 gs             0x0      0

 On a working netbsd-8 domU I get:
 cs             0xe033   57395
 ss             0xe02b   57387
 ds             0x3f     63
 es             0xffff003f       -65473
 fs             0x0      0
 gs             0x0      0
 while on bare-metal:
 cs             0x47     71
 ss             0x3f     63
 ds             0x3f     63
 es             0x3f     63
 fs             0x0      0
 gs             0x0      0

 So I suspect Xen is remapping GDT enstries, and we can't blindly reset them
 to our defaults.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: "Maxime Villard" <maxv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52662 CVS commit: src/sys/arch/amd64/amd64
Date: Sat, 28 Oct 2017 20:06:31 +0000

 Module Name:	src
 Committed By:	maxv
 Date:		Sat Oct 28 20:06:31 UTC 2017

 Modified Files:
 	src/sys/arch/amd64/amd64: locore.S

 Log Message:
 It appears that Xen remaps the userland %cs to 0xE033. So add it to the
 checklist. Otherwise we're going through Luexit32: %fs gets reloaded,
 which sets the FS.base to NULL, which will cause the thread to page-fault
 next time it accesses its TLS (as seen in PR/52662).

 This fix is not very clean, and it would be nice to understand why Xen
 remaps %cs. But I'm committing it now anyway, so that people can test.


 To generate a diff of this commit:
 cvs rdiff -u -r1.138 -r1.139 src/sys/arch/amd64/amd64/locore.S

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Sat, 28 Oct 2017 20:58:31 +0000
State-Changed-Why:
Should be fixed in HEAD, please test.


From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-xen/52662 (Almost everything crashes on -current kernel)
Date: Sun, 29 Oct 2017 09:57:31 +0700

 Looks good to me.   I have both maxv's patch and bouyer's tidyup of it
 in my tree, and no more core dumps from commands that were dumping core.

From: coypu@sdf.org
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-xen/52662 (Almost everything crashes on -current kernel)
Date: Sun, 29 Oct 2017 05:55:49 +0000

 It works! thanks

State-Changed-From-To: feedback->closed
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Sun, 29 Oct 2017 06:56:20 +0000
State-Changed-Why:
Confirmed fixed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.