NetBSD Problem Report #53267
From kre@munnari.OZ.AU Mon May 7 12:29:53 2018
Return-Path: <kre@munnari.OZ.AU>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 8DDB27A1A1
for <gnats-bugs@gnats.NetBSD.org>; Mon, 7 May 2018 12:29:53 +0000 (UTC)
Message-Id: <201805071229.w47CT3we020222@jinx.noi.kre.to>
Date: Mon, 7 May 2018 19:29:03 +0700 (+07)
From: kre@munnari.OZ.AU
To: gnats-bugs@NetBSD.org
Subject: XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails
X-Send-Pr-Version: 3.95
>Number: 53267
>Category: port-xen
>Synopsis: XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: cherry
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon May 07 12:30:00 +0000 2018
>Closed-Date: Sun May 13 17:00:10 +0000 2018
>Last-Modified: Sun May 13 20:50:00 +0000 2018
>Originator: Robert Elz
>Release: NetBSD 8.99.14 (and later, probably earlier too).
>Organization:
>Environment:
System: NetBSD netbsd.noi.kre.to 8.99.16 NetBSD 8.99.16 (MUNNARI-DomU) #312: Mon May 7 18:38:30 ICT 2018 kre@onyx.coe.psu.ac.th:/usr/obj/testing/kernels/amd64/MUNNARI-DomU amd64
Architecture: x86_64
Machine: amd64
>Description:
Bear with me if I get the technical details slightly incorrect...
The USERMODE() macro is used to determine whether a trap
frame (the saved state from before the trap) represents
user or system mode.
Among other things, it is used (via CLKF_USERMODE()) to
decide whether to attribute ticks to user time or system
time.
On Xen DomU (on NetBSD current, 8.0_RC1 is OK) USERMODE()
always returns false, so all ticks are counted as system
time.
Aside from simply being wrong, one thing this affects is
cgdconfig -g which increases the amount of work is required
for key generation until it takes (what it considers to be)
a reasonable time - it does this by measuring the CPU time
used (in user mode) for each test, and while it is
insufficient, increase the amount of work, and try again.
Since (after the first tick) the user mode time is always
zero on Xen DomU, this loops forever (doing more and more
work on each iteration.) I guess it would eventually
run out of RAM or something, but that would take a long
time to happen, so "forever" is close enough...
>How-To-Repeat:
On a NetBSD current (amd64 at least, probably i386 as well)
DomU try
cgdconfig -g -o /tmp/whatever aes-cbc 192
(which is, aside from the output file name, the
canonical first step when configuring a new CGD).
Don't expect to be patient enough to wait for it to
finish (it is safe, no problem killing it with SIGINT
or other ways, it is just doing more and more CPU calcs,
and checking usage.)
>Fix:
???
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Mon, 07 May 2018 17:32:35 +0000
State-Changed-Why:
Well, you should probably try to revert my amd64/include/frameasm.h::1.34.
Normally it shouldn't change anything since andb is there for Xen, but who
knows.
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
Date: Tue, 08 May 2018 03:15:18 +0700
Date: Mon, 7 May 2018 17:32:36 +0000 (UTC)
From: maxv@NetBSD.org
Message-ID: <20180507173236.193297A20C@mollari.NetBSD.org>
Thanks...
| Well, you should probably try to revert my amd64/include/frameasm.h::1.34.
Tried that.
| Normally it shouldn't change anything
It didn't.
| since andb is there for Xen, but who| knows.
That appears to (unconditionally) reset the bottom two bits of
%cs (the saved value) - which would convert it from 3 (user more,
or acual kernel mode in xen) into 0 (apparent kernel mode in everything)
That is consistent with XEN always believeing it was in kernel mode.
That is, if I am readong it correctly. What I don't see is how that is
avoided if the processor had been in user mode (and so the andb
should not be done) - nor how in xen one tells whether it was running
in user or kernel mode before the trap.
There must be some other difference though, as that is the same in 8
and current (aside from the method of gettig %cs which seems to make
no difference).
kre
Responsible-Changed-From-To: port-xen-maintainer->cherry
Responsible-Changed-By: kre@NetBSD.org
Responsible-Changed-When: Tue, 08 May 2018 23:10:37 +0000
Responsible-Changed-Why:
Assigning responsibility for this to cherry:
changes listed caused the bug
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: cherry@netbsd.org
Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
Date: Wed, 09 May 2018 06:02:44 +0700
I have done a kernel bisect on -current from when netbsd-8 was branched,
until (more or less) now, and found that before the following mods,
the XEN USERMODE() macro was performing correctly, and after them
it is not (it always says "kernel mode" - so does the KERNELMODE()
macro, not very surprisingly).
That is, I tried kernels from immediately before, and immediately after,
this set of mods, and it worked before, failed after.
cherry - can you have a look and see if you can see why your changes broke
this, and perhaps find a suitable fix?
commit 2017.11.06.15.21.23 cherry src/sys/arch/xen/conf/files.xen 1.163
commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/x86/xen_ipi.c 1.23
commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xen/clock.c 1.65
commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xen/if_xennet_xenbus.c 1.72
commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xen/xbd_xenbus.c 1.77
commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xen/xbdback_xenbus.c 1.64
commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xen/xencons.c 1.42
commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xen/xennetback_xenbus.c 1.60
commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xenbus/xenbus_comms.c 1.16
If you haven't been following the PR, an easy test is
cgdconfig -g -o /tmp/P aes-cbc 192
(no actual cgd is needed or involved, this simply writes a params file)
If it loops (seemingly) forever, USERMODE() is broken. If it completes
withing 5 seconds or so all is OK. (I was actually running with a
version fo cgdconfig with debug printfs to display the user/sys times being
returned, but that's not needed to use cgdconfig to detect the bug.)
kre
From: Cherry G.Mathew <cherry@zyx.in>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, kre@munnari.OZ.AU
Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
Date: Fri, 11 May 2018 14:37:12 +0530
Robert Elz <kre@munnari.OZ.AU> writes:
> The following reply was made to PR port-xen/53267; it has been noted by GNATS.
>
> From: Robert Elz <kre@munnari.OZ.AU>
> To: gnats-bugs@NetBSD.org
> Cc: cherry@netbsd.org
> Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
> Date: Wed, 09 May 2018 06:02:44 +0700
>
> I have done a kernel bisect on -current from when netbsd-8 was branched,
> until (more or less) now, and found that before the following mods,
> the XEN USERMODE() macro was performing correctly, and after them
> it is not (it always says "kernel mode" - so does the KERNELMODE()
> macro, not very surprisingly).
>
> That is, I tried kernels from immediately before, and immediately after,
> this set of mods, and it worked before, failed after.
>
> cherry - can you have a look and see if you can see why your changes broke
> this, and perhaps find a suitable fix?
>
> commit 2017.11.06.15.21.23 cherry src/sys/arch/xen/conf/files.xen 1.163
> commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/x86/xen_ipi.c 1.23
> commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xen/clock.c 1.65
> commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xen/if_xennet_xenbus.c 1.72
> commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xen/xbd_xenbus.c 1.77
> commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xen/xbdback_xenbus.c 1.64
> commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xen/xencons.c 1.42
> commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xen/xennetback_xenbus.c 1.60
> commit 2017.11.06.15.27.09 cherry src/sys/arch/xen/xenbus/xenbus_comms.c 1.16
>
This was a pretty intrusive change ( we switched to the x86 common
interrupt code ), so I'll need a bit of time to figure it out.
I'll send an update as soon as I get one, unless someone beats me to
it.
--
~cherry
From: Robert Elz <kre@munnari.OZ.AU>
To: "Cherry G.Mathew" <cherry@zyx.in>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org
Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
Date: Fri, 11 May 2018 17:38:09 +0700
Date: Fri, 11 May 2018 14:37:12 +0530
From: "Cherry G.Mathew" <cherry@zyx.in>
Message-ID: <87o9hm7e7z.fsf@zyx.in>
| This was a pretty intrusive change ( we switched to the x86 common
| interrupt code ), so I'll need a bit of time to figure it out.
Yes, I saw ... and also explains the problem, as regular x86 run the
kernel and users at different priv levels, but xen does not. The old
way, some magic must have been being done to make it appear as if
there were different levels, which is no longer happening.
I have no idea how all of this works (I don't know x86 architecture at
all) so I cannot really help with a soltution (I admit to making one wild
guess, and trying it out - that kernel did not make it through autoconf
before hanging ... fortunately DomU's are easy to destroy, and bogus
code is easy to throw away!)
One suggestion though - there is no need for the XEN USERMODE()
macro to work anything like the x86 one does - it does not need to
inspect bits in %cs (whatever that is). Other ports do not. Any method
that tells what the mode was before the current interrupt/trap will work.
kre
From: Cherry G.Mathew <cherry@zyx.in>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, kre@munnari.OZ.AU
Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
Date: Fri, 11 May 2018 17:06:08 +0530
Robert Elz <kre@munnari.OZ.AU> writes:
> The following reply was made to PR port-xen/53267; it has been noted by GNATS.
>
> From: Robert Elz <kre@munnari.OZ.AU>
> To: "Cherry G.Mathew" <cherry@zyx.in>
> Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org
> Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
> Date: Fri, 11 May 2018 17:38:09 +0700
>
> Date: Fri, 11 May 2018 14:37:12 +0530
> From: "Cherry G.Mathew" <cherry@zyx.in>
> Message-ID: <87o9hm7e7z.fsf@zyx.in>
>
> | This was a pretty intrusive change ( we switched to the x86 common
> | interrupt code ), so I'll need a bit of time to figure it out.
>
> Yes, I saw ... and also explains the problem, as regular x86 run the
> kernel and users at different priv levels, but xen does not. The old
> way, some magic must have been being done to make it appear as if
> there were different levels, which is no longer happening.
Actually it's a bit more complicated, depending on if you're talking
about amd64 or i386.
In any case, the Xen hypervisor trap code "emulates" CPL by frobbing the
appropriate CS register bitfield in the saved stackframe.
On amd64,
From amd64/include/segments.h:
#ifdef XEN
#define SEL_KPL 3 /* kernel privilege level */
#define SEL_XPL 0 /* Xen Hypervisor privilege
#level */
#else
#define SEL_KPL 0 /* kernel privilege level */
#endif
[...]
#ifdef XEN
/*
* As KPL == UPL, Xen emulate interrupt in kernel context by pushing
* a fake CS with XPL privilege
*/
#define KERNELMODE(c) (ISPL(c) == SEL_XPL)
#else
#define KERNELMODE(c) (ISPL(c) == SEL_KPL)
#endif
Thus although "KPL" is considered to be at level3 on amd64 (mostly for
bit flipping to do with re-using segment entry setup code for the kernel
from native but with the appropriately lowered privilege level), the
actual value of CS stored on the stack frame reflects what native code
would expect for the kernel - ie; a CPL of 0.
> I have no idea how all of this works (I don't know x86 architecture at
> all) so I cannot really help with a soltution (I admit to making one wild
> guess, and trying it out - that kernel did not make it through autoconf
> before hanging ... fortunately DomU's are easy to destroy, and bogus
> code is easy to throw away!)
>
Thanks for binary searching this - I really appreciate the patience it
took.
> One suggestion though - there is no need for the XEN USERMODE()
> macro to work anything like the x86 one does - it does not need to
> inspect bits in %cs (whatever that is). Other ports do not. Any method
> that tells what the mode was before the current interrupt/trap will work.
>
It's a bit more complicated because of the various configurations
(xen/native/amd64/i386) that x86 supports. I probably don't want to
touch it myself.
Please find a patch that could bandaid the current problem. The issue is
that the interrupt despatch code actually passes the real stack frame as
a second parameter to all callbacks, however the clock handler is the
only one which actually uses it (and passes it on to hardclock(), where
I believe the USERMODE() related accounting takes place).
This fact got overlooked during the API change, so the regs parameter
that the clock handler got was basically nonsense.
There's a couple of other patches I have that will sort out this API
mess in a cleaner way at which point this kludge should go away.
Please let me know how this goes.
--
~cherry
Index: clock.c
===================================================================
RCS file: /cvsroot/src/sys/arch/xen/xen/clock.c,v
retrieving revision 1.65
diff -u -r1.65 clock.c
--- clock.c 6 Nov 2017 15:27:09 -0000 1.65
+++ clock.c 11 May 2018 11:21:12 -0000
@@ -49,7 +49,8 @@
#include <dev/clock_subr.h>
#include <x86/rtc.h>
-static int xen_timer_handler(void *);
+static int xen_timer_handler(void *, struct intrframe *);
+static int (*xen_timer_handler_stub)(void *) = (void *) xen_timer_handler;
static struct intrhand *ih;
/* A timecounter: Xen system_time extrapolated with a TSC. */
@@ -524,7 +525,7 @@
KASSERT(evtch != -1);
ih = intr_establish_xname(0, &xen_pic, evtch, IST_LEVEL, IPL_CLOCK,
- xen_timer_handler, ci, true, "clock");
+ xen_timer_handler_stub, ci, true, "clock");
KASSERT(ih != NULL);
@@ -535,11 +536,10 @@
/* ARGSUSED */
static int
-xen_timer_handler(void *arg)
+xen_timer_handler(void *arg, struct intrframe *regs)
{
int64_t delta;
struct cpu_info *ci = curcpu();
- struct intrframe *regs = arg;
int err;
again:
From: Robert Elz <kre@munnari.OZ.AU>
To: "Cherry G.Mathew" <cherry@zyx.in>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org
Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
Date: Fri, 11 May 2018 19:32:55 +0700
Date: Fri, 11 May 2018 17:06:08 +0530
From: "Cherry G.Mathew" <cherry@zyx.in>
Message-ID: <87y3gq4e6v.fsf@zyx.in>
| Actually it's a bit more complicated, depending on if you're talking
| about amd64 or i386.
But of course, it would have to be ...
| In any case, the Xen hypervisor trap code "emulates" CPL by frobbing the
| appropriate CS register bitfield in the saved stackframe.
Yes, I had seen that was intended, but not how or where it was done.
| Thanks for binary searching this - I really appreciate the patience it
| took.
Not so much really, it did not take long once I started.
| Please find a patch that could bandaid the current problem.
[...]
| Please let me know how this goes.
It works great, thanks. I will do a bit more testing to verify,
but from my diagnostics in cgdconfig I now see ...
netbsd# cgdconfig -g -o /tmp/P aes-cbc 192
pkcs5_pbkdf2_time(24, 1): 10 us (0.851 -> 0.861) (sys 0.851->0.861 us)
pkcs5_pbkdf2_time(24, 2): 5 us (0.864 -> 0.869) (sys 0.864->0.869 us)
pkcs5_pbkdf2_time(24, 4): 9 us (0.873 -> 0.882) (sys 0.873->0.882 us)
pkcs5_pbkdf2_time(24, 8): 18 us (0.885 -> 0.903) (sys 0.885->0.903 us)
pkcs5_pbkdf2_time(24, 16): 33 us (0.906 -> 0.939) (sys 0.906->0.939 us)
pkcs5_pbkdf2_time(24, 32): 65 us (0.942 -> 0.1007) (sys 0.942->0.1007 us)
pkcs5_pbkdf2_time(24, 64): 129 us (0.1010 -> 0.1139) (sys 0.1010->0.1139 us)
pkcs5_pbkdf2_time(24, 128): 255 us (0.1142 -> 0.1397) (sys 0.1142->0.1397 us)
pkcs5_pbkdf2_time(24, 256): 514 us (0.1400 -> 0.1914) (sys 0.1400->0.1914 us)
pkcs5_pbkdf2_time(24, 512): 1015 us (0.1918 -> 0.2933) (sys 0.1918->0.2933 us)
pkcs5_pbkdf2_time(24, 1024): 4062 us (0.2937 -> 0.6999) (sys 0.2937->0.2937 us)
pkcs5_pbkdf2_time(24, 2048): 8119 us (0.7005 -> 0.15124) (sys 0.2937->0.2937 us)
pkcs5_pbkdf2_time(24, 4096): 16222 us (0.15130 -> 0.31352) (sys 0.2937->0.2937 us)
pkcs5_pbkdf2_time(24, 8192): 32557 us (0.31359 -> 0.63916) (sys 0.2937->0.2937 us)
pkcs5_pbkdf2_time(24, 503240): 1994147 us (0.63922 -> 2.58069) (sys 0.2937->0.2937 us)
(the formatting there is horrid, the N.M times are N seconds and M
microseconds, not a decimal number at all, and the "us" right at the
end was simply a cut/pasto (or something) - it does not belong ... but
the actual numbers are correct, which is what matters.)
The message indicates the workload (params to pkcs5_pbkdf2_time()) with how
long that work took in microseconds (it all stops when that number gets big
enough). After that are the values from getrusage, before & after the work,
for user time (first () section) and system time (second).
You can see there that up to the "512" line, user and sys times are the same,
that means that a clock tick had not happened yet, and calcru() was using its
fallback when it has not info to guide it, of simply dividing total consumed
time in half, and allocating user and sys times equally.
The first tick clearly happened here during the work for the "1024" line, and
here allocated that tick to user mode, as the user mode time increases and
system time does not.
What it looked like before was more like ...
netbsd# cgdconfig -g -o /tmp/P aes-cbc 192
pkcs5_pbkdf2_time(24, 1): 10 us (0.860 -> 0.870) (sys 0.860->0.870 us)
pkcs5_pbkdf2_time(24, 2): 5 us (0.874 -> 0.879) (sys 0.874->0.879 us)
pkcs5_pbkdf2_time(24, 4): 10 us (0.882 -> 0.892) (sys 0.882->0.892 us)
pkcs5_pbkdf2_time(24, 8): 17 us (0.896 -> 0.913) (sys 0.896->0.913 us)
pkcs5_pbkdf2_time(24, 16): 35 us (0.916 -> 0.951) (sys 0.916->0.951 us)
pkcs5_pbkdf2_time(24, 32): 66 us (0.954 -> 0.1020) (sys 0.954->0.1020 us)
pkcs5_pbkdf2_time(24, 64): 130 us (0.1023 -> 0.1153) (sys 0.1023->0.1153 us)
pkcs5_pbkdf2_time(24, 128): 259 us (0.1157 -> 0.1416) (sys 0.1157->0.1416 us)
pkcs5_pbkdf2_time(24, 256): 517 us (0.1419 -> 0.1936) (sys 0.1419->0.1936 us)
pkcs5_pbkdf2_time(24, 512): 1031 us (0.1939 -> 0.2970) (sys 0.1939->0.2970 us)
pkcs5_pbkdf2_time(24, 1024): 0 us (0.2973 -> 0.2973) (sys 0.2973->0.7101 us)
pkcs5_pbkdf2_time(24, 2048): 0 us (0.2973 -> 0.2973) (sys 0.7107->0.15357 us)
pkcs5_pbkdf2_time(24, 4096): 0 us (0.2973 -> 0.2973) (sys 0.15365->0.31850 us)
pkcs5_pbkdf2_time(24, 8192): 0 us (0.2973 -> 0.2973) (sys 0.31856->0.64807 us)
pkcs5_pbkdf2_time(24, 16384): 0 us (0.2973 -> 0.2973) (sys 0.64814->0.130721 us)
pkcs5_pbkdf2_time(24, 32768): 0 us (0.2973 -> 0.2973) (sys 0.130729->0.262540 us)
pkcs5_pbkdf2_time(24, 65536): 0 us (0.2973 -> 0.2973) (sys 0.262550->0.526155 us)
pkcs5_pbkdf2_time(24, 131072): 0 us (0.2973 -> 0.2973) (sys 0.526164->1.53428 us)
In that one (by a co-incidence) the tick also happened during the "1024" work,
but here, the tick was attributed to system time (as were all that followed).
(The tick could happen any time of course - I have also seen it before the work even
starts, and all the user times were simply 0...)
The only change between those two tests was your patch, otherwise they were
identical kernels.
Thanks,
Please commit it! (Or something equiv if this really is just a bandaid.)
kre
From: Robert Elz <kre@munnari.OZ.AU>
To: "Cherry G.Mathew" <cherry@zyx.in>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org
Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
Date: Fri, 11 May 2018 19:45:00 +0700
Additional testing also looks good, things that should use user time, do.
Things that should use system time, do, and things that should use both,
do as well, eg:
netbsd# time find / -type f -exec grep -Ea 'ABC.*X.*[0-9]{1,3} ??Q' {} \; >/dev/null
34.00 real 16.29 user 17.62 sys
Looks perfect. (Note this is a vrey small test install, grepping every file
is not such a big thing...)
Thanks again.
kre
From: "Cherry G. Mathew" <cherry@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53267 CVS commit: src/sys/arch/xen/xen
Date: Fri, 11 May 2018 13:24:46 +0000
Module Name: src
Committed By: cherry
Date: Fri May 11 13:24:46 UTC 2018
Modified Files:
src/sys/arch/xen/xen: clock.c
Log Message:
Fixes port-xen/53267
re-educate xen_clock_handler() how to use the interrupt stackframe.
The current regs value passed in is *ci, and thus invalid.
Reported and tested by kre@. See PR 53267 for more details.
To generate a diff of this commit:
cvs rdiff -u -r1.65 -r1.66 src/sys/arch/xen/xen/clock.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: John Nemeth <jnemeth@cue.bc.ca>
To: Robert Elz <kre@munnari.OZ.AU>, "Cherry G.Mathew" <cherry@zyx.in>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org
Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
Date: Sat, 12 May 2018 00:36:25 -0700
On May 11, 5:38pm, Robert Elz wrote:
}
} [snip]
} macro to work anything like the x86 one does - it does not need to
} inspect bits in %cs (whatever that is). Other ports do not. Any method
} that tells what the mode was before the current interrupt/trap will work.
%cs is the code segment register. It comes from the days of
the 8086/8088 (both processors have 16-bit registers, but the 8088
has an 8-bit bus interface requiring two bus cycles to load a
register). At that time, it would be loaded with the physical
address of where the code segment started in memory. It was shifted
left four bits and %pc (program counter) was added to get the memory
location of the next instruction. These were 16-bit registers thus
creating an address space of 1 MB. Modern processors still startup
in what is called "real mode" which is basically emulating an
8086/8088 (called "real mode" as the MMU was disabled). OS startup
code is responsible for switching the processor to 32-bit/64-bit
mode as appropriate. UEFI changes this and switches the processor
to 32-bit/64-bit mode early on. An UEFI boot loader is actually
an UEFI application and doesn't run on bare metal (UEFI even has
a processor independent byte code as an option).
In the 80286 world, %cs (and its cousins) became segment selector
registers. In that world, %cs pointed at entries in a table
containing segment descriptors. The table contained such things
as base, length, and permissions for segments. The 80286 was a
strictly segmented architecture with no paging capability, but did
have memory protection (aka a supervisor mode).
80286 introduced rings or privilege levels. Originally there
were four. The bottom two bits of a segment register indicated
the desired privilege level. 0 was maximum privilege and 3 was
least privilege. When you loaded a segment register these bits
would be compared with the DPL (Descriptor Privilege Level) in the
segment descriptor. The processor wouldn't let you access a segment
with a higher DPL without going through some kind of access method
of which there were multiple (trap, software interrupt, call gate,
etc.). Most OSes run the kernel in ring 0 and userland in ring 3.
The same basic scheme continued with the 80386. However,
registers were extended to 32-bits and paging hardware was added.
It still used segments. Mapping from an address in a program goes
through the segmentation hardware then the paging hardware which
maps from virtual addresses to physical addresses. Most modern
OSes set the segment base to 0 and the length to all of memory thus
effectively taking segmentation out of the picture.
x86-64 (which goes under a variety of names) further extended
the registers to 64-bits. 64-bit mode, formally known as "long
mode", drops a bunch of legacy stuff including segmentation.
However, I don't know as much about x86-64 as I do about the earlier
modes.
AMD noted that rings 1 and 2 weren't used much and dumped them
as unused legacy stuff. This is what caused problems for Xen. On
i386, Xen runs the hypervisor in ring 0, the OS kernel in ring 1,
and the OS userland in ring 3 (this gave the hypervisor protection
from the guest OS while also giving the guest OS kernel protection
from its userland). On x86-64 this is no longer possible. So,
the hypervisor runs in ring 0 and the OS runs in ring 3. The OS
must trap to the hypervisor to switch between "supervisor" and
"user" mode. Without this mechanism the guest OS kernel wouldn't
have any protection from its userland. I don't know the details
of this mechanism but it is apparent that it is involved in the
problem you observed.
Having to trap to the hypervisor does slow down 64-bit OSes.
It is part of the impetus behind PVH mode. In that mode, the OS
basically thinks it's running on bare metal and has access to ring
0 and ring 3. However, it knows to trap to the hypervisor for I/O.
HVM mode runs unmodified OSes and emulates hardware which requires
a trap to the hypervisor for every I/O instruction (most I/O
operations require numerous I/O instructions).
This is probably much more then you wanted to know about the
low level details of x86 processors. :-) Also, it has been many
many years since I've written significant amounts of x86 assembly
language code, so some of the details might be wrong, but the gist
should be correct.
}-- End of excerpt from Robert Elz
State-Changed-From-To: feedback->closed
State-Changed-By: kre@NetBSD.org
State-Changed-When: Sun, 13 May 2018 17:00:10 +0000
State-Changed-Why:
Fixed by cherry@ Thanks.
From: Robert Elz <kre@munnari.OZ.AU>
To: John Nemeth <jnemeth@cue.bc.ca>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org
Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
Date: Sun, 13 May 2018 23:57:53 +0700
Date: Sat, 12 May 2018 00:36:25 -0700
From: John Nemeth <jnemeth@cue.bc.ca>
Message-ID: <201805120736.w4C7aP9t028295@server.cornerstoneservice.ca>
Thanks for taking the trouble to explain all of that.
In particular:
| AMD noted that rings 1 and 2 weren't used much and dumped them
| as unused legacy stuff.
which explains a lot - I had wondered why they were not used.
kre
ps: I still think the 8080 was the worst of the competing 8 bit processors
of the time, and while I applaud Intel's capacity to extend it while more
or less keeping complete compat with ancient code, I really wish that IBM
had selected one of the other choices...
From: John Nemeth <jnemeth@cue.bc.ca>
To: Robert Elz <kre@munnari.OZ.AU>, John Nemeth <jnemeth@cue.bc.ca>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org
Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
Date: Sun, 13 May 2018 13:22:35 -0700
On May 13, 11:57pm, Robert Elz wrote:
}
} Thanks for taking the trouble to explain all of that.
}
} In particular:
}
} | AMD noted that rings 1 and 2 weren't used much and dumped them
} | as unused legacy stuff.
}
} which explains a lot - I had wondered why they were not used.
}
} ps: I still think the 8080 was the worst of the competing 8 bit processors
} of the time, and while I applaud Intel's capacity to extend it while more
} or less keeping complete compat with ancient code, I really wish that IBM
} had selected one of the other choices...
Both the 8088 and 68000 came out in 1979. The IBM PC came
out in 1981. It certainly would have been nicer if IBM had chosen
the 68000. Being a 32-bit processor with a large register set from
the get-go (albeit with a 16-bit bus), it was definitely far superior
tecnologically. One could argue that in many ways, it is still
superior. However, having been abandoned for the most part, it
doesn't have a 64-bit upgrade path, nor would it have the same
performance of modern x86 processors. Unfortunately, technological
superiority is not the only consideration in making design choices.
}-- End of excerpt from Robert Elz
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.