NetBSD Problem Report #50021
From www@NetBSD.org Thu Jul 2 06:55:47 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id A6B63A5674
for <gnats-bugs@gnats.NetBSD.org>; Thu, 2 Jul 2015 06:55:47 +0000 (UTC)
Message-Id: <20150702065546.192BAA6552@mollari.NetBSD.org>
Date: Thu, 2 Jul 2015 06:55:46 +0000 (UTC)
From: okuyama@flex.phys.tohoku.ac.jp
Reply-To: okuyama@flex.phys.tohoku.ac.jp
To: gnats-bugs@NetBSD.org
Subject: Linux affinity syscalls are not fully implemented
X-Send-Pr-Version: www-1.0
>Number: 50021
>Category: kern
>Synopsis: Linux affinity syscalls are not fully implemented
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jul 02 07:00:01 +0000 2015
>Closed-Date: Mon Jan 09 00:05:30 +0000 2017
>Last-Modified: Mon Jan 09 00:05:30 +0000 2017
>Originator: Rin Okuyama
>Release: 7.99.19
>Organization:
Department of Physics, Tohoku University
>Environment:
NetBSD okuyama 7.99.19 NetBSD 7.99.19 (GENERIC) #0: Wed Jul 1 15:48:43 JST 2015 root@okuyama:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
NetBSD has not fully supported sched_setaffinity and sched_getaffinity
syscalls in its Linux emulation. Linux binaries cannot set CPU affinity
to maximize their performance in multiprocessor environments. Moreover,
Intel Math Kernel Library (MKL) determines the number of available CPUs
by using these calls. As this attempt fails, MKL launches only one
thread even on multiprocessor machines. To resolve this, we have fully
implemented linux_shced_(set|get)affinity syscalls on NetBSD-current.
>How-To-Repeat:
You need a machine supporting COMPAT_LINUX with at least two CPUs.
Some basic libraries for Linux binaries are also required (they are
provided via suse131_base package). First of all, set
security.models.extensions.user_set_cpu_affinity=1
by sysctl. Otherwise, you need the root privilege to set CPU affinity.
We provide a test program which repeats bogus floating-point
calculations on a specific CPU (CPU1):
http://flex.phys.tohoku.ac.jp/~okuyama/test_linux_affinity.tgz
MD5 (test_linux_affinity.tgz) = 366e4c17f5bd7f5821d729f1a79343a6
This tarball contains binaries for amd64 and i386. For other platforms
supporting COMPAT_LINUX, you can compile it from source code, provided
you have Linux version of gcc, binutils, and so on.
On NetBSD 7.99.19, the test program fails to set CPU affinity. This is
confirmed by "top -1t" command:
% tar zxf test_linux_affinity.tgz
% cd test_linux_affinity
% ./test.amd64 (or test.i386)
setting affinity mask for CPU1
failed to set affinity
running anyway
% top -1t
...
CPU0 states: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU2 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
...
Note that it can accidentally run on CPU1.
If you have Intel compiler suites, you can confirm that MKL launches
only one thread. The followings are results with Intel Fortran Composer
XE 2011:
% KMP_AFFINITY=verbose ./test.mkl
OMP: Warning #79: KMP_AFFINITY: cannot determine proper affinity mask size.
OMP: Warning #71: KMP_AFFINITY: affinity not supported, using "disabled".
OMP: Warning #121: Error initializing affinity - not using affinity.
% top -1t
...
CPU0 states: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU2 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
...
Sorry, we cannot provide a test program with MKL, because we are not
licensed to redistribute runtime libraries.
>Fix:
Apply a patch below. For linux_sched_setaffinity(2), we just provide a
wrapper for sys__sched_setaffinity(9). On the other hand, in the case of
linux_sched_getaffinity(2), we cannot use sys__sched_getaffinity(9).
This is because the former is expected to report all CPUs available,
whereas the latter reports all CPUs unavailable, for a thread whose
affinity mask has not been set. Thus, we have implemented the Linux
syscall using codes derived from the native one.
On the patched version of NetBSD, the test program successfully sets CPU
affinity:
% ./test.amd64
setting affinity mask for CPU1
succeeded to set affinity
running on CPU1
% top -1t
...
CPU0 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU1 states: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
CPU2 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
...
We have also confirmed that MKL launches threads more than unity.
No warnings are issued. We can specify the number of threads, affinity
policy, etc., in the same manner as in native Linux environments.
--- sys/compat/linux/common/linux_sched.c.orig 2015-07-02 15:39:47.000000000 +0900
+++ sys/compat/linux/common/linux_sched.c 2015-07-01 15:46:00.000000000 +0900
@@ -65,6 +65,9 @@
static int linux_clone_nptl(struct lwp *, const struct linux_sys_clone_args *,
register_t *);
+/* Unlike Linux, dynamically calculate CPU mask size */
+#define LINUX_CPU_MASK_SIZE (sizeof(long) * ((ncpu + LONG_BIT - 1) / LONG_BIT))
+
#if DEBUG_LINUX
#define DPRINTF(x) uprintf x
#else
@@ -635,39 +638,45 @@
syscallarg(unsigned int) len;
syscallarg(unsigned long *) mask;
} */
- proc_t *p;
- unsigned long *lp, *data;
- int error, size, nb = ncpu;
+ struct lwp *t;
+ kcpuset_t *kcset;
+ size_t size;
+ cpuid_t i;
+ int error;
- /* Unlike Linux, dynamically calculate cpu mask size */
- size = sizeof(long) * ((ncpu + LONG_BIT - 1) / LONG_BIT);
+ size = LINUX_CPU_MASK_SIZE;
if (SCARG(uap, len) < size)
return EINVAL;
- /* XXX: Pointless check. TODO: Actually implement this. */
- mutex_enter(proc_lock);
- p = proc_find(SCARG(uap, pid));
- mutex_exit(proc_lock);
- if (p == NULL) {
+ /* Lock the LWP */
+ t = lwp_find2(SCARG(uap, pid), l->l_lid);
+ if (t == NULL)
return ESRCH;
- }
-
- /*
- * return the actual number of CPU, tag all of them as available
- * The result is a mask, the first CPU being in the least significant
- * bit.
- */
- data = kmem_zalloc(size, KM_SLEEP);
- lp = data;
- while (nb > LONG_BIT) {
- *lp++ = ~0UL;
- nb -= LONG_BIT;
- }
- if (nb)
- *lp = (1 << ncpu) - 1;
- error = copyout(data, SCARG(uap, mask), size);
- kmem_free(data, size);
+ /* Check the permission */
+ if (kauth_authorize_process(l->l_cred,
+ KAUTH_PROCESS_SCHEDULER_GETAFFINITY, t->l_proc, NULL, NULL, NULL)) {
+ mutex_exit(t->l_proc->p_lock);
+ return EPERM;
+ }
+
+ kcpuset_create(&kcset, true);
+ lwp_lock(t);
+ if (t->l_affinity != NULL)
+ kcpuset_copy(kcset, t->l_affinity);
+ else {
+ /*
+ * All available CPUs should be masked when affinity has not
+ * been set.
+ */
+ kcpuset_zero(kcset);
+ for (i = 0; i < ncpu; i++)
+ kcpuset_set(kcset, i);
+ }
+ lwp_unlock(t);
+ mutex_exit(t->l_proc->p_lock);
+ error = kcpuset_copyout(kcset, (cpuset_t *)SCARG(uap, mask), size);
+ kcpuset_unuse(kcset, NULL);
*retval = size;
return error;
}
@@ -680,17 +689,17 @@
syscallarg(unsigned int) len;
syscallarg(unsigned long *) mask;
} */
- proc_t *p;
+ struct sys__sched_setaffinity_args ssa;
+ size_t size;
- /* XXX: Pointless check. TODO: Actually implement this. */
- mutex_enter(proc_lock);
- p = proc_find(SCARG(uap, pid));
- mutex_exit(proc_lock);
- if (p == NULL) {
- return ESRCH;
- }
+ size = LINUX_CPU_MASK_SIZE;
+ if (SCARG(uap, len) < size)
+ return EINVAL;
- /* Let's ignore it */
- DPRINTF(("%s\n", __func__));
- return 0;
+ SCARG(&ssa, pid) = SCARG(uap, pid);
+ SCARG(&ssa, lid) = l->l_lid;
+ SCARG(&ssa, size) = size;
+ SCARG(&ssa, cpuset) = (cpuset_t *)SCARG(uap, mask);
+
+ return sys__sched_setaffinity(l, &ssa, retval);
}
>Release-Note:
>Audit-Trail:
From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/50021 CVS commit: src/sys/compat/linux/common
Date: Thu, 2 Jul 2015 22:24:29 -0400
Module Name: src
Committed By: christos
Date: Fri Jul 3 02:24:28 UTC 2015
Modified Files:
src/sys/compat/linux/common: linux_sched.c
Log Message:
PR/50021: Rin Okuyama: Fix linux affinity syscalls
XXX: pullup-7
To generate a diff of this commit:
cvs rdiff -u -r1.67 -r1.68 src/sys/compat/linux/common/linux_sched.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Rin Okuyama <okuyama@flex.phys.tohoku.ac.jp>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/50021 (PR/50021 CVS commit: src/sys/compat/linux/common)
Date: Fri, 03 Jul 2015 15:38:24 +0900
Thank you very much for your quick response. It works fine for me.
Pulling-up to netbsd-7 would be helpful to replace our Linux boxes.
Please close the PR if there's no objection.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/50021 (PR/50021 CVS commit: src/sys/compat/linux/common)
Date: Sun, 6 Sep 2015 00:43:55 +0000
On Fri, Jul 03, 2015 at 06:40:00AM +0000, Rin Okuyama wrote:
> The following reply was made to PR kern/50021; it has been noted by GNATS.
>
> From: Rin Okuyama <okuyama@flex.phys.tohoku.ac.jp>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/50021 (PR/50021 CVS commit: src/sys/compat/linux/common)
> Date: Fri, 03 Jul 2015 15:38:24 +0900
>
> Thank you very much for your quick response. It works fine for me.
> Pulling-up to netbsd-7 would be helpful to replace our Linux boxes.
> Please close the PR if there's no objection.
Will do that when the netbsd-7 pullup gets taken care of.
--
David A. Holland
dholland@netbsd.org
State-Changed-From-To: open->pending-pullups
State-Changed-By: rin@NetBSD.org
State-Changed-When: Wed, 28 Dec 2016 21:23:52 +0000
State-Changed-Why:
Pull-up requested to netbsd-7.
From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/50021 CVS commit: [netbsd-7] src/sys/compat/linux/common
Date: Sat, 31 Dec 2016 07:38:31 +0000
Module Name: src
Committed By: snj
Date: Sat Dec 31 07:38:31 UTC 2016
Modified Files:
src/sys/compat/linux/common [netbsd-7]: linux_sched.c
Log Message:
Pull up following revision(s) (requested by rin in ticket #1343):
sys/compat/linux/common/linux_sched.c: revision 1.68
PR/50021: Rin Okuyama: Fix linux affinity syscalls
To generate a diff of this commit:
cvs rdiff -u -r1.66.4.1 -r1.66.4.2 src/sys/compat/linux/common/linux_sched.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Mon, 09 Jan 2017 00:05:30 +0000
State-Changed-Why:
The fix to be in NetBSD 7.1.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.