NetBSD Problem Report #50021

From www@NetBSD.org  Thu Jul  2 06:55:47 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A6B63A5674
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  2 Jul 2015 06:55:47 +0000 (UTC)
Message-Id: <20150702065546.192BAA6552@mollari.NetBSD.org>
Date: Thu,  2 Jul 2015 06:55:46 +0000 (UTC)
From: okuyama@flex.phys.tohoku.ac.jp
Reply-To: okuyama@flex.phys.tohoku.ac.jp
To: gnats-bugs@NetBSD.org
Subject: Linux affinity syscalls are not fully implemented
X-Send-Pr-Version: www-1.0

>Number:         50021
>Category:       kern
>Synopsis:       Linux affinity syscalls are not fully implemented
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jul 02 07:00:01 +0000 2015
>Closed-Date:    Mon Jan 09 00:05:30 +0000 2017
>Last-Modified:  Mon Jan 09 00:05:30 +0000 2017
>Originator:     Rin Okuyama
>Release:        7.99.19
>Organization:
Department of Physics, Tohoku University
>Environment:
NetBSD okuyama 7.99.19 NetBSD 7.99.19 (GENERIC) #0: Wed Jul  1 15:48:43 JST 2015  root@okuyama:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
NetBSD has not fully supported sched_setaffinity and sched_getaffinity
syscalls in its Linux emulation. Linux binaries cannot set CPU affinity
to maximize their performance in multiprocessor environments. Moreover,
Intel Math Kernel Library (MKL) determines the number of available CPUs
by using these calls. As this attempt fails, MKL launches only one
thread even on multiprocessor machines. To resolve this, we have fully
implemented linux_shced_(set|get)affinity syscalls on NetBSD-current.
>How-To-Repeat:
You need a machine supporting COMPAT_LINUX with at least two CPUs.
Some basic libraries for Linux binaries are also required (they are
provided via suse131_base package). First of all, set

    security.models.extensions.user_set_cpu_affinity=1

by sysctl. Otherwise, you need the root privilege to set CPU affinity.

We provide a test program which repeats bogus floating-point
calculations on a specific CPU (CPU1):

    http://flex.phys.tohoku.ac.jp/~okuyama/test_linux_affinity.tgz
    MD5 (test_linux_affinity.tgz) = 366e4c17f5bd7f5821d729f1a79343a6

This tarball contains binaries for amd64 and i386. For other platforms
supporting COMPAT_LINUX, you can compile it from source code, provided
you have Linux version of gcc, binutils, and so on.

On NetBSD 7.99.19, the test program fails to set CPU affinity. This is
confirmed by "top -1t" command:

    % tar zxf test_linux_affinity.tgz
    % cd test_linux_affinity
    % ./test.amd64 (or test.i386)
    setting affinity mask for CPU1
    failed to set affinity
    running anyway

    % top -1t
    ...
    CPU0 states:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
    CPU1 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    CPU2 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    ...

Note that it can accidentally run on CPU1.

If you have Intel compiler suites, you can confirm that MKL launches
only one thread. The followings are results with Intel Fortran Composer
XE 2011:

    % KMP_AFFINITY=verbose ./test.mkl
    OMP: Warning #79: KMP_AFFINITY: cannot determine proper affinity mask size.
    OMP: Warning #71: KMP_AFFINITY: affinity not supported, using "disabled".
    OMP: Warning #121: Error initializing affinity - not using affinity.

    % top -1t
    ...
    CPU0 states:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
    CPU1 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    CPU2 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    ...

Sorry, we cannot provide a test program with MKL, because we are not
licensed to redistribute runtime libraries.
>Fix:
Apply a patch below. For linux_sched_setaffinity(2), we just provide a
wrapper for sys__sched_setaffinity(9). On the other hand, in the case of
linux_sched_getaffinity(2), we cannot use sys__sched_getaffinity(9).
This is because the former is expected to report all CPUs available,
whereas the latter reports all CPUs unavailable, for a thread whose
affinity mask has not been set. Thus, we have implemented the Linux
syscall using codes derived from the native one.

On the patched version of NetBSD, the test program successfully sets CPU
affinity:

    % ./test.amd64
    setting affinity mask for CPU1
    succeeded to set affinity
    running on CPU1

    % top -1t
    ...
    CPU0 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    CPU1 states:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
    CPU2 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    ...

We have also confirmed that MKL launches threads more than unity.
No warnings are issued. We can specify the number of threads, affinity
policy, etc., in the same manner as in native Linux environments.


--- sys/compat/linux/common/linux_sched.c.orig	2015-07-02 15:39:47.000000000 +0900
+++ sys/compat/linux/common/linux_sched.c	2015-07-01 15:46:00.000000000 +0900
@@ -65,6 +65,9 @@
 static int linux_clone_nptl(struct lwp *, const struct linux_sys_clone_args *,
     register_t *);

+/* Unlike Linux, dynamically calculate CPU mask size */
+#define	LINUX_CPU_MASK_SIZE (sizeof(long) * ((ncpu + LONG_BIT - 1) / LONG_BIT))
+
 #if DEBUG_LINUX
 #define DPRINTF(x) uprintf x
 #else
@@ -635,39 +638,45 @@
 		syscallarg(unsigned int) len;
 		syscallarg(unsigned long *) mask;
 	} */
-	proc_t *p;
-	unsigned long *lp, *data;
-	int error, size, nb = ncpu;
+	struct lwp *t;
+	kcpuset_t *kcset;
+	size_t size;
+	cpuid_t i;
+	int error;

-	/* Unlike Linux, dynamically calculate cpu mask size */
-	size = sizeof(long) * ((ncpu + LONG_BIT - 1) / LONG_BIT);
+	size = LINUX_CPU_MASK_SIZE;
 	if (SCARG(uap, len) < size)
 		return EINVAL;

-	/* XXX: Pointless check.  TODO: Actually implement this. */
-	mutex_enter(proc_lock);
-	p = proc_find(SCARG(uap, pid));
-	mutex_exit(proc_lock);
-	if (p == NULL) {
+	/* Lock the LWP */
+	t = lwp_find2(SCARG(uap, pid), l->l_lid);
+	if (t == NULL)
 		return ESRCH;
-	}
-
-	/* 
-	 * return the actual number of CPU, tag all of them as available 
-	 * The result is a mask, the first CPU being in the least significant
-	 * bit.
-	 */
-	data = kmem_zalloc(size, KM_SLEEP);
-	lp = data;
-	while (nb > LONG_BIT) {
-		*lp++ = ~0UL;
-		nb -= LONG_BIT;
-	}
-	if (nb)
-		*lp = (1 << ncpu) - 1;

-	error = copyout(data, SCARG(uap, mask), size);
-	kmem_free(data, size);
+	/* Check the permission */
+	if (kauth_authorize_process(l->l_cred,
+	    KAUTH_PROCESS_SCHEDULER_GETAFFINITY, t->l_proc, NULL, NULL, NULL)) {
+		mutex_exit(t->l_proc->p_lock);
+		return EPERM;
+	}
+
+	kcpuset_create(&kcset, true);
+	lwp_lock(t);
+	if (t->l_affinity != NULL)
+		kcpuset_copy(kcset, t->l_affinity);
+	else {
+		/*
+		 * All available CPUs should be masked when affinity has not
+		 * been set.
+		 */
+		kcpuset_zero(kcset);
+		for (i = 0; i < ncpu; i++)
+			kcpuset_set(kcset, i);
+	}
+	lwp_unlock(t);
+	mutex_exit(t->l_proc->p_lock);
+	error = kcpuset_copyout(kcset, (cpuset_t *)SCARG(uap, mask), size);
+	kcpuset_unuse(kcset, NULL);
 	*retval = size;
 	return error;
 }
@@ -680,17 +689,17 @@
 		syscallarg(unsigned int) len;
 		syscallarg(unsigned long *) mask;
 	} */
-	proc_t *p;
+	struct sys__sched_setaffinity_args ssa;
+	size_t size;

-	/* XXX: Pointless check.  TODO: Actually implement this. */
-	mutex_enter(proc_lock);
-	p = proc_find(SCARG(uap, pid));
-	mutex_exit(proc_lock);
-	if (p == NULL) {
-		return ESRCH;
-	}
+	size = LINUX_CPU_MASK_SIZE;
+	if (SCARG(uap, len) < size)
+		return EINVAL;

-	/* Let's ignore it */
-	DPRINTF(("%s\n", __func__));
-	return 0;
+	SCARG(&ssa, pid) = SCARG(uap, pid);
+	SCARG(&ssa, lid) = l->l_lid;
+	SCARG(&ssa, size) = size;
+	SCARG(&ssa, cpuset) = (cpuset_t *)SCARG(uap, mask);
+
+	return sys__sched_setaffinity(l, &ssa, retval);
 }

>Release-Note:

>Audit-Trail:
From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50021 CVS commit: src/sys/compat/linux/common
Date: Thu, 2 Jul 2015 22:24:29 -0400

 Module Name:	src
 Committed By:	christos
 Date:		Fri Jul  3 02:24:28 UTC 2015

 Modified Files:
 	src/sys/compat/linux/common: linux_sched.c

 Log Message:
 PR/50021: Rin Okuyama: Fix linux affinity syscalls
 XXX: pullup-7


 To generate a diff of this commit:
 cvs rdiff -u -r1.67 -r1.68 src/sys/compat/linux/common/linux_sched.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Rin Okuyama <okuyama@flex.phys.tohoku.ac.jp>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/50021 (PR/50021 CVS commit: src/sys/compat/linux/common)
Date: Fri, 03 Jul 2015 15:38:24 +0900

 Thank you very much for your quick response. It works fine for me.
 Pulling-up to netbsd-7 would be helpful to replace our Linux boxes.
 Please close the PR if there's no objection.

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/50021 (PR/50021 CVS commit: src/sys/compat/linux/common)
Date: Sun, 6 Sep 2015 00:43:55 +0000

 On Fri, Jul 03, 2015 at 06:40:00AM +0000, Rin Okuyama wrote:
  > The following reply was made to PR kern/50021; it has been noted by GNATS.
  > 
  > From: Rin Okuyama <okuyama@flex.phys.tohoku.ac.jp>
  > To: gnats-bugs@NetBSD.org
  > Cc: 
  > Subject: Re: kern/50021 (PR/50021 CVS commit: src/sys/compat/linux/common)
  > Date: Fri, 03 Jul 2015 15:38:24 +0900
  > 
  >  Thank you very much for your quick response. It works fine for me.
  >  Pulling-up to netbsd-7 would be helpful to replace our Linux boxes.
  >  Please close the PR if there's no objection.

 Will do that when the netbsd-7 pullup gets taken care of.

 -- 
 David A. Holland
 dholland@netbsd.org

State-Changed-From-To: open->pending-pullups
State-Changed-By: rin@NetBSD.org
State-Changed-When: Wed, 28 Dec 2016 21:23:52 +0000
State-Changed-Why:
Pull-up requested to netbsd-7.


From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50021 CVS commit: [netbsd-7] src/sys/compat/linux/common
Date: Sat, 31 Dec 2016 07:38:31 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Sat Dec 31 07:38:31 UTC 2016

 Modified Files:
 	src/sys/compat/linux/common [netbsd-7]: linux_sched.c

 Log Message:
 Pull up following revision(s) (requested by rin in ticket #1343):
 	sys/compat/linux/common/linux_sched.c: revision 1.68
 PR/50021: Rin Okuyama: Fix linux affinity syscalls


 To generate a diff of this commit:
 cvs rdiff -u -r1.66.4.1 -r1.66.4.2 src/sys/compat/linux/common/linux_sched.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Mon, 09 Jan 2017 00:05:30 +0000
State-Changed-Why:
The fix to be in NetBSD 7.1.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.