NetBSD Problem Report #40419
From www@NetBSD.org Fri Jan 16 22:04:38 2009
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 6C82763BA90
for <gnats-bugs@gnats.netbsd.org>; Fri, 16 Jan 2009 22:04:38 +0000 (UTC)
Message-Id: <20090116220437.85E7663B8BA@narn.NetBSD.org>
Date: Fri, 16 Jan 2009 22:04:37 +0000 (UTC)
From: ad@netbsd.org
Reply-To: ad@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: processor sets broken on 5.99.6
X-Send-Pr-Version: www-1.0
>Number: 40419
>Category: kern
>Synopsis: processor sets broken on 5.99.6
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: rmind
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Jan 16 22:05:00 +0000 2009
>Closed-Date: Wed Jan 21 00:04:29 +0000 2009
>Last-Modified: Thu Jan 22 20:10:03 +0000 2009
>Originator: Andrew Doran
>Release: 5.99.6
>Organization:
The NetBSD Project
>Environment:
NetBSD spanners.hairylemon.org 5.99.6 NetBSD 5.99.6 (GENERIC) #6: Tue Dec 30 13:19:29 GMT 2008 ad@spanners.hairylemon.org:/local/home/ad/src/sys/arch/i386/compile/GENERIC i386
>Description:
# cpuctl list
Num HwId Unbound LWPs Interrupts Last change
---- ---- ------------ -------------- ----------------------------
0 0 online intr Fri Jan 16 17:30:28 2009
1 1 online intr Fri Jan 16 17:30:28 2009
# psrset
system processor set 0: processor(s) 0
user processor set 1: processor(s) 1
# psrset -e 1 top | grep top
561 root 43 0 752K 1116K CPU/0 0:00 0.00% 0.00% top
CPU/0 ???
>How-To-Repeat:
See above.
>Fix:
Not investigated.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->rmind
Responsible-Changed-By: rmind@NetBSD.org
Responsible-Changed-When: Sun, 18 Jan 2009 01:44:43 +0000
Responsible-Changed-Why:
Take.
From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40419 CVS commit: src/sys/kern
Date: Sun, 18 Jan 2009 05:07:51 +0000 (UTC)
Module Name: src
Committed By: rmind
Date: Sun Jan 18 05:07:51 UTC 2009
Modified Files:
src/sys/kern: kern_runq.c
Log Message:
- Avoid calling sched_catchlwp() if CPUs have different processor-sets.
- sched_takecpu: check for psid earlier (be more strict).
PR/40419.
To generate a diff of this commit:
cvs rdiff -r1.23 -r1.24 src/sys/kern/kern_runq.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: ad@netbsd.org
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/40419 (processor sets broken on 5.99.6)
Date: Sun, 18 Jan 2009 05:26:10 +0000
> >Number: 40419
> >Category: kern
> >Synopsis: processor sets broken on 5.99.6
>
> ...
>
> # cpuctl list
> Num HwId Unbound LWPs Interrupts Last change
> ---- ---- ------------ -------------- ----------------------------
> 0 0 online intr Fri Jan 16 17:30:28 2009
> 1 1 online intr Fri Jan 16 17:30:28 2009
> # psrset
> system processor set 0: processor(s) 0
> user processor set 1: processor(s) 1
> # psrset -e 1 top | grep top
> 561 root 43 0 752K 1116K CPU/0 0:00 0.00% 0.00% top
What happens later?
I have committed one fix (please review), sched_takecpu() would keep the
thread on local CPU if it idles. Now function is strict. But even with
previous behaviour, after some load process should have been migrated to
the appropriate CPU (and then it should not be able to get out).
However, in your test case, application might still be on local CPU,
because curlwp (LSONRPOC) might not migrate that fast. We can do yield()
in sys__pset_bind(). Do you think it is worth?
Otherwise, it seems to be working for me.
--
Best regards,
Mindaugas
From: Andrew Doran <ad@netbsd.org>
To: Mindaugas Rasiukevicius <rmind@netbsd.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/40419 (processor sets broken on 5.99.6)
Date: Mon, 19 Jan 2009 09:02:17 +0000
> I have committed one fix (please review), sched_takecpu() would keep the
> thread on local CPU if it idles.
Thanks. I will look at it later, don't have the time right now.
> However, in your test case, application might still be on local CPU,
> because curlwp (LSONRPOC) might not migrate that fast.
I think this is what's happending.
> We can do yield() in sys__pset_bind(). Do you think it is worth?
I was thinking of a function that scans all threads, with cpu_lock held, and
checks to see if their l_cpu is allowed by their affinity mask, processor
set or LP_BOUND flag. If not, change l_cpu (or migrate if online), then do a
broadcast xcall to nullop() if there have been migrations.
Hmm, shouldn't psets and affinity masks be mutually exclusive?
Andrew
From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: Andrew Doran <ad@netbsd.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/40419 (processor sets broken on 5.99.6)
Date: Tue, 20 Jan 2009 02:08:53 +0000
Andrew Doran <ad@netbsd.org> wrote:
> > We can do yield() in sys__pset_bind(). Do you think it is worth?
>
> I was thinking of a function that scans all threads, with cpu_lock held, and
> checks to see if their l_cpu is allowed by their affinity mask, processor
> set or LP_BOUND flag. If not, change l_cpu (or migrate if online), then do a
> broadcast xcall to nullop() if there have been migrations.
>
> Hmm, shouldn't psets and affinity masks be mutually exclusive?
>
Yes, exactly what I thought while fixing this. While originally mixing was
permitted, intervention to the "jailed" processor-set is really wrong!
Fix committed (without xcall bit):
http://mail-index.netbsd.org/source-changes/2009/01/20/msg215921.html
By the way, does it make more sense to return EPERM or EBUSY, or other?
> Andrew
--
Best regards,
Mindaugas
State-Changed-From-To: open->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Wed, 21 Jan 2009 00:04:29 +0000
State-Changed-Why:
Fixed, thanks.
From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: Andrew Doran <ad@netbsd.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/40419 (processor sets broken on 5.99.6)
Date: Wed, 21 Jan 2009 00:01:49 +0000
Andrew Doran <ad@netbsd.org> wrote:
> > We can do yield() in sys__pset_bind(). Do you think it is worth?
>
> I was thinking of a function that scans all threads, with cpu_lock held, and
> checks to see if their l_cpu is allowed by their affinity mask, processor
> set or LP_BOUND flag. If not, change l_cpu (or migrate if online), then do a
> broadcast xcall to nullop() if there have been migrations.
After some thinking, I do not think it is worth. Theoretically, xc_broadcast
might still not ensure that all LWPs have migrated, eg. in a case when there
are many migrating LWPs in the same run-queue.
After last fixes, I think migration should be very fast (~immediate) anyway,
just re-scheduling overhead.
--
Best regards,
Mindaugas
From: Andrew Doran <ad@netbsd.org>
To: Mindaugas Rasiukevicius <rmind@netbsd.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/40419 (processor sets broken on 5.99.6)
Date: Wed, 21 Jan 2009 09:56:38 +0000
On Wed, Jan 21, 2009 at 12:01:49AM +0000, Mindaugas Rasiukevicius wrote:
> Andrew Doran <ad@netbsd.org> wrote:
> > > We can do yield() in sys__pset_bind(). Do you think it is worth?
> >
> > I was thinking of a function that scans all threads, with cpu_lock held, and
> > checks to see if their l_cpu is allowed by their affinity mask, processor
> > set or LP_BOUND flag. If not, change l_cpu (or migrate if online), then do a
> > broadcast xcall to nullop() if there have been migrations.
>
> After some thinking, I do not think it is worth. Theoretically, xc_broadcast
> might still not ensure that all LWPs have migrated, eg. in a case when there
> are many migrating LWPs in the same run-queue.
Hmm. I can't look at the code right now. If we can have LWPs in the wrong
runqueue after a pset/affinity change, we should move them to prevent them
running on that CPU after there is a context switch. Maybe it would be
useful to add a syncobj_t::sobj_changecpu()?
Thanks,
Andrew
From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: Andrew Doran <ad@netbsd.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/40419 (processor sets broken on 5.99.6)
Date: Wed, 21 Jan 2009 10:17:14 +0000
Andrew Doran <ad@netbsd.org> wrote:
> > > I was thinking of a function that scans all threads, with cpu_lock
> > > held, and checks to see if their l_cpu is allowed by their affinity
> > > mask, processor set or LP_BOUND flag. If not, change l_cpu (or migrate
> > > if online), then do a broadcast xcall to nullop() if there have been
> > > migrations.
> >
> > After some thinking, I do not think it is worth. Theoretically,
> > xc_broadcast might still not ensure that all LWPs have migrated, eg. in a
> > case when there are many migrating LWPs in the same run-queue.
>
> Hmm. I can't look at the code right now. If we can have LWPs in the wrong
> runqueue after a pset/affinity change, we should move them to prevent them
> running on that CPU after there is a context switch. Maybe it would be
> useful to add a syncobj_t::sobj_changecpu()?
It would not migrate immediately only if LWP is in LSRUN (and if it's in the
runqueue) or LSONPROC state. For both cases LWP would migrate just after the
context-switch via idle loop (l_target_cpu != NULL).
If we really want to be synchronous about this, then a simple cv_wait() in
lwp_migrate() and cv_broadcast() in sched_idle() would work. Do you see some
good reason to do this?
--
Best regards,
Mindaugas
From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40419 CVS commit: [netbsd-5] src/sys/kern
Date: Thu, 22 Jan 2009 20:06:47 +0000 (UTC)
Module Name: src
Committed By: snj
Date: Thu Jan 22 20:06:47 UTC 2009
Modified Files:
src/sys/kern [netbsd-5]: kern_runq.c
Log Message:
Pull up following revision(s) (requested by rmind in ticket #284):
sys/kern/kern_runq.c: revision 1.24
- Avoid calling sched_catchlwp() if CPUs have different processor-sets.
- sched_takecpu: check for psid earlier (be more strict).
PR/40419.
To generate a diff of this commit:
cvs rdiff -r1.22 -r1.22.4.1 src/sys/kern/kern_runq.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.