NetBSD Problem Report #43561

From www@NetBSD.org  Sat Jul  3 08:30:29 2010
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id A91F863BA89
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  3 Jul 2010 08:30:29 +0000 (UTC)
Message-Id: <20100703083029.7229163BA69@www.NetBSD.org>
Date: Sat,  3 Jul 2010 08:30:29 +0000 (UTC)
From: witold.wnuk@gmail.com
Reply-To: witold.wnuk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Thread waiting, CPU idling
X-Send-Pr-Version: www-1.0

>Number:         43561
>Category:       kern
>Synopsis:       Thread waiting, CPU idling
>Confidential:   no
>Severity:       non-critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jul 03 08:35:00 +0000 2010
>Closed-Date:    Tue Apr 10 08:21:43 +0000 2018
>Last-Modified:  Tue Apr 10 08:21:43 +0000 2018
>Originator:     Witold Jan Wnuk
>Release:        NetBSD-current
>Organization:
>Environment:
NetBSD foster 5.99.33 NetBSD 5.99.33 (FOSTER) #23: Sat Jul  3 09:07:28 CEST 2010  w@foster:/home/w/NetBSD/src/sys/arch/i386/compile/FOSTER i386

>Description:
On multiprocessor system sched_balance (sys/kern/kern_runq.c) fails to select CPU with one thread in run queue.
>How-To-Repeat:
Compile and run one copy for each CPU. Inspect CPU idle time.

int
main()
{
        while (1)
                ;
}


Problem described by Sad Clouds in http://mail-index.netbsd.org/tech-userlevel/2010/04/16/msg003515.html is likely also a manifestation of this.

>Fix:
Workaround - add one bit of precision to r_avgcount calculation:


Index: sys/kern/kern_runq.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_runq.c,v
retrieving revision 1.30
diff -u -r1.30 kern_runq.c
--- sys/kern/kern_runq.c        3 Mar 2010 00:47:30 -0000       1.30
+++ sys/kern/kern_runq.c        3 Jul 2010 07:43:20 -0000
@@ -78,7 +78,7 @@
        uint32_t        r_bitmap[PRI_COUNT >> BITMAP_SHIFT];
        /* Counters */
        u_int           r_count;        /* Count of the threads */
-       u_int           r_avgcount;     /* Average count of threads */
+       u_int           r_avgcount1;    /* Average count of threads x 2 */
        u_int           r_mcount;       /* Count of migratable threads */
        /* Runqueues */
        queue_t         r_rt_queue[PRI_RT_COUNT];
@@ -523,12 +523,12 @@
                ci_rq = ci->ci_schedstate.spc_sched_info;

                /* Average count of the threads */
-               ci_rq->r_avgcount = (ci_rq->r_avgcount + ci_rq->r_mcount) >> 1;
+               ci_rq->r_avgcount1 = (ci_rq->r_avgcount1 + (ci_rq->r_mcount << 1)) >> 1;

                /* Look for CPU with the highest average */
-               if (ci_rq->r_avgcount > highest) {
+               if (ci_rq->r_avgcount1 > highest) {
                        hci = ci;
-                       highest = ci_rq->r_avgcount;
+                       highest = ci_rq->r_avgcount1;
                }
        }

@@ -625,7 +625,7 @@
        }

        /* Reset the counter, and call the balancer */
-       ci_rq->r_avgcount = 0;
+       ci_rq->r_avgcount1 = 0;
        sched_balance(ci);
        tci = worker_ci;
        tspc = &tci->ci_schedstate;
@@ -734,7 +734,7 @@
                        return NULL;

                /* Reset the counter, and call the balancer */
-               ci_rq->r_avgcount = 0;
+               ci_rq->r_avgcount1 = 0;
                sched_balance(ci);
                cci = worker_ci;
                cspc = &cci->ci_schedstate;
@@ -871,14 +871,14 @@
                ci_rq = spc->spc_sched_info;

                (*pr)("Run-queue (CPU = %u):\n", ci->ci_index);
-               (*pr)(" pid.lid = %d.%d, r_count = %u, r_avgcount = %u, "
+               (*pr)(" pid.lid = %d.%d, r_count = %u, r_avgcount1 = %u, "
                    "maxpri = %d, mlwp = %p\n",
 #ifdef MULTIPROCESSOR
                    ci->ci_curlwp->l_proc->p_pid, ci->ci_curlwp->l_lid,
 #else
                    curlwp->l_proc->p_pid, curlwp->l_lid,
 #endif
-                   ci_rq->r_count, ci_rq->r_avgcount, spc->spc_maxpriority,
+                   ci_rq->r_count, ci_rq->r_avgcount1, spc->spc_maxpriority,
                    spc->spc_migrating);
                i = (PRI_COUNT >> BITMAP_SHIFT) - 1;
                do {

>Release-Note:

>Audit-Trail:
From: Hubert Feyrer <hubert@feyrer.de>
To: witold.wnuk@gmail.com, gnats-bugs@NetBSD.org
Cc: Hubert Feyrer <hubertf@netbsd.org>
Subject: Re: kern/43561: Thread waiting, CPU idling
Date: Thu, 22 Dec 2016 20:55:41 +0100 (CET)

 FWIW,
 code was committed to -current today that might fix this issue.
 See PR kern/51615 and the following URL and its threat for more 
 information:

 http://mail-index.netbsd.org/source-changes/2016/12/22/msg080093.html

 If this resolves your issue, please let me know and I can close the PR.


   - Hubert

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Tue, 06 Jun 2017 03:19:56 +0000
State-Changed-Why:
If you're still there please check the commit cited...


State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Tue, 10 Apr 2018 08:21:43 +0000
State-Changed-Why:
Feedback timeout.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.