NetBSD Problem Report #45145

From ignatios@cs.uni-bonn.de  Thu Jul 14 14:59:37 2011
Return-Path: <ignatios@cs.uni-bonn.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id B442C63BEBD
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 14 Jul 2011 14:59:37 +0000 (UTC)
Message-Id: <1310655386.677802.6743.nullmailer@kolmogorov.cs.uni-bonn.de>
Date: Thu, 14 Jul 2011 16:56:26 +0200
From: ignatios@cs.uni-bonn.de
Reply-To: ignatios@cs.uni-bonn.de, kretschm@cs.uni-bonn.de
To: gnats-bugs@gnats.NetBSD.org
Subject: SMP cpu time accounting bug
X-Send-Pr-Version: 3.95

>Number:         45145
>Category:       kern
>Synopsis:       SMP cpu time accounting bug
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jul 14 15:00:00 +0000 2011
>Last-Modified:  Sat Jan 31 18:05:00 +0000 2015
>Originator:     ignatios@cs.uni-bonn.de, kretschm@cs.uni-bonn.de
>Release:        NetBSD 5.1.0_PATCH
>Organization:
Computer Science Department, University of Bonn, Germany
>Environment:


System: NetBSD kolmogorov 5.1.0_PATCH NetBSD 5.1.0_PATCH (GENERIC) #3: Thu Jan 6 21:40:54 CET 2011 ignatios@random87:/var/itch/obj/5i386/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:

Originally this was detected with a student's Java program (openjdk7), but
reduced to a C test program:

When a multi-threaded program on a dual-core CPU is run in a loop
by a shell, and that shell is time(1)d, the user or system (or
both) CPU time(s) are sometimes much larger than real time. Depending
on the machine used, we've seen factors of some 1000 to some 2000000.

We've reduced this to a C test. It happens whether we run pthread_join()
against the worker threads, or just call exit() from the main thread.

It does not happen all but one of the machine's CPUs are offlined using
cpuctl.

>How-To-Repeat:

This is the test program, the shell wrapper (run-alg2) and a driver script
(perform_test). Run the driver script possibly after tuning the number
of threads and loops for your hardware speed.

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#	time-test/test.c
#	time-test/run_alg2
#	time-test/perform_test
#
echo x - time-test/test.c
sed 's/^X//' >time-test/test.c << 'END-of-time-test/test.c'
X#include <errno.h>
X#include <stdio.h>
X#include <stdlib.h>
X#include <stdint.h>
X#include <pthread.h>
X
X
X#define DEFAULT_THREADS         8
X#define DEFAULT_STUPIDTHINGS    1024
X
X
Xstatic int  stupidthings;
X
X
Xstatic
Xvoid *
Xworker(void *_foo)
X{
X    int  i;
X
X    for (i=0; i<stupidthings; i++)
X        printf(" %d", i);
X}
X
X
Xint
Xmain(int argc, char **argv)
X{
X    pthread_t  *tids;
X    int        threads, i, err;
X
X
X    if (argc >= 2) {
X        threads = atoi(argv[1]);
X        if ((threads < 1) || (SIZE_MAX / sizeof(pthread_t) < (size_t)threads))
X            threads = DEFAULT_THREADS;
X    } else
X        threads = DEFAULT_THREADS;
X
X    if (argc >= 3) {
X        stupidthings = atoi(argv[2]);
X        if (stupidthings < 1)
X            stupidthings = DEFAULT_STUPIDTHINGS;
X    } else
X        stupidthings = DEFAULT_STUPIDTHINGS;
X
X
X    tids = malloc(sizeof(pthread_t) * (size_t)threads);
X    if (tids == NULL) {
X        fprintf(stderr, "not enough memory!\n");
X        exit(EXIT_FAILURE);
X    }
X
X
X    for (i=0; i<threads; i++) {
X        errno = pthread_create(tids + i, NULL, worker, NULL);
X        if (errno != 0) {
X            perror("pthread_create()");
X            exit(EXIT_FAILURE);
X        }
X    }
X
X
X#ifdef DOJOIN
X    for (i=0; i<threads; i++) {
X        errno = pthread_join(tids[i], NULL);
X        if (errno != 0) {
X            perror("pthread_join()");
X            exit(EXIT_FAILURE);
X        }
X    }
X#elif defined(MAINWORKER)
X    worker(NULL);
X#endif
X
X
X    return EXIT_SUCCESS;
X}
END-of-time-test/test.c
echo x - time-test/run_alg2
sed 's/^X//' >time-test/run_alg2 << 'END-of-time-test/run_alg2'
X#!/bin/sh
X
Xtimes=$1
X
Xfor i in `seq 1 $times`
Xdo
X    #java Test foo bar baz $times > /dev/null
X    ./$2 $3 $4 > /dev/null
Xdone
END-of-time-test/run_alg2
echo x - time-test/perform_test
sed 's/^X//' >time-test/perform_test << 'END-of-time-test/perform_test'
X#!/bin/sh
X
Xgcc -lpthread -o test-nothing test.c || exit 1
Xgcc -DDOJOIN -lpthread -o test-join test.c || exit 1
Xgcc -DMAINWORKER -lpthread -o test-mainworker test.c || exit 1
X
X
X
Xruns=20
Xrepeats=50
Xthreads=64
Xloops=128
Xprogs="test-nothing test-join test-mainworker"
X
Xfor prog in $progs
Xdo
X    for run in `seq 1 $runs`
X    do
X        echo -n "$prog($run): "
X        time ./run_alg2 $repeats $prog $threads $loops
X    done
Xdone
END-of-time-test/perform_test
exit

A test program can be found in
http://theory.cs.uni-bonn.de/~ignatios/time-test.htm

>Fix:
	Workaround: run in single-CPU mode.

>Release-Note:

>Audit-Trail:

From: Matthias Kretschmer <kretschm@cs.uni-bonn.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/45145: SMP cpu time accounting bug
Date: Sat, 31 Jan 2015 19:02:18 +0100

 This bug still affects NetBSD 6.1.x (tested with NetBSD 6.1.4), too (or
 there is an equivalent bug in NetBSD 6.1.x).

 When running the test-programs I get something like:

 test-nothing(1):         0.36 real  24528623.19 user  44151523.82 sys

 test-join seems to hang randomly.


 NetBSD-7_BETA seems to have a fix.  There
     user+sys =< number_of_cpus*real
 which is what one would expect.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.