NetBSD Problem Report #52893

From martin@duskware.de  Wed Jan  3 16:47:32 2018
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 9F5647A19A
	for <gnats-bugs@gnats.NetBSD.org>; Wed,  3 Jan 2018 16:47:32 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: t_timeleft case is racy
X-Send-Pr-Version: 3.95

>Number:         52893
>Category:       bin
>Synopsis:       t_timeleft case is racy
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 03 16:50:00 +0000 2018
>Originator:     Martin Husemann
>Release:        NetBSD 8.99.10
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD unpluged.duskware.de 8.99.10 NetBSD 8.99.10 (UNPLUGED) #135: Wed Jan 3 12:18:55 CET 2018 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/evbarm/compile/UNPLUGED evbarm
Architecture: earm
Machine: evbarm
>Description:

When run on a uniprocessor machine, the tests/kernel/t_timeleft test is
failing randomly. The reason seems to be that the second lwp parked itself
but may not be scheduled again to do it's SIGINT handling before the first
thread compares the timing results.

This usually works fine on multiprocessor systems.

>How-To-Repeat:
ktrace it on a uniprocessor machine a few times (with output redirected
or from atf-run):

 16314      1 t_timeleft RET   getcontext 0
 16314      1 t_timeleft CALL  _lwp_create(0xbfffe4a8,0x40,0xbbb0f11c)
 16314      1 t_timeleft RET   _lwp_create 0
 16314      1 t_timeleft CALL  __nanosleep50(0xbfffe6c8,0)
 16314      2 t_timeleft CALL  _lwp_ctl(1,0xbbb0f184)
 16314      2 t_timeleft RET   _lwp_ctl 0
 16314      2 t_timeleft CALL  ___lwp_park60(3,0,0xbfffe6e0,0,0xbfffe6e0,0)
 16314      1 t_timeleft RET   __nanosleep50 0
 16314      1 t_timeleft CALL  _lwp_kill(2,2)
 16314      1 t_timeleft RET   _lwp_kill 0
 16314      1 t_timeleft CALL  __fstat50(1,0xbfffddc8)
 16314      1 t_timeleft RET   __fstat50 0
 16314      1 t_timeleft CALL  open(0xbbb0a058,0x601,0x1a4)
 16314      1 t_timeleft NAMI  "/tmp/atf-run.bG3G9N/tcr"
 16314      1 t_timeleft RET   open 3
 16314      1 t_timeleft CALL  writev(3,0xbfffe598,4)
 16314      1 t_timeleft GIO   fd 3 wrote 83 bytes
       "failed: /work/src/tests/kernel/t_timeleft.c:98: timespeccmp(&i.ts, &ts\
        , <) not met\n"


Note the missing SIGINT signal handler invocation in lwp 2.

>Fix:
Do another sleep in the main thread? Join the thread?

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.