NetBSD Problem Report #53931
From gson@gson.org Fri Feb 1 15:28:41 2019
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 1DADA7A1AC
for <gnats-bugs@gnats.NetBSD.org>; Fri, 1 Feb 2019 15:28:41 +0000 (UTC)
Message-Id: <20190201152835.7C8A89892C2@guava.gson.org>
Date: Fri, 1 Feb 2019 17:28:35 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: posix_fadvise_reg test case fails randomly on real hardware
X-Send-Pr-Version: 3.95
>Number: 53931
>Category: kern
>Synopsis: posix_fadvise_reg test case fails randomly on real hardware
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Feb 01 15:30:00 +0000 2019
>Last-Modified: Sun Feb 03 20:35:01 +0000 2019
>Originator: Andreas Gustafsson
>Release: NetBSD-current
>Organization:
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:
The posix_fadvise_reg test case of the lib/libc/sys/t_posix_fadvise
test program is failing randomly on real amd64 hardware, with six
failures in the last 30 runs on my bare metal testbed. It fails
with the message
t_posix_fadvise.c:135: errno != 999: got: Operation already in progress
Log output from the latest failure:
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2019/2019.01.31.00.27.52/test.html#lib_libc_sys_t_posix_fadvise_posix_fadvise_reg
The first recorded failure was with source date 2015.10.30.03.08.56:
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2015/2015.10.30.03.08.56/test.html#lib_libc_sys_t_posix_fadvise_posix_fadvise_reg
It's passing reliably on the qemu-based TNF testbed, with no failures
in 2018 nor any in 2019 so far.
>How-To-Repeat:
>Fix:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53931: posix_fadvise_reg test case fails randomly on real
hardware
Date: Sat, 2 Feb 2019 18:45:56 +0000
On Fri, Feb 01, 2019 at 03:30:00PM +0000, Andreas Gustafsson wrote:
> The posix_fadvise_reg test case of the lib/libc/sys/t_posix_fadvise
> test program is failing randomly on real amd64 hardware, with six
> failures in the last 30 runs on my bare metal testbed. It fails
> with the message
>
> t_posix_fadvise.c:135: errno != 999: got: Operation already in progress
The system call cannot generate EINPROGRESS, and furthermore, the
system call does not touch errno (it is one of the broken POSIX
innovations that returns an errno value instead) so something in the
rump plumbing must be doing it.
Does rump actually have a means for handling these broken syscalls
correctly, and if so, is posix_fadvise tagged appropriately?
It is bizarre that the behavior would depend on the nature of the
underlying hardware though.
--
David A. Holland
dholland@netbsd.org
From: Andreas Gustafsson <gson@gson.org>
To: David Holland <dholland-bugs@netbsd.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53931: posix_fadvise_reg test case fails randomly on real
hardware
Date: Sun, 3 Feb 2019 00:09:47 +0200
David Holland wrote:
> > t_posix_fadvise.c:135: errno != 999: got: Operation already in progress
>
> The system call cannot generate EINPROGRESS, and furthermore, the
> system call does not touch errno (it is one of the broken POSIX
> innovations that returns an errno value instead) so something in the
> rump plumbing must be doing it.
Quite possible. Thanks for the analysis.
> Does rump actually have a means for handling these broken syscalls
> correctly, and if so, is posix_fadvise tagged appropriately?
I don't know.
> It is bizarre that the behavior would depend on the nature of the
> underlying hardware though.
If it's some kind of race condition, it's hardly surprising if it
happens on multiprocessor but not on a (software emulation of a)
uniprocessor.
--
Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53931: posix_fadvise_reg test case fails randomly on real hardware
Date: Sun, 3 Feb 2019 22:30:10 +0200
Earlier, I wrote:
> The first recorded failure was with source date 2015.10.30.03.08.56
I have now reproduced the failure using sources from 2015.10.02.03.08.26,
using an 8-core machine, but only on the 94th run of the test.
The failure is probably even older than that, but may not be showing
up in the existing reports for older versions because the tests
were run on a uniprocessor until around 2015-10-09.
There's probably no point in trying to bisect this, because it's
likely to be old enough that the version where it first appeared no
longer builds on a NetBSD-8 host.
--
Andreas Gustafsson, gson@gson.org
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.