NetBSD Problem Report #53931
From gson@gson.org Fri Feb 1 15:28:41 2019
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 1DADA7A1AC
for <gnats-bugs@gnats.NetBSD.org>; Fri, 1 Feb 2019 15:28:41 +0000 (UTC)
Message-Id: <20190201152835.7C8A89892C2@guava.gson.org>
Date: Fri, 1 Feb 2019 17:28:35 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: posix_fadvise_reg test case fails randomly on real hardware
X-Send-Pr-Version: 3.95
>Number: 53931
>Notify-List: riastradh@NetBSD.org
>Category: kern
>Synopsis: posix_fadvise_reg test case fails randomly on real hardware
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: riastradh
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Feb 01 15:30:00 +0000 2019
>Last-Modified: Sun Apr 13 23:51:28 +0000 2025
>Originator: Andreas Gustafsson
>Release: NetBSD-current
>Organization:
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:
The posix_fadvise_reg test case of the lib/libc/sys/t_posix_fadvise
test program is failing randomly on real amd64 hardware, with six
failures in the last 30 runs on my bare metal testbed. It fails
with the message
t_posix_fadvise.c:135: errno != 999: got: Operation already in progress
Log output from the latest failure:
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2019/2019.01.31.00.27.52/test.html#lib_libc_sys_t_posix_fadvise_posix_fadvise_reg
The first recorded failure was with source date 2015.10.30.03.08.56:
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2015/2015.10.30.03.08.56/test.html#lib_libc_sys_t_posix_fadvise_posix_fadvise_reg
It's passing reliably on the qemu-based TNF testbed, with no failures
in 2018 nor any in 2019 so far.
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53931: posix_fadvise_reg test case fails randomly on real
hardware
Date: Sat, 2 Feb 2019 18:45:56 +0000
On Fri, Feb 01, 2019 at 03:30:00PM +0000, Andreas Gustafsson wrote:
> The posix_fadvise_reg test case of the lib/libc/sys/t_posix_fadvise
> test program is failing randomly on real amd64 hardware, with six
> failures in the last 30 runs on my bare metal testbed. It fails
> with the message
>
> t_posix_fadvise.c:135: errno != 999: got: Operation already in progress
The system call cannot generate EINPROGRESS, and furthermore, the
system call does not touch errno (it is one of the broken POSIX
innovations that returns an errno value instead) so something in the
rump plumbing must be doing it.
Does rump actually have a means for handling these broken syscalls
correctly, and if so, is posix_fadvise tagged appropriately?
It is bizarre that the behavior would depend on the nature of the
underlying hardware though.
--
David A. Holland
dholland@netbsd.org
From: Andreas Gustafsson <gson@gson.org>
To: David Holland <dholland-bugs@netbsd.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53931: posix_fadvise_reg test case fails randomly on real
hardware
Date: Sun, 3 Feb 2019 00:09:47 +0200
David Holland wrote:
> > t_posix_fadvise.c:135: errno != 999: got: Operation already in progress
>
> The system call cannot generate EINPROGRESS, and furthermore, the
> system call does not touch errno (it is one of the broken POSIX
> innovations that returns an errno value instead) so something in the
> rump plumbing must be doing it.
Quite possible. Thanks for the analysis.
> Does rump actually have a means for handling these broken syscalls
> correctly, and if so, is posix_fadvise tagged appropriately?
I don't know.
> It is bizarre that the behavior would depend on the nature of the
> underlying hardware though.
If it's some kind of race condition, it's hardly surprising if it
happens on multiprocessor but not on a (software emulation of a)
uniprocessor.
--
Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53931: posix_fadvise_reg test case fails randomly on real hardware
Date: Sun, 3 Feb 2019 22:30:10 +0200
Earlier, I wrote:
> The first recorded failure was with source date 2015.10.30.03.08.56
I have now reproduced the failure using sources from 2015.10.02.03.08.26,
using an 8-core machine, but only on the 94th run of the test.
The failure is probably even older than that, but may not be showing
up in the existing reports for older versions because the tests
were run on a uniprocessor until around 2015-10-09.
There's probably no point in trying to bisect this, because it's
likely to be old enough that the version where it first appeared no
longer builds on a NetBSD-8 host.
--
Andreas Gustafsson, gson@gson.org
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53931 CVS commit: src/sys/sys
Date: Sun, 6 Apr 2025 19:13:06 +0000
Module Name: src
Committed By: riastradh
Date: Sun Apr 6 19:13:06 UTC 2025
Modified Files:
src/sys/sys: ktrace.h
Log Message:
sys/ktrace.h: Need sys/param.h for MAXCOMLEN.
Found while preparing to diagnose:
PR kern/53931: posix_fadvise_reg test case fails randomly on real
hardware
To generate a diff of this commit:
cvs rdiff -u -r1.70 -r1.71 src/sys/sys/ktrace.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53931 CVS commit: src/tests/lib/libc/sys
Date: Sun, 6 Apr 2025 19:18:01 +0000
Module Name: src
Committed By: riastradh
Date: Sun Apr 6 19:18:01 UTC 2025
Modified Files:
src/tests/lib/libc/sys: t_posix_fadvise.c
Log Message:
t_posix_fadvise: Don't check whether errno is preserved.
I can find no guarantee in POSIX about posix_fadvise preserving
errno; until such language is found I'm going to assume there is no
such guarantee.
What is happening is that, sometimes, rump_sys_posix_fadvise waits on
a mutex or condvar, which uses _lwp_park internally, which sometimes
wakes up early with EALREADY because a wakeup was already pending for
the thread by the time it entered _lwp_park. And that EALREADY is
delivered by _lwp_park via errno.
PR kern/53931: posix_fadvise_reg test case fails randomly on real
hardware
To generate a diff of this commit:
cvs rdiff -u -r1.3 -r1.4 src/tests/lib/libc/sys/t_posix_fadvise.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: PR/53931 CVS commit: src/tests/lib/libc/sys
Date: Mon, 7 Apr 2025 07:38:57 +0200
On Sun, Apr 06, 2025 at 07:20:01PM +0000, Taylor R Campbell wrote:
> I can find no guarantee in POSIX about posix_fadvise preserving
> errno; until such language is found I'm going to assume there is no
> such guarantee.
The errno definition seems to say it:
[...] and shall otherwise be defined only after a call to a function for
which it is explicitly stated to be set and until it is changed by the
next function call or if the application assigns it a value
where I would read "next function call" not from a compiler's perspective
but function being a posix defined system interface. Maybe this should
be clarified.
https://pubs.opengroup.org/onlinepubs/9799919799/functions/errno.html
Martin
Responsible-Changed-From-To: kern-bug-people->kre
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Sun, 13 Apr 2025 00:31:02 +0000
Responsible-Changed-Why:
Need a ruling from an Austin Group whisperer: is posix_fadvise (and
any other POSIX function that returns an error code instead of setting
errno and returning -1, like pthread_*) _allowed_ to set errno, or
_required_ to leave errno as it was on entry when it returns?
If it's allowed to set errno, we can close this -- test has been fixed.
If it's forbidden to set errno, we need to do a lot of work to make
libpthread and librumpuser save/restore errno on ~every public
function.
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53931 (posix_fadvise_reg test case fails randomly on real hardware)
Date: Sun, 13 Apr 2025 08:57:41 +0700
Date: Sun, 13 Apr 2025 00:31:02 +0000 (UTC)
From: riastradh@NetBSD.org
Message-ID: <20250413003103.307A81A923F@mollari.NetBSD.org>
| Need a ruling from an Austin Group whisperer:
Not so much me any more (for now anyway) - I tested running (munnari.oz.au)
without IPv4 connectivity for a while last year .. just IPv6. Most things
I care about didn't mind at all (incl NetBSD lists, etc, and IETF stuff).
Some gnu lists more or less bounced me, but kept sending mail up until
after my little experiment was over indicating why, and how to get reinstated
(well one, that's all I'm on...)
But the Austin group simply removed me from the mailing list, and
so far I haven't found the magic formula to get reinstated. So, I
can't ask.
But:
| is posix_fadvise (and
| any other POSIX function that returns an error code instead of setting
| errno and returning -1, like pthread_*) _allowed_ to set errno, or
| _required_ to leave errno as it was on entry when it returns?
I believe the answer to that can be inferred from XSH 2.3 Error Numbers
Some functions provide the error number in a variable accessed
through the symbol errno, defined by including the <errno.h> header.
The value of errno should only be examined when it is indicated to
be valid by a function's return value.
That is, effectively, unless a function is defined to return a value
in errno, and the function returns a value which indicates that has happened
(typically -1, or NULL) then the state of errno is undefined after any of
the defined functions are called.
There are some functions which expressly indicate that errno must not
be altered by the function, but not a lot.
posix_fadvise() as best I can tell isn't such a function. I haven't
checked all the pthread_*() functions though - there are many!
So:
| If it's allowed to set errno, we can close this -- test has been fixed.
I am fairly sure that's the case, almost anything is allowed to alter
errno - usually by calling some other function which returns an error,
but where that error is not fatal to the call in question.
kre
Responsible-Changed-From-To: kre->riastradh
Responsible-Changed-By: kre@NetBSD.org
Responsible-Changed-When: Sun, 13 Apr 2025 23:51:28 +0000
Responsible-Changed-Why:
I have done my bit, nothing more I can do,
returning responsibility for this to riastradh
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.