NetBSD Problem Report #58929
From www@netbsd.org Sat Dec 21 21:04:02 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 3BB071A923A
for <gnats-bugs@gnats.NetBSD.org>; Sat, 21 Dec 2024 21:04:02 +0000 (UTC)
Message-Id: <20241221210400.6F0471A923B@mollari.NetBSD.org>
Date: Sat, 21 Dec 2024 21:04:00 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: POSIX.1-2024 compliance: posix_close, POSIX_CLOSE_RESTART
X-Send-Pr-Version: www-1.0
>Number: 58929
>Category: kern
>Synopsis: POSIX.1-2024 compliance: posix_close, POSIX_CLOSE_RESTART
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Dec 21 21:05:00 +0000 2024
>Last-Modified: Sun Dec 22 04:55:01 +0000 2024
>Originator: Taylor R Campbell
>Release:
>Organization:
The CloseBSD Fdation
>Environment:
>Description:
Through POSIX.1-2008, the state of the descriptor after the close() function _fails_ has been specified to be indeterminate:
> If close() is interrupted by a signal that is to be caught, it shall return -1 with errno set to [EINTR] and the state of fildes is unspecified. If an I/O error occurred while reading from or writing to the file system during close(), it may return -1 with errno set to [EIO]; if this error is returned, the state of fildes is unspecified.
>
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html
On every OS I checked (NetBSD, FreeBSD, OpenBSD, Linux, illumos), the semantics is that after close(), the descriptor is closed, unconditionally, whether close() succeeded or failed -- even if this involves doing some I/O that may be interrupted by a signal which causes EINTR to come flying out of close().
Rather than simply endorse the reasonable semantics that probably every reasonable OS on the planet already implements, POSIX.1-2024 changed that language to:
> If close() is interrupted by a signal that is to be caught, then it is unspecified whether it returns -1 with errno set to [EINTR] and fildes remaining open, or returns -1 with errno set to [EINPROGRESS] and fildes being closed, or returns 0 to indicate successful completion; except that if POSIX_CLOSE_RESTART is defined as 0, then the option of returning -1 with errno set to [EINTR] and fildes remaining open shall not occur.
>
> [...]
>
> The posix_close() function shall be equivalent to the close() function, except with the modifications based on the flag argument as described below. If flag is 0, then posix_close() shall not return -1 with errno set to [EINTR], which implies that fildes will always be closed (except for [EBADF], where fildes was invalid). If flag includes POSIX_CLOSE_RESTART and POSIX_CLOSE_RESTART is defined as a non-zero value, and posix_close() is interrupted by a signal that is to be caught, then posix_close() may return -1 with errno set to [EINTR], in which case fildes shall be left open; however, it is unspecified whether fildes can subsequently be passed to any function except close() or posix_close() without error. If flag is invalid, posix_close() may fail with errno set to [EINVAL], but shall otherwise behave as if flag had been 0 and close fildes.
>
> [...]
>
> If POSIX_CLOSE_RESTART is zero, the close() function shall not return an [EINTR] error.
>
> https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/posix_close.html
This probably caters only to such obscure and likely indefensible niche proprietary operating systems that I doubt whether many applications in the real world will ever call posix_close(), especially given the mental gymnastics needed to understand the language describing it. But it's there in POSIX so it would be reasonable for us to implement it, or if we don't, then or at least to document why we don't.
>How-To-Repeat:
read POSIX.1-2024 and take some aspirin for the headache it will cause
>Fix:
Since close() (a) may return EINTR, and (b) is unconditionally final, i.e., cannot be restarted even if it returns EINTR, we cannot simply #define POSIX_CLOSE_RESTART 0.
So, perhaps we can do this:
#define POSIX_CLOSE_RESTART 1
int
posix_close(int fildes, int flag)
{
if (close(fildes) == -1 && errno != EINTR)
return -1;
case (flag) {
case 0:
case POSIX_CLOSE_RESTART:
break;
default:
errno = EINVAL;
return -1;
}
return 0;
}
This meets the following criteria:
> If flag is 0, then posix_close() shall not return -1 with errno set to [EINTR], which implies that fildes will always be closed (except for [EBADF], where fildes was invalid).
This posix_close() definition does not fail with EINTR. fildes is always closed.
> If flag includes POSIX_CLOSE_RESTART and POSIX_CLOSE_RESTART is defined as a non-zero value, and posix_close() is interrupted by a signal that is to be caught, then posix_close() may return -1 with errno set to [EINTR], in which case fildes shall be left open; [...]
POSIX_CLOSE_RESTART is defined as a non-zero value, so this clause applies. And while this clause says posix_close() MAY fail with EINTR, it does not say posix_close() MUST fail with EINTR if interrupted by a signal. This posix_close() definition succeeds (returns 0) if interrupted by a signal, so it meets the requirement by never failing with EINTR and never leaving fildes open.
> If flag is invalid, posix_close() may fail with errno set to [EINVAL], but shall otherwise behave as if flag had been 0 and close fildes.
This checks flag for validity _after_ close(), so fildes is closed even if flag is invalid. (EBADF and other possible close() failures like EIO take precedence over EINVAL.)
This posix_close() definition closes fildes even if the flag is invalid.
>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Cc:
Subject: Re: kern/58929: POSIX.1-2024 compliance: posix_close, POSIX_CLOSE_RESTART
Date: Sat, 21 Dec 2024 22:07:19 +0000
Correction: Our current close() -- and probably all existing close()
implementations outside buggy proprietary Unix variants -- is not
compliant with POSIX.1-2024, because if some underlying I/O is
interrupted by a signal it can fail with EINTR even though the
descriptor is unconditionally closed. That is now forbidden (emphasis
added):
> If close() is interrupted by a signal that is to be caught, then it
> is unspecified whether it returns -1 with errno set to [EINTR] and
> fildes _remaining open_, or returns -1 with errno set to
> [EINPROGRESS] and fildes being closed, or returns 0 to indicate
> successful completion[...]
>
> RATIONALE
>
> [...] Note that the standard requires that close() and posix_close()
> _must leave fildes open after [EINTR]_ (in the cases where [EINTR]
> is permitted)
So, in order to deal with POSIX.1-2024 having addressed the problem of
its own failure to adequately specify semantics by breaking
compatibility with every major OS on the planet, it looks like we'll
have to do some symbol magic under _POSIX_C_SOURCE -- something like
this:
/* include/unistd.h */
#if (_POSIX_C_SOURCE >= 202408L) && !defined(_NETBSD_SOURCE)
int close(int) __RENAME(__posixly_correct_close);
int posix_close(int, int);
#define POSIX_CLOSE_RESTART 0
#else
int close(int);
#endif
/* lib/libc/sys/posixly_correct_close.c */
int
__posixly_correct_close(int d)
{
const int ret = close(d);
if (ret == -1 && errno == EINTR)
errno = EINPROGRESS;
return ret;
}
int
posix_close(int d, int flag)
{
const int ret = __posixly_correct_close(d);
_DIAGASSERT(ret == 0 || errno != EINTR);
if (ret == 0 && flag != 0) {
errno = EINVAL;
ret = -1;
}
return ret;
}
What a colossal waste of effort -- and attack surface for new bugs to
crawl onto -- to force everyone else to bend over backwards to
accommodate bugs in proprietary implementations of close().
From: Robert Elz <kre@munnari.OZ.AU>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58929: POSIX.1-2024 compliance: posix_close, POSIX_CLOSE_RESTART
Date: Sun, 22 Dec 2024 07:49:25 +0700
Date: Sat, 21 Dec 2024 22:07:19 +0000
From: Taylor R Campbell <riastradh@NetBSD.org>
Message-ID: <20241221220723.04ECB84E92@mail.netbsd.org>
| Correction: Our current close() -- and probably all existing close()
| implementations outside buggy proprietary Unix variants -- is not
| compliant with POSIX.1-2024, because if some underlying I/O is
| interrupted by a signal it can fail with EINTR even though the
| descriptor is unconditionally closed. That is now forbidden (emphasis
| added):
Can it really? I haven't looked at our implementation (kernel)
but it used to be the case that even if close() slept for some time,
(like waiting for tty output to drain) and a signal occurred, close()
would still return 0.
But if not, surely the simple change here is to make that happen, not
to work around it in libc - just have sys_close() never return EINTR.
We have to function correctly in this case anyway, as we do close the
fd, so another close(fd) will just return EBADF - so anything draining
still in the kernel output queues needs to be properly handled without
any more userland assistance.
kre
|
| > If close() is interrupted by a signal that is to be caught, then it
| > is unspecified whether it returns -1 with errno set to [EINTR] and
| > fildes _remaining open_, or returns -1 with errno set to
| > [EINPROGRESS] and fildes being closed, or returns 0 to indicate
| > successful completion[...]
| >
| > RATIONALE
| >
| > [...] Note that the standard requires that close() and posix_close()
| > _must leave fildes open after [EINTR]_ (in the cases where [EINTR]
| > is permitted)
|
| So, in order to deal with POSIX.1-2024 having addressed the problem of
| its own failure to adequately specify semantics by breaking
| compatibility with every major OS on the planet, it looks like we'll
| have to do some symbol magic under _POSIX_C_SOURCE -- something like
| this:
|
| /* include/unistd.h */
|
| #if (_POSIX_C_SOURCE >= 202408L) && !defined(_NETBSD_SOURCE)
| int close(int) __RENAME(__posixly_correct_close);
| int posix_close(int, int);
| #define POSIX_CLOSE_RESTART 0
| #else
| int close(int);
| #endif
|
| /* lib/libc/sys/posixly_correct_close.c */
|
| int
| __posixly_correct_close(int d)
| {
| const int ret = close(d);
|
| if (ret == -1 && errno == EINTR)
| errno = EINPROGRESS;
|
| return ret;
| }
|
| int
| posix_close(int d, int flag)
| {
| const int ret = __posixly_correct_close(d);
|
| _DIAGASSERT(ret == 0 || errno != EINTR);
|
| if (ret == 0 && flag != 0) {
| errno = EINVAL;
| ret = -1;
| }
| return ret;
| }
|
| What a colossal waste of effort -- and attack surface for new bugs to
| crawl onto -- to force everyone else to bend over backwards to
| accommodate bugs in proprietary implementations of close().
|
From: Robert Elz <kre@munnari.OZ.AU>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58929: POSIX.1-2024 compliance: posix_close, POSIX_CLOSE_RESTART
Date: Sun, 22 Dec 2024 10:45:54 +0700
Date: Sat, 21 Dec 2024 22:07:19 +0000
From: Taylor R Campbell <riastradh@NetBSD.org>
Message-ID: <20241221220723.04ECB84E92@mail.netbsd.org>
| So, in order to deal with POSIX.1-2024 having addressed the problem of
| its own failure to adequately specify semantics by breaking
| compatibility with every major OS on the planet, it looks like we'll
| have to do some symbol magic under _POSIX_C_SOURCE -- something like
| this:
Surely all we need is...
| /* include/unistd.h */
|
| #if (_POSIX_C_SOURCE >= 202408L) && !defined(_NETBSD_SOURCE)
| int posix_close(int, int);
| #define POSIX_CLOSE_RESTART 0
| #endif
| int close(int);
| int
| posix_close(int d, int flag)
| {
| const int ret = close(d);
|
| _DIAGASSERT(ret == 0 || errno != EINTR);
|
| if (ret == 0 && flag != 0) {
| errno = EINVAL;
| ret = -1;
| }
| return ret;
| }
Along with:
Index: sys_descrip.c
===================================================================
RCS file: /cvsroot/src/sys/kern/sys_descrip.c,v
retrieving revision 1.40
diff -u -r1.40 sys_descrip.c
--- sys_descrip.c 16 Apr 2022 07:59:02 -0000 1.40
+++ sys_descrip.c 22 Dec 2024 03:36:00 -0000
@@ -519,7 +519,7 @@
printf("%s[%d]: close(%d) returned ERESTART\n",
l->l_proc->p_comm, (int)l->l_proc->p_pid, fd);
#endif
- error = EINTR;
+ error = 0;
}
return error;
where that 0 could be EINPROGRESS if anyone really believes that
conveys any information anyone would ever use (what? how?)
kre
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Robert Elz <kre@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58929: POSIX.1-2024 compliance: posix_close, POSIX_CLOSE_RESTART
Date: Sun, 22 Dec 2024 04:21:24 +0000
> Date: Sun, 22 Dec 2024 07:49:25 +0700
> From: Robert Elz <kre@munnari.OZ.AU>
>
> Date: Sat, 21 Dec 2024 22:07:19 +0000
> From: Taylor R Campbell <riastradh@NetBSD.org>
> Message-ID: <20241221220723.04ECB84E92@mail.netbsd.org>
>
> | Correction: Our current close() -- and probably all existing close()
> | implementations outside buggy proprietary Unix variants -- is not
> | compliant with POSIX.1-2024, because if some underlying I/O is
> | interrupted by a signal it can fail with EINTR even though the
> | descriptor is unconditionally closed. That is now forbidden (emphasis
> | added):
>
> Can it really? I haven't looked at our implementation (kernel)
> but it used to be the case that even if close() slept for some time,
> (like waiting for tty output to drain) and a signal occurred, close()
> would still return 0.
Mount an nfs volume with `-i', open a file, unplug the network, write
to the file, close it, and hit ^C. I bet close(2) will fail with
EINTR. And I bet it will do this on most other operating systems too.
> But if not, surely the simple change here is to make that happen, not
> to work around it in libc - just have sys_close() never return EINTR.
Would you like to volunteer to audit every device driver and file
system, and every subroutine call they all make, to verify this?
Gigantic waste of effort for a bad decision in POSIX which I'm
inclined to say we should just flout because it's so bad.
Current semantics -- which Linux, Solaris, and other BSDs already
agree on -- is extremely simple and reliable: after close(d), d is
closed. No ifs or buts about it. POSIX.1-2024 semantics is
hopelessly confusing for users and implementors alike, and is
incompatible with the vast majority of existing practice -- a mistake.
If it was really necessary for some political reason not to simply
endorse the simple rule `close() is final' (much like fclose() is
already specified to be final), then POSIX could have added a simple
feature macro like
#define _POSIX_CLOSE_IS_FINAL,
or
#define _POSIX_CLOSE_IS_UNRELIABLE
to accommodate the buggy proprietary Unix systems that motivated this
whole mess. Why should everyone else have to change their semantics
of closing a file descriptor to accommodate their design bugs?
From: Robert Elz <kre@munnari.OZ.AU>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58929: POSIX.1-2024 compliance: posix_close, POSIX_CLOSE_RESTART
Date: Sun, 22 Dec 2024 11:50:26 +0700
Date: Sun, 22 Dec 2024 04:21:24 +0000
From: Taylor R Campbell <riastradh@NetBSD.org>
Message-ID: <20241222042128.8CA8185574@mail.netbsd.org>
| Mount an nfs volume with `-i', open a file, unplug the network, write
| to the file, close it, and hit ^C. I bet close(2) will fail with
| EINTR.
It quite likely might, but it shouldn't. EINTR is supposed to mean
"do it again (if you want to)" - or more verbosely "While waiting to
perform the operation you requested, a signal was sent to you. Rather
than keep that signal pending for perhaps a long time, we are abandoning
your request, so you can process the signal. After you have done that,
should you still desire to perform this operation, repeat the call".
Note that in cases like read(), write() etc, once a single byte has
been transferred to/from a device (or network connection, etc) EINTR
is no longer allowed, rather the call returns successfully with a
short count - as simply repeating the call would result in lost/duplicated
data. EINTR happens only when the system call hasn't changed the
state of the process, or should.
| Would you like to volunteer to audit every device driver and file
| system, and every subroutine call they all make, to verify this?
No, nor do I need to. I just looked at the close system call, and
it does, as you say, and has more or less been true forever:
| Current semantics -- which Linux, Solaris, and other BSDs already
| agree on -- is extremely simple and reliable: after close(d), d is
| closed.
Exactly. The close has happened. So why are we returning EINTR ?
That makes no sense at all.
| No ifs or buts about it. POSIX.1-2024 semantics is
| hopelessly confusing for users and implementors alike, and is
| incompatible with the vast majority of existing practice -- a mistake.
It is attempting (like almost always) to inform users of what they
may encounter, while still making the system robust.
For any system with the "close is final" semantic, it is really very
simple, just don't ever return EINTR (which would indicate the close
didn't occur) and all is good.
| Why should everyone else have to change their semantics
| of closing a file descriptor to accommodate their design bugs?
No-one has to change (almost) anything, except to avoid returning an
obviously bogus error code.
What do you expect application code that receives EINTR from close()
to do with that information as things are currently?
kre
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.