NetBSD Problem Report #57260

From www@netbsd.org  Sun Mar  5 13:59:10 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 7FDF11A9239
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  5 Mar 2023 13:59:10 +0000 (UTC)
Message-Id: <20230305135909.059411A923B@mollari.NetBSD.org>
Date: Sun,  5 Mar 2023 13:59:09 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: open(2) fails to restart on signal with SA_RESTART
X-Send-Pr-Version: www-1.0

>Number:         57260
>Category:       kern
>Synopsis:       open(2) fails to restart on signal with SA_RESTART
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Mar 05 14:00:00 +0000 2023
>Closed-Date:    Fri Mar 17 09:18:45 +0000 2023
>Last-Modified:  Fri Apr 21 21:55:01 +0000 2023
>Originator:     Taylor R Campbell
>Release:        current
>Organization:
The NetBS<Interrupted system call>
>Environment:
only one environment to use, can't restart on signal of pending climate catastrophe
>Description:
open(2) changes an ERESTART return value produced internally by opening a vnode into EINTR, as if a signal had been delivered without SA_RESTART, so that the system call will return immediately rather than restart:

https://nxr.netbsd.org/xref/src/sys/kern/vfs_syscalls.c?r=1.556#1756

   1752 	error = vn_open(dvp, pb, TRYEMULROOT, flags, cmode,
   1753 	    &vp, &dupfd_move, &dupfd);
   1754 	if (error != 0) {
   1755 		fd_abort(p, fp, indx);
   1756 		if (error == ERESTART)
   1757 			error = EINTR;
   1758 		return error;
   1759 	}

This breaks the POSIX rule about SA_RESTART -- from https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html:

NAME
    sigaction - examine and change a signal action
...
DESCRIPTION
...
    SA_RESTART
        This flag affects the behavior of interruptible functions; that is, those specified to fail with errno set to [EINTR].  If set, and a function specified as interruptible is interrupted by this signal, the function shall restart and shall not fail with [EINTR] unless otherwise specified.  If an interruptible function which uses a timeout is restarted, the duration of the timeout following the restart is set to an unspecified value that does not exceed the original timeout value.  If the flag is not set, interruptible functions interrupted by this signal shall fail with errno set to [EINTR].

From https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html:

NAME
    open, openat - open file
...
ERRORS
    These functions fail if:
    ...
    [EINTR]
        A signal was caught during open().

Nothing in the description of open specifies otherwise about restartable system calls.
>How-To-Repeat:
The following program should print SIGALRM and then hang indefinitely, but instead it fails with interrupted system call:

#include <sys/stat.h>

#include <err.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>

static void
on_alarm(int sig)
{
	const char msg[] = "SIGALRM\n";

	(void)write(STDERR_FILENO, msg, strlen(msg));
}

int
main(void)
{
	struct sigaction sa;
	int fd;

	memset(&sa, 0, sizeof(sa));
	sa.sa_handler = on_alarm;
	(void)sigfillset(&sa.sa_mask);
	sa.sa_flags = SA_RESTART;
	if (sigaction(SIGALRM, &sa, NULL) == -1)
		err(1, "sigaction");

	if (mkfifo("fifo", 0600) == -1)
		err(1, "mkfifo");
	alarm(1);
	fd = open("fifo", O_RDONLY);
	if (fd == -1)
		err(1, "open");
	errx(1, "open returned %d", fd);
}

>Fix:
Delete the lines mapping EINTR to ERESTART in do_open in vfs_syscalls.c.

>Release-Note:

>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57260 CVS commit: src/sys/kern
Date: Sun, 5 Mar 2023 14:40:32 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Mar  5 14:40:32 UTC 2023

 Modified Files:
 	src/sys/kern: vfs_syscalls.c

 Log Message:
 open(2): Don't map ERESTART to EINTR.

 If a file or device's open function returns ERESTART, respect that --
 restart the syscall; don't pretend a signal has been delivered when
 it was not.  If an SA_RESTART signal was delivered, POSIX does not
 allow it to fail with EINTR:

     SA_RESTART
         This flag affects the behavior of interruptible functions;
         that is, those specified to fail with errno set to [EINTR].
         If set, and a function specified as interruptible is
         interrupted by this signal, the function shall restart and
         shall not fail with [EINTR] unless otherwise specified.  If
         an interruptible function which uses a timeout is restarted,
         the duration of the timeout following the restart is set to
         an unspecified value that does not exceed the original
         timeout value.  If the flag is not set, interruptible
         functions interrupted by this signal shall fail with errno
         set to [EINTR].

 https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html

 Nothing in the POSIX definition of open specifies otherwise.

 In 1990, Kirk McKusick added these lines with a mysterious commit
 message:

 Author: Kirk McKusick <mckusick>
 Date:   Tue Apr 10 19:36:33 1990 -0800

     eliminate longjmp from the kernel (for karels)

 diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c
 index 7bc7b39bbf..d572d3a32d 100644
 --- a/sys/kern/vfs_syscalls.c
 +++ b/sys/kern/vfs_syscalls.c
 @@ -14,7 +14,7 @@
   * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
   * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
   *
 - *	@(#)vfs_syscalls.c	7.42 (Berkeley) 3/26/90
 + *	@(#)vfs_syscalls.c	7.43 (Berkeley) 4/10/90
   */

  #include "param.h"
 @@ -530,8 +530,10 @@ copen(scp, fmode, cmode, ndp, resultfd)
  	if (error = vn_open(ndp, fmode, (cmode & 07777) &~ S_ISVTX)) {
  		crfree(fp->f_cred);
  		fp->f_count--;
 -		if (error == -1)	/* XXX from fdopen */
 -			return (0);	/* XXX from fdopen */
 +		if (error == EJUSTRETURN)	/* XXX from fdopen */
 +			return (0);		/* XXX from fdopen */
 +		if (error == ERESTART)
 +			error = EINTR;
  		scp->sc_ofile[indx] = NULL;
  		return (error);
  	}

 (found via this git import of the CSRG history:
 https://github.com/robohack/ucb-csrg-bsd/commit/cce2869b7ae5d360921eb411005b328a29c4a3fe)

 This change appears to have served two related purposes:

 1. The fdopen function (the erstwhile open routine for /dev/fd/N)
    used to return -1 as a hack to mean it had just duplicated the fd;
    it was recently changed by Mike Karels, in kern_descrip.c 7.9, to
    return EJUSTRETURN, now defined to be -2, presumably to avoid a
    conflict with ERESTART, defined to be -1.  So this change finished
    part of the change by Mike Karels to use a different magic return
    code from fdopen.

    Of course, today we use still another disgusting hack, EDUPFD, for
    the same purpose, so none of this is relevant any more.

 2. Prior to April 1990, the kernel handled signals during tsleep(9)
    by longjmping out to the system call entry point or similar.  In
    April 1990, Mike Karels worked to convert all of that into
    explicit unwind logic by passing through EINTR or ERESTART as
    appropriate, instead of setjmp at each entry point.

 However, it's not clear to me why this setjmp/longjmp and
 fdopen/-1/EJUSTRETURN renovation justifies unconditional logic to map
 ERESTART to EINTR in open(2).  I suspect it was a mistake.

 In 2013, the corresponding logic to map ERESTART to EINTR in open(2)
 was removed from FreeBSD:

    r246472 | kib | 2013-02-07 14:53:33 +0000 (Thu, 07 Feb 2013) | 11 lines

    Stop translating the ERESTART error from the open(2) into EINTR.
    Posix requires that open(2) is restartable for SA_RESTART.

    For non-posix objects, in particular, devfs nodes, still disable
    automatic restart of the opens. The open call to a driver could have
    significant side effects for the hardware.

    Noted and reviewed by:  jilles
    Discussed with: bde
    MFC after:      2 weeks

 Index: vfs_syscalls.c
 ===================================================================
 --- vfs_syscalls.c	(revision 246471)
 +++ vfs_syscalls.c	(revision 246472)
 @@ -1106,8 +1106,6 @@
  				goto success;
  		}

 -		if (error == ERESTART)
 -			error = EINTR;
  		goto bad;
  	}
  	td->td_dupfd = 0;

 https://cgit.freebsd.org/src/commit/sys/kern/vfs_syscalls.c?id=2ca49983425886121b506cb5126b60a705afc38c

 It's not clear to me that there's any reason to treat device nodes
 specially here; in fact, if a driver's .d_open routine sleeps and is
 woken by a concurrent revoke without a signal pending or with an
 SA_RESTART signal pending, it is wrong for it to fail with EINTR.
 But it MUST restart the whole system call rather than continue
 sleeping in a loop or just exit the loop and continue to open,
 because it is mandatory in the security model of revoke for open(2)
 to retry the permissions check at that point.

 PR kern/57260

 XXX pullup-8
 XXX pullup-9
 XXX pullup-10


 To generate a diff of this commit:
 cvs rdiff -u -r1.556 -r1.557 src/sys/kern/vfs_syscalls.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 05 Mar 2023 14:47:52 +0000
State-Changed-Why:
needs pullups


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57260 CVS commit: [netbsd-10] src/sys/kern
Date: Tue, 7 Mar 2023 19:56:45 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Tue Mar  7 19:56:45 UTC 2023

 Modified Files:
 	src/sys/kern [netbsd-10]: vfs_syscalls.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #115):

 	sys/kern/vfs_syscalls.c: revision 1.557

 open(2): Don't map ERESTART to EINTR.

 If a file or device's open function returns ERESTART, respect that --
 restart the syscall; don't pretend a signal has been delivered when
 it was not.  If an SA_RESTART signal was delivered, POSIX does not
 allow it to fail with EINTR:

     SA_RESTART
         This flag affects the behavior of interruptible functions;
         that is, those specified to fail with errno set to [EINTR].
         If set, and a function specified as interruptible is
         interrupted by this signal, the function shall restart and
         shall not fail with [EINTR] unless otherwise specified.  If
         an interruptible function which uses a timeout is restarted,
         the duration of the timeout following the restart is set to
         an unspecified value that does not exceed the original
         timeout value.  If the flag is not set, interruptible
         functions interrupted by this signal shall fail with errno
         set to [EINTR].

 https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html

 Nothing in the POSIX definition of open specifies otherwise.

 In 1990, Kirk McKusick added these lines with a mysterious commit
 message:
 Author: Kirk McKusick <mckusick>
 Date:   Tue Apr 10 19:36:33 1990 -0800
     eliminate longjmp from the kernel (for karels)
 diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c
 index 7bc7b39bbf..d572d3a32d 100644
 --- a/sys/kern/vfs_syscalls.c
 +++ b/sys/kern/vfs_syscalls.c
 @@ -14,7 +14,7 @@
   * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
   * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
   *
 - *     @(#)vfs_syscalls.c      7.42 (Berkeley) 3/26/90
 + *     @(#)vfs_syscalls.c      7.43 (Berkeley) 4/10/90
   */
  #include "param.h"
 @@ -530,8 +530,10 @@ copen(scp, fmode, cmode, ndp, resultfd)
         if (error = vn_open(ndp, fmode, (cmode & 07777) &~ S_ISVTX)) {
                 crfree(fp->f_cred);
                 fp->f_count--;
 -               if (error == -1)        /* XXX from fdopen */
 -                       return (0);     /* XXX from fdopen */
 +               if (error == EJUSTRETURN)       /* XXX from fdopen */
 +                       return (0);             /* XXX from fdopen */
 +               if (error == ERESTART)
 +                       error = EINTR;
                 scp->sc_ofile[indx] = NULL;
                 return (error);
         }

 (found via this git import of the CSRG history:
 https://github.com/robohack/ucb-csrg-bsd/commit/cce2869b7ae5d360921eb411005b328a29c4a3fe

 This change appears to have served two related purposes:
 1. The fdopen function (the erstwhile open routine for /dev/fd/N)
    used to return -1 as a hack to mean it had just duplicated the fd;
    it was recently changed by Mike Karels, in kern_descrip.c 7.9, to
    return EJUSTRETURN, now defined to be -2, presumably to avoid a
    conflict with ERESTART, defined to be -1.  So this change finished
    part of the change by Mike Karels to use a different magic return
    code from fdopen.
    Of course, today we use still another disgusting hack, EDUPFD, for
    the same purpose, so none of this is relevant any more.
 2. Prior to April 1990, the kernel handled signals during tsleep(9)
    by longjmping out to the system call entry point or similar.  In
    April 1990, Mike Karels worked to convert all of that into
    explicit unwind logic by passing through EINTR or ERESTART as
    appropriate, instead of setjmp at each entry point.

 However, it's not clear to me why this setjmp/longjmp and
 fdopen/-1/EJUSTRETURN renovation justifies unconditional logic to map
 ERESTART to EINTR in open(2).  I suspect it was a mistake.

 In 2013, the corresponding logic to map ERESTART to EINTR in open(2)
 was removed from FreeBSD:

    r246472 | kib | 2013-02-07 14:53:33 +0000 (Thu, 07 Feb 2013) | 11 lines
    Stop translating the ERESTART error from the open(2) into EINTR.
    Posix requires that open(2) is restartable for SA_RESTART.
    For non-posix objects, in particular, devfs nodes, still disable
    automatic restart of the opens. The open call to a driver could have
    significant side effects for the hardware.
    Noted and reviewed by:  jilles
    Discussed with: bde
    MFC after:      2 weeks

 Index: vfs_syscalls.c
 ===================================================================
 --- vfs_syscalls.c      (revision 246471)
 +++ vfs_syscalls.c      (revision 246472)
 @@ -1106,8 +1106,6 @@
                                 goto success;
                 }
 -               if (error == ERESTART)
 -                       error = EINTR;
                 goto bad;
         }
         td->td_dupfd = 0;

 https://cgit.freebsd.org/src/commit/sys/kern/vfs_syscalls.c?id=2ca49983425886121b506cb5126b60a705afc38c">https://cgit.freebsd.org/src/commit/sys/kern/vfs_syscalls.c?id=2ca49983425886121b506cb5126b60a705afc38c

 It's not clear to me that there's any reason to treat device nodes
 specially here; in fact, if a driver's .d_open routine sleeps and is
 woken by a concurrent revoke without a signal pending or with an
 SA_RESTART signal pending, it is wrong for it to fail with EINTR.

 But it MUST restart the whole system call rather than continue
 sleeping in a loop or just exit the loop and continue to open,
 because it is mandatory in the security model of revoke for open(2)
 to retry the permissions check at that point.

 PR kern/57260


 To generate a diff of this commit:
 cvs rdiff -u -r1.556 -r1.556.2.1 src/sys/kern/vfs_syscalls.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57260 CVS commit: [netbsd-9] src/sys/kern
Date: Tue, 7 Mar 2023 20:01:07 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Tue Mar  7 20:01:07 UTC 2023

 Modified Files:
 	src/sys/kern [netbsd-9]: vfs_syscalls.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #1610):

 	sys/kern/vfs_syscalls.c: revision 1.557

 open(2): Don't map ERESTART to EINTR.

 If a file or device's open function returns ERESTART, respect that --
 restart the syscall; don't pretend a signal has been delivered when
 it was not.  If an SA_RESTART signal was delivered, POSIX does not
 allow it to fail with EINTR:

     SA_RESTART
         This flag affects the behavior of interruptible functions;
         that is, those specified to fail with errno set to [EINTR].
         If set, and a function specified as interruptible is
         interrupted by this signal, the function shall restart and
         shall not fail with [EINTR] unless otherwise specified.  If
         an interruptible function which uses a timeout is restarted,
         the duration of the timeout following the restart is set to
         an unspecified value that does not exceed the original
         timeout value.  If the flag is not set, interruptible
         functions interrupted by this signal shall fail with errno
         set to [EINTR].

 https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html

 Nothing in the POSIX definition of open specifies otherwise.

 In 1990, Kirk McKusick added these lines with a mysterious commit
 message:
 Author: Kirk McKusick <mckusick>
 Date:   Tue Apr 10 19:36:33 1990 -0800
     eliminate longjmp from the kernel (for karels)
 diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c
 index 7bc7b39bbf..d572d3a32d 100644
 --- a/sys/kern/vfs_syscalls.c
 +++ b/sys/kern/vfs_syscalls.c
 @@ -14,7 +14,7 @@
   * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
   * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
   *
 - *     @(#)vfs_syscalls.c      7.42 (Berkeley) 3/26/90
 + *     @(#)vfs_syscalls.c      7.43 (Berkeley) 4/10/90
   */
  #include "param.h"
 @@ -530,8 +530,10 @@ copen(scp, fmode, cmode, ndp, resultfd)
         if (error = vn_open(ndp, fmode, (cmode & 07777) &~ S_ISVTX)) {
                 crfree(fp->f_cred);
                 fp->f_count--;
 -               if (error == -1)        /* XXX from fdopen */
 -                       return (0);     /* XXX from fdopen */
 +               if (error == EJUSTRETURN)       /* XXX from fdopen */
 +                       return (0);             /* XXX from fdopen */
 +               if (error == ERESTART)
 +                       error = EINTR;
                 scp->sc_ofile[indx] = NULL;
                 return (error);
         }

 (found via this git import of the CSRG history:
 https://github.com/robohack/ucb-csrg-bsd/commit/cce2869b7ae5d360921eb411005b328a29c4a3fe

 This change appears to have served two related purposes:
 1. The fdopen function (the erstwhile open routine for /dev/fd/N)
    used to return -1 as a hack to mean it had just duplicated the fd;
    it was recently changed by Mike Karels, in kern_descrip.c 7.9, to
    return EJUSTRETURN, now defined to be -2, presumably to avoid a
    conflict with ERESTART, defined to be -1.  So this change finished
    part of the change by Mike Karels to use a different magic return
    code from fdopen.
    Of course, today we use still another disgusting hack, EDUPFD, for
    the same purpose, so none of this is relevant any more.
 2. Prior to April 1990, the kernel handled signals during tsleep(9)
    by longjmping out to the system call entry point or similar.  In
    April 1990, Mike Karels worked to convert all of that into
    explicit unwind logic by passing through EINTR or ERESTART as
    appropriate, instead of setjmp at each entry point.

 However, it's not clear to me why this setjmp/longjmp and
 fdopen/-1/EJUSTRETURN renovation justifies unconditional logic to map
 ERESTART to EINTR in open(2).  I suspect it was a mistake.

 In 2013, the corresponding logic to map ERESTART to EINTR in open(2)
 was removed from FreeBSD:

    r246472 | kib | 2013-02-07 14:53:33 +0000 (Thu, 07 Feb 2013) | 11 lines
    Stop translating the ERESTART error from the open(2) into EINTR.
    Posix requires that open(2) is restartable for SA_RESTART.
    For non-posix objects, in particular, devfs nodes, still disable
    automatic restart of the opens. The open call to a driver could have
    significant side effects for the hardware.
    Noted and reviewed by:  jilles
    Discussed with: bde
    MFC after:      2 weeks

 Index: vfs_syscalls.c
 ===================================================================
 --- vfs_syscalls.c      (revision 246471)
 +++ vfs_syscalls.c      (revision 246472)
 @@ -1106,8 +1106,6 @@
                                 goto success;
                 }
 -               if (error == ERESTART)
 -                       error = EINTR;
                 goto bad;
         }
         td->td_dupfd = 0;

 https://cgit.freebsd.org/src/commit/sys/kern/vfs_syscalls.c?id=2ca49983425886121b506cb5126b60a705afc38c">https://cgit.freebsd.org/src/commit/sys/kern/vfs_syscalls.c?id=2ca49983425886121b506cb5126b60a705afc38c

 It's not clear to me that there's any reason to treat device nodes
 specially here; in fact, if a driver's .d_open routine sleeps and is
 woken by a concurrent revoke without a signal pending or with an
 SA_RESTART signal pending, it is wrong for it to fail with EINTR.

 But it MUST restart the whole system call rather than continue
 sleeping in a loop or just exit the loop and continue to open,
 because it is mandatory in the security model of revoke for open(2)
 to retry the permissions check at that point.

 PR kern/57260


 To generate a diff of this commit:
 cvs rdiff -u -r1.533.2.1 -r1.533.2.2 src/sys/kern/vfs_syscalls.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57260 CVS commit: [netbsd-8] src/sys/kern
Date: Tue, 7 Mar 2023 20:02:57 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Tue Mar  7 20:02:57 UTC 2023

 Modified Files:
 	src/sys/kern [netbsd-8]: vfs_syscalls.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #1806):

 	sys/kern/vfs_syscalls.c: revision 1.557

 open(2): Don't map ERESTART to EINTR.

 If a file or device's open function returns ERESTART, respect that --
 restart the syscall; don't pretend a signal has been delivered when
 it was not.  If an SA_RESTART signal was delivered, POSIX does not
 allow it to fail with EINTR:

     SA_RESTART
         This flag affects the behavior of interruptible functions;
         that is, those specified to fail with errno set to [EINTR].
         If set, and a function specified as interruptible is
         interrupted by this signal, the function shall restart and
         shall not fail with [EINTR] unless otherwise specified.  If
         an interruptible function which uses a timeout is restarted,
         the duration of the timeout following the restart is set to
         an unspecified value that does not exceed the original
         timeout value.  If the flag is not set, interruptible
         functions interrupted by this signal shall fail with errno
         set to [EINTR].

 https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html

 Nothing in the POSIX definition of open specifies otherwise.

 In 1990, Kirk McKusick added these lines with a mysterious commit
 message:
 Author: Kirk McKusick <mckusick>
 Date:   Tue Apr 10 19:36:33 1990 -0800
     eliminate longjmp from the kernel (for karels)
 diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c
 index 7bc7b39bbf..d572d3a32d 100644
 --- a/sys/kern/vfs_syscalls.c
 +++ b/sys/kern/vfs_syscalls.c
 @@ -14,7 +14,7 @@
   * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
   * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
   *
 - *     @(#)vfs_syscalls.c      7.42 (Berkeley) 3/26/90
 + *     @(#)vfs_syscalls.c      7.43 (Berkeley) 4/10/90
   */
  #include "param.h"
 @@ -530,8 +530,10 @@ copen(scp, fmode, cmode, ndp, resultfd)
         if (error = vn_open(ndp, fmode, (cmode & 07777) &~ S_ISVTX)) {
                 crfree(fp->f_cred);
                 fp->f_count--;
 -               if (error == -1)        /* XXX from fdopen */
 -                       return (0);     /* XXX from fdopen */
 +               if (error == EJUSTRETURN)       /* XXX from fdopen */
 +                       return (0);             /* XXX from fdopen */
 +               if (error == ERESTART)
 +                       error = EINTR;
                 scp->sc_ofile[indx] = NULL;
                 return (error);
         }

 (found via this git import of the CSRG history:
 https://github.com/robohack/ucb-csrg-bsd/commit/cce2869b7ae5d360921eb411005b328a29c4a3fe

 This change appears to have served two related purposes:
 1. The fdopen function (the erstwhile open routine for /dev/fd/N)
    used to return -1 as a hack to mean it had just duplicated the fd;
    it was recently changed by Mike Karels, in kern_descrip.c 7.9, to
    return EJUSTRETURN, now defined to be -2, presumably to avoid a
    conflict with ERESTART, defined to be -1.  So this change finished
    part of the change by Mike Karels to use a different magic return
    code from fdopen.
    Of course, today we use still another disgusting hack, EDUPFD, for
    the same purpose, so none of this is relevant any more.
 2. Prior to April 1990, the kernel handled signals during tsleep(9)
    by longjmping out to the system call entry point or similar.  In
    April 1990, Mike Karels worked to convert all of that into
    explicit unwind logic by passing through EINTR or ERESTART as
    appropriate, instead of setjmp at each entry point.

 However, it's not clear to me why this setjmp/longjmp and
 fdopen/-1/EJUSTRETURN renovation justifies unconditional logic to map
 ERESTART to EINTR in open(2).  I suspect it was a mistake.

 In 2013, the corresponding logic to map ERESTART to EINTR in open(2)
 was removed from FreeBSD:

    r246472 | kib | 2013-02-07 14:53:33 +0000 (Thu, 07 Feb 2013) | 11 lines
    Stop translating the ERESTART error from the open(2) into EINTR.
    Posix requires that open(2) is restartable for SA_RESTART.
    For non-posix objects, in particular, devfs nodes, still disable
    automatic restart of the opens. The open call to a driver could have
    significant side effects for the hardware.
    Noted and reviewed by:  jilles
    Discussed with: bde
    MFC after:      2 weeks

 Index: vfs_syscalls.c
 ===================================================================
 --- vfs_syscalls.c      (revision 246471)
 +++ vfs_syscalls.c      (revision 246472)
 @@ -1106,8 +1106,6 @@
                                 goto success;
                 }
 -               if (error == ERESTART)
 -                       error = EINTR;
                 goto bad;
         }
         td->td_dupfd = 0;

 https://cgit.freebsd.org/src/commit/sys/kern/vfs_syscalls.c?id=2ca49983425886121b506cb5126b60a705afc38c">https://cgit.freebsd.org/src/commit/sys/kern/vfs_syscalls.c?id=2ca49983425886121b506cb5126b60a705afc38c

 It's not clear to me that there's any reason to treat device nodes
 specially here; in fact, if a driver's .d_open routine sleeps and is
 woken by a concurrent revoke without a signal pending or with an
 SA_RESTART signal pending, it is wrong for it to fail with EINTR.

 But it MUST restart the whole system call rather than continue
 sleeping in a loop or just exit the loop and continue to open,
 because it is mandatory in the security model of revoke for open(2)
 to retry the permissions check at that point.

 PR kern/57260


 To generate a diff of this commit:
 cvs rdiff -u -r1.516.2.1 -r1.516.2.2 src/sys/kern/vfs_syscalls.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: needs-pullups->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Fri, 17 Mar 2023 09:18:45 +0000
State-Changed-Why:
Fixed and pulled up.

(and what a mess, thanks for dealing with it)


From: "David H. Gutteridge" <gutteridge@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57260 CVS commit: src
Date: Fri, 21 Apr 2023 21:50:05 +0000

 Module Name:	src
 Committed By:	gutteridge
 Date:		Fri Apr 21 21:50:05 UTC 2023

 Modified Files:
 	src/distrib/sets/lists/debug: mi
 	src/distrib/sets/lists/tests: mi
 	src/tests/kernel: Makefile
 Added Files:
 	src/tests/kernel: t_open_pr_57260.c

 Log Message:
 Add new test t_open_pr_57260

 New test case that reflects the fix in PR kern/57260. The majority of
 work for this case itself was by riastradh@, who'd supplied the basis
 for it in the ticket, and provided further guidance.


 To generate a diff of this commit:
 cvs rdiff -u -r1.396 -r1.397 src/distrib/sets/lists/debug/mi
 cvs rdiff -u -r1.1256 -r1.1257 src/distrib/sets/lists/tests/mi
 cvs rdiff -u -r1.70 -r1.71 src/tests/kernel/Makefile
 cvs rdiff -u -r0 -r1.1 src/tests/kernel/t_open_pr_57260.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.