NetBSD Problem Report #42724
From www@NetBSD.org Wed Feb 3 00:07:01 2010
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 392B263C462
for <gnats-bugs@gnats.NetBSD.org>; Wed, 3 Feb 2010 00:07:01 +0000 (UTC)
Message-Id: <20100203000700.EA98563B886@www.NetBSD.org>
Date: Wed, 3 Feb 2010 00:07:00 +0000 (UTC)
From: eravin@panix.com
Reply-To: eravin@panix.com
To: gnats-bugs@NetBSD.org
Subject: select(2) and poll(2) can return non-error status on bad file descriptors
X-Send-Pr-Version: www-1.0
>Number: 42724
>Category: kern
>Synopsis: select(2) and poll(2) can return non-error status on bad file descriptors
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: feedback
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Feb 03 00:10:00 +0000 2010
>Closed-Date:
>Last-Modified: Thu Jul 06 12:02:35 +0000 2023
>Originator: Ed Ravin
>Release: 5.0.1
>Organization:
PANIX Public Access Networks Corp
>Environment:
NetBSD panix3.panix.com 5.0.1 NetBSD 5.0.1 (PANIX-USER) #0: Thu Nov 5 22:13:39
EST 2009 root@juggler.panix.com:/devel/netbsd/5.0.1/src/sys/arch/i386/compile/PANIX-USER i386
>Description:
we repeatedly see programs like emacs, mutt, elm, pine, trn, and nn go into infinite loops polling for input when the end user has lost their telnet or ssh session.
Here's a sample ktrace:
19399 1 emacs-21.3 select(0x1, 0x8211000, 0, 0, 0xbf7fe7e8) = 1
19399 1 emacs-21.3 ioctl(0, FIONREAD, 0xbf7d1744) Err#9 EBADF
19399 1 emacs-21.3 getpid() = 19399, 7766
19399 1 emacs-21.3 kill(0x4bc7, 0x1) = 0
19399 1 emacs-21.3 read(0, 0xbf7d1748, 0xfff) = 0
""
19399 1 emacs-21.3 ioctl(0, FIONREAD, 0xbf7d174c) Err#9 EBADF
19399 1 emacs-21.3 getpid() = 19399, 7766
19399 1 emacs-21.3 kill(0x4bc7, 0x1) = 0
19399 1 emacs-21.3 read(0, 0xbf7d1750, 0xfff) = 0
""
19399 1 emacs-21.3 select(0x1, 0x8211000, 0, 0, 0xbf7fe7e8) = 1
19399 1 emacs-21.3 ioctl(0, FIONREAD, 0xbf7d1744) Err#9 EBADF
19399 1 emacs-21.3 getpid() = 19399, 7766
19399 1 emacs-21.3 kill(0x4bc7, 0x1) = 0
19399 1 emacs-21.3 read(0, 0xbf7d1748, 0xfff) = 0
""
19399 1 emacs-21.3 ioctl(0, FIONREAD, 0xbf7d174c) Err#9 EBADF
19399 1 emacs-21.3 getpid() = 19399, 7766
19399 1 emacs-21.3 kill(0x4bc7, 0x1) = 0
19399 1 emacs-21.3 read(0, 0xbf7d1750, 0xfff) = 0
""
And so on ad infinitum. Note that file descriptor #0 has been closed:
# fstat -p 19399
USER CMD PID FD MOUNT INUM MODE SZ|DV R/W
zzz emacs-21.3 19399 wd /net/u 6552785 drwx------ 8192 r
zzz emacs-21.3 19399 0 - - none -
zzz emacs-21.3 19399 1 - - none -
zzz emacs-21.3 19399 2 - - none -
And here's the FD list:
(gdb) x/32 0x8211000
0x8211000: 0x00000001 0x00000000 0x00000000 0x00000000
0x8211010: 0x00000000 0x00000000 0x00000000 0x00000000
0x8211020: 0x1821cc34 0x00000000 0x00000000 0x00000000
0x8211030: 0x00000000 0x00000000 0x00000000 0x00000000
0x8211040: 0x00000001 0x00000000 0x00000000 0x00000000
0x8211050: 0x00000000 0x00000000 0x00000000 0x00000000
0x8211060: 0x00000000 0x00000000 0x00000000 0x00000000
0x8211070: 0x00000000 0x00000000 0x00000000 0x00000000
The version of lsof we have on this box seems to not fully understand the broken file descriptors:
root@panix2 ~: # lsof-NetBSD-i386-5.0_BETA -p 19399
lsof-NetBSD-i386-5.0_BETA: WARNING: compiled for NetBSD release 5.0_BETA; this is 5.0.1.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
emacs-21. 19399 zzz cwd VDIR 11,3 8192 6552785 /net/u/1/k/zzz/News
emacs-21. 19399 zzz txt VREG 142,0 4561480 638894 /usr/local/bin/emacs-21.3
emacs-21. 19399 zzz txt VREG 142,0 1120316 249256 /lib/libc.so.12.164
emacs-21. 19399 zzz txt VREG 142,0 125014 249277 /lib/libm.so.0.6
emacs-21. 19399 zzz txt VREG 142,0 3790 249279 /lib/libm387.so.0.1
emacs-21. 19399 zzz txt VREG 142,0 12875 249268 /lib/libtermcap.so.0.6
emacs-21. 19399 zzz txt VREG 142,0 11263 636496 /usr/lib/libossaudio.so.0.0
emacs-21. 19399 zzz txt VREG 142,0 65173 635885 /libexec/ld.elf_so
emacs-21. 19399 zzz 0u unknown file system type: 0
emacs-21. 19399 zzz 1u unknown file system type: 0
emacs-21. 19399 zzz 2u unknown file system type: 0
Note that process 19399 has lost its telnetd or sshd and has only a controlling shell which is parented by init:
# pstree -p 19399
-+= 00000 root [system]
\-+= 00001 root init
\-+= 07766 zzz -tcsh (tcsh-6.13.00)
\--= 19399 zzz emacs (emacs-21.3)
Here's what I believe the scenario to be - when a user gets disconnected abnormally from an ssh or telnet session, the process should receive a HUP signal. Perhaps select(2) or poll(2) are sleeping waiting on input at the time, and something goes wrong. But the HUP does not get processed properly, and the process continues with its select/read loop, and assumes select is sleeping for it to wait on input.
However, select keeps returning error value 1, saying that one FD is ready to read, even though the FD supplied to select(2) was invalid. The process tries to read, gets zero data available (that doesn't sound right either, shouldn't read(2) return EBADF here?), and goes back to select(2) to try again. Since the process expected select(2) to sleep until I/O was available, and select(2) is now returning immediately, the process goes into a tight loop and hogs the CPU.
Although it's clear that emacs in this case has a chance to see something's wrong (note the ioctl call that returns EBADF), I don't think the app is really at fault, since as previously stated this happens to multiple applications and they all exhibit the same symptoms.
We have also seen this with the poll(2) syscall.
>How-To-Repeat:
run a multi-user system with many shell users using interactive programs like emacs, mutt, elm, pine, trn, and nn.
wait for some of them to get accidentally disconnected.
eventually, this will happen. we usually see it once every few days.
>Fix:
have select return EBADF when it is given an invalid or closed FD in its list.
read(2) should also return EBADF when it is given an invalid or closed FD.
>Release-Note:
>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: eravin@panix.com
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/42724: select(2) and poll(2) can return non-error status on bad file descriptors
Date: Thu, 6 Jul 2023 11:56:54 +0000
This is a multi-part message in MIME format.
--=_3LM3Qr5snjR6jwWdKR2eeC80vJ4rOpBA
> Date: Wed, 3 Feb 2010 00:07:00 +0000 (UTC)
> From: Ed Ravin <eravin@panix.com>
>
> we repeatedly see programs like emacs, mutt, elm, pine, trn, and nn
> go into infinite loops polling for input when the end user has lost
> their telnet or ssh session.
>
> Here's a sample ktrace:
> 19399 1 emacs-21.3 select(0x1, 0x8211000, 0, 0, 0xbf7fe7e8) = 1
> 19399 1 emacs-21.3 ioctl(0, FIONREAD, 0xbf7d1744) Err#9 EBADF
> 19399 1 emacs-21.3 getpid() = 19399, 7766
> 19399 1 emacs-21.3 kill(0x4bc7, 0x1) = 0
> 19399 1 emacs-21.3 read(0, 0xbf7d1748, 0xfff) = 0
> ""
I realize a long time has elapsed, but are you still seeing this?
It is somewhat alarming to see this in so many applications, but the
symptoms you describe sound like application bugs -- unless perhaps it
is in a library inside NetBSD, but I'm not sure what library that
would be. The only obvious candidates I can think of -- libcurses,
libedit, and libterminfo or the now-deleted libterm -- don't call
select and never have, as far as I can tell.
First, other than the issue I discovered in
https://gnats.NetBSD.org/57504, select(2) and poll(2) do return EBADF
when the file descriptor is _not open_.
However, in the case of terminal hangup, the file descriptor is still
open. It doesn't get closed until you close it with close(2).
NetBSD 10, macOS 13.4.1 (Darwin 22.5.0), and Linux 4.15 all appear to
behave the same way in select and read on a file descriptor for a
terminal after hangup:
- select returns readable, as in EOF.
- read returns 0, as in EOF.
I don't think there's any other reasonable choice of behaviour here:
- The select(2) `exceptfds' set, if I understand correctly, is
reserved for obscure things like TCP out-of-band data, like
POLLRDBAND or POLLWRBAND, not for errors/hangup, so it wouldn't be
appropriate.
- I see no reason for read(2) to behave any differently from hitting
the end of a regular file or a pipe; EPIPE would be wrong because
that's to alert a _writer_ that the reader won't be consuming any
more, since (other than by SIGPIPE) the writer would otherwise be
none the wiser about it.
poll behaves differently on each system:
- NetBSD: POLLOUT=POLLWRNORM
- macOS: (POLLOUT=POLLWRNORM) | POLLWRBAND
- Linux: POLLHUP | POLLOUT | POLLWRNORM
It seems to me poll ought to return POLLHUP here, but that wouldn't
explain the select issue anyway.
The one part that I'm puzzled about here is ioctl(FIONREAD) failing
with EBADF. I tried to reproduce that with a test program and failed:
ioctl(FIONREAD) succeeds and yields 0 on a NetBSD 10 kernel. Things
may have changed in the past decade, of course. Linux 4.15 fails, but
with EIO, not with EBADF.
$ ./ptyselect; cat ptyselect.out
pty 4 = /dev/pts/145
child = 15669
status = 0x800 exited status 8
ptyselect: ioctl(FIONREAD): 0
ptyselect: read returned eof
$ ./ptypoll; cat ptypoll.out
pty 4 = /dev/pts/145
child = 1955
status = 0x800 exited status 8
ptypoll: revents = 0x4
ptypoll: POLLOUT
ptypoll: POLLWRNORM
ptypoll: ioctl(FIONREAD): 0
ptypoll: read returned eof
--=_3LM3Qr5snjR6jwWdKR2eeC80vJ4rOpBA
Content-Type: text/plain; charset="ISO-8859-1"; name="ptyselect"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="ptyselect.c"
#include <sys/ioctl.h>
#include <sys/select.h>
#include <sys/wait.h>
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <limits.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <termios.h>
#include <unistd.h>
#ifdef __linux__
# include <pty.h>
#else
# include <util.h>
#endif
static int
child(void)
{
int fd;
int flags;
fd_set readfds;
int nready;
int nreadable;
char b;
ssize_t nread;
/*
* Ignore terminal hangup signal so we see how select and read
* behave after it has happened.
*/
if (signal(SIGHUP, SIG_IGN) =3D=3D SIG_ERR)
err(1, "signal(SIGHUP, SIG_IGN)");
/*
* Create a file to notify the parent that we are ready for
* the terminal to be hung up, and redirect stderr to it.
*/
if ((fd =3D open("ptyselect.out", O_WRONLY|O_CREAT|O_TRUNC, 0644)) =3D=3D =
-1)
err(2, "open");
if (dup2(fd, STDERR_FILENO) =3D=3D -1)
err(3, "dup2(%d, STDERR_FILENO)", fd);
/*
* Test selecting and reading from stdin, which is the
* terminal the parent created and will hang up shortly.
*/
fd =3D STDIN_FILENO;
FD_ZERO(&readfds);
FD_SET(fd, &readfds);
nready =3D select(fd + 1, &readfds, NULL, NULL, /*forever*/NULL);
if (nready =3D=3D -1)
err(4, "select");
if (nready =3D=3D 0)
errx(5, "select returned 0");
if (nready > 1)
warnx("select returned %d", nready);
if (!FD_ISSET(fd, &readfds))
errx(6, "select returned nonsense");
if (ioctl(fd, FIONREAD, &nreadable) =3D=3D -1)
warn("ioctl(FIONREAD)");
else
warnx("ioctl(FIONREAD): %d", nreadable);
nread =3D read(fd, &b, 1);
if (nread =3D=3D -1)
err(7, "read");
if (nread =3D=3D 0)
errx(8, "read returned eof");
warnx("b =3D 0x%hhx", b);
return 9;
}
int
main(void)
{
int pty;
char ptyname[PATH_MAX];
struct termios t;
struct winsize w;
pid_t pid;
int fd;
int status;
if (unlink("ptyselect.out") =3D=3D -1 && errno !=3D ENOENT)
err(1, "unlink(\"ptyselect.out\")");
pid =3D forkpty(&pty, ptyname, &t, &w);
switch (pid) {
case -1: /* error */
err(1, "forkpty");
case 0: /* child */
exit(child());
default: /* parent */
break;
}
printf("pty %d =3D %s\n", pty, ptyname);
printf("child =3D %ld\n", (long)pid);
while ((fd =3D open("ptyselect.out", O_RDONLY)) =3D=3D -1)
continue;
if (close(fd) =3D=3D -1)
warn("close");
if (close(pty) =3D=3D -1)
warn("close");
if (wait(&status) =3D=3D -1)
err(1, "wait");
printf("status =3D 0x%x", status);
if (WIFEXITED(status)) {
printf(" exited status %d", WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
printf(" signalled %d%s", WTERMSIG(status),
WCOREDUMP(status) ? " (core dumped)" : "");
} else if (WIFSTOPPED(status)) {
printf(" stopped %d", WSTOPSIG(status));
}
printf("\n");
fflush(stdout);
return ferror(stdout);
}
--=_3LM3Qr5snjR6jwWdKR2eeC80vJ4rOpBA
Content-Type: text/plain; charset="ISO-8859-1"; name="ptypoll"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="ptypoll.c"
#include <sys/ioctl.h>
#include <sys/wait.h>
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <limits.h>
#include <poll.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <termios.h>
#include <unistd.h>
#ifdef __linux__
# include <pty.h>
#else
# include <util.h>
#endif
static int
child(void)
{
int fd;
int flags;
struct pollfd pfd;
int nready;
int nreadable;
char b;
ssize_t nread;
/*
* Ignore terminal hangup signal so we see how select and read
* behave after it has happened.
*/
if (signal(SIGHUP, SIG_IGN) =3D=3D SIG_ERR)
err(1, "signal(SIGHUP, SIG_IGN)");
/*
* Create a file to notify the parent that we are ready for
* the terminal to be hung up, and redirect stderr to it.
*/
if ((fd =3D open("ptypoll.out", O_WRONLY|O_CREAT|O_TRUNC, 0644)) =3D=3D -1)
err(2, "open");
if (dup2(fd, STDERR_FILENO) =3D=3D -1)
err(3, "dup2(%d, STDERR_FILENO)", fd);
/*
* Test polling and reading from stdin, which is the terminal
* the parent created and will hang up shortly.
*/
fd =3D STDIN_FILENO;
pfd =3D (struct pollfd){ .fd =3D fd, .events =3D /*all*/-1 };
nready =3D poll(&pfd, 1, /*forever*/-1);
if (nready =3D=3D -1)
err(4, "poll");
if (nready =3D=3D 0)
errx(5, "poll returned 0");
if (nready > 1)
warnx("poll returned %d", nready);
warnx("revents =3D 0x%x", pfd.revents);
if (pfd.revents & POLLERR)
warnx("POLLERR");
if (pfd.revents & POLLHUP)
warnx("POLLHUP");
if (pfd.revents & POLLIN)
warnx("POLLIN");
if (pfd.revents & POLLNVAL)
warnx("POLLNVAL");
if (pfd.revents & POLLOUT)
warnx("POLLOUT");
if (pfd.revents & POLLPRI)
warnx("POLLPRI");
if (pfd.revents & POLLRDBAND)
warnx("POLLRDBAND");
if (pfd.revents & POLLRDNORM)
warnx("POLLRDNORM");
if (pfd.revents & POLLWRBAND)
warnx("POLLWRBAND");
if (pfd.revents & POLLWRNORM)
warnx("POLLWRNORM");
if (ioctl(fd, FIONREAD, &nreadable) =3D=3D -1)
warn("ioctl(FIONREAD)");
else
warnx("ioctl(FIONREAD): %d", nreadable);
nread =3D read(fd, &b, 1);
if (nread =3D=3D -1)
err(7, "read");
if (nread =3D=3D 0)
errx(8, "read returned eof");
warnx("b =3D 0x%hhx", b);
return 9;
}
int
main(void)
{
int pty;
char ptyname[PATH_MAX];
struct termios t;
struct winsize w;
pid_t pid;
int fd;
int status;
/*
* Make sure the output file isn't there so we can wait for
* child startup by probing for its existence.
*/
if (unlink("ptypoll.out") =3D=3D -1 && errno !=3D ENOENT)
err(1, "unlink(\"ptypoll.out\")");
pid =3D forkpty(&pty, ptyname, &t, &w);
switch (pid) {
case -1: /* error */
err(1, "forkpty");
case 0: /* child */
exit(child());
default: /* parent */
break;
}
printf("pty %d =3D %s\n", pty, ptyname);
printf("child =3D %ld\n", (long)pid);
/*
* Wait for the child to have begun ignoring SIGHUP.
*/
while ((fd =3D open("ptypoll.out", O_RDONLY)) =3D=3D -1)
continue;
if (close(fd) =3D=3D -1)
warn("close");
/*
* Hang up the terminal.
*/
if (close(pty) =3D=3D -1)
warn("close");
/*
* Wait for the child and print its termination status.
*/
if (wait(&status) =3D=3D -1)
err(1, "wait");
printf("status =3D 0x%x", status);
if (WIFEXITED(status)) {
printf(" exited status %d", WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
printf(" signalled %d%s", WTERMSIG(status),
WCOREDUMP(status) ? " (core dumped)" : "");
} else if (WIFSTOPPED(status)) {
printf(" stopped %d", WSTOPSIG(status));
}
printf("\n");
fflush(stdout);
return ferror(stdout);
}
--=_3LM3Qr5snjR6jwWdKR2eeC80vJ4rOpBA--
State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Thu, 06 Jul 2023 12:02:35 +0000
State-Changed-Why:
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.