NetBSD Problem Report #58916

From www@netbsd.org  Wed Dec 18 14:52:02 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 8A8841A9238
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 18 Dec 2024 14:52:02 +0000 (UTC)
Message-Id: <20241218145201.81F491A923A@mollari.NetBSD.org>
Date: Wed, 18 Dec 2024 14:52:01 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: timerfd(2) claims ready for write
X-Send-Pr-Version: www-1.0

>Number:         58916
>Category:       kern
>Synopsis:       timerfd(2) claims ready for write
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Dec 18 14:55:00 +0000 2024
>Last-Modified:  Thu Dec 19 17:45:01 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10
>Organization:
The TimerFD Writation
>Environment:
>Description:
For a timerfd(2), select(2) returns writable if asked and poll(2) returns POLLOUT|POLLWRNORM if asked.  (Also POLLRDNORM if asked.)

In contrast, in Linux, select(2) never returns writable and poll(2) never returns POLLOUT|POLLWRNORM -- or, for that matter, anything other than POLLIN.

Writing to a timerfd fails with EOPNOTSUPP, so it's not useful to claim writable.  This doesn't appear to be a POSIX interface, so it looks like Linux is the `spec' here if I haven't missed anything obvious.
>How-To-Repeat:
run Python 3.13.1 test suite

 ======================================================================
 FAIL: test_timerfd_ns_select (test.test_os.TimerfdTests.test_timerfd_ns_select)
 ----------------------------------------------------------------------
 Traceback (most recent call last):
   File "/scratch/lang/python313/work/Python-3.13.1/Lib/test/test_os.py", line 4436, in test_timerfd_ns_select
     self.assertEqual((rfd, wfd, xfd), ([], [], []))
     ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 AssertionError: Tuples differ: ([], [3], []) != ([], [], [])

 First differing element 1:
 [3]
 []

 - ([], [3], [])
 ?       -

 + ([], [], [])
>Fix:
Yes, please!

>Audit-Trail:
From: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/58916: timerfd(2) claims ready for write
Date: Wed, 18 Dec 2024 16:49:02 +0100

 --8vIkFKi2bneNJ0ZW
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Wed, Dec 18, 2024 at 02:55:00PM +0000, campbell+netbsd@mumble.net wrote:
 > >Number:         58916
 > >Category:       kern
 > >Synopsis:       timerfd(2) claims ready for write

 This seems to fix the two tests for me. Ok to commit?
  Thomas

 --8vIkFKi2bneNJ0ZW
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="poll.diff"

 Index: sys_timerfd.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/sys_timerfd.c,v
 retrieving revision 1.8
 diff -u -r1.8 sys_timerfd.c
 --- sys_timerfd.c	17 Feb 2022 16:28:29 -0000	1.8
 +++ sys_timerfd.c	18 Dec 2024 15:48:37 -0000
 @@ -337,7 +337,7 @@
  timerfd_fop_poll(file_t * const fp, int const events)
  {
  	struct timerfd * const tfd = fp->f_timerfd;
 -	int revents = events & (POLLOUT | POLLWRNORM);
 +	int revents = 0;

  	if (events & (POLLIN | POLLRDNORM)) {
  		itimer_lock();

 --8vIkFKi2bneNJ0ZW--

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/58916 CVS commit: src
Date: Wed, 18 Dec 2024 16:01:28 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Wed Dec 18 16:01:28 UTC 2024

 Modified Files:
 	src/sys/kern: sys_timerfd.c
 	src/tests/lib/libc/sys: t_timerfd.c

 Log Message:
 timerfd(2): Do not claim writable.

 Writes will fail with EOPNOTSUPP.

 PR kern/58916: timerfd(2) claims ready for write


 To generate a diff of this commit:
 cvs rdiff -u -r1.8 -r1.9 src/sys/kern/sys_timerfd.c
 cvs rdiff -u -r1.5 -r1.6 src/tests/lib/libc/sys/t_timerfd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Thomas Klausner <wiz@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58916: timerfd(2) claims ready for write
Date: Wed, 18 Dec 2024 16:23:22 +0000

 > Date: Wed, 18 Dec 2024 16:49:02 +0100
 > From: Thomas Klausner <wiz@NetBSD.org>
 >  
 > This seems to fix the two tests for me. Ok to commit?

 Oops -- I already committed.  The other part I committed is just as
 important: updating the tests.

 (And for the other issues we need automatic tests added!  I went ahead
 with the itimespecfix patch for PR 58914 without adding tests because,
 well, the tests would crash our testbed, but we do need to add those
 tests.)

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/58916: timerfd(2) claims ready for write
Date: Thu, 19 Dec 2024 08:34:07 +0700

     Date:        Wed, 18 Dec 2024 14:55:00 +0000 (UTC)
     From:        campbell+netbsd@mumble.net
     Message-ID:  <20241218145500.8B58E1A923B@mollari.NetBSD.org>

   | For a timerfd(2), select(2) returns writable if asked and poll(2)
   | returns POLLOUT|POLLWRNORM if asked.  (Also POLLRDNORM if asked.)\

 Seems reasonable to me.

   | In contrast, in Linux, select(2) never returns writable

 I don't know what Linux does on writes, is it also:

   | Writing to a timerfd fails with EOPNOTSUPP,

 If so, then I'd suggest Linux's poll/select has a bug.

   | so it's not useful to claim writable.

 Someone fails to understand the purpose of poll/select.   It isn't
 to inform the process that a read/write will succeed, it is to inform
 the process that a read/write will not hang.   There are too many possible
 results for select (which just returns 1 bit) in particular to be able
 to differentiate between different possible results (poll() possibly
 could, but doesn't really) - so there is no attempt to do that.

   | This doesn't appear to be a POSIX interface,

 Timerfd isn't (devices in general aren't) but select() and poll() are.
 Further, they're generic, while the actual implementation depends upon
 the underlying device, the functions themselves do not.

 What POSIX says of select() is:
     to see whether some of their descriptors are ready for
     reading, are ready for writing, or...

 That is, there won't be an EAGAIN (or similar) because the underlying
 device is not ready yet, nor would the call hang if not in non-blocking
 mode.   Whether there might be an error, EOF, or other return (or how
 much data might be successfully transferred) isn't conveyed by these
 interfaces.

 kre

 ps: not that I really care what timerfd devices consider ready or not.


From: Taylor R Campbell <campbell@mumble.net>
To: Robert Elz <kre@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58916: timerfd(2) claims ready for write
Date: Thu, 19 Dec 2024 02:39:17 +0000

 > Date: Thu, 19 Dec 2024 08:34:07 +0700
 > From: Robert Elz <kre@munnari.OZ.AU>
 > 
 >     Date:        Wed, 18 Dec 2024 14:55:00 +0000 (UTC)
 >     From:        campbell+netbsd@mumble.net
 >     Message-ID:  <20241218145500.8B58E1A923B@mollari.NetBSD.org>
 > 
 >   | For a timerfd(2), select(2) returns writable if asked and poll(2)
 >   | returns POLLOUT|POLLWRNORM if asked.  (Also POLLRDNORM if asked.)\
 > 
 > Seems reasonable to me.

 select(2) returns nonwritable if asked about the reader side of a
 pipe (and vice versa, nonreadable if asked about the writer side of a
 pipe).  Is that a bug?

 >   | In contrast, in Linux, select(2) never returns writable
 > 
 > I don't know what Linux does on writes, is it also:
 > 
 >   | Writing to a timerfd fails with EOPNOTSUPP,
 > 
 > If so, then I'd suggest Linux's poll/select has a bug.

 Correction:

 - On Linux, writing to a timerfd fails with EINVAL.
 - On NetBSD, writing to a timerfd fails with EBADF (same as the reader
   side of a pipe).

 >   | so it's not useful to claim writable.
 > 
 > Someone fails to understand the purpose of poll/select.   It isn't
 > to inform the process that a read/write will succeed, it is to inform
 > the process that a read/write will not hang.   There are too many possible
 > results for select (which just returns 1 bit) in particular to be able
 > to differentiate between different possible results (poll() possibly
 > could, but doesn't really) - so there is no attempt to do that.

 OK, but:

 (a) Probably need to audit all the struct fileops and struct cdevsw
     *poll functions in tree if you want to enforce this -- and you'll
     have to change a lot of existing cases, I expect.

 (b) Currently Linux is the `spec' for timerfd and we're deviating from
     that behaviour.

 (c) timerfd-on-NetBSD is the only case of {pipe, timerfd, kqueue,
     ...}-on-{NetBSD, Linux} that I've found that behaves like this,
     where writes are nonsensical and immediately rejected but select
     returns writable.  (But I haven't done an exhaustive search.)

 (This came up in Python's automatic tests of select on timerfds,
 details in <https://gnats.netbsd.org/58914>.  Happy to revert the
 change.  Mainly I just want to make sure these timer- and timerfd-
 related paths are adequately exercised with clear automatic tests that
 cover enough to avoid gratuitous application incompatibility and
 kernel crashes.)

From: Robert Elz <kre@munnari.OZ.AU>
To: Taylor R Campbell <campbell@mumble.net>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58916: timerfd(2) claims ready for write
Date: Thu, 19 Dec 2024 19:49:40 +0700

     Date:        Thu, 19 Dec 2024 02:39:17 +0000
     From:        Taylor R Campbell <campbell@mumble.net>
     Message-ID:  <20241219023918.2023460BB5@jupiter.mumble.net>

   | select(2) returns nonwritable if asked about the reader side of a
   | pipe (and vice versa, nonreadable if asked about the writer side of a
   | pipe).  Is that a bug?

 Perhaps - I assume that also applies to any fd open O_WRONLY or O_RDONLY
 with select/poll used for the other direction.   In practice I'm not
 sure it matters, as real code just doesn't do things like this, code
 doesn't ask when it can write to a fd which doesn't support write()
 there's no point.

   | (This came up in Python's automatic tests of select on timerfds,

 Yes, I saw that -- the number of issues made that look more like
 an "is this linux" test, that kind of thing is also likely the
 only place where this difference would be noted, and even then I
 don't see the point, after all we could extend the timerfd interface
 to make writing have some meaning (no, I have no idea what that
 might be) - unless they actually intend writing to it in real
 applications, testing that it must fail (at all, just just via
 select/poll) seems like a pointless waste of time, it isn't as
 if this is a specific test of just that interface even.

   | Happy to revert the change.

 No need, like I said initially, I don't really care what the
 behaviour is here.   I just like to make sure we don't fall
 into the trap of "select() (or poll()) said it was writable,
 so a write must succeed!" mentality.

 kre

From: Jason Thorpe <thorpej@me.com>
To: Taylor Campbell <campbell@mumble.net>
Cc: Robert Elz <kre@NetBSD.org>,
 "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>,
 "netbsd-bugs@netbsd.org" <netbsd-bugs@NetBSD.org>
Subject: Re: kern/58916: timerfd(2) claims ready for write
Date: Thu, 19 Dec 2024 05:15:05 -0800

 > On Dec 18, 2024, at 6:39=E2=80=AFPM, Taylor R Campbell =
 <campbell@mumble.net> wrote:
 >=20
 > (This came up in Python's automatic tests of select on timerfds,
 > details in <https://gnats.netbsd.org/58914>.  Happy to revert the
 > change.  Mainly I just want to make sure these timer- and timerfd-
 > related paths are adequately exercised with clear automatic tests that
 > cover enough to avoid gratuitous application incompatibility and
 > kernel crashes.)

 In this case, I think it=E2=80=99s better to file a bug against the =
 Python test suite.

 -- thorpej

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Jason Thorpe <thorpej@me.com>
Cc: Robert Elz <kre@NetBSD.org>,
	gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58916: timerfd(2) claims ready for write
Date: Thu, 19 Dec 2024 13:57:15 +0000

 > Date: Thu, 19 Dec 2024 05:15:05 -0800
 > From: Jason Thorpe <thorpej@me.com>
 > 
 > In this case, I think it's better to file a bug against the Python
 > test suite.

 We can do that but I want to make sure we have a clear story for what
 timerfd(2) _should_ be doing so we have a compelling argument that it
 is a Python bug.

 Right now there's another part of the Python test suite that crashes
 the kernel, and on closer inspection -- in the course of writing tests
 to cover that case -- I'm not actually sure what the right thing is
 there (PR 58914).

 So I'm not about to go claiming bugs in other projects' code until I
 get this egg off our face!

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Robert Elz <kre@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58916: timerfd(2) claims ready for write
Date: Thu, 19 Dec 2024 14:15:01 +0000

 > Date: Thu, 19 Dec 2024 19:49:40 +0700
 > From: Robert Elz <kre@munnari.OZ.AU>
 > 
 >     Date:        Thu, 19 Dec 2024 02:39:17 +0000
 >     From:        Taylor R Campbell <campbell@mumble.net>
 >     Message-ID:  <20241219023918.2023460BB5@jupiter.mumble.net>
 > 
 >   | select(2) returns nonwritable if asked about the reader side of a
 >   | pipe (and vice versa, nonreadable if asked about the writer side of a
 >   | pipe).  Is that a bug?
 > 
 > Perhaps - I assume that also applies to any fd open O_WRONLY or O_RDONLY
 > with select/poll used for the other direction.   In practice I'm not
 > sure it matters, as real code just doesn't do things like this, code
 > doesn't ask when it can write to a fd which doesn't support write()
 > there's no point.

 It's not limited to O_WRONLY and O_RDONLY, or the read-only and
 write-only sides of a pipe.

 I tested with a disconnected stream-type socket and got the same
 result: read and write don't hang (they fail immediately with
 ENOTCONN, rather than EBADF), but select doesn't report them readable
 or writable.

 I checked some input-only devices too like wskbd and wsmouse, and they
 too never return POLLOUT|POLLWRNORM even though write would fail
 immediately with ENODEV.

 I think if we want select and poll to guarantee claiming
 readable/writable if a read or write would return immediately, we have
 a lot of work to do.

 Now it may be worthwhile to audit device poll routines for various
 types of bugs (like returning E* instead of POLL*), because they are
 often not very carefully written.

 But I don't think the particular behaviour you're asking for is useful
 when the underlying file object guarantees that read or write will
 _never_ work because the operation is nonsensical.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/58916 CVS commit: src/tests/lib/libc/sys
Date: Thu, 19 Dec 2024 14:42:32 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Thu Dec 19 14:42:32 UTC 2024

 Modified Files:
 	src/tests/lib/libc/sys: t_timerfd.c

 Log Message:
 t_timerfd: Fix select/poll tests and add kevent EVFILT_WRITE test.

 PR kern/58916: timerfd(2) claims ready for write


 To generate a diff of this commit:
 cvs rdiff -u -r1.6 -r1.7 src/tests/lib/libc/sys/t_timerfd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Robert Elz <kre@munnari.OZ.AU>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: Jason Thorpe <thorpej@me.com>, gnats-bugs@NetBSD.org,
        netbsd-bugs@NetBSD.org
Subject: Re: kern/58916: timerfd(2) claims ready for write
Date: Fri, 20 Dec 2024 00:42:10 +0700

     Date:        Thu, 19 Dec 2024 13:57:15 +0000
     From:        Taylor R Campbell <riastradh@NetBSD.org>
     Message-ID:  <20241219135721.0AF69855E1@mail.netbsd.org>

   | We can do that but I want to make sure we have a clear story for what
   | timerfd(2) _should_ be doing so we have a compelling argument that it
   | is a Python bug.

 In this case that should be easy ... since no-one apparently supports
 writing to a timerfd, there is no reason to test, in any way, that
 doing so fails, as no existing (python or other) code can possibly
 have a valid reason for attempting a write to a timerfd, but it is
 possible that some implementation might add a timerfd extension which
 would involve writing to one of them.

 Note that this doesn't mean that a kernel specific test for how
 timerfds work shouldn't test for attempting a write, and checking that
 the (system dependent) error is returned, if only so that no local
 change inadvertently alters how things work (eg: whether it is even
 possible to open a timerfd for writing, and if it is, what happens
 when a write is attempted) - but there is no reason at all for this
 to be connected to anything intended to be portable across systems.

   | Right now there's another part of the Python test suite that crashes
   | the kernel, and on closer inspection -- in the course of writing tests
   | to cover that case -- I'm not actually sure what the right thing is
   | there (PR 58914).

 Yes, that one looks like it ought to be fixed, and your patch looked like
 it was reasonable to me.   But I have never used a timerfd, so have no
 real way to know for sure what should happen.

 kre

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.