NetBSD Problem Report #46463
From www@NetBSD.org Thu May 17 23:16:11 2012
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
by www.NetBSD.org (Postfix) with ESMTP id 90F7E63B89C
for <gnats-bugs@gnats.NetBSD.org>; Thu, 17 May 2012 23:16:11 +0000 (UTC)
Message-Id: <20120517231610.9804C63B882@www.NetBSD.org>
Date: Thu, 17 May 2012 23:16:10 +0000 (UTC)
From: rhansen@bbn.com
Reply-To: rhansen@bbn.com
To: gnats-bugs@NetBSD.org
Subject: netbsd-6: panic and filesystem corruption running tmux
X-Send-Pr-Version: www-1.0
>Number: 46463
>Category: kern
>Synopsis: netbsd-6: panic and filesystem corruption running tmux
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: martin
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu May 17 23:20:00 +0000 2012
>Closed-Date: Tue Jun 26 19:49:44 +0000 2012
>Last-Modified: Sat Nov 24 21:45:02 +0000 2012
>Originator: Richard Hansen
>Release: 6.0_BETA
>Organization:
>Environment:
NetBSD 6.0_BETA NetBSD 6.0_BETA (GENERIC) i386
>Description:
On i386 GENERIC netbsd-6 (nightly build from around 2012-04-28), I get a panic and significant filesystem corruption (ffs with log, noatime) if I do the following:
1. ssh to the netbsd-6 machine
2. install the tmux-1.4nb1 pkgsrc package
3. run 'tmux new-session`
4. ssh to the same machine from a different terminal
5. run 'tmux list-sessions 0<&-'
For some reason, closing stdin in step #5 above is required to trigger the panic.
Using a lightly modified and older (2012-03-28) snapshot, I get the following backtrace in the close() syscall():
#0 maybe_dump (howto=260) at /usr/src/sys/arch/i386/i386/machdep.c:880
s = 0
#1 0xc071c8ca in cpu_reboot (howto=260, bootstr=0x0)
at /usr/src/sys/arch/i386/i386/machdep.c:899
syncdone = false
s = 0
#2 0xc09a79c4 in vpanic (fmt=0xc0deb79c "kernel %sassertion \"%s\" failed: file \"%s\", line %d ",
ap=0xddcd8b58 "�\267300\024\270300 \267300v\005")
at /usr/src/sys/kern/subr_prf.c:308
cii = 0
ci = 0x0
oci = 0x0
bootopt = 260
scratchstr = "kernel diagnostic assertion \"mutex_owned(&fdp->fd_lock)\" failed: file \"/usr/src/sys/kern/kern_event.c\", line 1398 ", '\000' <repeats 97 times>
#3 0xc0bbb8c3 in kern_assert (fmt=0xc0deb79c "kernel %sassertion \"%s\" failed: file \"%s\", line %d ")
at /usr/src/sys/lib/libkern/kern_assert.c:50
ap = 0xddcd8b58 "�\267300\024\270300 \267300v\005"
#4 0xc0679e3c in kqueue_doclose (kq=0xc6487a58, list=0xc63b194c, fd=0)
at /usr/src/sys/kern/kern_event.c:1398
kn = 0xc0c0
fdp = 0xc64b1cc0
#5 0xc0679f00 in kqueue_close (fp=0xc5ecc200) at /usr/src/sys/kern/kern_event.c:1430
kq = 0xc6487a58
fdp = 0xc63b1940
ff = 0xc63b1940
i = 0
#6 0xc06737fb in closef (fp=0xc5ecc200) at /usr/src/sys/kern/kern_descrip.c:824
lf = {l_start = -4184506505861690440, l_len = -4160666348713672704, l_pid = 0, l_type = 0, l_whence = 0}
error = 0
#7 0xc0673411 in fd_close (fd=13) at /usr/src/sys/kern/kern_descrip.c:709
lf = {l_start = -4629759700944188308, l_len = 28, l_pid = -573731704, l_type = -29420, l_whence = -8755}
fdp = 0xc63b1940
ff = 0xc5eda580
fp = 0xc5ecc200
p = 0xc5dea370
l = 0xc6425800
refcnt = 0
#8 0xc09c6e5d in sys_close (l=0xc6425800, uap=0xddcd8cec, retval=0xddcd8d14)
at /usr/src/sys/kern/sys_descrip.c:486
No locals.
#9 0xc09d687b in sy_call (sy=0xc0f35608, l=0xc6425800, uap=0xddcd8cec, rval=0xddcd8d14)
at /usr/src/sys/sys/syscallvar.h:61
error = 0
#10 0xc09d6c1e in syscall (frame=0xddcd8d48) at /usr/src/sys/arch/x86/x86/syscall.c:179
callp = 0xc0f35608
p = 0xc5dea370
l = 0xc6425800
error = 0
code = 6
rval = {0, 0}
rip_call = -1146338251
args = {13, 11, -1077945480, -1145308313, 64, -1077945516, -1077945524, -1077945524, -1147969380, -1077945492}
#11 0xc01005d6 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>How-To-Repeat:
1. ssh to the netbsd-6 machine
2. install the tmux-1.4nb1 pkgsrc package
3. run 'tmux new-session`
4. ssh to the same machine from a different terminal
5. run 'tmux list-sessions 0<&-'
>Fix:
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: christos@NetBSD.org
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running tmux
Date: Fri, 18 May 2012 12:06:27 +0200
On Thu, May 17, 2012 at 11:20:00PM +0000, rhansen@bbn.com wrote:
> #4 0xc0679e3c in kqueue_doclose (kq=0xc6487a58, list=0xc63b194c, fd=0)
> at /usr/src/sys/kern/kern_event.c:1398
> kn = 0xc0c0
> fdp = 0xc64b1cc0
Here fdp is kq->kq_fdp. We expect it to be locked.
> #5 0xc0679f00 in kqueue_close (fp=0xc5ecc200) at /usr/src/sys/kern/kern_event.c:1430
> kq = 0xc6487a58
> fdp = 0xc63b1940
> ff = 0xc63b1940
> i = 0
Here fdp is curlwp->l_fdp, and we lock it.
Sounds clearly wrong to me, why is curlwp involved here (and why does a kqueue
have its own filedesc_t)?
Martin
From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running
tmux
Date: Fri, 18 May 2012 08:50:10 -0600
On Thu, 17 May 2012 23:20:00 +0000 (UTC)
rhansen@bbn.com wrote:
> >How-To-Repeat:
> 1. ssh to the netbsd-6 machine
> 2. install the tmux-1.4nb1 pkgsrc package
> 3. run 'tmux new-session`
> 4. ssh to the same machine from a different terminal
> 5. run 'tmux list-sessions 0<&-'
> >Fix:
Step 2 is not needed -- tmux is in /usr/bin on netbsd-6, and replicates
the issue...
Later...
Greg Oster
From: Greg Oster <oster@cs.usask.ca>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running
tmux
Date: Fri, 18 May 2012 09:06:35 -0600
On Thu, 17 May 2012 23:20:00 +0000 (UTC)
rhansen@bbn.com wrote:
> >Number: 46463
> >Category: kern
> >Synopsis: netbsd-6: panic and filesystem corruption running
> >tmux Confidential: no
> >Severity: critical
> >Priority: high
> >Responsible: kern-bug-people
> >State: open
> >Class: sw-bug
> >Submitter-Id: net
> >Arrival-Date: Thu May 17 23:20:00 +0000 2012
> >Originator: Richard Hansen
> >Release: 6.0_BETA
> >Organization:
> >Environment:
> NetBSD 6.0_BETA NetBSD 6.0_BETA (GENERIC) i386
> >Description:
> On i386 GENERIC netbsd-6 (nightly build from around 2012-04-28), I
> get a panic and significant filesystem corruption (ffs with log,
> noatime) if I do the following:
>
> 1. ssh to the netbsd-6 machine
> 2. install the tmux-1.4nb1 pkgsrc package
> 3. run 'tmux new-session`
> 4. ssh to the same machine from a different terminal
> 5. run 'tmux list-sessions 0<&-'
>
> For some reason, closing stdin in step #5 above is required to
> trigger the panic.
>
> Using a lightly modified and older (2012-03-28) snapshot, I get the
> following backtrace in the close() syscall():
>
> #0 maybe_dump (howto=260)
> at /usr/src/sys/arch/i386/i386/machdep.c:880 s = 0
> #1 0xc071c8ca in cpu_reboot (howto=260, bootstr=0x0)
> at /usr/src/sys/arch/i386/i386/machdep.c:899
> syncdone = false
> s = 0
> #2 0xc09a79c4 in vpanic (fmt=0xc0deb79c "kernel %sassertion \"%s\"
> failed: file \"%s\", line %d ", ap=0xddcd8b58
> "�\267300\024\270300 \267300v\005")
> at /usr/src/sys/kern/subr_prf.c:308 cii = 0
> ci = 0x0
> oci = 0x0
> bootopt = 260
> scratchstr = "kernel diagnostic assertion
> \"mutex_owned(&fdp->fd_lock)\" failed: file
> \"/usr/src/sys/kern/kern_event.c\", line 1398 ", '\000' <repeats 97
> times> #3 0xc0bbb8c3 in kern_assert (fmt=0xc0deb79c "kernel
> times> %sassertion \"%s\" failed: file \"%s\", line %d ")
> times> at /usr/src/sys/lib/libkern/kern_assert.c:50 ap = 0xddcd8b58
> times> "�\267300\024\270300 \267300v\005"
> #4 0xc0679e3c in kqueue_doclose (kq=0xc6487a58, list=0xc63b194c,
> fd=0) at /usr/src/sys/kern/kern_event.c:1398
> kn = 0xc0c0
> fdp = 0xc64b1cc0
> #5 0xc0679f00 in kqueue_close (fp=0xc5ecc200)
> at /usr/src/sys/kern/kern_event.c:1430 kq = 0xc6487a58
> fdp = 0xc63b1940
> ff = 0xc63b1940
> i = 0
> #6 0xc06737fb in closef (fp=0xc5ecc200)
> at /usr/src/sys/kern/kern_descrip.c:824 lf = {l_start =
> -4184506505861690440, l_len = -4160666348713672704, l_pid = 0, l_type
> = 0, l_whence = 0} error = 0 #7 0xc0673411 in fd_close (fd=13)
> at /usr/src/sys/kern/kern_descrip.c:709 lf = {l_start =
> -4629759700944188308, l_len = 28, l_pid = -573731704, l_type =
> -29420, l_whence = -8755} fdp = 0xc63b1940 ff = 0xc5eda580
> fp = 0xc5ecc200
> p = 0xc5dea370
> l = 0xc6425800
> refcnt = 0
> #8 0xc09c6e5d in sys_close (l=0xc6425800, uap=0xddcd8cec,
> retval=0xddcd8d14) at /usr/src/sys/kern/sys_descrip.c:486
> No locals.
> #9 0xc09d687b in sy_call (sy=0xc0f35608, l=0xc6425800,
> uap=0xddcd8cec, rval=0xddcd8d14) at /usr/src/sys/sys/syscallvar.h:61
> error = 0
> #10 0xc09d6c1e in syscall (frame=0xddcd8d48)
> at /usr/src/sys/arch/x86/x86/syscall.c:179 callp = 0xc0f35608
> p = 0xc5dea370
> l = 0xc6425800
> error = 0
> code = 6
> rval = {0, 0}
> rip_call = -1146338251
> args = {13, 11, -1077945480, -1145308313, 64, -1077945516,
> -1077945524, -1077945524, -1147969380, -1077945492} #11 0xc01005d6
> in ?? () No symbol table info available.
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> >How-To-Repeat:
> 1. ssh to the netbsd-6 machine
> 2. install the tmux-1.4nb1 pkgsrc package
> 3. run 'tmux new-session`
> 4. ssh to the same machine from a different terminal
> 5. run 'tmux list-sessions 0<&-'
> >Fix:
The following patch from Martin Husemann gets rid of the panic for me.
Any one else have comments on this patch? (we think it's right, but
hopefully someone who understands this code better will chime in...)
Later...
Greg Oster
Index: kern_event.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_event.c,v
retrieving revision 1.75
diff -c -r1.75 kern_event.c
*** kern_event.c 25 Jan 2012 00:28:35 -0000 1.75
--- kern_event.c 18 May 2012 15:03:42 -0000
***************
*** 1421,1427 ****
int i;
kq = fp->f_data;
! fdp = curlwp->l_fd;
mutex_enter(&fdp->fd_lock);
for (i = 0; i <= fdp->fd_lastkqfile; i++) {
--- 1421,1427 ----
int i;
kq = fp->f_data;
! fdp = kq->kq_fdp;
mutex_enter(&fdp->fd_lock);
for (i = 0; i <= fdp->fd_lastkqfile; i++) {
From: "Jukka Ruohonen" <jruoho@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/46463 CVS commit: src
Date: Fri, 18 May 2012 15:25:26 +0000
Module Name: src
Committed By: jruoho
Date: Fri May 18 15:25:26 UTC 2012
Modified Files:
src/distrib/sets/lists/tests: mi
src/etc/mtree: NetBSD.dist.tests
src/tests/usr.bin: Makefile
Added Files:
src/tests/usr.bin/tmux: Makefile t_tmux.sh
Log Message:
Add a test case for PR kern/46463. From Richard Hansen.
To generate a diff of this commit:
cvs rdiff -u -r1.467 -r1.468 src/distrib/sets/lists/tests/mi
cvs rdiff -u -r1.75 -r1.76 src/etc/mtree/NetBSD.dist.tests
cvs rdiff -u -r1.13 -r1.14 src/tests/usr.bin/Makefile
cvs rdiff -u -r0 -r1.1 src/tests/usr.bin/tmux/Makefile \
src/tests/usr.bin/tmux/t_tmux.sh
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Jukka Ruohonen" <jruoho@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/46463 CVS commit: src/tests/usr.bin/tmux
Date: Sat, 19 May 2012 07:30:38 +0000
Module Name: src
Committed By: jruoho
Date: Sat May 19 07:30:38 UTC 2012
Modified Files:
src/tests/usr.bin/tmux: t_tmux.sh
Log Message:
Make the test pass. It appears that this however does not reproduce the
PR kern/46463, even though the command is exactly the same.
To generate a diff of this commit:
cvs rdiff -u -r1.1 -r1.2 src/tests/usr.bin/tmux/t_tmux.sh
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Richard Hansen <rhansen@bbn.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running
tmux
Date: Tue, 22 May 2012 17:13:44 -0400
On 2012-05-18 11:10, Greg Oster wrote:
> The following patch from Martin Husemann gets rid of the panic for me.
>
> Any one else have comments on this patch? (we think it's right, but
> hopefully someone who understands this code better will chime in...)
There are other functions in src/sys/kern/kern_event.c that have the line:
fdp = curlwp->l_fd;
Are those also wrong?
Thanks,
Richard
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running tmux
Date: Wed, 23 May 2012 00:16:08 +0200
On Tue, May 22, 2012 at 09:15:04PM +0000, Richard Hansen wrote:
> There are other functions in src/sys/kern/kern_event.c that have the line:
>
> fdp = curlwp->l_fd;
>
> Are those also wrong?
I think the one in knote_fdclose() is correct, the other are probably wrong
too.
It only matters if the same kernel object is watched via kqueue from
different processes, which seems to be an uncommon operation. We need to
add test cases for it...
Martin
From: Richard Hansen <rhansen@bbn.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running
tmux
Date: Tue, 22 May 2012 17:33:50 -0400
I think I've found a way to test the same underlying bug without causing
a panic:
sh -c 'tmux new-session -d && tmux list-sessions && read'
The above should sit and wait for a line of input on stdin, but instead
/bin/sh exits with the following error:
read: arg count
(Not a very helpful error message.) If I run the above with bash 4.2.0
from pkgsrc, I get:
-bash: read: read error: 0: Resource temporarily unavailable
This error is what led me to do 'tmux list-sessions 0<&-' in the first
place. I had dismissed the issue as unrelated until I realized that
Linux doesn't have this problem (but NetBSD 5 does).
I can't reproduce the above error when stdin is something other than a tty.
This could be a completely unrelated issue. I haven't yet tried the
proposed patch to see if it makes this error go away.
-Richard
From: Richard Hansen <rhansen@bbn.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running
tmux
Date: Tue, 22 May 2012 21:12:07 -0400
On 2012-05-22 17:33, Richard Hansen wrote:
> I think I've found a way to test the same underlying bug without causing
> a panic:
>
> sh -c 'tmux new-session -d && tmux list-sessions && read'
>
> The above should sit and wait for a line of input on stdin, but instead
> /bin/sh exits with the following error:
>
> read: arg count
>
> (Not a very helpful error message.) If I run the above with bash 4.2.0
> from pkgsrc, I get:
>
> -bash: read: read error: 0: Resource temporarily unavailable
With Martin's patch the panic is gone, but I can still reproduce the
above error. Perhaps the above error is unrelated to the panic.
-Richard
Responsible-Changed-From-To: kern-bug-people->martin
Responsible-Changed-By: martin@NetBSD.org
Responsible-Changed-When: Wed, 23 May 2012 06:51:00 +0000
Responsible-Changed-Why:
Not that simple, needs closer investigation.
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running tmux
Date: Thu, 31 May 2012 12:08:22 +0200
So it turns out (subject to interpretation in detail and to be discussed
on tech-kern) that we see a side effect of two bugs here.
First, at the application level, in tmux, there is something wrong.
The client attaching to a already established session does:
(1) call kqueue() to be able to receive kevents
(2) connect via a local socket to the already running tmux instance
(3) pass stdin/stdout/stderr vis sendmsg with SCM_RIGHTS to the session mux
Now in the case that triggered this PR, tmux was invoked with stdin closed.
Not suprising, kqueue() in (1) returns 0. Later in (3), this file descriptor
is passed to the mux (which expects to receive stdin).
So, adding something like this:
while ((fd = open("/dev/null", O_RDWR)) < 3)
;
close(fd);
early in tmux would make sure this does not happen, replacing the missing
descriptors with /dev/null.
Now for the kernel part. A kqueue was intended to be a local-proc-only thing.
There is no sense passing it from one process to another. That is why they
are not inherited during fork() for example. However, and I think this is a
bug, they can be passed via SCM_RIGHTS messages.
I can reproduce the crash with this test program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <err.h>
#include <sys/event.h>
#include <sys/time.h>
#include <sys/socket.h>
#include <sys/wait.h>
static int
child_proc(int fd)
{
int kq, storage = -1;
struct cmsghdr *msg;
struct iovec iov;
struct msghdr m;
msg = malloc(CMSG_SPACE(sizeof(int)));
iov.iov_base = &storage;
iov.iov_len = sizeof(int);
m.msg_iov = &iov;
m.msg_iovlen = 1;
if (recvmsg(fd, &m, 0) == -1)
err(1, "child: could not recvmsg");
kq = *(int *)CMSG_DATA(msg);
printf("child (pid %d): received kq fd %d\n", getpid(), kq);
return 0;
}
int main(void)
{
pid_t child;
int s[2], storage, status, kq;
struct cmsghdr *msg;
struct iovec iov;
struct msghdr m;
struct kevent ev;
kq = kqueue();
if (kq < 0)
err(1, "could not create kqueue");
if (socketpair(AF_LOCAL, SOCK_STREAM, 0, s) == -1)
err(1, "could not create a socket pair");
child = fork();
if (child == 0) {
close(s[0]);
return child_proc(s[1]);
}
close(s[1]);
msg = malloc(CMSG_SPACE(sizeof(int)));
iov.iov_base = &storage;
iov.iov_len = sizeof(int);
msg->cmsg_level = SOL_SOCKET;
msg->cmsg_type = SCM_RIGHTS;
msg->cmsg_len = CMSG_LEN(sizeof(int));
m.msg_iov = &iov;
m.msg_iovlen = 1;
m.msg_name = NULL;
m.msg_namelen = 0;
m.msg_control = msg;
m.msg_controllen = CMSG_SPACE(sizeof(int));
*(int *)CMSG_DATA(msg) = kq;
EV_SET(&ev, 1, EVFILT_TIMER, EV_ADD|EV_ENABLE, 0, 1, 0);
if (kevent(kq, &ev, 1, NULL, 0, NULL) == -1)
err(1, "could not add timer event");
printf("parent (pid %d): sending kq fd %d\n", getpid(), kq);
if (sendmsg(s[0], &m, 0) == -1)
err(1, "could not sendmsg");
close(kq);
waitpid(child, &status, 0);
return 0;
}
IMHO the correct fix would be to filter kqueue descriptors out of SCM_RIGHTS
at the sending side and document this (in unp_internalize break out and
return EPERM if there is a file_t with f_type == DTYPE_KQUEUE).
Martin
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, martin@NetBSD.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, rhansen@bbn.com
Cc:
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running tmux
Date: Thu, 31 May 2012 09:18:53 -0400
On May 31, 10:10am, martin@duskware.de (Martin Husemann) wrote:
-- Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption runnin
| The following reply was made to PR kern/46463; it has been noted by GNATS.
|
| From: Martin Husemann <martin@duskware.de>
| To: gnats-bugs@NetBSD.org
| Cc:
| Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running tmux
| Date: Thu, 31 May 2012 12:08:22 +0200
|
| So it turns out (subject to interpretation in detail and to be discussed
| on tech-kern) that we see a side effect of two bugs here.
|
| First, at the application level, in tmux, there is something wrong.
| The client attaching to a already established session does:
| (1) call kqueue() to be able to receive kevents
| (2) connect via a local socket to the already running tmux instance
| (3) pass stdin/stdout/stderr vis sendmsg with SCM_RIGHTS to the session mux
|
| Now in the case that triggered this PR, tmux was invoked with stdin closed.
| Not suprising, kqueue() in (1) returns 0. Later in (3), this file descriptor
| is passed to the mux (which expects to receive stdin).
|
| So, adding something like this:
|
| while ((fd = open("/dev/null", O_RDWR)) < 3)
| ;
| close(fd);
|
| early in tmux would make sure this does not happen, replacing the missing
| descriptors with /dev/null.
|
| Now for the kernel part. A kqueue was intended to be a local-proc-only thing.
| There is no sense passing it from one process to another. That is why they
| are not inherited during fork() for example. However, and I think this is a
| bug, they can be passed via SCM_RIGHTS messages.
|
| I can reproduce the crash with this test program:
|
| #include <stdio.h>
| #include <stdlib.h>
| #include <string.h>
| #include <unistd.h>
| #include <err.h>
| #include <sys/event.h>
| #include <sys/time.h>
| #include <sys/socket.h>
| #include <sys/wait.h>
|
| static int
| child_proc(int fd)
| {
| int kq, storage = -1;
| struct cmsghdr *msg;
| struct iovec iov;
| struct msghdr m;
|
| msg = malloc(CMSG_SPACE(sizeof(int)));
| iov.iov_base = &storage;
| iov.iov_len = sizeof(int);
| m.msg_iov = &iov;
| m.msg_iovlen = 1;
|
| if (recvmsg(fd, &m, 0) == -1)
| err(1, "child: could not recvmsg");
|
| kq = *(int *)CMSG_DATA(msg);
| printf("child (pid %d): received kq fd %d\n", getpid(), kq);
|
| return 0;
| }
|
| int main(void)
| {
| pid_t child;
| int s[2], storage, status, kq;
| struct cmsghdr *msg;
| struct iovec iov;
| struct msghdr m;
| struct kevent ev;
|
| kq = kqueue();
| if (kq < 0)
| err(1, "could not create kqueue");
|
| if (socketpair(AF_LOCAL, SOCK_STREAM, 0, s) == -1)
| err(1, "could not create a socket pair");
|
|
| child = fork();
| if (child == 0) {
| close(s[0]);
| return child_proc(s[1]);
| }
| close(s[1]);
|
| msg = malloc(CMSG_SPACE(sizeof(int)));
| iov.iov_base = &storage;
| iov.iov_len = sizeof(int);
|
| msg->cmsg_level = SOL_SOCKET;
| msg->cmsg_type = SCM_RIGHTS;
| msg->cmsg_len = CMSG_LEN(sizeof(int));
|
| m.msg_iov = &iov;
| m.msg_iovlen = 1;
| m.msg_name = NULL;
| m.msg_namelen = 0;
| m.msg_control = msg;
| m.msg_controllen = CMSG_SPACE(sizeof(int));
| *(int *)CMSG_DATA(msg) = kq;
|
| EV_SET(&ev, 1, EVFILT_TIMER, EV_ADD|EV_ENABLE, 0, 1, 0);
| if (kevent(kq, &ev, 1, NULL, 0, NULL) == -1)
| err(1, "could not add timer event");
|
| printf("parent (pid %d): sending kq fd %d\n", getpid(), kq);
| if (sendmsg(s[0], &m, 0) == -1)
| err(1, "could not sendmsg");
|
| close(kq);
|
| waitpid(child, &status, 0);
|
| return 0;
| }
|
|
| IMHO the correct fix would be to filter kqueue descriptors out of SCM_RIGHTS
| at the sending side and document this (in unp_internalize break out and
| return EPERM if there is a file_t with f_type == DTYPE_KQUEUE).
What's the harm in letting them through and making them work properly in
the new process?
christos
From: Martin Husemann <martin@duskware.de>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running tmux
Date: Thu, 31 May 2012 15:23:18 +0200
On Thu, May 31, 2012 at 09:18:53AM -0400, Christos Zoulas wrote:
> What's the harm in letting them through and making them work properly in
> the new process?
None, probably. Needs a bit of carefull review. However, I don't see
any use case where this would make sense from a userland perspective.
Martin
From: christos@zoulas.com (Christos Zoulas)
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running tmux
Date: Thu, 31 May 2012 09:30:44 -0400
On May 31, 3:23pm, martin@duskware.de (Martin Husemann) wrote:
-- Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption runnin
| On Thu, May 31, 2012 at 09:18:53AM -0400, Christos Zoulas wrote:
| > What's the harm in letting them through and making them work properly in
| > the new process?
|
| None, probably. Needs a bit of carefull review. However, I don't see
| any use case where this would make sense from a userland perspective.
It would make sense if you wanted an unprivileged process monitor a
file descriptor monitor something what does not have access to, without
making the client setuid. From the orthogonality POV, I don't see why
file descriptor handling should be special-cased. It should just work
as expected, i.e. a file descriptor passed from one process to another
should function properly regardless of type.
christos
From: Martin Husemann <martin@duskware.de>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running tmux
Date: Thu, 31 May 2012 15:54:34 +0200
On Thu, May 31, 2012 at 09:30:44AM -0400, Christos Zoulas wrote:
> It would make sense if you wanted an unprivileged process monitor a
> file descriptor monitor something what does not have access to, without
> making the client setuid.
OK, so I'll fix it that way.
Martin
From: christos@zoulas.com (Christos Zoulas)
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running tmux
Date: Thu, 31 May 2012 10:49:31 -0400
On May 31, 3:54pm, martin@duskware.de (Martin Husemann) wrote:
-- Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption runnin
| On Thu, May 31, 2012 at 09:30:44AM -0400, Christos Zoulas wrote:
| > It would make sense if you wanted an unprivileged process monitor a
| > file descriptor monitor something what does not have access to, without
| > making the client setuid.
|
| OK, so I'll fix it that way.
Yes, and we can remove the close-on-fork part too I guess.
christos
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running tmux
Date: Thu, 31 May 2012 17:09:31 +0200
On Thu, May 31, 2012 at 02:50:06PM +0000, Christos Zoulas wrote:
> Yes, and we can remove the close-on-fork part too I guess.
I'm not sure we should do that by default, but we could add a flag for
kqueue1() to make it inheritable. Otherwise it might be a suprise with
unclear consequences on existing apps (like now suddenly running into
RLIMIT_NOFILE).
However, is this realy worth the effort? All other OSes seem to not inherit
this descriptors as well.
Martin
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, martin@NetBSD.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, rhansen@bbn.com
Cc:
Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption running tmux
Date: Thu, 31 May 2012 11:28:18 -0400
On May 31, 3:10pm, martin@duskware.de (Martin Husemann) wrote:
-- Subject: Re: kern/46463: netbsd-6: panic and filesystem corruption runnin
| On Thu, May 31, 2012 at 02:50:06PM +0000, Christos Zoulas wrote:
| > Yes, and we can remove the close-on-fork part too I guess.
|
| I'm not sure we should do that by default, but we could add a flag for
| kqueue1() to make it inheritable. Otherwise it might be a suprise with
| unclear consequences on existing apps (like now suddenly running into
| RLIMIT_NOFILE).
I don't think that this will happen in reality; there are very few programs
using kqueue, and I think that most people don't treat the kqueue file
descriptors specially. I.e. if you ask jill-random if kqueue descriptors are
inherited on fork, she will probably say yes.
| However, is this realy worth the effort? All other OSes seem to not inherit
| this descriptors as well.
I don't know. I just dislike the special treatment. I think that the power
of unix comes from the fact that file-descriptors have all treated equally
for common operations. Yes, I know we have added all kinds of warts through
the years, but...
christos
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/46463 CVS commit: src/external/bsd/tmux/dist
Date: Thu, 31 May 2012 19:14:57 +0000
Module Name: src
Committed By: martin
Date: Thu May 31 19:14:56 UTC 2012
Modified Files:
src/external/bsd/tmux/dist: tmux.c
Log Message:
Make sure to have file descriptors 0-2 open before doing further setup.
Otherwise strange invocations could trick us into passing the event
descriptor as, for example, stdin to the server.
See PR 46463 for details.
To generate a diff of this commit:
cvs rdiff -u -r1.1.1.2 -r1.2 src/external/bsd/tmux/dist/tmux.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/46463 CVS commit: src/tests/lib/libc/sys
Date: Thu, 31 May 2012 20:31:07 +0000
Module Name: src
Committed By: martin
Date: Thu May 31 20:31:07 UTC 2012
Modified Files:
src/tests/lib/libc/sys: t_kevent.c
Log Message:
Add a (skipped for now) test case for PR 46463
To generate a diff of this commit:
cvs rdiff -u -r1.2 -r1.3 src/tests/lib/libc/sys/t_kevent.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/46463 CVS commit: src/sys/kern
Date: Sat, 2 Jun 2012 16:16:16 +0000
Module Name: src
Committed By: martin
Date: Sat Jun 2 16:16:16 UTC 2012
Modified Files:
src/sys/kern: uipc_usrreq.c
Log Message:
Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.
To generate a diff of this commit:
cvs rdiff -u -r1.136 -r1.137 src/sys/kern/uipc_usrreq.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->pending-pullups
State-Changed-By: martin@NetBSD.org
State-Changed-When: Sat, 02 Jun 2012 16:50:56 +0000
State-Changed-Why:
[pullup-6 #304], [pullup-5 #1766]
From: "Julian Coleman" <jdc@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/46463 CVS commit: [netbsd-5] src/sys/kern
Date: Sun, 3 Jun 2012 08:47:28 +0000
Module Name: src
Committed By: jdc
Date: Sun Jun 3 08:47:28 UTC 2012
Modified Files:
src/sys/kern [netbsd-5]: uipc_usrreq.c
Log Message:
Pull up revision 1.137 (requested by martin in ticket #1766).
Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.
To generate a diff of this commit:
cvs rdiff -u -r1.119.4.4 -r1.119.4.5 src/sys/kern/uipc_usrreq.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Julian Coleman" <jdc@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/46463 CVS commit: [netbsd-5-0] src/sys/kern
Date: Sun, 3 Jun 2012 08:47:30 +0000
Module Name: src
Committed By: jdc
Date: Sun Jun 3 08:47:30 UTC 2012
Modified Files:
src/sys/kern [netbsd-5-0]: uipc_usrreq.c
Log Message:
Pull up revision 1.137 (requested by martin in ticket #1766).
Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.
To generate a diff of this commit:
cvs rdiff -u -r1.119.4.2 -r1.119.4.2.2.1 src/sys/kern/uipc_usrreq.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Julian Coleman" <jdc@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/46463 CVS commit: [netbsd-5-1] src/sys/kern
Date: Sun, 3 Jun 2012 08:47:36 +0000
Module Name: src
Committed By: jdc
Date: Sun Jun 3 08:47:36 UTC 2012
Modified Files:
src/sys/kern [netbsd-5-1]: uipc_usrreq.c
Log Message:
Pull up revision 1.137 (requested by martin in ticket #1766).
Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.
To generate a diff of this commit:
cvs rdiff -u -r1.119.4.3 -r1.119.4.3.2.1 src/sys/kern/uipc_usrreq.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/46463 CVS commit: [netbsd-6] src/sys/kern
Date: Mon, 11 Jun 2012 23:20:38 +0000
Module Name: src
Committed By: riz
Date: Mon Jun 11 23:20:38 UTC 2012
Modified Files:
src/sys/kern [netbsd-6]: uipc_usrreq.c
Log Message:
Pull up following revision(s) (requested by martin in ticket #304):
sys/kern/uipc_usrreq.c: revision 1.137
Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.
To generate a diff of this commit:
cvs rdiff -u -r1.136 -r1.136.8.1 src/sys/kern/uipc_usrreq.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Tue, 26 Jun 2012 19:49:44 +0000
State-Changed-Why:
Crash is fixed and pulled up.
From: "Julian Coleman" <jdc@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/46463 CVS commit: [netbsd-6] src
Date: Sat, 24 Nov 2012 21:40:03 +0000
Module Name: src
Committed By: jdc
Date: Sat Nov 24 21:40:03 UTC 2012
Modified Files:
src/lib/libc/sys [netbsd-6]: kqueue.2
src/sys/kern [netbsd-6]: kern_descrip.c kern_event.c
src/tests/lib/libc/sys [netbsd-6]: t_kevent.c
Log Message:
Pull up revisions:
src/sys/kern/kern_event.c revision 1.79
src/sys/kern/kern_descrip.c revision 1.219
src/lib/libc/sys/kqueue.2 revision 1.33
src/tests/lib/libc/sys/t_kevent.c revision 1.2-1.5
(requested by christos in ticket #716).
- initialize kn_id
- in close, invalidate f_data and f_type early to prevent accidental re-use
- add a DIAGNOSTIC for when we use unsupported fd's and a KASSERT for f_event
being NULL.
Return EOPNOTSUPP for fnullop_kqfilter to prevent registration of unsupported
fds. XXX: We should really fix the fd's to be supported in the future.
Unsupported fd's have a NULL f_event, so registering crashes the kernel with
a NULL function dereference of f_event.
mention that kevent returns now EOPNOTSUPP.
Move the references to PRs from code comments to the test description. Once
ATF has the ability to output the metadata in the HTML reports, it should be
easy to traverse between releng and gnats -reports via links.
Add a (skipped for now) test case for PR 46463
adapt to new reality
Add a test for adding an event to an unsupported fd.
To generate a diff of this commit:
cvs rdiff -u -r1.32 -r1.32.2.1 src/lib/libc/sys/kqueue.2
cvs rdiff -u -r1.218 -r1.218.2.1 src/sys/kern/kern_descrip.c
cvs rdiff -u -r1.75 -r1.75.2.1 src/sys/kern/kern_event.c
cvs rdiff -u -r1.1 -r1.1.2.1 src/tests/lib/libc/sys/t_kevent.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Julian Coleman" <jdc@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/46463 CVS commit: [netbsd-6-0] src
Date: Sat, 24 Nov 2012 21:40:08 +0000
Module Name: src
Committed By: jdc
Date: Sat Nov 24 21:40:08 UTC 2012
Modified Files:
src/lib/libc/sys [netbsd-6-0]: kqueue.2
src/sys/kern [netbsd-6-0]: kern_descrip.c kern_event.c
src/tests/lib/libc/sys [netbsd-6-0]: t_kevent.c
Log Message:
Pull up revisions:
src/sys/kern/kern_event.c revision 1.79
src/sys/kern/kern_descrip.c revision 1.219
src/lib/libc/sys/kqueue.2 revision 1.33
src/tests/lib/libc/sys/t_kevent.c revision 1.2-1.5
(requested by christos in ticket #716).
- initialize kn_id
- in close, invalidate f_data and f_type early to prevent accidental re-use
- add a DIAGNOSTIC for when we use unsupported fd's and a KASSERT for f_event
being NULL.
Return EOPNOTSUPP for fnullop_kqfilter to prevent registration of unsupported
fds. XXX: We should really fix the fd's to be supported in the future.
Unsupported fd's have a NULL f_event, so registering crashes the kernel with
a NULL function dereference of f_event.
mention that kevent returns now EOPNOTSUPP.
Move the references to PRs from code comments to the test description. Once
ATF has the ability to output the metadata in the HTML reports, it should be
easy to traverse between releng and gnats -reports via links.
Add a (skipped for now) test case for PR 46463
adapt to new reality
Add a test for adding an event to an unsupported fd.
To generate a diff of this commit:
cvs rdiff -u -r1.32 -r1.32.8.1 src/lib/libc/sys/kqueue.2
cvs rdiff -u -r1.218 -r1.218.8.1 src/sys/kern/kern_descrip.c
cvs rdiff -u -r1.75 -r1.75.6.1 src/sys/kern/kern_event.c
cvs rdiff -u -r1.1 -r1.1.6.1 src/tests/lib/libc/sys/t_kevent.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.