NetBSD Problem Report #44986
From tron@zhadum.org.uk Sat May 21 11:35:11 2011
Return-Path: <tron@zhadum.org.uk>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 29E5A63C64D
for <gnats-bugs@gnats.NetBSD.org>; Sat, 21 May 2011 11:35:11 +0000 (UTC)
Message-Id: <20110521113506.F1B73F9010@lyssa.zhadum.org.uk>
Date: Sat, 21 May 2011 12:35:06 +0100 (BST)
From: tron@zhadum.org.uk
Reply-To: tron@zhadum.org.uk
To: gnats-bugs@gnats.NetBSD.org
Subject: pollts(2) system call changes signal mask
X-Send-Pr-Version: 3.95
>Number: 44986
>Category: kern
>Synopsis: pollts(2) system call changes signal mask
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: christos
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat May 21 11:40:01 +0000 2011
>Closed-Date: Sat May 28 16:17:25 +0000 2011
>Last-Modified: Thu Jul 21 03:37:28 +0000 2011
>Originator: tron@zhadum.org.uk
>Release: NetBSD 5.99.51 2011-05-21 sources
>Organization:
Matthias Scheler http://zhadum.org.uk/
>Environment:
System: NetBSD lyssa.zhadum.org.uk 5.99.51 NetBSD 5.99.51 (LYSSA) #0: Sat May 21 11:53:54 BST 2011 tron@lyssa.zhadum.org.uk:/src/sys/compile/LYSSA i386
Architecture: i386
Machine: i386
>Description:
"screen" gets stuck during detach under NetBSD/i386 current since a few days.
It is working fine with 2011-05-13 build and doesn't work with a
2011-05-19 build and newer. I've attached "gdb" to the hung "screen"
process at it produced the following stack trace:
(gdb) where
#0 0xbbab2657 in _sys___sigsuspend14 () from /usr/lib/libc.so.12
#1 0xbbadb252 in pause () from /usr/lib/libc.so.12
#2 0x0806bb13 in ?? ()
#3 0x0000000f in ?? ()
#4 0x0806adc0 in ?? ()
#5 0x00000000 in ?? ()
I've suspicion that the problem is related to this change:
http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
I could however not find a problem when I reviewed the change to
"src/sys/kern/sys_sig.c".
>How-To-Repeat:
screen
<CTRL> "a" + "d"
>Fix:
Not known.
>Release-Note:
>Audit-Trail:
From: Matthias Scheler <tron@zhadum.org.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 13:31:23 +0100
On Sat, May 21, 2011 at 11:40:01AM +0000, gnats-admin@netbsd.org wrote:
> Thank you very much for your problem report.
> It has the internal identification `kern/44986'.
> The individual assigned to look at your
> report is: kern-bug-people.
>
> >Category: kern
> >Responsible: kern-bug-people
> >Synopsis: "screens" gets stuck during detach
> >Arrival-Date: Sat May 21 11:40:01 +0000 2011
Here is a better stack trace for the hanging "screen" process:
(gdb) where
#0 0xbbab2657 in _sys___sigsuspend14 () from /usr/lib/libc.so.12
#1 0xbbadb252 in pause () from /usr/lib/libc.so.12
#2 0x0806bb13 in Attacher ()
#3 0x0804e7e6 in main ()
The "Attacher" function looks like this (after "cpp"):
void
Attacher()
{
xsignal(1, AttacherFinit);
xsignal(1, AttacherFinit);
xsignal(30, AttacherFinitBye);
xsignal(31, DoLock);
xsignal(2, AttacherSigInt);
xsignal(18, SigStop);
xsignal(28, AttacherWinch);
do {} while (0);
dflag = 0;
xflag = 1;
for (;;)
{
xsignal(14, AttacherSigAlarm);
alarm(15);
pause(); <--- It seems get stuck here.
alarm(0);
if (kill(MasterPid, 0) < 0 && (*__errno()) != 1)
{
do {} while (0);
AttacherPanic++;
}
[...]
}
My system also got stuck during shutdown cleanly because "cron" wouldn't die
(until I killed it with "kill -9"). There seems to be a general problem
with signal handling.
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 09:36:26 -0400
On May 21, 11:40am, tron@zhadum.org.uk (tron@zhadum.org.uk) wrote:
-- Subject: kern/44986: "screens" gets stuck during detach
| I've suspicion that the problem is related to this change:
|
| http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
|
| I could however not find a problem when I reviewed the change to
| "src/sys/kern/sys_sig.c".
Does screen use pselect or pollts?
christos
From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 15:26:30 +0100
On Sat, May 21, 2011 at 01:40:03PM +0000, Christos Zoulas wrote:
> On May 21, 11:40am, tron@zhadum.org.uk (tron@zhadum.org.uk) wrote:
> -- Subject: kern/44986: "screens" gets stuck during detach
>
> | I've suspicion that the problem is related to this change:
> |
> | http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
> |
> | I could however not find a problem when I reviewed the change to
> | "src/sys/kern/sys_sig.c".
>
> Does screen use pselect or pollts?
No, it doesn't. It uses pause(3).
The problem can also be reproduced with cron(8):
tron@lyssa:/#/etc/rc.d/cron restart
Stopping cron.
Waiting for PIDS: 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268^Z
zsh: suspended /etc/rc.d/cron restart
tron@lyssa:/#pkill -9 cron
tron@lyssa:/#fg
[1] + continued /etc/rc.d/cron restart
.
Starting cron.
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tron@zhadum.org.uk
Cc:
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 10:36:06 -0400
On May 21, 2:30pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
-- Subject: Re: kern/44986: "screens" gets stuck during detach
| No, it doesn't. It uses pause(3).
|
| The problem can also be reproduced with cron(8):
|
| tron@lyssa:/#/etc/rc.d/cron restart
| Stopping cron.
| Waiting for PIDS: 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268^Z
| zsh: suspended /etc/rc.d/cron restart
| tron@lyssa:/#pkill -9 cron
| tron@lyssa:/#fg
| [1] + continued /etc/rc.d/cron restart
| .
But that code did not change? I just moved it into a separate function.
christos
From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 16:20:59 +0100
On Sat, May 21, 2011 at 10:36:06AM -0400, Christos Zoulas wrote:
> On May 21, 2:30pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
> -- Subject: Re: kern/44986: "screens" gets stuck during detach
>
> | No, it doesn't. It uses pause(3).
> |
> | The problem can also be reproduced with cron(8):
> |
> | tron@lyssa:/#/etc/rc.d/cron restart
> | Stopping cron.
> | Waiting for PIDS: 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268^Z
> | zsh: suspended /etc/rc.d/cron restart
> | tron@lyssa:/#pkill -9 cron
> | tron@lyssa:/#fg
> | [1] + continued /etc/rc.d/cron restart
> | .
>
> But that code did not change? I just moved it into a separate function.
Yes, I saw that. And maybe it wasn't your change. But something broke
signal handlign.
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 11:54:39 -0400
On May 21, 4:20pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
-- Subject: Re: kern/44986: "screens" gets stuck during detach
| Yes, I saw that. And maybe it wasn't your change. But something broke
| signal handlign.
We'll need to do some bisection to find out I guess :-) I tested my changes.
christos
From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 22:30:56 +0100
On Sat, May 21, 2011 at 11:54:39AM -0400, Christos Zoulas wrote:
> On May 21, 4:20pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
> -- Subject: Re: kern/44986: "screens" gets stuck during detach
>
> | Yes, I saw that. And maybe it wasn't your change. But something broke
> | signal handlign.
>
> We'll need to do some bisection to find out I guess :-) I tested my changes.
I've reverted these two commits ...
http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
http://mail-index.netbsd.org/source-changes/2011/05/18/msg022032.html
... in my own source tree and it fixed the problem.
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 17:48:47 -0400
On May 21, 10:30pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
-- Subject: Re: kern/44986: "screens" gets stuck during detach
| I've reverted these two commits ...
|
| http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
| http://mail-index.netbsd.org/source-changes/2011/05/18/msg022032.html
|
| ... in my own source tree and it fixed the problem.
Interesting because my tree seems to work. I will check some more.
christos
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tron@zhadum.org.uk
Cc:
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sun, 22 May 2011 18:03:17 -0400
On May 21, 9:35pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
-- Subject: Re: kern/44986: "screens" gets stuck during detach
| I've reverted these two commits ...
|
| http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
| http://mail-index.netbsd.org/source-changes/2011/05/18/msg022032.html
|
| ... in my own source tree and it fixed the problem.
|
I cannot reproduce either the cron problem or the screen problem.
christos
From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sun, 22 May 2011 23:34:56 +0100
On Sun, May 22, 2011 at 06:03:17PM -0400, Christos Zoulas wrote:
> On May 21, 9:35pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
> -- Subject: Re: kern/44986: "screens" gets stuck during detach
>
> | I've reverted these two commits ...
> |
> | http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
> | http://mail-index.netbsd.org/source-changes/2011/05/18/msg022032.html
> |
> | ... in my own source tree and it fixed the problem.
> |
>
> I cannot reproduce either the cron problem or the screen problem.
Unfortunately I can reproduce both reliable. Which port are you using?
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sun, 22 May 2011 18:42:07 -0400
On May 22, 11:34pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
-- Subject: Re: kern/44986: "screens" gets stuck during detach
| Unfortunately I can reproduce both reliable. Which port are you using?
i386 and amd64. I am building on arm now.
christos
From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Mon, 23 May 2011 22:27:11 +0100
On Sun, May 22, 2011 at 06:42:07PM -0400, Christos Zoulas wrote:
> On May 22, 11:34pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
> -- Subject: Re: kern/44986: "screens" gets stuck during detach
>
> | Unfortunately I can reproduce both reliable. Which port are you using?
>
> i386 and amd64. I am building on arm now.
I can still reproduce this with today's NetBSD/i386 current.
I've also tried a "GENERIC" kernel but it doesn't fix the problem.
Does any of your test machines have foure or more cores?
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Mon, 23 May 2011 21:12:52 -0400
On May 23, 10:27pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
-- Subject: Re: kern/44986: "screens" gets stuck during detach
| On Sun, May 22, 2011 at 06:42:07PM -0400, Christos Zoulas wrote:
| > On May 22, 11:34pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
| > -- Subject: Re: kern/44986: "screens" gets stuck during detach
| >
| > | Unfortunately I can reproduce both reliable. Which port are you using?
| >
| > i386 and amd64. I am building on arm now.
|
| I can still reproduce this with today's NetBSD/i386 current.
| I've also tried a "GENERIC" kernel but it doesn't fix the problem.
|
| Does any of your test machines have foure or more cores?
None of the machines I ran this yet. I will build a new one and try tomorrow.
christos
From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Tue, 24 May 2011 07:35:36 +0100
On Mon, May 23, 2011 at 09:12:52PM -0400, Christos Zoulas wrote:
> | I can still reproduce this with today's NetBSD/i386 current.
> | I've also tried a "GENERIC" kernel but it doesn't fix the problem.
> |
> | Does any of your test machines have foure or more cores?
>
> None of the machines I ran this yet. I will build a new one and try tomorrow.
Any ideas how to debug this?
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Tue, 24 May 2011 08:42:11 -0400
On May 24, 7:35am, tron@zhadum.org.uk (Matthias Scheler) wrote:
-- Subject: Re: kern/44986: "screens" gets stuck during detach
| On Mon, May 23, 2011 at 09:12:52PM -0400, Christos Zoulas wrote:
| > | I can still reproduce this with today's NetBSD/i386 current.
| > | I've also tried a "GENERIC" kernel but it doesn't fix the problem.
| > |
| > | Does any of your test machines have foure or more cores?
| >
| > None of the machines I ran this yet. I will build a new one and try tomorrow.
|
| Any ideas how to debug this?
Backout the select changes and leave the sigsuspend changes. If that fixes
the problem, then the problem is with select. I don't see how putting the
code in a separate function in sys_sig.c makes a difference.
christos
From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Tue, 24 May 2011 14:17:09 +0100
On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
> Backout the select changes and leave the sigsuspend changes. If that fixes
> the problem, then the problem is with select. I don't see how putting the
> code in a separate function in sys_sig.c makes a difference.
I'm wondering how the signal mask gets restored in case of select(2) now.
There used to be code which did that. But the old signal mask is simply
copied into a property of the "lwp" structure.
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Tue, 24 May 2011 11:37:47 -0400
On May 24, 2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
-- Subject: Re: kern/44986: "screens" gets stuck during detach
| On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
| > Backout the select changes and leave the sigsuspend changes. If that fixes
| > the problem, then the problem is with select. I don't see how putting the
| > code in a separate function in sys_sig.c makes a difference.
|
| I'm wondering how the signal mask gets restored in case of select(2) now.
| There used to be code which did that. But the old signal mask is simply
| copied into a property of the "lwp" structure.
The same way that it is restored for sigsuspend. Look in kern_sig.c
christos
From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Wed, 25 May 2011 21:44:48 +0100
On Tue, May 24, 2011 at 11:37:47AM -0400, Christos Zoulas wrote:
> On May 24, 2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
> -- Subject: Re: kern/44986: "screens" gets stuck during detach
>
> | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
> | > Backout the select changes and leave the sigsuspend changes. If that fixes
> | > the problem, then the problem is with select. I don't see how putting the
> | > code in a separate function in sys_sig.c makes a difference.
> |
> | I'm wondering how the signal mask gets restored in case of select(2) now.
> | There used to be code which did that. But the old signal mask is simply
> | copied into a property of the "lwp" structure.
>
> The same way that it is restored for sigsuspend. Look in kern_sig.c
I've reverted "sys/kern/sys_select.c" to revision 1.30 in my source tree
and can no longer reproduce the problem.
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
From: Matthias Scheler <tron@zhadum.org.uk>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: Christos Zoulas <christos@zoulas.com>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Wed, 25 May 2011 22:25:37 +0100
On Wed, May 25, 2011 at 08:45:02PM +0000, Matthias Scheler wrote:
> On Tue, May 24, 2011 at 11:37:47AM -0400, Christos Zoulas wrote:
> > On May 24, 2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
> > -- Subject: Re: kern/44986: "screens" gets stuck during detach
> >
> > | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
> > | > Backout the select changes and leave the sigsuspend changes. If that fixes
> > | > the problem, then the problem is with select. I don't see how putting the
> > | > code in a separate function in sys_sig.c makes a difference.
> > |
> > | I'm wondering how the signal mask gets restored in case of select(2) now.
> > | There used to be code which did that. But the old signal mask is simply
> > | copied into a property of the "lwp" structure.
> >
> > The same way that it is restored for sigsuspend. Look in kern_sig.c
>
> I've reverted "sys/kern/sys_select.c" to revision 1.30 in my source tree
> and can no longer reproduce the problem.
Reverting to revision 1.31 does *not* fix the problem.
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Wed, 25 May 2011 18:25:56 -0400
On May 25, 9:44pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
-- Subject: Re: kern/44986: "screens" gets stuck during detach
| On Tue, May 24, 2011 at 11:37:47AM -0400, Christos Zoulas wrote:
| > On May 24, 2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
| > -- Subject: Re: kern/44986: "screens" gets stuck during detach
| >
| > | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
| > | > Backout the select changes and leave the sigsuspend changes. If that fixes
| > | > the problem, then the problem is with select. I don't see how putting the
| > | > code in a separate function in sys_sig.c makes a difference.
| > |
| > | I'm wondering how the signal mask gets restored in case of select(2) now.
| > | There used to be code which did that. But the old signal mask is simply
| > | copied into a property of the "lwp" structure.
| >
| > The same way that it is restored for sigsuspend. Look in kern_sig.c
|
| I've reverted "sys/kern/sys_select.c" to revision 1.30 in my source tree
| and can no longer reproduce the problem.
Ok, that is curious; can you ktrace screen and if it invokes select with
a signal mask?
christos
From: Matthias Scheler <tron@zhadum.org.uk>
To: christos@zoulas.com (Christos Zoulas)
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Wed, 25 May 2011 23:33:37 +0100
On 25 May 2011, at 23:25, Christos Zoulas wrote:
> On May 25, 9:44pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
> -- Subject: Re: kern/44986: "screens" gets stuck during detach
>=20
> | On Tue, May 24, 2011 at 11:37:47AM -0400, Christos Zoulas wrote:
> | > On May 24, 2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
> | > -- Subject: Re: kern/44986: "screens" gets stuck during detach
> | >=20
> | > | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
> | > | > Backout the select changes and leave the sigsuspend changes. =
If that fixes
> | > | > the problem, then the problem is with select. I don't see how =
putting the
> | > | > code in a separate function in sys_sig.c makes a difference.
> | > |=20
> | > | I'm wondering how the signal mask gets restored in case of =
select(2) now.
> | > | There used to be code which did that. But the old signal mask is =
simply
> | > | copied into a property of the "lwp" structure.
> | >=20
> | > The same way that it is restored for sigsuspend. Look in =
kern_sig.c
> |=20
> | I've reverted "sys/kern/sys_select.c" to revision 1.30 in my source =
tree
> | and can no longer reproduce the problem.
>=20
> Ok, that is curious; can you ktrace screen and if it invokes select =
with
> a signal mask?
You mean whether it uses pselect(2)? No, it doesn't, "pselect" is not =
mentioned anywhere in the source code.
Kind regards
--=20
Matthias Scheler http://zhadum.org.uk/
From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Wed, 25 May 2011 18:37:29 -0400
On May 25, 11:33pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
-- Subject: Re: kern/44986: "screens" gets stuck during detach
|
| On 25 May 2011, at 23:25, Christos Zoulas wrote:
|
| > On May 25, 9:44pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
| > -- Subject: Re: kern/44986: "screens" gets stuck during detach
| >=20
| > | On Tue, May 24, 2011 at 11:37:47AM -0400, Christos Zoulas wrote:
| > | > On May 24, 2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
| > | > -- Subject: Re: kern/44986: "screens" gets stuck during detach
| > | >=20
| > | > | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
| > | > | > Backout the select changes and leave the sigsuspend changes. If t=
| hat fixes
| > | > | > the problem, then the problem is with select. I don't see how put=
| ting the
| > | > | > code in a separate function in sys_sig.c makes a difference.
| > | > |=20
| > | > | I'm wondering how the signal mask gets restored in case of select(2=
| ) now.
| > | > | There used to be code which did that. But the old signal mask is si=
| mply
| > | > | copied into a property of the "lwp" structure.
| > | >=20
| > | > The same way that it is restored for sigsuspend. Look in kern_sig.c
| > |=20
| > | I've reverted "sys/kern/sys_select.c" to revision 1.30 in my source tre=
| e
| > | and can no longer reproduce the problem.
| >=20
| > Ok, that is curious; can you ktrace screen and if it invokes select with
| > a signal mask?
|
| You mean whether it uses pselect(2)? No, it doesn't, "pselect" is not menti=
| oned anywhere in the source code.
or pollts()... Let me look again at the code...
christos
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, tron@zhadum.org.uk
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Fri, 27 May 2011 02:20:28 +0000 (UTC)
> The following reply was made to PR kern/44986; it has been noted by GNATS.
>
> From: christos@zoulas.com (Christos Zoulas)
> To: Matthias Scheler <tron@zhadum.org.uk>
> Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
> Subject: Re: kern/44986: "screens" gets stuck during detach
> Date: Tue, 24 May 2011 11:37:47 -0400
>
> On May 24, 2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
> -- Subject: Re: kern/44986: "screens" gets stuck during detach
>
> | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
> | > Backout the select changes and leave the sigsuspend changes. If that fixes
> | > the problem, then the problem is with select. I don't see how putting the
> | > code in a separate function in sys_sig.c makes a difference.
> |
> | I'm wondering how the signal mask gets restored in case of select(2) now.
> | There used to be code which did that. But the old signal mask is simply
> | copied into a property of the "lwp" structure.
>
> The same way that it is restored for sigsuspend. Look in kern_sig.c
how does it work when select returns due to non-signal events?
YAMAMOTO Takashi
>
> christos
From: christos@zoulas.com (Christos Zoulas)
To: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi), gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, tron@zhadum.org.uk
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Fri, 27 May 2011 15:44:40 -0400
On May 27, 2:20am, yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
-- Subject: Re: kern/44986: "screens" gets stuck during detach
| how does it work when select returns due to non-signal events?
|
| YAMAMOTO Takashi
Hmm, yes. I guess it will not restore the mask the same way. I will add a
function to do it. The question is what uses pselect or pollts?
christos
From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 28 May 2011 13:13:02 +0100
On Fri, May 27, 2011 at 07:45:07PM +0000, Christos Zoulas wrote:
> The following reply was made to PR kern/44986; it has been noted by GNATS.
>
> From: christos@zoulas.com (Christos Zoulas)
> To: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi), gnats-bugs@NetBSD.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org, tron@zhadum.org.uk
> Subject: Re: kern/44986: "screens" gets stuck during detach
> Date: Fri, 27 May 2011 15:44:40 -0400
>
> On May 27, 2:20am, yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
> -- Subject: Re: kern/44986: "screens" gets stuck during detach
>
> | how does it work when select returns due to non-signal events?
> |
> | YAMAMOTO Takashi
>
> Hmm, yes. I guess it will not restore the mask the same way. I will add a
> function to do it. The question is what uses pselect or pollts?
I've traced the calls with a non-NULL "mask" value. They are all
made by sys___pollts50() via pollcommon().
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 28 May 2011 14:40:05 +0100
On Sat, May 28, 2011 at 12:15:05PM +0000, Matthias Scheler wrote:
> On Fri, May 27, 2011 at 07:45:07PM +0000, Christos Zoulas wrote:
> > The following reply was made to PR kern/44986; it has been noted by GNATS.
> >
> > From: christos@zoulas.com (Christos Zoulas)
> > To: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi), gnats-bugs@NetBSD.org
> > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> > netbsd-bugs@netbsd.org, tron@zhadum.org.uk
> > Subject: Re: kern/44986: "screens" gets stuck during detach
> > Date: Fri, 27 May 2011 15:44:40 -0400
> >
> > On May 27, 2:20am, yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
> > -- Subject: Re: kern/44986: "screens" gets stuck during detach
> >
> > | how does it work when select returns due to non-signal events?
> > |
> > | YAMAMOTO Takashi
> >
> > Hmm, yes. I guess it will not restore the mask the same way. I will add a
> > function to do it. The question is what uses pselect or pollts?
>
> I've traced the calls with a non-NULL "mask" value. They are all
> made by sys___pollts50() via pollcommon().
And I think I found out why my system makes a lot of pollts(2) calls.
This NetBSD system is a NIS client. And "src/lib/libc/rpc/clnt_dg.c"
uses pollts(2) with a non-NULL argument for the mask.
Kind regards
--
Matthias Scheler http://zhadum.org.uk/
From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 28 May 2011 10:34:20 -0400
On May 28, 2:40pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
-- Subject: Re: kern/44986: "screens" gets stuck during detach
| On Sat, May 28, 2011 at 12:15:05PM +0000, Matthias Scheler wrote:
| > On Fri, May 27, 2011 at 07:45:07PM +0000, Christos Zoulas wrote:
| > > The following reply was made to PR kern/44986; it has been noted by GNATS.
| > >
| > > From: christos@zoulas.com (Christos Zoulas)
| > > To: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi), gnats-bugs@NetBSD.org
| > > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
| > > netbsd-bugs@netbsd.org, tron@zhadum.org.uk
| > > Subject: Re: kern/44986: "screens" gets stuck during detach
| > > Date: Fri, 27 May 2011 15:44:40 -0400
| > >
| > > On May 27, 2:20am, yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
| > > -- Subject: Re: kern/44986: "screens" gets stuck during detach
| > >
| > > | how does it work when select returns due to non-signal events?
| > > |
| > > | YAMAMOTO Takashi
| > >
| > > Hmm, yes. I guess it will not restore the mask the same way. I will add a
| > > function to do it. The question is what uses pselect or pollts?
| >
| > I've traced the calls with a non-NULL "mask" value. They are all
| > made by sys___pollts50() via pollcommon().
|
| And I think I found out why my system makes a lot of pollts(2) calls.
| This NetBSD system is a NIS client. And "src/lib/libc/rpc/clnt_dg.c"
| uses pollts(2) with a non-NULL argument for the mask.
Makes, sense. Which is one of the reasons we added it. This is why I don't
see it on mine. Can you please try this:
Index: kern/sys_select.c
===================================================================
RCS file: /cvsroot/src/sys/kern/sys_select.c,v
retrieving revision 1.32
diff -u -u -r1.32 sys_select.c
--- kern/sys_select.c 18 May 2011 14:48:04 -0000 1.32
+++ kern/sys_select.c 28 May 2011 14:33:06 -0000
@@ -304,6 +304,9 @@
}
selclear();
+ if (__predict_false(mask))
+ sigsuspendteardown(l);
+
/* select and poll are not restarted after signals... */
if (error == ERESTART)
return EINTR;
Index: kern/sys_sig.c
===================================================================
RCS file: /cvsroot/src/sys/kern/sys_sig.c,v
retrieving revision 1.33
diff -u -u -r1.33 sys_sig.c
--- kern/sys_sig.c 18 May 2011 03:51:41 -0000 1.33
+++ kern/sys_sig.c 28 May 2011 14:33:07 -0000
@@ -631,6 +631,19 @@
mutex_exit(p->p_lock);
}
+void
+sigsuspendteardown(struct lwp *l)
+{
+ struct proc *p = l->l_proc;
+
+ mutex_enter(p->p_lock);
+ if (l->l_sigrestore) {
+ l->l_sigrestore = 0;
+ l->l_sigmask = l->l_sigoldmask;
+ }
+ mutex_exit(p->p_lock);
+}
+
int
sigsuspend1(struct lwp *l, const sigset_t *ss)
{
Index: sys/signalvar.h
===================================================================
RCS file: /cvsroot/src/sys/sys/signalvar.h,v
retrieving revision 1.80
diff -u -u -r1.80 signalvar.h
--- sys/signalvar.h 18 May 2011 03:51:41 -0000 1.80
+++ sys/signalvar.h 28 May 2011 14:33:08 -0000
@@ -149,6 +149,7 @@
int sigprocmask1(struct lwp *, int, const sigset_t *, sigset_t *);
void sigpending1(struct lwp *, sigset_t *);
void sigsuspendsetup(struct lwp *, const sigset_t *);
+void sigsuspendteardown(struct lwp *);
int sigsuspend1(struct lwp *, const sigset_t *);
int sigaltstack1(struct lwp *, const struct sigaltstack *,
struct sigaltstack *);
From: "Matthias Scheler" <tron@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/44986 CVS commit: src
Date: Sat, 28 May 2011 15:24:50 +0000
Module Name: src
Committed By: tron
Date: Sat May 28 15:24:49 UTC 2011
Modified Files:
src/distrib/sets/lists/tests: mi
src/tests/kernel: Makefile
Added Files:
src/tests/kernel: t_pollts.c
Log Message:
Add two test cases for pollts(2):
- The first tests basic functionality e.g. timeouts and correct events.
- The second tests whether pollts(2) correctly restores the signal mask.
This test currently fails because of PR kern/44986.
To generate a diff of this commit:
cvs rdiff -u -r1.338 -r1.339 src/distrib/sets/lists/tests/mi
cvs rdiff -u -r1.10 -r1.11 src/tests/kernel/Makefile
cvs rdiff -u -r0 -r1.1 src/tests/kernel/t_pollts.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: kern-bug-people->christos
Responsible-Changed-By: tron@NetBSD.org
Responsible-Changed-When: Sat, 28 May 2011 15:28:27 +0000
Responsible-Changed-Why:
Christos is working on a fix.
From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 28 May 2011 16:31:48 +0100
On Sat, May 28, 2011 at 10:34:20AM -0400, Christos Zoulas wrote:
> | And I think I found out why my system makes a lot of pollts(2) calls.
> | This NetBSD system is a NIS client. And "src/lib/libc/rpc/clnt_dg.c"
> | uses pollts(2) with a non-NULL argument for the mask.
>
> Makes, sense. Which is one of the reasons we added it. This is why I don't
> see it on mine. Can you please try this:
Yes, that fixes the problem. Please commit the change and remove the
expected failure from "src/tests/kernel/t_pollts.c".
Thanks a lot
--
Matthias Scheler http://zhadum.org.uk/
From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/44896 CVS commit: src/tests/kernel
Date: Sat, 28 May 2011 11:37:11 -0400
Module Name: src
Committed By: christos
Date: Sat May 28 15:37:11 UTC 2011
Modified Files:
src/tests/kernel: t_pollts.c
Log Message:
PR/44896 has been fixed.
BTW: We've created a mess here again with the directory structure of the
tests. What goes in syscalls, what goes in sys, and what goes in kernel?
I think we should follow the userland location for paths where those should
be defined, so everything should go into libc/sys.
To generate a diff of this commit:
cvs rdiff -u -r1.1 -r1.2 src/tests/kernel/t_pollts.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Matthias Scheler" <tron@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/44986 CVS commit: src/sys/rump/librump/rumpkern
Date: Sat, 28 May 2011 16:07:44 +0000
Module Name: src
Committed By: tron
Date: Sat May 28 16:07:44 UTC 2011
Modified Files:
src/sys/rump/librump/rumpkern: signals.c
Log Message:
Fix rump build which got broken by the fix for PR kern/44986.
To generate a diff of this commit:
cvs rdiff -u -r1.9 -r1.10 src/sys/rump/librump/rumpkern/signals.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Matthias Scheler" <tron@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/44986 CVS commit: src
Date: Sat, 28 May 2011 16:12:56 +0000
Module Name: src
Committed By: tron
Date: Sat May 28 16:12:56 UTC 2011
Modified Files:
src/distrib/sets/lists/tests: mi
src/tests/kernel: Makefile
src/tests/syscall: Makefile
Added Files:
src/tests/syscall: t_pollts.c
Removed Files:
src/tests/kernel: t_pollts.c
Log Message:
Move regression test for PR kern/44986 from "kernel" to "syscalls" as
the later directory seems to be a better fit.
To generate a diff of this commit:
cvs rdiff -u -r1.339 -r1.340 src/distrib/sets/lists/tests/mi
cvs rdiff -u -r1.11 -r1.12 src/tests/kernel/Makefile
cvs rdiff -u -r1.2 -r0 src/tests/kernel/t_pollts.c
cvs rdiff -u -r1.27 -r1.28 src/tests/syscall/Makefile
cvs rdiff -u -r0 -r1.1 src/tests/syscall/t_pollts.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: tron@NetBSD.org
State-Changed-When: Sat, 28 May 2011 16:17:25 +0000
State-Changed-Why:
The bug was fixed by this commit:
http://mail-index.netbsd.org/source-changes/2011/05/28/msg022625.html
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.