NetBSD Problem Report #44986

From tron@zhadum.org.uk  Sat May 21 11:35:11 2011
Return-Path: <tron@zhadum.org.uk>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 29E5A63C64D
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 21 May 2011 11:35:11 +0000 (UTC)
Message-Id: <20110521113506.F1B73F9010@lyssa.zhadum.org.uk>
Date: Sat, 21 May 2011 12:35:06 +0100 (BST)
From: tron@zhadum.org.uk
Reply-To: tron@zhadum.org.uk
To: gnats-bugs@gnats.NetBSD.org
Subject: pollts(2) system call changes signal mask
X-Send-Pr-Version: 3.95

>Number:         44986
>Category:       kern
>Synopsis:       pollts(2) system call changes signal mask
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    christos
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat May 21 11:40:01 +0000 2011
>Closed-Date:    Sat May 28 16:17:25 +0000 2011
>Last-Modified:  Thu Jul 21 03:37:28 +0000 2011
>Originator:     tron@zhadum.org.uk
>Release:        NetBSD 5.99.51 2011-05-21 sources
>Organization:
Matthias Scheler                                  http://zhadum.org.uk/
>Environment:
System: NetBSD lyssa.zhadum.org.uk 5.99.51 NetBSD 5.99.51 (LYSSA) #0: Sat May 21 11:53:54 BST 2011 tron@lyssa.zhadum.org.uk:/src/sys/compile/LYSSA i386
Architecture: i386
Machine: i386
>Description:
"screen" gets stuck during detach under NetBSD/i386 current since a few days.
It is working fine with 2011-05-13 build and doesn't work with a
2011-05-19 build and newer. I've attached "gdb" to the hung "screen"
process at it produced the following stack trace:

(gdb) where
#0  0xbbab2657 in _sys___sigsuspend14 () from /usr/lib/libc.so.12
#1  0xbbadb252 in pause () from /usr/lib/libc.so.12
#2  0x0806bb13 in ?? ()
#3  0x0000000f in ?? ()
#4  0x0806adc0 in ?? ()
#5  0x00000000 in ?? ()

I've suspicion that the problem is related to this change:

http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html

I could however not find a problem when I reviewed the change to
"src/sys/kern/sys_sig.c".

>How-To-Repeat:
screen
<CTRL> "a" + "d"

>Fix:
Not known.

>Release-Note:

>Audit-Trail:
From: Matthias Scheler <tron@zhadum.org.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 13:31:23 +0100

 On Sat, May 21, 2011 at 11:40:01AM +0000, gnats-admin@netbsd.org wrote:
 > Thank you very much for your problem report.
 > It has the internal identification `kern/44986'.
 > The individual assigned to look at your
 > report is: kern-bug-people. 
 > 
 > >Category:       kern
 > >Responsible:    kern-bug-people
 > >Synopsis:       "screens" gets stuck during detach
 > >Arrival-Date:   Sat May 21 11:40:01 +0000 2011

 Here is a better stack trace for the hanging "screen" process:

 (gdb) where
 #0  0xbbab2657 in _sys___sigsuspend14 () from /usr/lib/libc.so.12
 #1  0xbbadb252 in pause () from /usr/lib/libc.so.12
 #2  0x0806bb13 in Attacher ()
 #3  0x0804e7e6 in main ()

 The "Attacher" function looks like this (after "cpp"):

 void
 Attacher()
 {
   xsignal(1, AttacherFinit);
   xsignal(1, AttacherFinit);
   xsignal(30, AttacherFinitBye);


   xsignal(31, DoLock);
   xsignal(2, AttacherSigInt);
   xsignal(18, SigStop);

   xsignal(28, AttacherWinch);


   do {} while (0);
   dflag = 0;
   xflag = 1;
   for (;;)
     {
       xsignal(14, AttacherSigAlarm);
       alarm(15);
       pause();		<--- It seems get stuck here.
       alarm(0);
       if (kill(MasterPid, 0) < 0 && (*__errno()) != 1)
         {
    do {} while (0);
    AttacherPanic++;
  }
 [...]
 }

 My system also got stuck during shutdown cleanly because "cron" wouldn't die
 (until I killed it with "kill -9"). There seems to be a general problem
 with signal handling.

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 09:36:26 -0400

 On May 21, 11:40am, tron@zhadum.org.uk (tron@zhadum.org.uk) wrote:
 -- Subject: kern/44986: "screens" gets stuck during detach

 | I've suspicion that the problem is related to this change:
 | 
 | http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
 | 
 | I could however not find a problem when I reviewed the change to
 | "src/sys/kern/sys_sig.c".

 Does screen use pselect or pollts?

 christos

From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 15:26:30 +0100

 On Sat, May 21, 2011 at 01:40:03PM +0000, Christos Zoulas wrote:
 >  On May 21, 11:40am, tron@zhadum.org.uk (tron@zhadum.org.uk) wrote:
 >  -- Subject: kern/44986: "screens" gets stuck during detach
 >  
 >  | I've suspicion that the problem is related to this change:
 >  | 
 >  | http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
 >  | 
 >  | I could however not find a problem when I reviewed the change to
 >  | "src/sys/kern/sys_sig.c".
 >  
 >  Does screen use pselect or pollts?

 No, it doesn't. It uses pause(3).

 The problem can also be reproduced with cron(8):

 tron@lyssa:/#/etc/rc.d/cron restart
 Stopping cron.
 Waiting for PIDS: 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268^Z
 zsh: suspended  /etc/rc.d/cron restart
 tron@lyssa:/#pkill -9 cron
 tron@lyssa:/#fg
 [1]  + continued  /etc/rc.d/cron restart
 .
 Starting cron.

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tron@zhadum.org.uk
Cc: 
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 10:36:06 -0400

 On May 21,  2:30pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 -- Subject: Re: kern/44986: "screens" gets stuck during detach

 |  No, it doesn't. It uses pause(3).
 |  
 |  The problem can also be reproduced with cron(8):
 |  
 |  tron@lyssa:/#/etc/rc.d/cron restart
 |  Stopping cron.
 |  Waiting for PIDS: 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268^Z
 |  zsh: suspended  /etc/rc.d/cron restart
 |  tron@lyssa:/#pkill -9 cron
 |  tron@lyssa:/#fg
 |  [1]  + continued  /etc/rc.d/cron restart
 |  .

 But that code did not change? I just moved it into a separate function.

 christos

From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 16:20:59 +0100

 On Sat, May 21, 2011 at 10:36:06AM -0400, Christos Zoulas wrote:
 > On May 21,  2:30pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 > -- Subject: Re: kern/44986: "screens" gets stuck during detach
 > 
 > |  No, it doesn't. It uses pause(3).
 > |  
 > |  The problem can also be reproduced with cron(8):
 > |  
 > |  tron@lyssa:/#/etc/rc.d/cron restart
 > |  Stopping cron.
 > |  Waiting for PIDS: 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268, 268^Z
 > |  zsh: suspended  /etc/rc.d/cron restart
 > |  tron@lyssa:/#pkill -9 cron
 > |  tron@lyssa:/#fg
 > |  [1]  + continued  /etc/rc.d/cron restart
 > |  .
 > 
 > But that code did not change? I just moved it into a separate function.

 Yes, I saw that. And maybe it wasn't your change. But something broke
 signal handlign.

 	Kind regards
 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 11:54:39 -0400

 On May 21,  4:20pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 -- Subject: Re: kern/44986: "screens" gets stuck during detach

 | Yes, I saw that. And maybe it wasn't your change. But something broke
 | signal handlign.

 We'll need to do some bisection to find out I guess :-) I tested my changes.

 christos

From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 22:30:56 +0100

 On Sat, May 21, 2011 at 11:54:39AM -0400, Christos Zoulas wrote:
 > On May 21,  4:20pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 > -- Subject: Re: kern/44986: "screens" gets stuck during detach
 > 
 > | Yes, I saw that. And maybe it wasn't your change. But something broke
 > | signal handlign.
 > 
 > We'll need to do some bisection to find out I guess :-) I tested my changes.

 I've reverted these two commits ...

 http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
 http://mail-index.netbsd.org/source-changes/2011/05/18/msg022032.html

 ... in my own source tree and it fixed the problem.

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 21 May 2011 17:48:47 -0400

 On May 21, 10:30pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 -- Subject: Re: kern/44986: "screens" gets stuck during detach

 | I've reverted these two commits ...
 | 
 | http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
 | http://mail-index.netbsd.org/source-changes/2011/05/18/msg022032.html
 | 
 | ... in my own source tree and it fixed the problem.

 Interesting because my tree seems to work. I will check some more.

 christos

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tron@zhadum.org.uk
Cc: 
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sun, 22 May 2011 18:03:17 -0400

 On May 21,  9:35pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 -- Subject: Re: kern/44986: "screens" gets stuck during detach

 |  I've reverted these two commits ...
 |  
 |  http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
 |  http://mail-index.netbsd.org/source-changes/2011/05/18/msg022032.html
 |  
 |  ... in my own source tree and it fixed the problem.
 |  

 I cannot reproduce either the cron problem or the screen problem.

 christos

From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sun, 22 May 2011 23:34:56 +0100

 On Sun, May 22, 2011 at 06:03:17PM -0400, Christos Zoulas wrote:
 > On May 21,  9:35pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 > -- Subject: Re: kern/44986: "screens" gets stuck during detach
 > 
 > |  I've reverted these two commits ...
 > |  
 > |  http://mail-index.netbsd.org/source-changes/2011/05/18/msg022025.html
 > |  http://mail-index.netbsd.org/source-changes/2011/05/18/msg022032.html
 > |  
 > |  ... in my own source tree and it fixed the problem.
 > |  
 > 
 > I cannot reproduce either the cron problem or the screen problem.

 Unfortunately I can reproduce both reliable. Which port are you using?

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sun, 22 May 2011 18:42:07 -0400

 On May 22, 11:34pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 -- Subject: Re: kern/44986: "screens" gets stuck during detach

 | Unfortunately I can reproduce both reliable. Which port are you using?

 i386 and amd64. I am building on arm now.

 christos

From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Mon, 23 May 2011 22:27:11 +0100

 On Sun, May 22, 2011 at 06:42:07PM -0400, Christos Zoulas wrote:
 > On May 22, 11:34pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 > -- Subject: Re: kern/44986: "screens" gets stuck during detach
 > 
 > | Unfortunately I can reproduce both reliable. Which port are you using?
 > 
 > i386 and amd64. I am building on arm now.

 I can still reproduce this with today's NetBSD/i386 current.
 I've also tried a "GENERIC" kernel but it doesn't fix the problem.

 Does any of your test machines have foure or more cores?

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Mon, 23 May 2011 21:12:52 -0400

 On May 23, 10:27pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 -- Subject: Re: kern/44986: "screens" gets stuck during detach

 | On Sun, May 22, 2011 at 06:42:07PM -0400, Christos Zoulas wrote:
 | > On May 22, 11:34pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 | > -- Subject: Re: kern/44986: "screens" gets stuck during detach
 | > 
 | > | Unfortunately I can reproduce both reliable. Which port are you using?
 | > 
 | > i386 and amd64. I am building on arm now.
 | 
 | I can still reproduce this with today's NetBSD/i386 current.
 | I've also tried a "GENERIC" kernel but it doesn't fix the problem.
 | 
 | Does any of your test machines have foure or more cores?

 None of the machines I ran this yet. I will build a new one and try tomorrow.

 christos

From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Tue, 24 May 2011 07:35:36 +0100

 On Mon, May 23, 2011 at 09:12:52PM -0400, Christos Zoulas wrote:
 > | I can still reproduce this with today's NetBSD/i386 current.
 > | I've also tried a "GENERIC" kernel but it doesn't fix the problem.
 > | 
 > | Does any of your test machines have foure or more cores?
 > 
 > None of the machines I ran this yet. I will build a new one and try tomorrow.

 Any ideas how to debug this?

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Tue, 24 May 2011 08:42:11 -0400

 On May 24,  7:35am, tron@zhadum.org.uk (Matthias Scheler) wrote:
 -- Subject: Re: kern/44986: "screens" gets stuck during detach

 | On Mon, May 23, 2011 at 09:12:52PM -0400, Christos Zoulas wrote:
 | > | I can still reproduce this with today's NetBSD/i386 current.
 | > | I've also tried a "GENERIC" kernel but it doesn't fix the problem.
 | > | 
 | > | Does any of your test machines have foure or more cores?
 | > 
 | > None of the machines I ran this yet. I will build a new one and try tomorrow.
 | 
 | Any ideas how to debug this?

 Backout the select changes and leave the sigsuspend changes. If that fixes
 the problem, then the problem is with select. I don't see how putting the
 code in a separate function in sys_sig.c makes a difference.

 christos

From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Tue, 24 May 2011 14:17:09 +0100

 On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
 > Backout the select changes and leave the sigsuspend changes. If that fixes
 > the problem, then the problem is with select. I don't see how putting the
 > code in a separate function in sys_sig.c makes a difference.

 I'm wondering how the signal mask gets restored in case of select(2) now.
 There used to be code which did that. But the old signal mask is simply
 copied into a property of the "lwp" structure.

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Tue, 24 May 2011 11:37:47 -0400

 On May 24,  2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 -- Subject: Re: kern/44986: "screens" gets stuck during detach

 | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
 | > Backout the select changes and leave the sigsuspend changes. If that fixes
 | > the problem, then the problem is with select. I don't see how putting the
 | > code in a separate function in sys_sig.c makes a difference.
 | 
 | I'm wondering how the signal mask gets restored in case of select(2) now.
 | There used to be code which did that. But the old signal mask is simply
 | copied into a property of the "lwp" structure.

 The same way that it is restored for sigsuspend. Look in kern_sig.c

 christos

From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Wed, 25 May 2011 21:44:48 +0100

 On Tue, May 24, 2011 at 11:37:47AM -0400, Christos Zoulas wrote:
 > On May 24,  2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 > -- Subject: Re: kern/44986: "screens" gets stuck during detach
 > 
 > | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
 > | > Backout the select changes and leave the sigsuspend changes. If that fixes
 > | > the problem, then the problem is with select. I don't see how putting the
 > | > code in a separate function in sys_sig.c makes a difference.
 > | 
 > | I'm wondering how the signal mask gets restored in case of select(2) now.
 > | There used to be code which did that. But the old signal mask is simply
 > | copied into a property of the "lwp" structure.
 > 
 > The same way that it is restored for sigsuspend. Look in kern_sig.c

 I've reverted "sys/kern/sys_select.c" to revision 1.30 in my source tree
 and can no longer reproduce the problem.

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: Matthias Scheler <tron@zhadum.org.uk>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: Christos Zoulas <christos@zoulas.com>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Wed, 25 May 2011 22:25:37 +0100

 On Wed, May 25, 2011 at 08:45:02PM +0000, Matthias Scheler wrote:
 >  On Tue, May 24, 2011 at 11:37:47AM -0400, Christos Zoulas wrote:
 >  > On May 24,  2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 >  > -- Subject: Re: kern/44986: "screens" gets stuck during detach
 >  > 
 >  > | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
 >  > | > Backout the select changes and leave the sigsuspend changes. If that fixes
 >  > | > the problem, then the problem is with select. I don't see how putting the
 >  > | > code in a separate function in sys_sig.c makes a difference.
 >  > | 
 >  > | I'm wondering how the signal mask gets restored in case of select(2) now.
 >  > | There used to be code which did that. But the old signal mask is simply
 >  > | copied into a property of the "lwp" structure.
 >  > 
 >  > The same way that it is restored for sigsuspend. Look in kern_sig.c
 >  
 >  I've reverted "sys/kern/sys_select.c" to revision 1.30 in my source tree
 >  and can no longer reproduce the problem.

 Reverting to revision 1.31 does *not* fix the problem.

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Wed, 25 May 2011 18:25:56 -0400

 On May 25,  9:44pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 -- Subject: Re: kern/44986: "screens" gets stuck during detach

 | On Tue, May 24, 2011 at 11:37:47AM -0400, Christos Zoulas wrote:
 | > On May 24,  2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 | > -- Subject: Re: kern/44986: "screens" gets stuck during detach
 | > 
 | > | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
 | > | > Backout the select changes and leave the sigsuspend changes. If that fixes
 | > | > the problem, then the problem is with select. I don't see how putting the
 | > | > code in a separate function in sys_sig.c makes a difference.
 | > | 
 | > | I'm wondering how the signal mask gets restored in case of select(2) now.
 | > | There used to be code which did that. But the old signal mask is simply
 | > | copied into a property of the "lwp" structure.
 | > 
 | > The same way that it is restored for sigsuspend. Look in kern_sig.c
 | 
 | I've reverted "sys/kern/sys_select.c" to revision 1.30 in my source tree
 | and can no longer reproduce the problem.

 Ok, that is curious; can you ktrace screen and if it invokes select with
 a signal mask?

 christos

From: Matthias Scheler <tron@zhadum.org.uk>
To: christos@zoulas.com (Christos Zoulas)
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Wed, 25 May 2011 23:33:37 +0100

 On 25 May 2011, at 23:25, Christos Zoulas wrote:

 > On May 25,  9:44pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 > -- Subject: Re: kern/44986: "screens" gets stuck during detach
 >=20
 > | On Tue, May 24, 2011 at 11:37:47AM -0400, Christos Zoulas wrote:
 > | > On May 24,  2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 > | > -- Subject: Re: kern/44986: "screens" gets stuck during detach
 > | >=20
 > | > | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
 > | > | > Backout the select changes and leave the sigsuspend changes. =
 If that fixes
 > | > | > the problem, then the problem is with select. I don't see how =
 putting the
 > | > | > code in a separate function in sys_sig.c makes a difference.
 > | > |=20
 > | > | I'm wondering how the signal mask gets restored in case of =
 select(2) now.
 > | > | There used to be code which did that. But the old signal mask is =
 simply
 > | > | copied into a property of the "lwp" structure.
 > | >=20
 > | > The same way that it is restored for sigsuspend. Look in =
 kern_sig.c
 > |=20
 > | I've reverted "sys/kern/sys_select.c" to revision 1.30 in my source =
 tree
 > | and can no longer reproduce the problem.
 >=20
 > Ok, that is curious; can you ktrace screen and if it invokes select =
 with
 > a signal mask?

 You mean whether it uses pselect(2)? No, it doesn't, "pselect" is not =
 mentioned anywhere in the source code.

 	Kind regards

 --=20
 Matthias Scheler                           http://zhadum.org.uk/



From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Wed, 25 May 2011 18:37:29 -0400

 On May 25, 11:33pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 -- Subject: Re: kern/44986: "screens" gets stuck during detach

 | 
 | On 25 May 2011, at 23:25, Christos Zoulas wrote:
 | 
 | > On May 25,  9:44pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 | > -- Subject: Re: kern/44986: "screens" gets stuck during detach
 | >=20
 | > | On Tue, May 24, 2011 at 11:37:47AM -0400, Christos Zoulas wrote:
 | > | > On May 24,  2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 | > | > -- Subject: Re: kern/44986: "screens" gets stuck during detach
 | > | >=20
 | > | > | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
 | > | > | > Backout the select changes and leave the sigsuspend changes. If t=
 | hat fixes
 | > | > | > the problem, then the problem is with select. I don't see how put=
 | ting the
 | > | > | > code in a separate function in sys_sig.c makes a difference.
 | > | > |=20
 | > | > | I'm wondering how the signal mask gets restored in case of select(2=
 | ) now.
 | > | > | There used to be code which did that. But the old signal mask is si=
 | mply
 | > | > | copied into a property of the "lwp" structure.
 | > | >=20
 | > | > The same way that it is restored for sigsuspend. Look in kern_sig.c
 | > |=20
 | > | I've reverted "sys/kern/sys_select.c" to revision 1.30 in my source tre=
 | e
 | > | and can no longer reproduce the problem.
 | >=20
 | > Ok, that is curious; can you ktrace screen and if it invokes select with
 | > a signal mask?
 | 
 | You mean whether it uses pselect(2)? No, it doesn't, "pselect" is not menti=
 | oned anywhere in the source code.

 or pollts()... Let me look again at the code...

 christos

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, tron@zhadum.org.uk
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Fri, 27 May 2011 02:20:28 +0000 (UTC)

 > The following reply was made to PR kern/44986; it has been noted by GNATS.
 > 
 > From: christos@zoulas.com (Christos Zoulas)
 > To: Matthias Scheler <tron@zhadum.org.uk>
 > Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
 > Subject: Re: kern/44986: "screens" gets stuck during detach
 > Date: Tue, 24 May 2011 11:37:47 -0400
 > 
 >  On May 24,  2:17pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 >  -- Subject: Re: kern/44986: "screens" gets stuck during detach
 >  
 >  | On Tue, May 24, 2011 at 08:42:11AM -0400, Christos Zoulas wrote:
 >  | > Backout the select changes and leave the sigsuspend changes. If that fixes
 >  | > the problem, then the problem is with select. I don't see how putting the
 >  | > code in a separate function in sys_sig.c makes a difference.
 >  | 
 >  | I'm wondering how the signal mask gets restored in case of select(2) now.
 >  | There used to be code which did that. But the old signal mask is simply
 >  | copied into a property of the "lwp" structure.
 >  
 >  The same way that it is restored for sigsuspend. Look in kern_sig.c

 how does it work when select returns due to non-signal events?

 YAMAMOTO Takashi

 >  
 >  christos

From: christos@zoulas.com (Christos Zoulas)
To: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi), gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
	netbsd-bugs@netbsd.org, tron@zhadum.org.uk
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Fri, 27 May 2011 15:44:40 -0400

 On May 27,  2:20am, yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
 -- Subject: Re: kern/44986: "screens" gets stuck during detach

 | how does it work when select returns due to non-signal events?
 | 
 | YAMAMOTO Takashi

 Hmm, yes. I guess it will not restore the mask the same way. I will add a
 function to do it. The question is what uses pselect or pollts?

 christos

From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 28 May 2011 13:13:02 +0100

 On Fri, May 27, 2011 at 07:45:07PM +0000, Christos Zoulas wrote:
 > The following reply was made to PR kern/44986; it has been noted by GNATS.
 > 
 > From: christos@zoulas.com (Christos Zoulas)
 > To: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi), gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
 > 	netbsd-bugs@netbsd.org, tron@zhadum.org.uk
 > Subject: Re: kern/44986: "screens" gets stuck during detach
 > Date: Fri, 27 May 2011 15:44:40 -0400
 > 
 >  On May 27,  2:20am, yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
 >  -- Subject: Re: kern/44986: "screens" gets stuck during detach
 >  
 >  | how does it work when select returns due to non-signal events?
 >  | 
 >  | YAMAMOTO Takashi
 >  
 >  Hmm, yes. I guess it will not restore the mask the same way. I will add a
 >  function to do it. The question is what uses pselect or pollts?

 I've traced the calls with a non-NULL "mask" value. They are all
 made by sys___pollts50() via pollcommon().

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 28 May 2011 14:40:05 +0100

 On Sat, May 28, 2011 at 12:15:05PM +0000, Matthias Scheler wrote:
 >  On Fri, May 27, 2011 at 07:45:07PM +0000, Christos Zoulas wrote:
 >  > The following reply was made to PR kern/44986; it has been noted by GNATS.
 >  > 
 >  > From: christos@zoulas.com (Christos Zoulas)
 >  > To: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi), gnats-bugs@NetBSD.org
 >  > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
 >  > 	netbsd-bugs@netbsd.org, tron@zhadum.org.uk
 >  > Subject: Re: kern/44986: "screens" gets stuck during detach
 >  > Date: Fri, 27 May 2011 15:44:40 -0400
 >  > 
 >  >  On May 27,  2:20am, yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
 >  >  -- Subject: Re: kern/44986: "screens" gets stuck during detach
 >  >  
 >  >  | how does it work when select returns due to non-signal events?
 >  >  | 
 >  >  | YAMAMOTO Takashi
 >  >  
 >  >  Hmm, yes. I guess it will not restore the mask the same way. I will add a
 >  >  function to do it. The question is what uses pselect or pollts?
 >  
 >  I've traced the calls with a non-NULL "mask" value. They are all
 >  made by sys___pollts50() via pollcommon().

 And I think I found out why my system makes a lot of pollts(2) calls.
 This NetBSD system is a NIS client. And "src/lib/libc/rpc/clnt_dg.c"
 uses pollts(2) with a non-NULL argument for the mask.

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: christos@zoulas.com (Christos Zoulas)
To: Matthias Scheler <tron@zhadum.org.uk>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 28 May 2011 10:34:20 -0400

 On May 28,  2:40pm, tron@zhadum.org.uk (Matthias Scheler) wrote:
 -- Subject: Re: kern/44986: "screens" gets stuck during detach

 | On Sat, May 28, 2011 at 12:15:05PM +0000, Matthias Scheler wrote:
 | >  On Fri, May 27, 2011 at 07:45:07PM +0000, Christos Zoulas wrote:
 | >  > The following reply was made to PR kern/44986; it has been noted by GNATS.
 | >  > 
 | >  > From: christos@zoulas.com (Christos Zoulas)
 | >  > To: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi), gnats-bugs@NetBSD.org
 | >  > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
 | >  > 	netbsd-bugs@netbsd.org, tron@zhadum.org.uk
 | >  > Subject: Re: kern/44986: "screens" gets stuck during detach
 | >  > Date: Fri, 27 May 2011 15:44:40 -0400
 | >  > 
 | >  >  On May 27,  2:20am, yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
 | >  >  -- Subject: Re: kern/44986: "screens" gets stuck during detach
 | >  >  
 | >  >  | how does it work when select returns due to non-signal events?
 | >  >  | 
 | >  >  | YAMAMOTO Takashi
 | >  >  
 | >  >  Hmm, yes. I guess it will not restore the mask the same way. I will add a
 | >  >  function to do it. The question is what uses pselect or pollts?
 | >  
 | >  I've traced the calls with a non-NULL "mask" value. They are all
 | >  made by sys___pollts50() via pollcommon().
 | 
 | And I think I found out why my system makes a lot of pollts(2) calls.
 | This NetBSD system is a NIS client. And "src/lib/libc/rpc/clnt_dg.c"
 | uses pollts(2) with a non-NULL argument for the mask.

 Makes, sense. Which is one of the reasons we added it. This is why I don't
 see it on mine. Can you please try this:

 Index: kern/sys_select.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/sys_select.c,v
 retrieving revision 1.32
 diff -u -u -r1.32 sys_select.c
 --- kern/sys_select.c	18 May 2011 14:48:04 -0000	1.32
 +++ kern/sys_select.c	28 May 2011 14:33:06 -0000
 @@ -304,6 +304,9 @@
  	}
  	selclear();

 +	if (__predict_false(mask))
 +		sigsuspendteardown(l);
 +
  	/* select and poll are not restarted after signals... */
  	if (error == ERESTART)
  		return EINTR;
 Index: kern/sys_sig.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/sys_sig.c,v
 retrieving revision 1.33
 diff -u -u -r1.33 sys_sig.c
 --- kern/sys_sig.c	18 May 2011 03:51:41 -0000	1.33
 +++ kern/sys_sig.c	28 May 2011 14:33:07 -0000
 @@ -631,6 +631,19 @@
  	mutex_exit(p->p_lock);
  }

 +void
 +sigsuspendteardown(struct lwp *l)
 +{
 +	struct proc *p = l->l_proc;
 +
 +	mutex_enter(p->p_lock);
 +	if (l->l_sigrestore) {
 +		l->l_sigrestore = 0;
 +		l->l_sigmask = l->l_sigoldmask;
 +	}
 +	mutex_exit(p->p_lock);
 +}
 +
  int
  sigsuspend1(struct lwp *l, const sigset_t *ss)
  {
 Index: sys/signalvar.h
 ===================================================================
 RCS file: /cvsroot/src/sys/sys/signalvar.h,v
 retrieving revision 1.80
 diff -u -u -r1.80 signalvar.h
 --- sys/signalvar.h	18 May 2011 03:51:41 -0000	1.80
 +++ sys/signalvar.h	28 May 2011 14:33:08 -0000
 @@ -149,6 +149,7 @@
  int	sigprocmask1(struct lwp *, int, const sigset_t *, sigset_t *);
  void	sigpending1(struct lwp *, sigset_t *);
  void	sigsuspendsetup(struct lwp *, const sigset_t *);
 +void	sigsuspendteardown(struct lwp *);
  int	sigsuspend1(struct lwp *, const sigset_t *);
  int	sigaltstack1(struct lwp *, const struct sigaltstack *,
  	    struct sigaltstack *);

From: "Matthias Scheler" <tron@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44986 CVS commit: src
Date: Sat, 28 May 2011 15:24:50 +0000

 Module Name:	src
 Committed By:	tron
 Date:		Sat May 28 15:24:49 UTC 2011

 Modified Files:
 	src/distrib/sets/lists/tests: mi
 	src/tests/kernel: Makefile
 Added Files:
 	src/tests/kernel: t_pollts.c

 Log Message:
 Add two test cases for pollts(2):
 - The first tests basic functionality e.g. timeouts and correct events.
 - The second tests whether pollts(2) correctly restores the signal mask.
   This test currently fails because of PR kern/44986.


 To generate a diff of this commit:
 cvs rdiff -u -r1.338 -r1.339 src/distrib/sets/lists/tests/mi
 cvs rdiff -u -r1.10 -r1.11 src/tests/kernel/Makefile
 cvs rdiff -u -r0 -r1.1 src/tests/kernel/t_pollts.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->christos
Responsible-Changed-By: tron@NetBSD.org
Responsible-Changed-When: Sat, 28 May 2011 15:28:27 +0000
Responsible-Changed-Why:
Christos is working on a fix.


From: Matthias Scheler <tron@zhadum.org.uk>
To: Christos Zoulas <christos@zoulas.com>
Cc: NetBSD GNATS <gnats-bugs@NetBSD.org>
Subject: Re: kern/44986: "screens" gets stuck during detach
Date: Sat, 28 May 2011 16:31:48 +0100

 On Sat, May 28, 2011 at 10:34:20AM -0400, Christos Zoulas wrote:
 > | And I think I found out why my system makes a lot of pollts(2) calls.
 > | This NetBSD system is a NIS client. And "src/lib/libc/rpc/clnt_dg.c"
 > | uses pollts(2) with a non-NULL argument for the mask.
 > 
 > Makes, sense. Which is one of the reasons we added it. This is why I don't
 > see it on mine. Can you please try this:

 Yes, that fixes the problem. Please commit the change and remove the
 expected failure from "src/tests/kernel/t_pollts.c".

 	Thanks a lot

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44896 CVS commit: src/tests/kernel
Date: Sat, 28 May 2011 11:37:11 -0400

 Module Name:	src
 Committed By:	christos
 Date:		Sat May 28 15:37:11 UTC 2011

 Modified Files:
 	src/tests/kernel: t_pollts.c

 Log Message:
 PR/44896 has been fixed.
 BTW: We've created a mess here again with the directory structure of the
 tests. What goes in syscalls, what goes in sys, and what goes in kernel?
 I think we should follow the userland location for paths where those should
 be defined, so everything should go into libc/sys.


 To generate a diff of this commit:
 cvs rdiff -u -r1.1 -r1.2 src/tests/kernel/t_pollts.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Matthias Scheler" <tron@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44986 CVS commit: src/sys/rump/librump/rumpkern
Date: Sat, 28 May 2011 16:07:44 +0000

 Module Name:	src
 Committed By:	tron
 Date:		Sat May 28 16:07:44 UTC 2011

 Modified Files:
 	src/sys/rump/librump/rumpkern: signals.c

 Log Message:
 Fix rump build which got broken by the fix for PR kern/44986.


 To generate a diff of this commit:
 cvs rdiff -u -r1.9 -r1.10 src/sys/rump/librump/rumpkern/signals.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Matthias Scheler" <tron@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44986 CVS commit: src
Date: Sat, 28 May 2011 16:12:56 +0000

 Module Name:	src
 Committed By:	tron
 Date:		Sat May 28 16:12:56 UTC 2011

 Modified Files:
 	src/distrib/sets/lists/tests: mi
 	src/tests/kernel: Makefile
 	src/tests/syscall: Makefile
 Added Files:
 	src/tests/syscall: t_pollts.c
 Removed Files:
 	src/tests/kernel: t_pollts.c

 Log Message:
 Move regression test for PR kern/44986 from "kernel" to "syscalls" as
 the later directory seems to be a better fit.


 To generate a diff of this commit:
 cvs rdiff -u -r1.339 -r1.340 src/distrib/sets/lists/tests/mi
 cvs rdiff -u -r1.11 -r1.12 src/tests/kernel/Makefile
 cvs rdiff -u -r1.2 -r0 src/tests/kernel/t_pollts.c
 cvs rdiff -u -r1.27 -r1.28 src/tests/syscall/Makefile
 cvs rdiff -u -r0 -r1.1 src/tests/syscall/t_pollts.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: tron@NetBSD.org
State-Changed-When: Sat, 28 May 2011 16:17:25 +0000
State-Changed-Why:
The bug was fixed by this commit:

http://mail-index.netbsd.org/source-changes/2011/05/28/msg022625.html


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.