NetBSD Problem Report #41302
From martin@duskware.de Wed Apr 29 08:59:53 2009
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 4D7C363BC62
for <gnats-bugs@gnats.NetBSD.org>; Wed, 29 Apr 2009 08:59:53 +0000 (UTC)
Message-Id: <20090429085949.A971133AAC@mail.duskware.de>
Date: Wed, 29 Apr 2009 10:59:44 +0200 (CEST)
From: martin@duskware.de
Reply-To: martin@duskware.de
To: gnats-bugs@gnats.NetBSD.org
Subject: cron dies at startup
X-Send-Pr-Version: 3.95
>Number: 41302
>Category: port-sparc64
>Synopsis: cron dies at startup
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: martin
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Apr 29 09:00:00 +0000 2009
>Closed-Date: Thu May 21 13:26:17 +0000 2009
>Last-Modified: Tue May 26 19:20:08 +0000 2009
>Originator: Martin Husemann
>Release: NetBSD 5.0
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD setting-sun.duskware.de 5.0 NetBSD 5.0 (SETTINGSUN) #1: Wed Apr 29 07:57:43 CEST 2009 martin@night-porter.duskware.de:/usr/src-5/sys/arch/sparc64/compile/SETTINGSUN sparc64
Architecture: sparc64
Machine: sparc64
>Description:
After upgrading my system to 5.0, cron goes sometimes missing. This is not
100% reproducable, but often happens at system startup (i.e. init running
/etc/rc) - when I log in as root and run "/etc/rc.d/cron start" manually
it always seems to work and cron keeps running.
Maybe some resource limit problem preventing the inital fork?
>How-To-Repeat:
s/a
>Fix:
n/a
>Release-Note:
>Audit-Trail:
From: "Jeremy C. Reed" <reed@reedmedia.net>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/41302: cron dies at startup
Date: Wed, 29 Apr 2009 09:02:17 -0500 (CDT)
I wonder if this is related to
http://mail-index.netbsd.org/netbsd-users/2009/02/09/msg002977.html
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, martin@duskware.de
Subject: re: bin/41302: cron dies at startup
Date: Thu, 30 Apr 2009 17:06:29 +1000
From: "Jeremy C. Reed" <reed@reedmedia.net>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/41302: cron dies at startup
Date: Wed, 29 Apr 2009 09:02:17 -0500 (CDT)
I wonder if this is related to
http://mail-index.netbsd.org/netbsd-users/2009/02/09/msg002977.html
i think it's the same problem.
that one is very strange. clearly it is dying in daemon() right
after fork returns... but it has something to do with SIGHUP occuring
right at this moment:
429 1 cron CALL fork
429 1 cron RET fork 343/0x157
429 1 cron CALL exit(0)
343 1 cron EMUL "netbsd"
343 1 cron PSIG SIGHUP caught handler=0x102880 mask=():
code=SI_NOINFO
343 1 cron RET fork 0
343 1 cron CALL setcontext(0xffffffffffffb660)
343 1 cron RET setcontext JUSTRETURN
343 1 cron CALL getpid
343 1 cron RET getpid 343/0x157, 1
343 1 cron CALL gettimeofday(0xffffffffffffab40,0)
343 1 cron RET gettimeofday 0
429 is the parent, and 343 is the child. the parent fork()'s and exits
just like in daemon() but the child doesn't really get to run any more.
the first thing it should do is call setsid(), but we don't see that
before we see the failure starting (getpid/gettimeofday both are used
to generate the failure message.)
cron has a SIGHUP handler that looks like:
static void
sighup_handler(int x __unused)
{
log_close();
}
void
log_close(void) {
if (LogFD != ERR) {
close(LogFD);
LogFD = ERR;
}
}
in the above log, pid 343 starts in emul netbsd, gets a SIGHUP and
has a handler (does it run here? i'm not sure.) but then we get
the RET into this child right after, and then a setcontext... i'm
not sure what exactly is going on here, but this is clearly where
it all goes wrong. why is a SIGHUP happening, and why is it making
the child fail?
.mrg.
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/41302: cron dies at startup
Date: Sun, 3 May 2009 11:04:27 +0200
This is a funny one:
init (pid 1) runs /bin/sh (pid 2) to execute /etc/rc. Now in my startup, cron
is the last daemon to start. While cron is doing the daemonize() dance,
sh is done and exits, via exit1(), which contains this code:
352 if (tp->t_session == sp) {
353 /* we can't guarantee the revoke will do this */
354 pgrp = tp->t_pgrp;
355 tp->t_pgrp = NULL;
356 tp->t_session = NULL;
357 mutex_spin_exit(&tty_lock);
358 if (pgrp != NULL) {
359 pgsignal(pgrp, SIGHUP, 1);
360 }
pgrp is 2, and the cron parent process is still in this group.
*booom*
I wonder if deamonize() should sigignore SIGHUP (and undo that in the child)?
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/41302: cron dies at startup
Date: Sun, 3 May 2009 12:14:28 +0200
--GvXjxJ+pjyke8COw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
This is a patch similar to what FreeBSD did to fix this problem (only
difference is restoring the signal handler when fork() fails.
OK to commit?
Martin
--GvXjxJ+pjyke8COw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=patch
Index: daemon.c
===================================================================
RCS file: /cvsroot/src/lib/libc/gen/daemon.c,v
retrieving revision 1.9
diff -u -r1.9 daemon.c
--- daemon.c 7 Aug 2003 16:42:46 -0000 1.9
+++ daemon.c 3 May 2009 10:11:17 -0000
@@ -39,9 +39,11 @@
#endif /* LIBC_SCCS and not lint */
#include "namespace.h"
+#include <errno.h>
#include <fcntl.h>
#include <paths.h>
#include <stdlib.h>
+#include <signal.h>
#include <unistd.h>
#ifdef __weak_alias
@@ -52,10 +54,25 @@
daemon(nochdir, noclose)
int nochdir, noclose;
{
+ struct sigaction osa, sa;
int fd;
+ pid_t newgrp;
+ int oerrno;
+ int osa_ok;
+
+ /* A SIGHUP may be thrown when the parent exits below. */
+ sigemptyset(&sa.sa_mask);
+ sa.sa_handler = SIG_IGN;
+ sa.sa_flags = 0;
+ osa_ok = sigaction(SIGHUP, &sa, &osa);
switch (fork()) {
case -1:
+ if (osa_ok != -1) {
+ oerrno = errno;
+ sigaction(SIGHUP, &osa, NULL);
+ errno = oerrno;
+ }
return (-1);
case 0:
break;
@@ -63,8 +80,14 @@
_exit(0);
}
- if (setsid() == -1)
+ newgrp = setsid();
+ oerrno = errno;
+ if (osa_ok != -1)
+ sigaction(SIGHUP, &osa, NULL);
+ if (newgrp == -1) {
+ errno = oerrno;
return (-1);
+ }
if (!nochdir)
(void)chdir("/");
--GvXjxJ+pjyke8COw--
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/41302: cron dies at startup
Date: Mon, 18 May 2009 11:25:37 +0200
On Sun, May 03, 2009 at 11:04:27AM +0200, Martin Husemann wrote:
> pgrp is 2, and the cron parent process is still in this group.
I looked a bit further and it seems that the cron signal handler runs,
returns, and then fork returns to the daemonize() call with child pid = -1
and errno = 0.
Looks like a sparc64 specific bug...
Martin
From: Martin Husemann <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/41302 CVS commit: src/sys/arch/sparc64/sparc64
Date: Thu, 21 May 2009 13:24:38 +0000
Module Name: src
Committed By: martin
Date: Thu May 21 13:24:38 UTC 2009
Modified Files:
src/sys/arch/sparc64/sparc64: vm_machdep.c
Log Message:
Deja Vu: when preparing the initial trap frame for a new forked lwp,
explicitly clear condition code. Otherwise we might catch a signal
(handlers are inherited from the parent) before we ever return to
userland. The current trapframe is converted into a ucontext and after
the signal handler returns, the lwp stays in userland and directly
uses the ucontext to return to the fork call.
Fixes PR 41302.
To generate a diff of this commit:
cvs rdiff -u -r1.87 -r1.88 src/sys/arch/sparc64/sparc64/vm_machdep.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: bin-bug-people->martin
Responsible-Changed-By: martin@NetBSD.org
Responsible-Changed-When: Thu, 21 May 2009 13:26:17 +0000
Responsible-Changed-Why:
I broke it (again)
State-Changed-From-To: open->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Thu, 21 May 2009 13:26:17 +0000
State-Changed-Why:
I fixed it
From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/41302 CVS commit: [netbsd-5-0] src/sys/arch/sparc64/sparc64
Date: Tue, 26 May 2009 19:18:05 +0000
Module Name: src
Committed By: snj
Date: Tue May 26 19:18:05 UTC 2009
Modified Files:
src/sys/arch/sparc64/sparc64 [netbsd-5-0]: vm_machdep.c
Log Message:
Pull up following revision(s) (requested by martin in ticket #774):
sys/arch/sparc64/sparc64/vm_machdep.c: revision 1.88
Deja Vu: when preparing the initial trap frame for a new forked lwp,
explicitly clear condition code. Otherwise we might catch a signal
(handlers are inherited from the parent) before we ever return to
userland. The current trapframe is converted into a ucontext and after
the signal handler returns, the lwp stays in userland and directly
uses the ucontext to return to the fork call.
Fixes PR 41302.
To generate a diff of this commit:
cvs rdiff -u -r1.84 -r1.84.6.1 src/sys/arch/sparc64/sparc64/vm_machdep.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/41302 CVS commit: [netbsd-5] src/sys/arch/sparc64/sparc64
Date: Tue, 26 May 2009 19:19:53 +0000
Module Name: src
Committed By: snj
Date: Tue May 26 19:19:53 UTC 2009
Modified Files:
src/sys/arch/sparc64/sparc64 [netbsd-5]: vm_machdep.c
Log Message:
Pull up following revision(s) (requested by martin in ticket #774):
sys/arch/sparc64/sparc64/vm_machdep.c: revision 1.88
Deja Vu: when preparing the initial trap frame for a new forked lwp,
explicitly clear condition code. Otherwise we might catch a signal
(handlers are inherited from the parent) before we ever return to
userland. The current trapframe is converted into a ucontext and after
the signal handler returns, the lwp stays in userland and directly
uses the ucontext to return to the fork call.
Fixes PR 41302.
To generate a diff of this commit:
cvs rdiff -u -r1.84 -r1.84.4.1 src/sys/arch/sparc64/sparc64/vm_machdep.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.