NetBSD Problem Report #43877

From apb@cequrux.com  Tue Sep 14 07:30:38 2010
Return-Path: <apb@cequrux.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 5CE8363BC98
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 14 Sep 2010 07:30:38 +0000 (UTC)
Message-Id: <20100914071601.BC38D100AA6F@apb-laptoy.apb.alt.za>
Date: Tue, 14 Sep 2010 06:11:45 +0000 (UTC)
From: apb@cequrux.com
Reply-To: apb@cequrux.com
To: gnats-bugs@gnats.NetBSD.org
Subject: named hangs with 5.99.39 kernel, 5.99.27 userland
X-Send-Pr-Version: 3.95

>Number:         43877
>Category:       kern
>Synopsis:       named hangs with 5.99.39 kernel, 5.99.27 userland
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Sep 14 07:35:00 +0000 2010
>Last-Modified:  Sun Dec 26 15:30:02 +0000 2010
>Originator:     Alan Barrett
>Release:        NetBSD 5.99.39
>Organization:
Not much
>Environment:
Kernel: NetBSD 5.99.39 i386
Userland: NetBSD 5.99.27 i387
>Description:

I booted a 5.99.39 kernel (built from sources checked out with cvs
update -D '2010-09-12 12:00 UTC') on a system with a userland from a few
months ago (version 5.99.27, built from sources checked out with cvs
update -D '2010-04-18 12:00 UTC').

One of the first things that I noticed was that "/atc/rc.d/named
forcerestart" hung.

>How-To-Repeat:

# /etc/rc.d/named forcerestart
Stopping named.
Waiting for PIDS: 177, 177, 177, 177, 177, 177, 177, 177, 177, 177,
177, 177, 177, 177, 177, 177, 177, 177, 177, 177, 177, 177, 177, 177,
177, 177, 177, 177, 177, 177, 177, 177, 177, 177, 177, 177, 177, 177,
177, 177, 177, 177, 177, 177, 177 [... this continues forever]

In another window:

# ps -axlsww | awk 'NR==1 || /named/ {print}'
  UID  PID PPID  CPU PRI NI    VSZ    RSS WCHAN    STAT TTY       TIME COMMAND
   14  177    1  542  85  0  31140  14348 kqueue   Isl  ?      0:00.15 /usr/sbin/named -u named -t /var/chroot/named
   14  177    1  542  43  0  31140  14348 parked   Isl  ?      0:00.15 /usr/sbin/named -u named -t /var/chroot/named
   14  177    1  542  43  0  31140  14348 parked   Isl  ?      0:00.15 /usr/sbin/named -u named -t /var/chroot/named
   14  177    1  542  43  0  31140  14348 parked   Isl  ?      0:00.15 /usr/sbin/named -u named -t /var/chroot/named
   14  177    1  542  85  0  31140  14348 sigwait  Isl  ?      0:00.15 /usr/sbin/named -u named -t /var/chroot/named
    0  387 1171  431  85  0   3144   1256 wait     I+   ttyp2  0:00.01 /bin/sh /etc/rc.d/named forcerestart
    0 1452  387    0  85  0   3144   1252 wait     S+   ttyp2  0:00.02 /bin/sh /etc/rc.d/named forcestop

>Fix:
Unknown.  It may also be relevant that "kqueue" appears in the report of
test failures from anita:

Failed test cases:
    fs/vfs/t_renamerace:lfs_renamerace,
    fs/vfs/t_renamerace:lfs_renamerace_dirs, lib/libevent/t_event:kqueue,
    lib/libevent/t_event:poll, lib/libevent/t_event:select,
    util/sort/t_sort:any_char

>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/43877: named hangs with 5.99.39 kernel, 5.99.27 userland
Date: Tue, 14 Sep 2010 18:07:33 +1000

 i saw this on sparc64, too.  i thought it was the old sparc vs pthreads
 vs.  bind issue, and disabled pthreads in bind and it make my bind work
 again.  i have a small patch to the bind build to fix the
 NAMED_USE_PTHREADS=no setting again i plan to commit once i've confirmed
 it doesn't break anything else.  (see below)

 in my case, i didn't get to starting named because dhcpcd tried to run
 "dig" which never exited for me, and my boot hung there.


 .mrg.


 Index: Makefile.inc
 ===================================================================
 RCS file: /cvsroot/src/external/bsd/bind/Makefile.inc,v
 retrieving revision 1.5
 diff -p -r1.5 Makefile.inc
 *** Makefile.inc	6 Aug 2010 10:58:03 -0000	1.5
 --- Makefile.inc	14 Sep 2010 08:06:28 -0000
 *************** CPPFLAGS+= -DLIBINTERFACE=${LIBINTERFACE
 *** 74,79 ****
 --- 74,80 ----
   .if ${NAMED_USE_PTHREADS} == "yes"
   # XXX: Not ready yet
   # CPPFLAGS+=	-DISC_PLATFORM_USE_NATIVE_RWLOCKS
 + CPPFLAGS+=	-DISC_PLATFORM_USETHREADS
   .if !defined (LIB) || empty(LIB)
   LDADD+= -lpthread
   DPADD+= ${LIBPTHREAD}
 Index: include/isc/platform.h
 ===================================================================
 RCS file: /cvsroot/src/external/bsd/bind/include/isc/platform.h,v
 retrieving revision 1.5
 diff -p -r1.5 platform.h
 *** include/isc/platform.h	6 Aug 2010 10:58:13 -0000	1.5
 --- include/isc/platform.h	14 Sep 2010 08:06:32 -0000
 ***************
 *** 207,213 ****
   /*
    * Defined if we are using threads.
    */
 ! #define ISC_PLATFORM_USETHREADS 1

   /*
    * Defined if unistd.h does not cause fd_set to be delared.
 --- 207,214 ----
   /*
    * Defined if we are using threads.
    */
 ! /* Put in the Makefile */
 ! /* #define ISC_PLATFORM_USETHREADS 1 */

   /*
    * Defined if unistd.h does not cause fd_set to be delared.

From: Alan Barrett <apb@cequrux.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/43877: named hangs with 5.99.39 kernel, 5.99.27 userland
Date: Sun, 26 Dec 2010 17:06:20 +0200

 On Tue, 14 Sep 2010, apb@cequrux.com wrote:
 > >Synopsis:       named hangs with 5.99.39 kernel, 5.99.27 userland
 > I booted a 5.99.39 kernel (built from sources checked out with cvs
 > update -D '2010-09-12 12:00 UTC') on a system with a userland from a few
 > months ago (version 5.99.27, built from sources checked out with cvs
 > update -D '2010-04-18 12:00 UTC').

 This is still an issue with a 5.99.40 kernel and 5.99.27 userland.  What
 can I do to help debug this failure of backward compatibility?

 I'd really like to upgrade from 5.99.27, but as long as an old userland
 doesn't work with a new kernel, I can't upgrade safely.

 --apb (Alan Barrett)

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, apb@cequrux.com
Cc: 
Subject: Re: kern/43877: named hangs with 5.99.39 kernel, 5.99.27 userland
Date: Sun, 26 Dec 2010 10:28:06 -0500

 On Dec 26,  3:10pm, apb@cequrux.com (Alan Barrett) wrote:
 -- Subject: Re: kern/43877: named hangs with 5.99.39 kernel, 5.99.27 userland

 | The following reply was made to PR kern/43877; it has been noted by GNATS.
 | 
 | From: Alan Barrett <apb@cequrux.com>
 | To: gnats-bugs@NetBSD.org
 | Cc: 
 | Subject: Re: kern/43877: named hangs with 5.99.39 kernel, 5.99.27 userland
 | Date: Sun, 26 Dec 2010 17:06:20 +0200
 | 
 |  On Tue, 14 Sep 2010, apb@cequrux.com wrote:
 |  > >Synopsis:       named hangs with 5.99.39 kernel, 5.99.27 userland
 |  > I booted a 5.99.39 kernel (built from sources checked out with cvs
 |  > update -D '2010-09-12 12:00 UTC') on a system with a userland from a few
 |  > months ago (version 5.99.27, built from sources checked out with cvs
 |  > update -D '2010-04-18 12:00 UTC').
 |  
 |  This is still an issue with a 5.99.40 kernel and 5.99.27 userland.  What
 |  can I do to help debug this failure of backward compatibility?
 |  
 |  I'd really like to upgrade from 5.99.27, but as long as an old userland
 |  doesn't work with a new kernel, I can't upgrade safely.

 What does a ktrace show? Where is it getting stuck?

 christos

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.