NetBSD Problem Report #50990

From martin@aprisoft.de  Mon Mar 21 08:33:55 2016
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 5AA607ABDE
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 21 Mar 2016 08:33:55 +0000 (UTC)
Message-Id: <20160321083338.F1FE6ED0E53@emmas.aprisoft.de>
Date: Mon, 21 Mar 2016 09:33:38 +0100 (CET)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: apache-2.2.31nb2 is not working well on netbsd-7
X-Send-Pr-Version: 3.95

>Number:         50990
>Category:       pkg
>Synopsis:       apache-2.2.31nb2 is not working well on netbsd-7
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    pkg-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 21 08:35:00 +0000 2016
>Last-Modified:  Wed Mar 30 09:45:00 +0000 2016
>Originator:     Martin Husemann
>Release:        NetBSD 7.0_STABLE
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD emmas.aprisoft.de 7.0_STABLE NetBSD 7.0_STABLE (EMMAS) #0: Tue Mar 15 18:40:15 CET 2016 martin@emmas.aprisoft.de:/usr/src-7/sys/arch/i386/compile/EMMAS i386
Architecture: i386
Machine: i386
>Description:

Not sure if this is a pkg or a kern problem - recording here first, need
to investigate more.

I updated this machine from netbsd-6 to netbsd-7 (latest checkout about a week
ago) and then recompiled all pkgs for it.

Since then, apache22 over night stops serving requests.
When I check in the morning, only a single httpd process is running and
it doesn't answer new connections.

0xbb864bb7 in accept () from /usr/lib/libc.so.12
(gdb) bt
#0  0xbb864bb7 in accept () from /usr/lib/libc.so.12
#1  0xbb9b34a5 in accept () from /usr/lib/libpthread.so.1
#2  0xbb347df4 in cgid_start () from /usr/pkg/lib/httpd/mod_cgid.so
#3  0xbb348a0a in cgid_init () from /usr/pkg/lib/httpd/mod_cgid.so
#4  0x08077381 in ap_run_post_config ()
#5  0x080929f1 in main ()

Another strange thing:
# /etc/rc.d/apache status
apache is not running.

and sometimes starting apache does nothing at all (it just exits with status 0).

>How-To-Repeat:
no idea

>Fix:
n/a

>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: pkg-manager@netbsd.org, gnats-admin@netbsd.org, pkgsrc-bugs@netbsd.org
Subject: Re: pkg/50990: apache-2.2.31nb2 is not working well on netbsd-7
Date: Mon, 21 Mar 2016 09:47:31 +0100

 On Mon, Mar 21, 2016 at 08:35:00AM +0000, martin@NetBSD.org wrote:
 > Not sure if this is a pkg or a kern problem - recording here first, need
 > to investigate more.
 > 
 > I updated this machine from netbsd-6 to netbsd-7 (latest checkout about a week
 > ago) and then recompiled all pkgs for it.
 > 
 > Since then, apache22 over night stops serving requests.
 > When I check in the morning, only a single httpd process is running and
 > it doesn't answer new connections.

 Do you see zombies httpd processes ? If so it could be
 http://mail-index.netbsd.org/current-users/2015/02/13/msg026686.html

 A workaround it to put
 Listen [::]:80
 Listen 0.0.0.0:80

 instead of
 Listen 0.0.0.0:80
 Listen [::]:80

 (i.e. but the wilcard v4 address last)

 if you don't use wilcard addresses then you problem is somtheing else.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: pkg/50990: apache-2.2.31nb2 is not working well on netbsd-7
Date: Mon, 21 Mar 2016 10:06:54 +0100

 On Mon, Mar 21, 2016 at 08:50:00AM +0000, Manuel Bouyer wrote:
 >  if you don't use wilcard addresses then you problem is somtheing else.

 This is an internal only (and v4 only) server, only Listen directive
 is:

 Listen 80

 But I found this error:

 [Mon Mar 21 09:39:20 2016] [error] (28)No space left on device: mod_python: Failed to create global mutex 1 of 8 (/tmp/mpmtx147861).

 but I am not sure if that is a path name for a temporary file or something
 else (shm_open?)

 I guess something there is leaking and will check all tmpfs'es next time
 it happens.

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: pkg/50990: apache-2.2.31nb2 is not working well on netbsd-7
Date: Tue, 22 Mar 2016 09:17:37 +0100

 It is not about space on /tmp, but shm_open() failing - I'll fix it.

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: pkg/50990: apache-2.2.31nb2 is not working well on netbsd-7
Date: Wed, 23 Mar 2016 22:24:45 +0100

 Ok, shm_open fixed, but that did not do it.

 ktrace shows:

  14631      1 httpd    CALL  write(2,0xbfbfa97c,0x4c)
  14631      1 httpd    GIO   fd 2 wrote 76 bytes
        "[Wed Mar 23 21:27:31 2016] [notice] mod_python: using mutex_directory \
         /tmp \n"
  14631      1 httpd    RET   write 76/0x4c
  14631      1 httpd    CALL  semget(0,1,0x380)
  14631      1 httpd    RET   semget 196616/0x30008
  14631      1 httpd    CALL  ____semctl50(0x30008,0,8,0xbfbfe944)
  14631      1 httpd    RET   ____semctl50 0
  14631      1 httpd    CALL  geteuid
  14631      1 httpd    RET   geteuid 0
  14631      1 httpd    CALL  __posix_chown(0xbfbfeb01,0x7fff,0xffffffff)
  14631      1 httpd    NAMI  "/tmp/mpmtx146310"
  14631      1 httpd    RET   __posix_chown -1 errno 2 No such file or directory

 and that is the first time "mpmtx146310" is ever mentioned in the kdump.
 I fail to map this to source code (upto the semctl and geteuid it is clear,
 but I can't find the __posix_chown call).

 However, the failure happens a few lines later with:

  14631      1 httpd    CALL  semget(0,1,0x380)
  14631      1 httpd    RET   semget -1 errno 28 No space left on device

 but that would mean the system is out of semaphores - but ipcs -T says:

 seminfo:
         semmap:     30  (# of entries in semaphore map)
         semmni:     10  (# of semaphore identifiers)
         semmns:     60  (# of semaphores in system)
         semmnu:     30  (# of undo structures in system)
         semmsl:     60  (max # of semaphores per id)
         semopm:    100  (max # of operations per semop call)
         semume:     10  (max # of undo entries per process)
         semusz:    100  (size in bytes of undo structure)
         semvmx:  32767  (semaphore maximum value)
         semaem:  16384  (adjust on exit max value)

 and this kernel is most i386 GENERIC (with a few unrealated additions).

 After httpd is gone, ipcs output is:

 IPC status from <running system> as of Wed Mar 23 22:23:47 2016

 Message Queues:
 T        ID     KEY        MODE       OWNER    GROUP

 Shared Memory:
 T        ID     KEY        MODE       OWNER    GROUP

 Semaphores:
 T        ID     KEY        MODE       OWNER    GROUP
 s    131072          0 --rw-------   nobody   nobody
 s    131073          0 --rw-------   nobody   nobody
 s    131074          0 --rw-------   nobody   nobody
 s    131075          0 --rw-------   nobody   nobody
 s    131076          0 --rw-------   nobody   nobody
 s    131077          0 --rw-------   nobody   nobody
 s    131078          0 --rw-------   nobody   nobody
 s    131079          0 --rw-------   nobody   nobody


 Any ideas where to look?

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: pkg/50990: apache-2.2.31nb2 is not working well on netbsd-7
Date: Thu, 24 Mar 2016 21:57:11 +0100

 After noticing that a regular "chown()" is renamed to posix_chown,
 I found the failing source:

 In mod_python.c::init_mutexes we find:

 #if !defined(OS2) && !defined(WIN32) && !defined(BEOS) && !defined(NETWARE)
         char fname[255];
         /* XXX What happens if len(mutex_dir) > 255 - len(mpmtx%d%d)? */
         snprintf(fname, 255, "%s/mpmtx%d%d", mutex_dir, glb->parent_pid, n);
 #else
         char *fname = NULL;
 #endif
         rc = apr_global_mutex_create(&mutex[n], fname, APR_LOCK_DEFAULT,   
                                      p);


 So here the magic filename is created, and then apr_global_mutex_create
 goes to a shmget() instead of using something in the file system.

 However, if the mutex creation works, we run into this:

 #if !defined(OS2) && !defined(WIN32) && !defined(BEOS) && !defined(NETWARE)
 #if AP_MODULE_MAGIC_AT_LEAST(20081201,0)
             ap_unixd_set_global_mutex_perms(mutex[n]);
 #else
             if (!geteuid()) {
                 chown(fname, unixd_config.user_id, -1);
                 unixd_set_global_mutex_perms(mutex[n]);
             }
 #endif
 #endif

 and execute the #else (20081201 is not in apache 2.2). The failure to chown
 is not fatal here.

 So it all boils down to 

  14631      1 httpd    CALL  semget(0,1,0x380)
  14631      1 httpd    RET   semget -1 errno 28 No space left on device

 and the man page says:

      [ENOSPC]           A new set of semaphores could not be created because
                         the system limit for the number of semaphores or the
                         number of semaphore sets has been reached.

 We are leaking semaphores somewhere but cleaning them up on shutdown
 of apache?

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: pkg/50990: apache-2.2.31nb2 is not working well on netbsd-7
Date: Fri, 25 Mar 2016 10:58:44 +0100

 I bumped the default limits in /etc/sysctl.conf and it is happier now -
 we'll see if it leaks over time.

 # Apache/Apr/mod_python need this
 kern.ipc.semmni=100
 kern.ipc.semmns=600
 kern.ipc.semmnu=300


 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: pkg/50990: apache-2.2.31nb2 is not working well on netbsd-7
Date: Mon, 28 Mar 2016 11:43:04 +0200

 On Fri, Mar 25, 2016 at 10:58:44AM +0100, Martin Husemann wrote:
 > I bumped the default limits in /etc/sysctl.conf and it is happier now -
 > we'll see if it leaks over time.
 > 
 > # Apache/Apr/mod_python need this
 > kern.ipc.semmni=100
 > kern.ipc.semmns=600
 > kern.ipc.semmnu=300

 Unfortunately it does not help.

 Is nobody else using mod_python and apache22 ?

 Martin

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: pkg-manager@netbsd.org, gnats-admin@netbsd.org, pkgsrc-bugs@netbsd.org,
        martin@NetBSD.org
Subject: Re: pkg/50990: apache-2.2.31nb2 is not working well on netbsd-7
Date: Wed, 30 Mar 2016 11:36:22 +0200

 On Mon, Mar 28, 2016 at 09:45:01AM +0000, Martin Husemann wrote:
 > The following reply was made to PR pkg/50990; it has been noted by GNATS.
 > 
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: pkg/50990: apache-2.2.31nb2 is not working well on netbsd-7
 > Date: Mon, 28 Mar 2016 11:43:04 +0200
 > 
 >  On Fri, Mar 25, 2016 at 10:58:44AM +0100, Martin Husemann wrote:
 >  > I bumped the default limits in /etc/sysctl.conf and it is happier now -
 >  > we'll see if it leaks over time.
 >  > 
 >  > # Apache/Apr/mod_python need this
 >  > kern.ipc.semmni=100
 >  > kern.ipc.semmns=600
 >  > kern.ipc.semmnu=300
 >  
 >  Unfortunately it does not help.
 >  
 >  Is nobody else using mod_python and apache22 ?

 I do, but not from HEAD (this is on netbsd-7/i386):
 ap22-py27-python-3.5.0nb1 Apache module that embeds the Python interpreter
 apache-2.2.31nb1    Apache HTTP (Web) server, version 2.2

 IPC status from <running system> as of Wed Mar 30 11:34:34 2016

 Message Queues:
 T        ID     KEY        MODE       OWNER    GROUP

 Shared Memory:
 T        ID     KEY        MODE       OWNER    GROUP
 m     65536    5432001 --rw-------    pgsql    pgsql

 Semaphores:
 T        ID     KEY        MODE       OWNER    GROUP
 s     65536    5432001 --rw-------    pgsql    pgsql
 s     65537    5432002 --rw-------    pgsql    pgsql
 s     65538    5432003 --rw-------    pgsql    pgsql
 s    589827          0 --rw-------      www      www
 s    589828          0 --rw-------      www      www
 s    589829          0 --rw-------      www      www
 s    589830          0 --rw-------      www      www
 s    589831          0 --rw-------      www      www


 I've not noticed problems like yours.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Martin Husemann <martin@duskware.de>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: pkg/50990: apache-2.2.31nb2 is not working well on netbsd-7
Date: Wed, 30 Mar 2016 11:43:17 +0200

 On Wed, Mar 30, 2016 at 11:36:22AM +0200, Manuel Bouyer wrote:
 > I do, but not from HEAD (this is on netbsd-7/i386):

 I am using -7 too, but pkgsrc HEAD.

 The semaphore issue seems to be fixed (now with the higher limits), but
 still restarting the server does not always work. I had a cron job
 running that shut it down for a short period at night, then did log
 rotation and various cleanup before firing it up again. This restart
 did not work. (This is an internal server mostly used to serve svn
 repos to restricted internal machines)

 I disabled that job now and changed log rotation to "|rotatelogs" and
 it keeps running.

 Still, something is wrong.

 Martin

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.