NetBSD Problem Report #53005

From SRS0=+HUE=FE=mail.csel.org=clare@csel.org  Sat Feb 10 18:45:26 2018
Return-Path: <SRS0=+HUE=FE=mail.csel.org=clare@csel.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id B857E7A18A
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 10 Feb 2018 18:45:26 +0000 (UTC)
Message-Id: <20180210184522.E0E30FE42@mail.csel.org>
Date: Sun, 11 Feb 2018 03:45:22 +0900 (JST)
From: Shinichi Doyashiki <clare@csel.org>
Reply-To: clare@csel.org
To: gnats-bugs@NetBSD.org
Subject: apache httpd can hang the system
X-Send-Pr-Version: 3.95

>Number:         53005
>Category:       kern
>Synopsis:       apache httpd can hang the system
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 10 18:50:00 +0000 2018
>Last-Modified:  Sun Feb 18 09:50:01 +0000 2018
>Originator:     Shinichi Doyashiki
>Release:        NetBSD 8.99.12
>Organization:
	at home
>Environment:
System: NetBSD kotori.csel.org 8.99.12 NetBSD 8.99.12 (CONOHAVPS) #3: Sun Jan 21 16:34:31 JST 2018 clare@mizuki.csel.org:/export/stage/hack/sys/arch/amd64/compile/CONOHAVPS amd64
Architecture: x86_64
Machine: amd64
>Description:
	recent environment behaves something strange.
	running "/usr/local/apache/bin/apachectl stop" is ok.
	running "/usr/local/apache/bin/apachectl start" is also ok.
	running "/usr/local/apache/bin/apachectl restart" is bad.
	the system hangs or behaves something storange after
	the operation.  system reboot is required to recover
	the system.

Server version: Apache/2.4.29 (Unix)
Server built:   Nov 19 2017 18:59:53
Server's Module Magic Number: 20120211:68
Server loaded:  APR 1.6.3, APR-UTIL 1.6.1
Compiled using: APR 1.6.3, APR-UTIL 1.6.1
Architecture:   64-bit
Server MPM:     event
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)
Server compiled with....
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses disabled)
 -D APR_USE_SYSVSEM_SERIALIZE
 -D APR_USE_PTHREAD_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D DYNAMIC_MODULE_LIMIT=256
 -D HTTPD_ROOT="/usr/local/apache"
 -D SUEXEC_BIN="/usr/local/apache/bin/suexec"
 -D DEFAULT_PIDLOG="logs/httpd.pid"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D AP_TYPES_CONFIG_FILE="conf/mime.types"
 -D SERVER_CONFIG_FILE="conf/httpd.conf"

#!/bin/sh
name=httpd
version=2.4.29
srcdir=$name-$version
objdir=$name-obj

rm -rf $objdir
mkdir -p $objdir
cd $objdir || exit

../$srcdir/configure \
	--prefix=/usr/local/apache \
	--with-nghttp2=/usr/pkg \
	--with-apr=/usr/pkg \
	--enable-so \
	--enable-http2

>How-To-Repeat:
	unknown yet.
>Fix:
	unknown yet.

>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53005: apache httpd can hang the system
Date: Sun, 11 Feb 2018 19:09:03 +0100

 On Sat, Feb 10, 2018 at 06:50:00PM +0000, Shinichi Doyashiki wrote:
 > >Description:
 > 	recent environment behaves something strange.
 > 	running "/usr/local/apache/bin/apachectl stop" is ok.
 > 	running "/usr/local/apache/bin/apachectl start" is also ok.
 > 	running "/usr/local/apache/bin/apachectl restart" is bad.
 > 	the system hangs or behaves something storange after
 > 	the operation.  system reboot is required to recover
 > 	the system.

 When it hangs, can you enter ddb and get a backtrace of the apache process ?

 When you says "behaves strange", what happens exactly ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: clare@csel.org
To: gnats-bugs@NetBSD.org
Cc: Manuel Bouyer <bouyer@antioche.eu.org>, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53005: apache httpd can hang the system
Date: Mon, 12 Feb 2018 04:43:05 +0900

 thank you for quick response.

 On Sun, 11 Feb 2018 18:10:00 +0000 (UTC)
 Manuel Bouyer <bouyer@antioche.eu.org> wrote:

 >  When it hangs, can you enter ddb and get a backtrace of the apache process ?

 the VPS provider does not allow hitting ALT+CTRL+ESC on the console
 remotely, i should change the ddb magic sequence to do enter ddb.
 i want change "hw.cnmagic" to ALT+CTRL+DEL or +++, how do i it?
 changing it to \x2b\x2b\x2b does not seems to be took effective.


 >  When you says "behaves strange", what happens exactly ?

 apachectl can hang the system, hang the sshd process connected,
 and it does not respond ssh connect request after happens.
 trying console logins hang long time.
 ping does respond.

 the server exists on the VPS provider, HDD/SSD activity
 is unknown.  the processor is single core.


 -- 
 Shinichi Doyashiki <clare@csel.org>

From: clare@csel.org
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: kern/53005: apache httpd can hang the system
Date: Mon, 12 Feb 2018 15:40:38 +0900

 >  When it hangs, can you enter ddb and get a backtrace of the apache process ?

 db{0}> bt
 breakpoint() at netbsd:breakpoint+0x5
 wskbd_translate() at netbsd:wskbd_translate+0xbb4
 wskbd_input() at netbsd:wskbd_input+0x5b
 pckbd_input() at netbsd:pckbd_input+0x6b
 pckbcintr() at netbsd:pckbcintr+0x8d
 intr_bloglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
 Xintr_ioapic_edge4() at netbsd:Xintr_ioapic_edge4+0xf1
 --- interrupt ---
 exit_lwps() at netbsd:exit_lwps+0x58
 exit1() at netbsd:exit1+0x68
 sys_exit() at netbsd:sys_eit+0x3d
 syscall() at netbsd:syscall+0x1d8
 --- syscall (number 1) ---

 db{0}> trace/t 50
 trace: pid 80 lid 40 at 0xffff80002a503db0
 ?() at ffffe4001fa3b5c0
 lwp_exit_switchaway() at netbsd:lwp_exit_switchaway+0x1ac
 Bad frame pointer: 0xffffe4001a1c8300


 -- 
 Shinichi Doyashiki <clare@csel.org>

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: clare@csel.org
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: kern/53005: apache httpd can hang the system
Date: Mon, 12 Feb 2018 11:34:08 +0100

 On Mon, Feb 12, 2018 at 04:43:05AM +0900, clare@csel.org wrote:
 > thank you for quick response.
 > 
 > On Sun, 11 Feb 2018 18:10:00 +0000 (UTC)
 > Manuel Bouyer <bouyer@antioche.eu.org> wrote:
 > 
 > >  When it hangs, can you enter ddb and get a backtrace of the apache process ?
 > 
 > the VPS provider does not allow hitting ALT+CTRL+ESC on the console
 > remotely, i should change the ddb magic sequence to do enter ddb.
 > i want change "hw.cnmagic" to ALT+CTRL+DEL or +++, how do i it?
 > changing it to \x2b\x2b\x2b does not seems to be took effective.

 just:
 sysctl -w h.cnmagic=+++

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Paul Goyette <paul@whooppee.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: clare@csel.org, gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
    gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53005: apache httpd can hang the system
Date: Mon, 12 Feb 2018 18:45:13 +0800 (+08)

 On Mon, 12 Feb 2018, Manuel Bouyer wrote:

 > On Mon, Feb 12, 2018 at 04:43:05AM +0900, clare@csel.org wrote:
 >> thank you for quick response.
 >>
 >> On Sun, 11 Feb 2018 18:10:00 +0000 (UTC)
 >> Manuel Bouyer <bouyer@antioche.eu.org> wrote:
 >>
 >>>  When it hangs, can you enter ddb and get a backtrace of the apache process ?
 >>
 >> the VPS provider does not allow hitting ALT+CTRL+ESC on the console
 >> remotely, i should change the ddb magic sequence to do enter ddb.
 >> i want change "hw.cnmagic" to ALT+CTRL+DEL or +++, how do i it?
 >> changing it to \x2b\x2b\x2b does not seems to be took effective.
 >
 > just:
 > sysctl -w h.cnmagic=+++

 Note that this should be

  	sysctl -w hw.cnmagic=+++

 (the first part of the variable name is "hw" not "h"!)


 +------------------+--------------------------+----------------------------+
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:          |
 | (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
 | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
 +------------------+--------------------------+----------------------------+

From: clare@csel.org
To: gnats-bugs@NetBSD.org
Cc: Paul Goyette <paul@whooppee.com>, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53005: apache httpd can hang the system
Date: Mon, 12 Feb 2018 20:36:10 +0900

 thank you informations.

 On Mon, 12 Feb 2018 10:50:01 +0000 (UTC)
 Paul Goyette <paul@whooppee.com> wrote:

 > The following reply was made to PR kern/53005; it has been noted by GNATS.
 > 
 > From: Paul Goyette <paul@whooppee.com>
 > To: Manuel Bouyer <bouyer@antioche.eu.org>
 > Cc: clare@csel.org, gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
 >     gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 > Subject: Re: kern/53005: apache httpd can hang the system
 > Date: Mon, 12 Feb 2018 18:45:13 +0800 (+08)
 > 
 >  On Mon, 12 Feb 2018, Manuel Bouyer wrote:
 >  
 >  > On Mon, Feb 12, 2018 at 04:43:05AM +0900, clare@csel.org wrote:
 >  >> thank you for quick response.
 >  >>
 >  >> On Sun, 11 Feb 2018 18:10:00 +0000 (UTC)
 >  >> Manuel Bouyer <bouyer@antioche.eu.org> wrote:
 >  >>
 >  >>>  When it hangs, can you enter ddb and get a backtrace of the apache process ?
 >  >>
 >  >> the VPS provider does not allow hitting ALT+CTRL+ESC on the console
 >  >> remotely, i should change the ddb magic sequence to do enter ddb.
 >  >> i want change "hw.cnmagic" to ALT+CTRL+DEL or +++, how do i it?
 >  >> changing it to \x2b\x2b\x2b does not seems to be took effective.
 >  >
 >  > just:
 >  > sysctl -w h.cnmagic=+++
 >  
 >  Note that this should be
 >  
 >   	sysctl -w hw.cnmagic=+++
 >  
 >  (the first part of the variable name is "hw" not "h"!)

 finally, i did the following local patch, i wanted to
 change hotkey from ALT+CTRL+ESC to ALT+CTRL+DEL,
 but result was ALT+CTRL+BACKSPACE.

 serial console was tryed but not usable on some cloud
 VPS environment...

 Index: wskbdmap_mfii.c
 ===================================================================
 RCS file: /export/cvsroot/netbsd/src/sys/dev/pckbport/wskbdmap_mfii.c,v
 retrieving revision 1.25
 diff -u -r1.25 wskbdmap_mfii.c
 --- wskbdmap_mfii.c     14 Jul 2014 10:05:24 -0000      1.25
 +++ wskbdmap_mfii.c     12 Feb 2018 06:14:40 -0000
 @@ -42,7 +42,7 @@

  static const keysym_t pckbd_keydesc_us[] = {
  /*  pos      command           normal          shifted */
 -    KC(1),   KS_Cmd_Debugger,  KS_Escape,
 +    KC(1),                     KS_Escape,
      KC(2),                     KS_1,           KS_exclam,
      KC(3),                     KS_2,           KS_at,
      KC(4),                     KS_3,           KS_numbersign,
 @@ -55,7 +55,7 @@
      KC(11),                    KS_0,           KS_parenright,
      KC(12),                    KS_minus,       KS_underscore,
      KC(13),                    KS_equal,       KS_plus,
 -    KC(14),  KS_Cmd_ResetEmul, KS_Delete,
 +    KC(14),  KS_Cmd_Debugger,  KS_Delete,
      KC(15),                    KS_Tab,
      KC(16),                    KS_q,
      KC(17),                    KS_w,


 -- 
 Shinichi Doyashiki <clare@csel.org>

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, clare@csel.org
Cc: 
Subject: Re: kern/53005: apache httpd can hang the system
Date: Mon, 12 Feb 2018 07:29:41 -0500

 On Feb 12,  6:45am, clare@csel.org (clare@csel.org) wrote:
 -- Subject: Re: kern/53005: apache httpd can hang the system

 Is your kernel DEBUG/DIAGNOSTIC/LOCKDEBUG?

 christos

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: clare@csel.org
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: kern/53005: apache httpd can hang the system
Date: Mon, 12 Feb 2018 20:18:06 +0100

 On Mon, Feb 12, 2018 at 03:40:38PM +0900, clare@csel.org wrote:
 > >  When it hangs, can you enter ddb and get a backtrace of the apache process ?
 > 
 > db{0}> bt
 > breakpoint() at netbsd:breakpoint+0x5
 > wskbd_translate() at netbsd:wskbd_translate+0xbb4
 > wskbd_input() at netbsd:wskbd_input+0x5b
 > pckbd_input() at netbsd:pckbd_input+0x6b
 > pckbcintr() at netbsd:pckbcintr+0x8d
 > intr_bloglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
 > Xintr_ioapic_edge4() at netbsd:Xintr_ioapic_edge4+0xf1
 > --- interrupt ---
 > exit_lwps() at netbsd:exit_lwps+0x58
 > exit1() at netbsd:exit1+0x68
 > sys_exit() at netbsd:sys_eit+0x3d
 > syscall() at netbsd:syscall+0x1d8
 > --- syscall (number 1) ---
 > 
 > db{0}> trace/t 50
 > trace: pid 80 lid 40 at 0xffff80002a503db0
 > ?() at ffffe4001fa3b5c0
 > lwp_exit_switchaway() at netbsd:lwp_exit_switchaway+0x1ac
 > Bad frame pointer: 0xffffe4001a1c8300

 Do you want to backtrace pid 50 (decimal) or 80 (decimal) ?
 if you want to trace pid 50, you have to use:
 trace/t 0t50
 (numbers are interpreted as hex by default).

 Also it would be interesting to see which pid was interupted by entering ddb.
 I wonder if it's stuck looping in exit_lwps()

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: clare@csel.org
To: gnats-bugs@NetBSD.org
Cc: Manuel Bouyer <bouyer@antioche.eu.org>, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53005: apache httpd can hang the system
Date: Wed, 14 Feb 2018 00:41:30 +0900

 On Mon, 12 Feb 2018 19:20:01 +0000 (UTC)
 Manuel Bouyer <bouyer@antioche.eu.org> wrote:

 > From: Manuel Bouyer <bouyer@antioche.eu.org>
 > To: clare@csel.org
 > Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 >         netbsd-bugs@netbsd.org
 > Subject: Re: kern/53005: apache httpd can hang the system
 > Date: Mon, 12 Feb 2018 20:18:06 +0100
 > 
 >  On Mon, Feb 12, 2018 at 03:40:38PM +0900, clare@csel.org wrote:
 >  > >  When it hangs, can you enter ddb and get a backtrace of the apache process ?
 >  > 
 >  > db{0}> bt
 >  > breakpoint() at netbsd:breakpoint+0x5
 >  > wskbd_translate() at netbsd:wskbd_translate+0xbb4
 >  > wskbd_input() at netbsd:wskbd_input+0x5b
 >  > pckbd_input() at netbsd:pckbd_input+0x6b
 >  > pckbcintr() at netbsd:pckbcintr+0x8d
 >  > intr_bloglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
 >  > Xintr_ioapic_edge4() at netbsd:Xintr_ioapic_edge4+0xf1
 >  > --- interrupt ---
 >  > exit_lwps() at netbsd:exit_lwps+0x58
 >  > exit1() at netbsd:exit1+0x68
 >  > sys_exit() at netbsd:sys_eit+0x3d
 >  > syscall() at netbsd:syscall+0x1d8
 >  > --- syscall (number 1) ---
 >  > 
 >  > db{0}> trace/t 50
 >  > trace: pid 80 lid 40 at 0xffff80002a503db0
 >  > ?() at ffffe4001fa3b5c0
 >  > lwp_exit_switchaway() at netbsd:lwp_exit_switchaway+0x1ac
 >  > Bad frame pointer: 0xffffe4001a1c8300
 >  
 >  Do you want to backtrace pid 50 (decimal) or 80 (decimal) ?
 >  if you want to trace pid 50, you have to use:
 >  trace/t 0t50
 >  (numbers are interpreted as hex by default).

 the PID was 80 as of showing ps command in ddb.
 i thounght 80 was shown in decimal.

 screenshot is here:
 https://www.csel.org/netbsd/pr/53005/ddb-2018-2-12.png


 >  Also it would be interesting to see which pid was interupted by entering ddb.
 >  I wonder if it's stuck looping in exit_lwps()

 I cannot reproduce the problem on the bare metal multiprocessor local
 machine, currently.

 when I enabled both LOCKDEBUG and DEBUG options, then problem was gone.

 when I enabled DEBUG and without LOCKDEBUG options, then problem was gone.

 when I enabled LOCKDEBUG and without DEBUG options, then problem was appeared,
 screenshots are in https://www.csel.org/netbsd/pr/53005/lockdebug/


 -- 
 Shinichi Doyashiki <clare@csel.org>

From: clare@csel.org
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: kern/53005: apache httpd can hang the system
Date: Sun, 18 Feb 2018 18:46:59 +0900

 kernel as of today was changed the situation.
 running "apachectl restart" is ok,
 running "apachectl stop" hangs.

 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 rip 0xffff... cs 0x8 rflags 0x202 cr2 0x...
  ilevel 0x6 rsp 0xffff...
 curlwp 0xffffe400... pid 368.1 lowest kstack 0xffff8...
 Stopped in pid 368.1 (httpd) at netbsd:...
 breakpoint() at netbsd:...
 wskbd_translate() at netbsd:...
 wskbd_input() at netbsd:...
 pckbd_input() at netbsd:...
 pckbcintr() at netbsd:...
 intr_biglock_wrapper() at netbsd:...
 handle_ioapic_edge4() at netbsd:...
 exit1() at netbsd:exit1+0x68
 sys_exit() at netbsd:sys_exit+0x3d
 syscall() at netbsd:syscall+0x1d8
 --- syscall (number 1) ---

 db{0}> trace/t 0x368
 trace: pid 368 lid 37 at 0xffff8000...
 sleepq_block() at netbsd:sleepq_block+0x97
 lwp_park() at netbsd:lwp_park+0x143
 sys____lwp_park60() at netbsd:sys____lwp_park60+0x54
 syscall() at netbsd:syscall+0x1d8
 --- syscall (number 478) ---


 -- 
 Shinichi Doyashiki <clare@csel.org>

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.