NetBSD Problem Report #50313

From he@smistad.uninett.no  Thu Oct  8 10:02:04 2015
Return-Path: <he@smistad.uninett.no>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id AA1B0A654F
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  8 Oct 2015 10:02:04 +0000 (UTC)
Message-Id: <20151008100200.DCE373D0B8@smistad.uninett.no>
Date: Thu,  8 Oct 2015 12:02:00 +0200 (CEST)
From: he@uninett.no
Reply-To: he@uninett.no
To: gnats-bugs@gnats.NetBSD.org
Subject: processes get stuck doing exec
X-Send-Pr-Version: 3.95

>Number:         50313
>Category:       kern
>Synopsis:       processes get stuck doing exec
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Oct 08 10:05:00 +0000 2015
>Last-Modified:  Sat Oct 10 23:25:00 +0000 2015
>Originator:     Havard Eidnes
>Release:        NetBSD 6.1_STABLE
>Organization:
	UNINETT AS
>Environment:
System: NetBSD smistad.uninett.no 6.1_STABLE NetBSD 6.1_STABLE (MAANEN) #1: Wed Oct 29 11:27:25 CET 2014 he@smistad.uninett.no:/usr/obj/sys/arch/i386/compile/MAANEN i386
Architecture: i386
Machine: i386
>Description:
	All too often processes get stuck on this machine.  According
	to ps the processes are stuck in "R" state.  Running crash on
	the live system shows this call graph

syscall -> sys_execve -> execve1 -> execve_loadvm -> pool_get ->
pool_grow -> exec_pool_alloc -> uvm_km_alloc -> uvm_map ->
uvm_map_prepare -> cv_timedwait -> sleepq_block

	as shown in this session with "crash":

smistad# crash
Crash version 6.1_STABLE, image version 6.1_STABLE.
Output from a running system is unreliable.
crash>
crash> t/t 0t2052
trace: pid 2052 lid 1 at 0xddb69938
sleepq_block(64,0,c0c54ee0,c0cda590,0,c4677c00,c4676680,c0d1dc00,c0d1dc10,c0d1dc14) at sleepq_block+0xa3
cv_timedwait(c0d1dc14,c0d1dc10,64,ddb699bc,c0d1da20,ffffffff,ffffffff,0,801727,0) at cv_timedwait+0x126
uvm_map_prepare(c0d1dc00,c0000000,40000,c0d1da20,ffffffff,ffffffff,0,801727,ddb699e8,0) at uvm_map_prepare+0x160
uvm_map(c0d1dc00,ddb69a58,40000,c0d1da20,ffffffff,ffffffff,0,801727,800002,c0d0aa94) at uvm_map+0x7f
uvm_km_alloc(c0d1dc00,40000,0,800002,ddb69ac0,c07daf4a,c0d0aa20,1,0,0) at uvm_km_alloc+0xe6
exec_pool_alloc(c0d0aa20,1,0,0,0,c055a02e,0,0,0,c0cda6e0) at exec_pool_alloc+0x2b
pool_grow(c0d0aa94,0,c054,c76e9180,ce7b5c09,4,c0d0aa98,c0d0aa94,1,0) at pool_grow+0x2a
pool_get(c0d0aa20,1,c8d99948,cea7cf80,c08d0731,0,ddb69b80,ddb69bec,ddb69be8,c7731c14) at pool_get+0x79
execve_loadvm(bb9123cc,c054f830,ddb69b68,bb912404,c4eedc00,c5327c00,cf1d7008,404,404,9) at execve_loadvm+0x1b0
execve1(cf0dfd40,bb912404,bb9123bc,bb9123cc,c054f830,ddb69d3c,c080312a,cf0dfd40,ddb69d00,ddb69d28) at execve1+0x32
sys_execve(cf0dfd40,ddb69d00,ddb69d28,c0803389,c7834cc0,3b,bb90e000,bb90e020,ddb69d00,c7731c14) at sys_execve+0x30
syscall(ddb69d48,bbb900b3,ab,bfbf001f,bbb9001f,bb912404,bb912404,bfbfeb20,bb9123bc,0) at syscall+0xaa
crash> t/t 0t498
trace: pid 498 lid 1 at 0xdd1e5938
sleepq_block(64,0,c0c54ee0,c0cda590,0,c4677d40,c4676680,c0d1dc00,c0d1dc10,c0d1dc14) at sleepq_block+0xa3
cv_timedwait(c0d1dc14,c0d1dc10,64,dd1e59bc,c0d1da20,ffffffff,ffffffff,0,801727,0) at cv_timedwait+0x126
uvm_map_prepare(c0d1dc00,c0000000,40000,c0d1da20,ffffffff,ffffffff,0,801727,dd1e59e8,0) at uvm_map_prepare+0x160
uvm_map(c0d1dc00,dd1e5a58,40000,c0d1da20,ffffffff,ffffffff,0,801727,800002,c0d0aa94) at uvm_map+0x7f
uvm_km_alloc(c0d1dc00,40000,0,800002,dd1e5ac0,c07daf4a,c0d0aa20,1,0,0) at uvm_km_alloc+0xe6
exec_pool_alloc(c0d0aa20,1,0,0,0,c055a02e,0,0,0,c0cda6e0) at exec_pool_alloc+0x2b
pool_grow(c0d0aa94,0,c054,c76e9b40,ce7b5009,5,c0d0aa98,c0d0aa94,1,0) at pool_grow+0x2a
pool_get(c0d0aa20,1,ce9a3d28,cac41240,c08d0ae3,0,dd1e5b80,dd1e5bec,dd1e5be8,c4c721d4) at pool_get+0x79
execve_loadvm(bb907c04,c054f830,dd1e5b68,bb907ae4,c4ef7c00,c4ef7000,cf87a008,404,404,9) at execve_loadvm+0x1b0
execve1(cf8d9aa0,bb907ae4,bb907b84,bb907c04,c054f830,dd1e5d3c,c080312a,cf8d9aa0,dd1e5d00,dd1e5d28) at execve1+0x32
sys_execve(cf8d9aa0,dd1e5d00,dd1e5d28,c0803389,c52ed364,3b,bba9b000,bba9b320,dd1e5d00,c4c721d4) at sys_execve+0x30
syscall(dd1e5d48,bbb900b3,ab,bfbf001f,bbb9001f,bb907ae4,bb907ae4,bfbfd6d8,bb907b84,7d7b7cff) at syscall+0xaa
crash> t/t 0t22404
trace: pid 22404 lid 1 at 0xdd0ee938
sleepq_block(64,0,c0c54ee0,c0cda590,0,c4677d40,c4676680,c0d1dc00,c0d1dc10,c0d1dc14) at sleepq_block+0xa3
cv_timedwait(c0d1dc14,c0d1dc10,64,dd0ee9bc,c0d1da20,ffffffff,ffffffff,0,801727,0) at cv_timedwait+0x126
uvm_map_prepare(c0d1dc00,c0000000,40000,c0d1da20,ffffffff,ffffffff,0,801727,dd0ee9e8,0) at uvm_map_prepare+0x160
uvm_map(c0d1dc00,dd0eea58,40000,c0d1da20,ffffffff,ffffffff,0,801727,800002,c0d0aa94) at uvm_map+0x7f
uvm_km_alloc(c0d1dc00,40000,0,800002,dd0eeac0,c07daf4a,c0d0aa20,1,0,0) at uvm_km_alloc+0xe6
exec_pool_alloc(c0d0aa20,1,0,0,0,c055a02e,0,0,0,c0cda6e0) at exec_pool_alloc+0x2b
pool_grow(c0d0aa94,0,c054,c76e9b40,c8511c05,3,c0d0aa98,c0d0aa94,1,0) at pool_grow+0x2a
pool_get(c0d0aa20,1,ce9a3c08,c8091e80,c08d0ae3,0,dd0eeb80,dd0eebec,dd0eebe8,cf7fac30) at pool_get+0x79
execve_loadvm(bb907b04,c054f830,dd0eeb68,bb907ae4,c4ebec00,c5327400,cf8c7808,404,404,9) at execve_loadvm+0x1b0
execve1(cf8d9800,bb907ae4,bb907afc,bb907b04,c054f830,dd0eed3c,c080312a,cf8d9800,dd0eed00,dd0eed28) at execve1+0x32
sys_execve(cf8d9800,dd0eed00,dd0eed28,c0803389,cf88ba44,3b,bba9b000,bba9b320,dd0eed00,cf7fac30) at sys_execve+0x30
syscall(dd0eed48,bbb900b3,ab,bfbf001f,bbb9001f,bb907ae4,bb907ae4,bfbfd6d8,bb907afc,6d6d6bff) at syscall+0xaa
crash> 

	execve_loadvm() contains only one pool_get, this one:

        /* allocate an argument buffer */
	        data->ed_argp = pool_get(&exec_pool, PR_WAITOK);

	and "vmstat -m" says:

Memory resource pool statistics
Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
...
execargs    262144  40498    0    40498     9     7     2     4     0    16    2
...

	I had at that time started a pkgsrc compile, and one of
	the stuck processes was a child inside that tree, but
	aborting the build changed very little in the above
	stats:

execargs    262144  40559    0    40559     9     7     2     4     0    16    2

	At this point experience tells me that I need to reboot
	to regain full use of the system.  I wish I didn't have
	to.

	exec_pool_alloc in its turn runs uvm_km_alloc(kernel_map, ...)
	but I don't know if I have a way to inspect the state of
	kernel_map, i.e. if there's any free space left (probably not?)

>How-To-Repeat:
	Sorry, I don't have an exact recipe for reproducing the problem.

>Fix:
	I wish I knew.

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/50313: processes get stuck doing exec
Date: Sat, 10 Oct 2015 22:59:27 +0000

 On Thu, Oct 08, 2015 at 10:05:00AM +0000, he@uninett.no wrote:
  > 	execve_loadvm() contains only one pool_get, this one:
  > 
  >         /* allocate an argument buffer */
  > 	        data->ed_argp = pool_get(&exec_pool, PR_WAITOK);
  > 
  > 	and "vmstat -m" says:
  > 
  > Memory resource pool statistics
  > Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
  > ...
  > execargs    262144  40498    0    40498     9     7     2     4     0    16    2
  > ...

 Out of curiosity: how much RAM is in the system? Because exec args are
 big, unless there's a throttle (I don't think we have one) it's fairly
 easy for transient demand to take up a lot of space, which can fill up
 the kernel map or plain run out of memory.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Havard Eidnes <he@uninett.no>
To: gnats-bugs@NetBSD.org, dholland-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: kern/50313: processes get stuck doing exec
Date: Sun, 11 Oct 2015 01:20:39 +0200 (CEST)

 >   > Memory resource pool statistics
 >   > Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat M=
 inpg Maxpg Idle
 >   > ...
 >   > execargs    262144  40498    0    40498     9     7     2     4  =
    0    16    2
 >   > ...
 >
 >  Out of curiosity: how much RAM is in the system?

 Physically 4G, but since this is i386, my kernel sees

 total memory =3D 3316 MB
 avail memory =3D 3248 MB

 >  Because exec args are big, unless there's a throttle (I don't
 >  think we have one) it's fairly easy for transient demand to
 >  take up a lot of space, which can fill up the kernel map or
 >  plain run out of memory.

 Do we have a meter for the kernel map, inspectable from user-land
 in some way?

 When this problem strikes, it's usually going downhill.
 Presently my host has a load average of 10, mostly caused by:

   PID TTY    STAT      TIME COMMAND
 ...
  1311 ?      D      0:00.00 cron: running job =

  2828 ?      Rs     0:00.00 (local)
  3101 ?      RVs    0:00.00 (cron)
  5471 ?      D      0:00.00 cron: running job =

  7074 ?      D      0:00.00 cron: running job =

 23096 ?      RVs    0:00.00 (cron)
 23277 ?      RVs    0:00.00 (cron)
 23347 ?      RVs    0:00.00 (cron)
 25722 ?      RVs    0:00.00 (cron)
 26808 ?      D      0:00.00 cron: running job =

 29057 ?      R      0:00.00 (sh)

 and some mail delivery appears to get stuck:

 Oct 11 01:17:03 smistad postfix/qmgr[14168]: D81A13D0B8: skipped, still=
  being delivered

 Some of these are stuck doing uvm_km_alloc(), some of them stuck
 in fork:

 crash> t/t 0t5471
 trace: pid 5471 lid 1 at 0xdd07ebe4
 sleepq_block(0,0,c0be3a51,c0cda590,8,c4677c00,c4675280,c9002034,0,cf7fa=
 dfc) at sleepq_block+0xa3
 cv_wait(cf7fadfc,c4614f00,ddbb3000,80000000,0,0,c0803314,c8c2ad40,dd07e=
 c88,0) at cv_wait+0x126
 fork1(c4c90560,3,14,0,0,c0803314,0,dd07ed28,0,dd07ed3c) at fork1+0x6a0
 sys___vfork14(c4c90560,dd07ed00,dd07ed28,c0803389,c77357ac,11a,bbab9000=
 ,bbab9d30,c77357ac,cf7fade4) at sys___vfork14+0x50
 syscall(dd07ed48,bbbb00b3,ab,bfbf001f,bbba001f,1c20,0,bfbfed98,0,bb91e1=
 53) at syscall+0xaa
 crash> =


 Regards,

 - H=E5vard

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.