NetBSD Problem Report #40750

From www@NetBSD.org  Tue Feb 24 23:11:50 2009
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id AED9663C1C2
	for <gnats-bugs@gnats.netbsd.org>; Tue, 24 Feb 2009 23:11:50 +0000 (UTC)
Message-Id: <20090224231150.800D963C1C1@www.NetBSD.org>
Date: Tue, 24 Feb 2009 23:11:50 +0000 (UTC)
From: ad@netbsd.org
Reply-To: ad@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: hackbench screws up 5.0 kernel
X-Send-Pr-Version: www-1.0

>Number:         40750
>Category:       kern
>Synopsis:       hackbench screws up 5.0 kernel
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    ad
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Feb 24 23:15:00 +0000 2009
>Closed-Date:    Sat Aug 29 17:29:54 +0000 2009
>Last-Modified:  Thu Jan 07 07:10:03 +0000 2010
>Originator:     Andrew Doran
>Release:        5.0_RC2
>Organization:
The NetBSD Project
>Environment:
i386 smp
very high maxprocs, maxfiles
4GB RAM, 4GB swap
>Description:
As per summary.
>How-To-Repeat:
This did not happen 6 months ago.
Run the new threaded hackbench three times:

- threaded mode, with pipes
- process mode, with pipes
- process mode, with sockets.

Boom, the hackbench processes hang waiting for KVA space to become available.


http://www.netbsd.org/~ad/vm_map/hackbench.c
http://www.netbsd.org/~ad/vm_map/backtrace.txt
http://www.netbsd.org/~ad/vm_map/kernel_map.txt
http://www.netbsd.org/~ad/vm_map/kmem_map.txt
http://www.netbsd.org/~ad/vm_map/ps-axlsww.txt
http://www.netbsd.org/~ad/vm_map/show-map.txt
http://www.netbsd.org/~ad/vm_map/vmstat-C.txt
http://www.netbsd.org/~ad/vm_map/vmstat-e.txt
http://www.netbsd.org/~ad/vm_map/vmstat-m.txt
http://www.netbsd.org/~ad/vm_map/vmstat-s.txt

>Fix:
Not yet debugged.

>Release-Note:

>Audit-Trail:
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/40750: hackbench screws up 5.0 kernel
Date: Wed, 25 Feb 2009 19:58:03 +0000

 The wait for VA is on kmem_map, which is ~130MB in size. The kernel seems to
 have leaked 2.6 million ksiginfo_t structures, 120MB worth.

 Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg	Idle
 ksiginfo      48  2548220    0        0 30336     0 30336 30336     0   inf	0

 Pool cache statistics.
 Name          Spin GrpSz Full Emty PoolLayer CacheLayer  Hit%    CpuLayer
 Hit%
 ksiginfo         0    14    0    0   2548220    2548228   0.0     2577954	1.2

Responsible-Changed-From-To: kern-bug-people->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Mon, 02 Mar 2009 19:45:24 +0000
Responsible-Changed-Why:
take.
there is one obvious leak, in sigtimedwait().
hackbench does not use it, though.


State-Changed-From-To: open->feedback
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Sun, 29 Mar 2009 05:08:21 +0000
State-Changed-Why:
Might not be a problem in 5.0 RC3.  Need more information.


From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: ad@netbsd.org
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/40750 (hackbench screws up 5.0 kernel)
Date: Sun, 29 Mar 2009 06:19:34 +0100

 rmind@NetBSD.org wrote:
 > Synopsis: hackbench screws up 5.0 kernel
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: rmind@NetBSD.org
 > State-Changed-When: Sun, 29 Mar 2009 05:08:21 +0000
 > State-Changed-Why:
 > Might not be a problem in 5.0 RC3.  Need more information.
 > 

 FYI: I could not reproduce the problem on -current and NetBSD 5.0_RC3:

 http://www.netbsd.org/~rmind/hbtest.sh
 http://www.netbsd.org/~rmind/hbtest.txt

 One of the major changes between RC2 and RC3 was KVA cache for pipe direct
 write (which was also disabled), though I do not see why it would send any
 signals, if pipe_pgid == 0. With enabled direct write in -current, problem
 does not occur, however. I have not tried reverting KVA cache changes yet..

 -- 
 Best regards,
 Mindaugas

From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: ad@netbsd.org
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/40750 (hackbench screws up 5.0 kernel)
Date: Sun, 29 Mar 2009 16:28:23 +0100

 Mindaugas Rasiukevicius <rmind@netbsd.org> wrote:
 > FYI: I could not reproduce the problem on -current and NetBSD 5.0_RC3:
 > 
 > http://www.netbsd.org/~rmind/hbtest.sh
 > http://www.netbsd.org/~rmind/hbtest.txt
 > 
 > One of the major changes between RC2 and RC3 was KVA cache for pipe direct
 > write (which was also disabled), though I do not see why it would send any
 > signals, if pipe_pgid == 0. With enabled direct write in -current, problem
 > does not occur, however. I have not tried reverting KVA cache changes yet..

 Latest 5.0/i386 with reverted sys_pipe.c to 1.105 revision:

 http://www.netbsd.org/~rmind/hbtest.2.txt

 Seems to be fine? I am missing something.

 -- 
 Best regards,
 Mindaugas

From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: ad@NetBSD.org
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/40750 (hackbench screws up 5.0 kernel)
Date: Sun, 28 Jun 2009 20:57:28 +0100

 rmind@NetBSD.org wrote:
 > Synopsis: hackbench screws up 5.0 kernel
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: rmind@NetBSD.org
 > State-Changed-When: Sun, 29 Mar 2009 05:08:21 +0000
 > State-Changed-Why:
 > Might not be a problem in 5.0 RC3.  Need more information.
 > 

 Have you ever seen it again?  Shall we close this PR?

 -- 
 Mindaugas

State-Changed-From-To: feedback->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Sat, 29 Aug 2009 17:29:54 +0000
State-Changed-Why:
Could not reproduce, close PR.


From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/40750 CVS commit: src/sys/kern
Date: Sat, 19 Dec 2009 18:25:55 +0000

 Module Name:	src
 Committed By:	rmind
 Date:		Sat Dec 19 18:25:55 UTC 2009

 Modified Files:
 	src/sys/kern: sys_sig.c

 Log Message:
 sigtimedwait: fix a memory leak (which happens since newlock2 times).
 Allocate ksiginfo on stack since it is safe and sigget() assumes that it is
 not allocated from pool (pending signals via sigput()/sigget() "mill" should
 be dynamically allocated, however).  Might be useful to revisit later.

 Likely the cause of PR/40750 and indirect cause of PR/39283.


 To generate a diff of this commit:
 cvs rdiff -u -r1.23 -r1.24 src/sys/kern/sys_sig.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/40750 CVS commit: [netbsd-5] src/sys/kern
Date: Thu, 7 Jan 2010 07:04:51 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Thu Jan  7 07:04:51 UTC 2010

 Modified Files:
 	src/sys/kern [netbsd-5]: sys_sig.c

 Log Message:
 Pull up following revision(s) (requested by rmind in ticket #1199):
 	sys/kern/sys_sig.c: revision 1.24
 sigtimedwait: fix a memory leak (which happens since newlock2 times).
 Allocate ksiginfo on stack since it is safe and sigget() assumes that it is
 not allocated from pool (pending signals via sigput()/sigget() "mill" should
 be dynamically allocated, however).  Might be useful to revisit later.
 Likely the cause of PR/40750 and indirect cause of PR/39283.


 To generate a diff of this commit:
 cvs rdiff -u -r1.17.4.2 -r1.17.4.3 src/sys/kern/sys_sig.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Soren Jacobsen <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/40750 CVS commit: [netbsd-5-0] src/sys/kern
Date: Thu, 7 Jan 2010 07:08:34 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Thu Jan  7 07:08:34 UTC 2010

 Modified Files:
 	src/sys/kern [netbsd-5-0]: sys_sig.c

 Log Message:
 Pull up following revision(s) (requested by rmind in ticket #1199):
 	sys/kern/sys_sig.c: revision 1.24
 sigtimedwait: fix a memory leak (which happens since newlock2 times).
 Allocate ksiginfo on stack since it is safe and sigget() assumes that it is
 not allocated from pool (pending signals via sigput()/sigget() "mill" should
 be dynamically allocated, however).  Might be useful to revisit later.
 Likely the cause of PR/40750 and indirect cause of PR/39283.


 To generate a diff of this commit:
 cvs rdiff -u -r1.17.4.2 -r1.17.4.2.2.1 src/sys/kern/sys_sig.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.