NetBSD Problem Report #32429

From woods@building.weird.com  Mon Jan  2 03:58:10 2006
Return-Path: <woods@building.weird.com>
Received: from building.weird.com (building.weird.com [204.92.254.24])
	by narn.netbsd.org (Postfix) with ESMTP id A929363B942
	for <gnats-bugs@gnats.netbsd.org>; Mon,  2 Jan 2006 03:58:10 +0000 (UTC)
Message-Id: <m1EtGpl-002IeQC@building.weird.com>
Date: Sun, 1 Jan 2006 22:58:09 -0500 (EST)
From: "Greg A. Woods" <woods@planix.com>
Sender: "Greg A. Woods" <woods@building.weird.com>
Reply-To: "Greg A. Woods" <woods@planix.com>
To: gnats-bugs@netbsd.org
Subject: setting MAXDSIZ > 1GB on 1.6.x alpha causes a "panic: trap"
X-Send-Pr-Version: 3.95

>Number:         32429
>Category:       port-alpha
>Synopsis:       setting MAXDSIZ over 1GB on 1.6.x alpha causes a "panic: trap"
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 02 07:15:51 +0000 2006
>Last-Modified:  Sun Feb 26 16:27:45 +0000 2012
>Originator:     Greg A. Woods
>Release:        NetBSD 1.6.2_STABLE (cvs update on 20051127)
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:
System: NetBSD building 1.6.2_STABLE
Architecture: alpha
Machine: alpha
>Description:

	NetBSD/alpha has a MAXDSIZ default setting of 1GB thus limiting
	all processes to a hard RLIMIT_DATA of the same.

	When MAXDSIZ is increased beyond 1GB in order to try to allow a
	process to have an RLIMIT_DATA of more than 1GB, the kernel will
	quickly panic when put under any significant load.

	Note that everything works fine in single user mode with just
	one process running:

	[console]<@> # ulimit -d $((8*1024*1024*1024))
	[console]<@> # ulimit -a
	time(cpu-seconds)    unlimited
	file(blocks)         unlimited
	coredump(blocks)     unlimited
	data(kbytes)         8388608
	stack(kbytes)        2048
	lockedmem(kbytes)    4860504
	memory(kbytes)       14581512
	nofiles(descriptors) 64
	processes            160
	[console]<@> # time zonec -v -f dnsbl.sorbs.net.nsd sorbs.zonec &
	[1] zonec -v -f dnsbl.sorbs.net.nsd sorbs.zonec 
[[ ... wait for some time ... ]]
	[console]<@> # ps -u 
	USER PID %CPU    %MEM     VSZ     RSS TT STAT STARTED    TIME COMMAND
	root  76 99.0 -38534.9 1191528 1149864 C0 R    10:24PM 3:17.11 zonec -v -f dnsbl
	root  72  0.0   -19.3     608     560 C0 S    10:23PM 0:00.49 ksh 
	root  15  0.0   -21.7     728     632 C0 Is   10:19PM 0:01.49 -sh 
	root 107  0.0   -13.4     384     384 C0 R+   10:28PM 0:00.00 ps -u 


>How-To-Repeat:

	options 	MAXDSIZ="(8UL*1024*1024*1024)"


	boot to multiuser, and observe a panic shortly afterwards:

	CPU 3: fatal kernel trap:

	CPU 3    trap entry = 0x2 (memory management fault)
	CPU 3    a0         = 0x2a0
	CPU 3    a1         = 0x1
	CPU 3    a2         = 0x0
	CPU 3    pc         = 0xfffffc0000300a50
	CPU 3    ra         = 0xfffffc0000300a44
	CPU 3    pv         = 0xfffffc0000300994
	CPU 3    curproc    = 0xfffffc00b3be8ba8
	CPU 3        pid = 328, comm = imapd

	panic: trap
	Stopped in pid 328 (imapd) at   cpu_Debugger+0x4:       ret     zero,(ra)
	db{3}> trace
	cpu_Debugger() at cpu_Debugger+0x4
	panic() at panic+0x160
	trap() at trap+0x6ec
	XentMM() at XentMM+0x20
	--- memory management fault (from ipl 0) ---
	copyinstr() at copyinstr+0x54
	namei() at namei+0xb8
	sys___stat13() at sys___stat13+0x5c
	syscall_plain() at syscall_plain+0x158
	XentSys() at XentSys+0x5c
	--- syscall (278) ---
	--- user mode ---
	db{3}>


>Fix:

	unknown

>Release-Note:

>Audit-Trail:
From: Elad Efrat <elad@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/32429: setting MAXDSIZ > 1GB on 1.6.x alpha causes a "panic:
 trap"
Date: Mon, 02 Jan 2006 18:05:22 +0200

 please try to reproduce on a -current kernel.

 -e.

 -- 
 Elad Efrat

From: "Greg A. Woods" <woods@planix.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/32429: setting MAXDSIZ > 1GB on 1.6.x alpha causes a "panic: trap"
Date: Mon, 02 Jan 2006 12:17:16 -0500

 --pgp-sign-Multipart_Mon_Jan__2_12:17:14_2006-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 At Mon,  2 Jan 2006 16:10:03 +0000 (UTC),
 Elad Efrat wrote:
 >=20
 >  please try to reproduce on a -current kernel.

 Unfortunately that's just not going to be easy.

 The system it's happening on is running in production with 15,000 or so
 users.

 I do have a test system that I'll try to get a newer release kernel onto
 (maybe even 3.0), but I'm not really enthused about trying -current at
 all.  Even if it worked it would be totally useless to me as I can't run
 it in the production environment.

 --=20
 						Greg A. Woods

 H:+1 416 218-0098  W:+1 416 489-5852 x122  VE3TCP  RoboHack <woods@robohack.ca>
 Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>

 --pgp-sign-Multipart_Mon_Jan__2_12:17:14_2006-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: 4YbiBiAPt5rbprCvmvcfS04MFXit5BZy

 iQA/AwUBQ7lgG2Z9cbd4v/R/EQJUdgCgmPlZE/na0oX8bWFId8uU8L0klOoAoPYa
 Ix9Lvtedt2e29NR0mFEJC277
 =VwGL
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Mon_Jan__2_12:17:14_2006-1--

From: "Greg A. Woods" <woods@planix.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/32429: setting MAXDSIZ > 1GB on 1.6.x alpha causes a "panic: trap"
Date: Mon, 02 Jan 2006 19:22:43 -0500

 --pgp-sign-Multipart_Mon_Jan__2_19:22:40_2006-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 At Mon,  2 Jan 2006 17:25:09 +0000 (UTC),
 I wrote:
 >=20
 >  I do have a test system that I'll try to get a newer release kernel onto
 >  (maybe even 3.0), but I'm not really enthused about trying -current at
 >  all.  Even if it worked it would be totally useless to me as I can't run
 >  it in the production environment.

 Unfortunately the test system won't crash even though I've been running
 "build.sh" and building packages with pkg_chk on it all afternoon.

 I'm guessing this means the problem is somehow more closely related to
 the networking code.

 (though the test system is mounting /usr/src, and /usr/pkgsrc, and
 distfiles, all over NFS)

 The production server where the problem was observed runs a rather
 heavily used Cyrus IMAPd, among other network services such as HTTP,
 FTP, DNS, SMTP, etc.  The crash happened in imapd both times, almost
 immediately after the system gets to multi-user mode (I was able to
 login via SSH once, but only just barely before it crashed).

 I'll try installing Cyrus on the test box tomorrow and running as many
 simultaneous connections against it as I can.  Maybe the POP benchmark
 from benchmarks/postal will help.

 --=20
 						Greg A. Woods

 H:+1 416 218-0098  W:+1 416 489-5852 x122  VE3TCP  RoboHack <woods@robohack.ca>
 Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>

 --pgp-sign-Multipart_Mon_Jan__2_19:22:40_2006-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: Kg6E3wSh6PqB75eKURSEOFsvo2HoQkgl

 iQA/AwUBQ7nD0mZ9cbd4v/R/EQJ2LgCgwQ/a/vyWeq8/SInTyswVpAdVK8wAn2ra
 a5B1BI9TsqmQd9fVU/ZcdBa+
 =fxGQ
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Mon_Jan__2_19:22:40_2006-1--

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.