NetBSD Problem Report #47702

From reinoud@diablo.13thmonkey.org  Thu Mar 28 15:14:00 2013
Return-Path: <reinoud@diablo.13thmonkey.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id BA6C063F26F
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 28 Mar 2013 15:14:00 +0000 (UTC)
Message-Id: <20130328151356.E91FDC131BD@diablo.13thmonkey.org>
Date: Thu, 28 Mar 2013 16:13:56 +0100 (CET)
From: reinoud@NetBSD.org
Reply-To: reinoud@NetBSD.org
To: gnats-bugs@gnats.NetBSD.org
Subject: coredumping big programs freeze NFS during dump
X-Send-Pr-Version: 3.95

>Number:         47702
>Category:       kern
>Synopsis:       coredumping big programs freeze NFS during dump
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Mar 28 15:15:00 +0000 2013
>Closed-Date:    Sat Jul 10 06:03:50 +0000 2021
>Last-Modified:  Sat Jul 10 06:03:50 +0000 2021
>Originator:     Reinoud Zandijk
>Release:        NetBSD 6.1_RC2
>Organization:
NetBSD
>Environment:
Any NetBSD machine with homedirecories on NFS, not port specific AFAICT, seen
on Architecture: i386, Machine: i386

Coredumping firefox 17, compiled natively.

>Description:
My i386 work machine has its homedirectories on a NAS over NFS. Using Firefox
from multiple machines is no problem since all firefox data is stored on the
local harddisc.

When firefox coredumps, well xulrunner is, it can create huge coredump files
of 2.5 Gb, more than the amount of physical memory the machine has. The
coredump will be written in the homedirectory on NFS.

While xulrunner is coredumping, X keeps on working fine but every process that
attempts to reach the homedirectory is blocking until the coredump is
finished. The NFS server is still reachable by other machines and sshing into
the machine showed that it was busy writing the coredump out to disc, but not
excessive enough to hinder the others.

My hypothesis is that the memory that is written out during the coredump is
not released immediately after writing out thus starving the machine
completely from all its memory since its coredumping a process whom's file is
bigger than physical memory. It might very well be that NFS is thus blocking
on memory allocation. Regretfully i haven't managed to get evidence for this
in since the machine basicly wedged until it was finished coredumping.

Shouldn't it be possible to have the process write out the data beginning with
the pages that are in memory, freeing them ASAP and then, using demand
swapping, write out the other pages/info?


>How-To-Repeat:
Have firefox coredump a 2.5 Gb coredump file on a 2 Gb machine (1918 MB total
memory, 1873 available).

>Fix:
For workability, make your homedirectory not accept coredump files.



>Release-Note:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47702: coredumping big programs freeze NFS during dump
Date: Thu, 28 Mar 2013 16:48:30 +0100

 On Thu, Mar 28, 2013 at 03:15:00PM +0000, reinoud@NetBSD.org wrote:
 > Shouldn't it be possible to have the process write out the data beginning with
 > the pages that are in memory, freeing them ASAP and then, using demand
 > swapping, write out the other pages/info?

 The core is written sequentially, by calling back from the exec pack to
 coredump_write() with exec format specific chunks of data (for elf: sections).

 It all ends up in a vn_rdwr(UIO_WRITE,...) call.

 There is no reason any of these pages should be wired into kernel space,
 so even if the order of pages is not optimized for the resident ones first,
 already written ones should be elligible for recycling.

 Are you sure the the NFS server is not doing something wrong/slow here
 and the problem is not the client?

 Btw: there are several ways to move the coredumps out of your homedir,
 the two most simple ones are to not start firefox from ~ or to set
 kern.defcorename = /tmp/%n.core.

 Martin

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/47702: coredumping big programs freeze NFS during dump
Date: Thu, 28 Mar 2013 15:26:09 -0400

 On Mar 28,  3:15pm, reinoud@NetBSD.org (reinoud@NetBSD.org) wrote:
 -- Subject: kern/47702: coredumping big programs freeze NFS during dump

 limit coredumpsize 0
 ulimit -c 0

 christos

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 07 Oct 2013 07:25:30 +0000
State-Changed-Why:
Is dumping core to a local disk better? (As in, other things will run
while it's going on -- nothing will make dumping 2.5G particularly fast)
Also, how is/was your nfs mount set up? Are you using a hard or soft
mount? intr or nointr? tcp or udp? And how good is your pipe to the
filer?
And also, how does the speed and concurrency of dumping 2.5G of core to
nfs compare with dumping 2.5G of /dev/zero with dd?

It does not sound to me like the problem is the core dump per se.


From: Reinoud Zandijk <reinoud@NetBSD.org>
To: gnats-bugs@gnats.netbsd.org
Cc: 
Subject: Re: kern/47702
Date: Thu, 5 Sep 2019 16:19:14 +0200

 The problem doesn't occure when the coredumpsize is limited to zero indeed. My
 setup has changed since them and I think we can close this PR

State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 10 Jul 2021 06:03:50 +0000
State-Changed-Why:
withdrawn by submitter (some time back)


>Unformatted:
 Not really that specific, AFAIK -current is showing it too

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: gnats-precook-prs,v 1.4 2018/12/21 14:20:20 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.