NetBSD Problem Report #38970

From simonb@thistledown.com.au  Tue Jun 17 06:31:39 2008
Return-Path: <simonb@thistledown.com.au>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 4B13863B842
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 17 Jun 2008 06:31:39 +0000 (UTC)
Message-Id: <20080617063138.233C3AFD04@thoreau.thistledown.com.au>
Date: Tue, 17 Jun 2008 16:31:38 +1000 (EST)
From: Simon Burge <simonb@NetBSD.org>
Reply-To: Simon Burge <simonb@NetBSD.org>
To: gnats-bugs@gnats.NetBSD.org
Subject: slow crashdump on amd64 machine
X-Send-Pr-Version: 3.95

>Number:         38970
>Notify-List:    gson@gson.org
>Category:       kern
>Synopsis:       slow crashdump on amd64 machine
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jun 17 06:35:00 +0000 2008
>Closed-Date:    Mon Jul 10 20:08:35 +0000 2017
>Last-Modified:  Mon Jul 10 21:10:00 +0000 2017
>Originator:     Simon Burge
>Release:        NetBSD -current - problem has been around for a while
>Organization:
>Environment:
	Architecture: amd64
	Machine: amd64
>Description:
	I've got two amd64 machines here.

	A single CPU Opteron with 1GB of RAM and system disk is a
	Fujitsu MHT2080BH on a svwsata.  That machine can write a crash
	dump at ~12MB/sec.

	Another machine is an 8 CPU Intel machine with 16GB of RAM and
	the system disk is a Seagate ST3500630NS on an ahcisata.  This
	machine can only write a crash dump out at about 200kB/sec.
	Each new number ticking down takes around 5 seconds to appears.
	Relevent parts of dmesg for the disk are:

	  ahcisata0 at pci0 dev 31 function 2: vendor 0x8086 product 0x2681
	  ahcisata0: interrupting at ioapic0 pin 19
	  ahcisata0: AHCI revision 1.1, 6 ports, 32 command slots, features 0x86226000
	  atabus1 at ahcisata0 channel 0
	  ahcisata0 port 0: device present, speed: 1.5Gb/s
	  wd0 at atabus1 drive 0: <ST3500630NS>
	  wd0: quirks 2<FORCE_LBA48>
	  wd0: 465 GB, 969021 cyl, 16 head, 63 sec, 512 bytes/sect x 976773168 sectors

>How-To-Repeat:
	Boot to single user and "reboot -d".

>Fix:
	None given.

>Release-Note:

>Audit-Trail:
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Fri, 19 Jul 2013 17:23:09 +0300

 As I write this, I am waiting for a NetBSD 6.x/amd64 machine to finish
 a crash dump.  I have plenty of time to write, because it's dumping at
 a rate of about 1 MB/s, which means dumping 16 GB of RAM is going to
 take more than four hours.

 As in the case of the original bug submitter, the dump target is a
 Seagate SATA disk.

 I will bump the priority of the PR, because this bug is now affecting
 multiple users and is a serious impediment to the diagnosis and fixing
 of kernel crashes.
 -- 
 Andreas Gustafsson, gson@gson.org

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	Simon Burge <simonb@NetBSD.org>
Cc: 
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Fri, 19 Jul 2013 11:06:56 -0400

 On Jul 19,  2:25pm, gson@gson.org (Andreas Gustafsson) wrote:
 -- Subject: Re: kern/38970: slow crashdump on amd64 machine

 | The following reply was made to PR kern/38970; it has been noted by GNATS.
 | 
 | From: Andreas Gustafsson <gson@gson.org>
 | To: gnats-bugs@NetBSD.org
 | Cc: 
 | Subject: Re: kern/38970: slow crashdump on amd64 machine
 | Date: Fri, 19 Jul 2013 17:23:09 +0300
 | 
 |  As I write this, I am waiting for a NetBSD 6.x/amd64 machine to finish
 |  a crash dump.  I have plenty of time to write, because it's dumping at
 |  a rate of about 1 MB/s, which means dumping 16 GB of RAM is going to
 |  take more than four hours.
 |  
 |  As in the case of the original bug submitter, the dump target is a
 |  Seagate SATA disk.
 |  
 |  I will bump the priority of the PR, because this bug is now affecting
 |  multiple users and is a serious impediment to the diagnosis and fixing
 |  of kernel crashes.
 |  -- 
 |  Andreas Gustafsson, gson@gson.org

 Are sparse dumps enabled and working?

 christos

From: Andreas Gustafsson <gson@gson.org>
To: christos@zoulas.com (Christos Zoulas)
Cc: gnats-bugs@NetBSD.org, Simon Burge <simonb@NetBSD.org>
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Sat, 20 Jul 2013 13:23:35 +0300

 Christos Zoulas wrote:
 > Are sparse dumps enabled and working?

 In my case, sysctl machdep.sparse_dump prints 0.  I'm not sure why you
 ask, though.  If it is just to eliminate sparse dumps as a cause, then
 fine, but if you are suggesting I should enable sparse dumps, then I
 don't see how that would help get the present bug fixed.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Sat, 20 Jul 2013 14:11:11 +0200

 Please provide dmesg excerpts for disk and controller, as we already know
 this is hardware specific, but we do not know a common denominator yet.

 Martin

From: Andreas Gustafsson <gson@gson.org>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@NetBSD.org, christos@zoulas.com (Christos Zoulas), Simon Burge <simonb@NetBSD.org>
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Sat, 20 Jul 2013 15:34:05 +0300

 Martin Husemann wrote:
 > Please provide dmesg excerpts for disk and controller, as we already know
 > this is hardware specific, but we do not know a common denominator yet.

 Here:

   ahcisata0 at pci0 dev 31 function 2: vendor 0x8086 product 0x1c02 (rev. 0x05)
   ahcisata0: interrupting at ioapic0 pin 19
   ahcisata0: 64-bit DMA
   ahcisata0: AHCI revision 1.30, 6 ports, 32 slots, CAP 0xe730ff65<SXS,EMS,PSC,SSC,PMD,ISS=0x3=Gen3,SCLO,SAL,SALP,SSNTF,SNCQ,S64A>
   atabus0 at ahcisata0 channel 0
   atabus1 at ahcisata0 channel 1
   atabus2 at ahcisata0 channel 2
   atabus3 at ahcisata0 channel 3
   atabus4 at ahcisata0 channel 4
   atabus5 at ahcisata0 channel 5

   ahcisata0 port 0: device present, speed: 3.0Gb/s
   ahcisata0 port 2: device present, speed: 1.5Gb/s
   wd0 at atabus0 drive 0
   wd0: <WDC WD2500KS-00MJB0>
   wd0: drive supports 16-sector PIO transfers, LBA48 addressing
   wd0: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x 488397168 sectors
   wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
   wd0(ahcisata0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
   atapibus0 at atabus2: 1 targets
   cd0 at atapibus0 drive 0: <TSSTcorp CDDVDW SH-S203D, , SB00> cdrom removable
   cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
   cd0(ahcisata0:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100) (using DMA)

 This is an Intel DH67CLB3 motherboard with BIOS version BLH6710H.86A.0131.2011.0926.1945.

 Here's a list of other PRs that apply to the same machine, just in
 case the cause happens to be related:

   46596
   46696
   47153

 -- 
 Andreas Gustafsson, gson@gson.org

From: christos@zoulas.com (Christos Zoulas)
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@NetBSD.org, Simon Burge <simonb@NetBSD.org>
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Sat, 20 Jul 2013 10:15:40 -0400

 On Jul 20,  1:23pm, gson@gson.org (Andreas Gustafsson) wrote:
 -- Subject: Re: kern/38970: slow crashdump on amd64 machine

 | Christos Zoulas wrote:
 | > Are sparse dumps enabled and working?
 | 
 | In my case, sysctl machdep.sparse_dump prints 0.  I'm not sure why you
 | ask, though.  If it is just to eliminate sparse dumps as a cause, then
 | fine, but if you are suggesting I should enable sparse dumps, then I
 | don't see how that would help get the present bug fixed.

 No, but it might help you get a crash dump faster to determine the cause
 of the crash.

 christos

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: Martin Husemann <martin@duskware.de>
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Sun, 21 Jul 2013 18:08:23 +0300

 Earlier, I wrote:
 > As in the case of the original bug submitter, the dump target is a
 > Seagate SATA disk.

 Sorry, I was wrong about the brand of the disk - I remembered
 incorrectly and I was unable to check it as it was being dumped on at
 the time.  It's actually a Western Digital disk as shown in the dmesg
 output I later submitted.

 Since the disk is in fact from a different manufacturer than in the
 case of the original bug submitter, it looks like the common
 denominator is the Intel ahcisata.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Sat, 14 Sep 2013 19:12:44 +0300

 I noticed that the commit message for src/sys/dev/ic/ahcisata_core.c
 1.47 says (among other things) "reduce delay while polling in ahci, to
 speed up the dump".  It would be good if someone could check to see if
 that fixed the present bug.
 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Fri, 16 Jun 2017 20:57:49 +0000
State-Changed-Why:
There was fix for ahci(4) while ago to speed this up, and it seems to be okay
on my machine dumping to ahcisata(4). Is this still relevant?


From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: jdolecek@NetBSD.org,
    Simon Burge <simonb@NetBSD.org>
Subject: Re: kern/38970 (slow crashdump on amd64 machine)
Date: Mon, 10 Jul 2017 22:36:30 +0300

 jdolecek@NetBSD.org wrote:
 > There was fix for ahci(4) while ago to speed this up, and it seems to be okay
 > on my machine dumping to ahcisata(4). Is this still relevant?

 I finally got around to testing thsi, by breaking into ddb and
 entering "reboot 0x100".  The dump now took 40 seconds, which is
 less than 1/100 of the time it took before, so the PR can be closed.
 For the record, this was using a Hitachi disk.
 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: feedback->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Mon, 10 Jul 2017 20:08:35 +0000
State-Changed-Why:
Confirmed fixed. Thanks for report.


From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: Simon Burge <simonb@NetBSD.org>,
    jdolecek@NetBSD.org
Subject: Re: kern/38970 (slow crashdump on amd64 machine)
Date: Mon, 10 Jul 2017 22:46:24 +0300

 A short while ago, I wrote:
 > the PR can be closed.

 I should add "... as far as I'm concerned", since I'm not the original
 submitter of the PR.
 -- 
 Andreas Gustafsson, gson@gson.org

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.