NetBSD Problem Report #38970
From simonb@thistledown.com.au Tue Jun 17 06:31:39 2008
Return-Path: <simonb@thistledown.com.au>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 4B13863B842
for <gnats-bugs@gnats.NetBSD.org>; Tue, 17 Jun 2008 06:31:39 +0000 (UTC)
Message-Id: <20080617063138.233C3AFD04@thoreau.thistledown.com.au>
Date: Tue, 17 Jun 2008 16:31:38 +1000 (EST)
From: Simon Burge <simonb@NetBSD.org>
Reply-To: Simon Burge <simonb@NetBSD.org>
To: gnats-bugs@gnats.NetBSD.org
Subject: slow crashdump on amd64 machine
X-Send-Pr-Version: 3.95
>Number: 38970
>Notify-List: gson@gson.org
>Category: kern
>Synopsis: slow crashdump on amd64 machine
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jun 17 06:35:00 +0000 2008
>Closed-Date: Mon Jul 10 20:08:35 +0000 2017
>Last-Modified: Mon Jul 10 21:10:00 +0000 2017
>Originator: Simon Burge
>Release: NetBSD -current - problem has been around for a while
>Organization:
>Environment:
Architecture: amd64
Machine: amd64
>Description:
I've got two amd64 machines here.
A single CPU Opteron with 1GB of RAM and system disk is a
Fujitsu MHT2080BH on a svwsata. That machine can write a crash
dump at ~12MB/sec.
Another machine is an 8 CPU Intel machine with 16GB of RAM and
the system disk is a Seagate ST3500630NS on an ahcisata. This
machine can only write a crash dump out at about 200kB/sec.
Each new number ticking down takes around 5 seconds to appears.
Relevent parts of dmesg for the disk are:
ahcisata0 at pci0 dev 31 function 2: vendor 0x8086 product 0x2681
ahcisata0: interrupting at ioapic0 pin 19
ahcisata0: AHCI revision 1.1, 6 ports, 32 command slots, features 0x86226000
atabus1 at ahcisata0 channel 0
ahcisata0 port 0: device present, speed: 1.5Gb/s
wd0 at atabus1 drive 0: <ST3500630NS>
wd0: quirks 2<FORCE_LBA48>
wd0: 465 GB, 969021 cyl, 16 head, 63 sec, 512 bytes/sect x 976773168 sectors
>How-To-Repeat:
Boot to single user and "reboot -d".
>Fix:
None given.
>Release-Note:
>Audit-Trail:
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Fri, 19 Jul 2013 17:23:09 +0300
As I write this, I am waiting for a NetBSD 6.x/amd64 machine to finish
a crash dump. I have plenty of time to write, because it's dumping at
a rate of about 1 MB/s, which means dumping 16 GB of RAM is going to
take more than four hours.
As in the case of the original bug submitter, the dump target is a
Seagate SATA disk.
I will bump the priority of the PR, because this bug is now affecting
multiple users and is a serious impediment to the diagnosis and fixing
of kernel crashes.
--
Andreas Gustafsson, gson@gson.org
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Simon Burge <simonb@NetBSD.org>
Cc:
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Fri, 19 Jul 2013 11:06:56 -0400
On Jul 19, 2:25pm, gson@gson.org (Andreas Gustafsson) wrote:
-- Subject: Re: kern/38970: slow crashdump on amd64 machine
| The following reply was made to PR kern/38970; it has been noted by GNATS.
|
| From: Andreas Gustafsson <gson@gson.org>
| To: gnats-bugs@NetBSD.org
| Cc:
| Subject: Re: kern/38970: slow crashdump on amd64 machine
| Date: Fri, 19 Jul 2013 17:23:09 +0300
|
| As I write this, I am waiting for a NetBSD 6.x/amd64 machine to finish
| a crash dump. I have plenty of time to write, because it's dumping at
| a rate of about 1 MB/s, which means dumping 16 GB of RAM is going to
| take more than four hours.
|
| As in the case of the original bug submitter, the dump target is a
| Seagate SATA disk.
|
| I will bump the priority of the PR, because this bug is now affecting
| multiple users and is a serious impediment to the diagnosis and fixing
| of kernel crashes.
| --
| Andreas Gustafsson, gson@gson.org
Are sparse dumps enabled and working?
christos
From: Andreas Gustafsson <gson@gson.org>
To: christos@zoulas.com (Christos Zoulas)
Cc: gnats-bugs@NetBSD.org, Simon Burge <simonb@NetBSD.org>
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Sat, 20 Jul 2013 13:23:35 +0300
Christos Zoulas wrote:
> Are sparse dumps enabled and working?
In my case, sysctl machdep.sparse_dump prints 0. I'm not sure why you
ask, though. If it is just to eliminate sparse dumps as a cause, then
fine, but if you are suggesting I should enable sparse dumps, then I
don't see how that would help get the present bug fixed.
--
Andreas Gustafsson, gson@gson.org
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Sat, 20 Jul 2013 14:11:11 +0200
Please provide dmesg excerpts for disk and controller, as we already know
this is hardware specific, but we do not know a common denominator yet.
Martin
From: Andreas Gustafsson <gson@gson.org>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@NetBSD.org, christos@zoulas.com (Christos Zoulas), Simon Burge <simonb@NetBSD.org>
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Sat, 20 Jul 2013 15:34:05 +0300
Martin Husemann wrote:
> Please provide dmesg excerpts for disk and controller, as we already know
> this is hardware specific, but we do not know a common denominator yet.
Here:
ahcisata0 at pci0 dev 31 function 2: vendor 0x8086 product 0x1c02 (rev. 0x05)
ahcisata0: interrupting at ioapic0 pin 19
ahcisata0: 64-bit DMA
ahcisata0: AHCI revision 1.30, 6 ports, 32 slots, CAP 0xe730ff65<SXS,EMS,PSC,SSC,PMD,ISS=0x3=Gen3,SCLO,SAL,SALP,SSNTF,SNCQ,S64A>
atabus0 at ahcisata0 channel 0
atabus1 at ahcisata0 channel 1
atabus2 at ahcisata0 channel 2
atabus3 at ahcisata0 channel 3
atabus4 at ahcisata0 channel 4
atabus5 at ahcisata0 channel 5
ahcisata0 port 0: device present, speed: 3.0Gb/s
ahcisata0 port 2: device present, speed: 1.5Gb/s
wd0 at atabus0 drive 0
wd0: <WDC WD2500KS-00MJB0>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x 488397168 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(ahcisata0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
atapibus0 at atabus2: 1 targets
cd0 at atapibus0 drive 0: <TSSTcorp CDDVDW SH-S203D, , SB00> cdrom removable
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
cd0(ahcisata0:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100) (using DMA)
This is an Intel DH67CLB3 motherboard with BIOS version BLH6710H.86A.0131.2011.0926.1945.
Here's a list of other PRs that apply to the same machine, just in
case the cause happens to be related:
46596
46696
47153
--
Andreas Gustafsson, gson@gson.org
From: christos@zoulas.com (Christos Zoulas)
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@NetBSD.org, Simon Burge <simonb@NetBSD.org>
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Sat, 20 Jul 2013 10:15:40 -0400
On Jul 20, 1:23pm, gson@gson.org (Andreas Gustafsson) wrote:
-- Subject: Re: kern/38970: slow crashdump on amd64 machine
| Christos Zoulas wrote:
| > Are sparse dumps enabled and working?
|
| In my case, sysctl machdep.sparse_dump prints 0. I'm not sure why you
| ask, though. If it is just to eliminate sparse dumps as a cause, then
| fine, but if you are suggesting I should enable sparse dumps, then I
| don't see how that would help get the present bug fixed.
No, but it might help you get a crash dump faster to determine the cause
of the crash.
christos
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: Martin Husemann <martin@duskware.de>
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Sun, 21 Jul 2013 18:08:23 +0300
Earlier, I wrote:
> As in the case of the original bug submitter, the dump target is a
> Seagate SATA disk.
Sorry, I was wrong about the brand of the disk - I remembered
incorrectly and I was unable to check it as it was being dumped on at
the time. It's actually a Western Digital disk as shown in the dmesg
output I later submitted.
Since the disk is in fact from a different manufacturer than in the
case of the original bug submitter, it looks like the common
denominator is the Intel ahcisata.
--
Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/38970: slow crashdump on amd64 machine
Date: Sat, 14 Sep 2013 19:12:44 +0300
I noticed that the commit message for src/sys/dev/ic/ahcisata_core.c
1.47 says (among other things) "reduce delay while polling in ahci, to
speed up the dump". It would be good if someone could check to see if
that fixed the present bug.
--
Andreas Gustafsson, gson@gson.org
State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Fri, 16 Jun 2017 20:57:49 +0000
State-Changed-Why:
There was fix for ahci(4) while ago to speed this up, and it seems to be okay
on my machine dumping to ahcisata(4). Is this still relevant?
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: jdolecek@NetBSD.org,
Simon Burge <simonb@NetBSD.org>
Subject: Re: kern/38970 (slow crashdump on amd64 machine)
Date: Mon, 10 Jul 2017 22:36:30 +0300
jdolecek@NetBSD.org wrote:
> There was fix for ahci(4) while ago to speed this up, and it seems to be okay
> on my machine dumping to ahcisata(4). Is this still relevant?
I finally got around to testing thsi, by breaking into ddb and
entering "reboot 0x100". The dump now took 40 seconds, which is
less than 1/100 of the time it took before, so the PR can be closed.
For the record, this was using a Hitachi disk.
--
Andreas Gustafsson, gson@gson.org
State-Changed-From-To: feedback->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Mon, 10 Jul 2017 20:08:35 +0000
State-Changed-Why:
Confirmed fixed. Thanks for report.
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: Simon Burge <simonb@NetBSD.org>,
jdolecek@NetBSD.org
Subject: Re: kern/38970 (slow crashdump on amd64 machine)
Date: Mon, 10 Jul 2017 22:46:24 +0300
A short while ago, I wrote:
> the PR can be closed.
I should add "... as far as I'm concerned", since I'm not the original
submitter of the PR.
--
Andreas Gustafsson, gson@gson.org
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.