NetBSD Problem Report #40449

From apb@cequrux.com  Wed Jan 21 17:55:36 2009
Return-Path: <apb@cequrux.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 337D663B8BA
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 21 Jan 2009 17:55:36 +0000 (UTC)
Message-Id: <20090121162934.38EDFE93298@apb-laptoy.apb.alt.za>
Date: Wed, 21 Jan 2009 16:29:34 +0000 (UTC)
From: apb@cequrux.com
To: gnats-bugs@gnats.NetBSD.org
Subject: disk errors after ACPI suspend/resume
X-Send-Pr-Version: 3.95

>Number:         40449
>Category:       port-i386
>Synopsis:       disk errors after ACPI suspend/resume
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    port-i386-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 21 18:00:00 +0000 2009
>Closed-Date:    Tue Mar 26 01:37:44 +0000 2024
>Last-Modified:  Tue Mar 26 01:37:44 +0000 2024
>Originator:     Alan Barrett
>Release:        NetBSD 5.99.1
>Organization:
Not much
>Environment:
System: NetBSD 5.99.10 i386
Architecture: i386
Machine: i386
>Description:

If I suspend the system via sysctl -w machdep.sleep_state=3 and
then resume, a consant stream of disk error messages appears.
The errors look like this:

wd0e: error reading fsbn blah blah retrying
wd0: (aborted command)
cgd1: error 5

There are several pairs of wd0e and wd0 messages for each cgd1 message.
The block numbers in the wd0e messages repeat a few times and then
change.  The errors scroll past rapidly and continuously.  The only
obvious way to recover it to power cycle the machine.

wd0 is an ordinary laptop SATA disk attached to an Intel
82801GBM/GHM controller (configured in the BIOS for compatibility
mode).  Here are some config messages:

    piixide0 at pci0 dev 31 function 2
    piixide0: Intel 82801GBM/GHM Serial ATA Controller (ICH7) (rev. 0x01)
    piixide0: bus-master DMA support present
    piixide0: primary channel wired to compatibility mode
    ioapic0: int14 0x69<vector=0x69,delmode=0x0,dest=0x0> 0x0<target=0x0>
    piixide0: primary channel interrupting at ioapic0 pin 14
    atabus0 at piixide0 channel 0

    wd0 at atabus0 drive 0: <Hitachi HTS542520K9SA00>
    wd0: drive supports 16-sector PIO transfers, LBA48 addressing
    wd0: 186 GB, 387621 cyl, 16 head, 63 sec, 512 bytes/sect x 390721968 sectors
    rnd: wd0 attached as an entropy source (collecting and estimating)
    wd0: 32-bit data port
    wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
    wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)

The disk has an MBR and a NetBSD disklabel.  wd0e is one of the disklabel
partitions.

cgd1 used wd0e as its backing store.

>How-To-Repeat:
suspend, then resume.

>Fix:
Unknown

>Release-Note:

>Audit-Trail:
From: David Young <dyoung@pobox.com>
To: apb@cequrux.com
Cc: port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: port-i386/40449: disk errors after ACPI suspend/resume
Date: Wed, 21 Jan 2009 12:50:43 -0600

 On Wed, Jan 21, 2009 at 06:00:00PM +0000, apb@cequrux.com wrote:
 > >Number:         40449
 > >Category:       port-i386
 > >Synopsis:       disk errors after ACPI suspend/resume
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       high
 > >Responsible:    port-i386-maintainer
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Wed Jan 21 18:00:00 +0000 2009
 > >Originator:     Alan Barrett
 > >Release:        NetBSD 5.99.1
 > >Organization:
 > Not much
 > >Environment:
 > System: NetBSD 5.99.10 i386
 > Architecture: i386
 > Machine: i386
 > >Description:
 > 
 > If I suspend the system via sysctl -w machdep.sleep_state=3 and
 > then resume, a consant stream of disk error messages appears.
 > The errors look like this:
 > 
 > wd0e: error reading fsbn blah blah retrying
 > wd0: (aborted command)
 > cgd1: error 5
 > 
 > There are several pairs of wd0e and wd0 messages for each cgd1 message.
 > The block numbers in the wd0e messages repeat a few times and then
 > change.  The errors scroll past rapidly and continuously.  The only
 > obvious way to recover it to power cycle the machine.
 > 
 > wd0 is an ordinary laptop SATA disk attached to an Intel
 > 82801GBM/GHM controller (configured in the BIOS for compatibility
 > mode).  Here are some config messages:
 > 
 >     piixide0 at pci0 dev 31 function 2
 >     piixide0: Intel 82801GBM/GHM Serial ATA Controller (ICH7) (rev. 0x01)
 >     piixide0: bus-master DMA support present
 >     piixide0: primary channel wired to compatibility mode
 >     ioapic0: int14 0x69<vector=0x69,delmode=0x0,dest=0x0> 0x0<target=0x0>
 >     piixide0: primary channel interrupting at ioapic0 pin 14
 >     atabus0 at piixide0 channel 0
 > 
 >     wd0 at atabus0 drive 0: <Hitachi HTS542520K9SA00>
 >     wd0: drive supports 16-sector PIO transfers, LBA48 addressing
 >     wd0: 186 GB, 387621 cyl, 16 head, 63 sec, 512 bytes/sect x 390721968 sectors
 >     rnd: wd0 attached as an entropy source (collecting and estimating)
 >     wd0: 32-bit data port
 >     wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
 >     wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)
 > 
 > The disk has an MBR and a NetBSD disklabel.  wd0e is one of the disklabel
 > partitions.
 > 
 > cgd1 used wd0e as its backing store.
 > 
 > >How-To-Repeat:
 > suspend, then resume.

 You may be able to narrow this down by using drvctl -S/-Q to
 suspend/resume wd0 and its parents, beginning with wd0:

         drvctl -S wd0; drvctl -Q wd0
         drvctl -S atabus0; drvctl -Q atabus0
         drvctl -S piixide0; drvctl -Q piixide0
         drvctl -S pci0; drvctl -Q pci0

 Let us see if one of those steps will reliably reproduce the problem.
 If so, then it may help both to have a look at the affected devices'
 PCI configuration before and after suspension/resumption, using
 pcictl(8), and to look at the devices' suspend/resume routines.

 It may be desirable to suspend cgd1 before suspending its backing
 store.  I don't know if cgd(4) suspends and resumes, though.  Not
 all disk drivers will refrain from trying to issue a read/write to
 the h/w while suspended.

 Dave

 -- 
 David Young             OJC Technologies
 dyoung@ojctech.com      Urbana, IL * (217) 278-3933

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 10 May 2009 13:58:31 +0000
State-Changed-Why:
feedback was requested


State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 23 Dec 2013 22:50:28 +0000
State-Changed-Why:
feedback timeout (nearly five years)

does anyone know if cgd(4) either (a) had detach support in 2009, or (b)
has detach support now? I recall this being an issue at some point.


State-Changed-From-To: open->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Tue, 26 Mar 2024 01:37:44 +0000
State-Changed-Why:
no feedback from submitter in over a decade
suspend/resume with cgd on wd has worked reliably for many years on
many machines


>Unformatted:
Home
PR Database Search
(Contact us) $NetBSD: gnats-precook-prs,v 1.4 2018/12/21 14:20:20 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.