NetBSD Problem Report #46683

From mm_lists@pulsar-zone.net  Tue Jul 10 11:50:55 2012
Return-Path: <mm_lists@pulsar-zone.net>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id DB00263B85F
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 10 Jul 2012 11:50:54 +0000 (UTC)
Message-Id: <201207101150.q6ABopx5022323@ginseng.pulsar-zone.net>
Date: Tue, 10 Jul 2012 07:50:51 -0400
From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@gnats.NetBSD.org
Subject: netbsd-6/amd64 cd0 hanging after burning

>Number:         46683
>Category:       kern
>Synopsis:       netbsd-6/amd64 cd0 hanging after burning
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jul 10 11:55:00 +0000 2012
>Last-Modified:  Mon Jul 16 10:50:02 +0000 2012
>Originator:     Matthew Mondor
>Release:        NetBSD 6.0_BETA2
>Organization:
>Environment:
System: NetBSD ninja.xisop 6.0_BETA2 NetBSD 6.0_BETA2 (GENERIC_MM) #16: Fri Jun 15 10:51:25 EDT 2012 root@ninja.xisop:/usr/obj/sys/arch/amd64/compile/GENERIC_MM amd64
Architecture: x86_64
Machine: amd64
>Description:

This problem is unfortunately intermittent.  When closing the session
after successfully burning a DVD, it can occur that the device freezes
with blinking lights on.  When this occurs, the growisofs process will
eventually timeout, and proceed to the "reloading tray" part, which
fails, then growisofs quits.

However, the device remains stuck with the blinking light on, and it
cannot be ejected.  A system reboot is necessary to use it again.

[...]
 99.87% done, estimate finish Tue Jul 10 06:36:56 2012
Total translation table size: 0
Total rockridge attributes bytes: 707
Total directory bytes: 0
Path table size(bytes): 10
Max brk space used 0
2142858 extents written (4185 MB)
builtin_dd: 2142864*2KB out @ average 10.3x1352KBps
/dev/rcd0d: flushing cache
/dev/rcd0d: updating RMA
/dev/rcd0d: closing session
^C/dev/rcd0d: reloading tray
:-( unable to reload tray: Input/output error
      922.78 real         1.40 user         9.11 sys
ninja# 

Possibly relevant dmesg parts:

ahcisata0 at pci0 dev 31 function 2: vendor 0x8086 product 0x1c02 (rev. 0x05)
ahcisata0: interrupting at ioapic0 pin 20
ahcisata0: 64-bit DMA
ahcisata0: AHCI revision 1.30, 6 ports, 32 slots, CAP 0xe730ff45<EMS,PSC,SSC,PMD,ISS=0x3=Gen3,SCLO,SAL,SALP,SSNTF,SNCQ,S64A>
atabus2 at ahcisata0 channel 0
atabus3 at ahcisata0 channel 1
atabus4 at ahcisata0 channel 4
[...]
atabus4 at ahcisata0 channel 4
[...]
atapibus0 at atabus4: 1 targets
cd0 at atapibus0 drive 0: <HL-DT-ST DVDRAM GH22NS90, K5BB9112905, HN00S30> cdrom removable
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
cd0(ahcisata0:4:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100) (using DMA)
[...]
cd0(ahcisata0:4:0):  DEFERRED ERROR, key = 0x2
cd0: dos partition I/O error

I forgot since the last time it occurred the WCHAN of the locked
growisofs process when it's stuck until the I/O error occurs.  I'll
post a followup with it the next time it occurs.

If attempting to use the device when it's locked after growisofs exits,
many processes begin to be locked temporarily, depending on the
syscalls they are making and/or possibly the devices they access.
These, and more and more processes may remain locked a while, until an
I/O error occurs on the CD related command (i.e. mount), when all
processes unlock too.  Unfortunately I could not yet get the WCHAN in
which those were locking up (even ps as a first command locks then),
and there are no new messages in dmesg after these general locks.
Possibly that I could get eventually get into the debugger and know.

>How-To-Repeat:

growisofs -Z /dev/rcd0d -r -J -joliet-long cd/

>Fix:

>Audit-Trail:
From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46683: netbsd-6/amd64 cd0 hanging after burning
Date: Mon, 16 Jul 2012 06:45:34 -0400

 On Tue, 10 Jul 2012 11:55:00 +0000 (UTC)
 Matthew Mondor <mm_lists@pulsar-zone.net> wrote:

 > I forgot since the last time it occurred the WCHAN of the locked
 > growisofs process when it's stuck until the I/O error occurs.  I'll
 > post a followup with it the next time it occurs.

 The locking seems to have occurred again, and before it times out, the
 WCHAN appears to be biowait for a while:

 $ ps axl | grep growisofs
    0 15120 24723    0  95 -20   22352    1908 biowait D<+  ttyp6   0:03.44 -reload 4 /dev/rcd0d 0 (growisofs)
    0 24723   417    0  85   0    9736    1036 wait    I+   ttyp6   0:00.00 /usr/bin/time /usr/pkg/bin/growisofs -Z /dev/rcd0d -r -J -joliet-long cd/ 

 And eventually, in xscmd for a while:

    0 15120 24723    0  95 -20   22352    1912 xscmd   D<+  ttyp6   0:03.44 -reload 4 /dev/rcd0d 0 (growisofs)

 Then as before, growisofs eventually exits reporting an I/O error after
 at least 10 minutes, but the device is unusable (and system unstable if
 trying to use it) until a reboot.
 -- 
 Matt

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.