NetBSD Problem Report #17618

Received: (qmail 15454 invoked by uid 605); 17 Jul 2002 14:00:59 -0000
Message-Id: <200207171400.g6HE0vG17815@cyclops.waterside.net>
Date: Wed, 17 Jul 2002 10:00:57 -0400 (EDT)
From: rafal@netbsd.org
Sender: gnats-bugs-owner@netbsd.org
Reply-To: rafal@netbsd.org
To: gnats-bugs@gnats.netbsd.org
Subject: CDIOCRESET ioctl on Toshiba CDROM@ahc0 hangs machine if no media
X-Send-Pr-Version: 3.95

>Number:         17618
>Category:       port-i386
>Synopsis:       CDIOCRESET ioctl on Toshiba CDROM@ahc0 hangs machine if no media
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-i386-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 17 14:01:01 +0000 2002
>Closed-Date:    
>Last-Modified:  Mon Apr 12 23:47:00 +0000 2004
>Originator:     
>Release:        NetBSD 1.6D
>Organization:
procrastinators anonymous
>Environment:
System: NetBSD cyclops 1.6D NetBSD 1.6D (CYCLOPS.mp) #16: Tue Jul 9 22:05:56 EDT 2002 rafal@cyclops:/extra/src-current/sys/arch/i386/compile/CYCLOPS.mp i386
Architecture: i386
Machine: i386
>Description:
	The GNOME CD player applet would hang my machine hard if I removed
	the CD from the drive or started it when there was no CD in the
	drive.

	I've gotten the problem down to a simple test program that (probably)
	could be trimmed even further... The culprit is the CDIOCRESET ioctl,
	which in turns does a Bus Device Reset at the SCSI level.  This logs
	a message about sending the BDR and then hangs the box (keyboard 
	still responds if running on a wscons console, but I'm unable to
	suspend/kill the process nor switch to a different virtual console.
	If in X, the keyboard/mouse *don't* respond and I'm still unable
	to get to a different virtual console).

	The controller is the following:
ahc0 at pci0 dev 16 function 0
ahc0: interrupting at apic 2 int 16 (irq 11)
ahc0: aic7890/91 Wide Channel A, SCSI Id=7, 16/255 SCBs

	The CDROM is:
cd0 at scsibus0 target 5 lun 0: <TOSHIBA, CD-ROM XM-6401TA, 1015> SCSI2 5/cdrom removable
cd0: sync (50.0ns offset 16), 8-bit (20.000MB/s) transfers

	There's also a CD-R/RW on the bus, and I haven't tried with that
	device yet:
cd1 at scsibus0 target 4 lun 0: <PLEXTOR, CD-R   PX-W124TS, 1.07> SCSI2 5/cdrom removable
cd1: sync (50.0ns offset 8), 8-bit (20.000MB/s) transfers

	FWIW, the SCSI chain only has those two devices and a SCSI ZIP drive;
	the system disks are all IDE, so the BDR shouldn't be interfering with
	normal system disk I/O.

>How-To-Repeat:
	Given:
		/dev/cdrom points to a device for a CDROM hanging off an
		`ahc' controller, as below:

rwxr-xr-x  1 root  wheel  10 Jul 10 22:42 /dev/cdrom@ -> /dev/rcd0d

		There is no media in that CDROM drive.

	Now:
		Run the following short program (best done on a wscons
		text console so results are more evident), and it should
		hang the PC (after printing the "Bus Device Reset" message
		from the ahc driver).

		I believe it should be sufficient to just to the CDIOCRESET
		ioctl, without the CDIOCREADSUBCHANNEL first, but have not
		verified this.

		(Note also that this is a SMP kernel, and has had the signal
		 trampoline changes from the trunk pulled up... I'm not sure
		 if either is relevant -- I'll see if I can reproduce this
		 with a "stock" uniprocessor kernel or an SMP kernel without
		 any hand-applied changes and append the results here later)

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/cdio.h>

main()
{
       int fd;
       struct ioc_read_subchannel subchnl;
       int errcode;

       if ((fd = open("/dev/cdrom", O_RDONLY)) < 0) {
	  perror("Open /dev/cdrom");
	  exit(1);
       }

       memset(&subchnl, 0, sizeof(struct ioc_read_subchannel));

       subchnl.address_format = CD_MSF_FORMAT;
       subchnl.data_format = CD_CURRENT_POSITION;

       subchnl.data_len = sizeof(struct cd_sub_channel_info);
       subchnl.data = malloc(sizeof(struct cd_sub_channel_info));

       if (ioctl(fd, CDIOCREADSUBCHANNEL, &subchnl) == -1) {
               free(subchnl.data);
	       perror("read CD subchannel");

	       printf("about to reset cdrom\n");
	       fflush(stdout);
	       sleep(3);

       	       if (ioctl(fd, CDIOCRESET, 0) == -1) {
		       perror("CD reset");
	       } else {
		       printf("cdrom reset OK\n");
		       fflush(stdout);
	       }
       }

       printf("CD data read ok!\n");
       free(subchnl.data);
       exit(0);
}

>Fix:
	Unknown.

>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: tls 
State-Changed-When: Thu Apr 1 02:45:52 UTC 2004 
State-Changed-Why:  
Does this still hang the machine, even with the new ahc driver andother 
relevant changes? 

From: Rafal Boni <rafal@pobox.com>
To: gnats-bugs@gnats.netbsd.org
Cc:  
Subject: Re: port-i386/17618 
Date: Wed, 31 Mar 2004 23:08:11 -0500

 --==_Exmh_43067288369P
 Content-Type: text/plain; charset=us-ascii

 In message <20040401024629.8300.qmail@mail.netbsd.org>, you write: 

 -> Does this still hang the machine, even with the new ahc driver andother
 -> relevant changes?

 I haven't verified this super-recently, but it certinaly did do so even
 after the new ahc driver.  Thought, last time I tried it was probably now
 almost four months ago (end of Nov '03 / beginning of Dec '03)...  I'll
 re-check in the next day or two, but I'm not too hopeful.

 --rafal

 ----
 Rafal Boni                                                     rafal@pobox.com
   We are all worms.  But I do believe I am a glowworm.  -- Winston Churchill

 --==_Exmh_43067288369P
 Content-Type: application/pgp-signature

 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.3 (NetBSD)
 Comment: Exmh version 2.5 07/13/2001

 iD8DBQFAa5WrEeBxM8fTAkwRAiTkAKCxi8fS7YhAN637e2yE5Mh3crBl9ACfTB8+
 KnpotaqAXenRsY1xSNas7+U=
 =WyWi
 -----END PGP SIGNATURE-----

 --==_Exmh_43067288369P--

From: Rafal Boni <rafal@pobox.com>
To: gnats-bugs@gnats.netbsd.org
Cc: tls@netbsd.org, port-i386-maintainer@netbsd.org, toddpw@netbsd.org
Subject: Re: port-i386/17618 
Date: Tue, 06 Apr 2004 01:07:54 -0400

 --==_Exmh_177407374795P
 Content-Type: text/plain; charset=us-ascii

 I've now verified that this still happens both with the most recent HEAD
 and 2.0 kernels (GENERIC and GENERIC.MP from the latest releng builds I
 could find for i386 for both HEAD and netbsd-2-0).

 A few interesting things learned along the way:
 	* It happens on both MP and UP kernels (the machine in question
 	  is an MP system).
 	* It happens on both the Toshiba *and* the Plextor CD-R/W attached
 	  to the ahc (I didn't verify this with all combinations of kernel
 	  type [MP vs. UP] and source branch, but did check at least two
 	  of them)
 	* It doesn't matter if there is (good) media or not in the drive;
 	  the CD reset ioctl will zorch the system in any case.
 	* When this happens, I get a steady stream of "Inactive SCB on
 	  pending list" kernel messages scrolling on the console.

 I believe toddpw also saw this same issue on one of his machines, so I'm
 CC'ing him as well (he seemed to be doing a little bit of digging, so I
 hope he finds something :-).

 --rafal

 ----
 Rafal Boni                                                     rafal@pobox.com
   We are all worms.  But I do believe I am a glowworm.  -- Winston Churchill

 --==_Exmh_177407374795P
 Content-Type: application/pgp-signature

 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.3 (NetBSD)
 Comment: Exmh version 2.5 07/13/2001

 iD8DBQFAcjspEeBxM8fTAkwRAsxsAKDdkNcqmdBoDb5SwgCAVPHn/2xgXwCgoV0g
 8nPAWQDJlOD1RPqQH1/jnc4=
 =0mYk
 -----END PGP SIGNATURE-----

 --==_Exmh_177407374795P--
State-Changed-From-To: feedback->open 
State-Changed-By: rafal 
State-Changed-When: Tue Apr 6 05:10:49 UTC 2004 
State-Changed-Why:  
This is still a problem both in netbsd-2-0 branch and in HEAD 

From: toddpw@toddpw.org
To: gnats-bugs@NetBSD.org
Cc: toddpw@toddpw.org (Todd Whitesel)
Subject: Re: kern/17618
Date: Mon, 12 Apr 2004 16:45:59 -0700 (PDT)

 I see a similar issue when I'm ripping a CD with very bad blocks
 (an ATAPI drive just sits there trying to read the block forever).

 I placed a README/dmesg/console-log at ftp://ftp.toddpw.org/scb-bug/
 and did some reading of the driver; here's what I think is going on
 based on the output in 'scb-log.txt':

   1. userland issues an ioctl CDIOCRESET or SCIOCRESET
   2. dev/scsipi/cd.c calls cd_reset() which sends cmd with XS_CTL_RESET
   3. aic7xxx_osm.c:ahc_action() translates that to SCB_DEVICE_RESET
   4. aic7xxx.c:ahc_setup_initiator_msgout() prints
 	"ahc1:Bus Device Reset Message Sent"
   5. aic7xxx.c:ahc_search_untagged_queues() prints "Inactive SCB in untaggedQ"
   6. aic7xxx.c:ahc_abort_scbs() prints "Inactive SCB on pending list" forever,
 	causing an apparent hang if you are running X. However, from the text
 	console, DDB can be invoked successfully.

 Hypothesis:

 When we reset the CD drive, we should flush all operations-in-progress
 for that device, since it presumably did the same when it got the reset.
 However, in this case, something goes wrong and we fail to clear something,
 so we get infinite warnings about it -- consuming the machine.

 I skimmed through a diff of our driver and the FreeBSD -current version of
 the same file, but didn't spot anything obvious. Then again, I don't really
 know what I should be looking for: a new call to a clearing routine, a fix
 to the matching code that finds the SCB's to clear, or something completely
 different.

 -- 
 Todd Whitesel
 toddpw @ toddpw.org
>Unformatted:
 		kernel build from i386 smp branch with sigtramp changes 
 		from trunk applied.  Haven't yet tested with UP kernel.

 		Sources from ~ Jul 9th (late evening US/Eastern)

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.