NetBSD Problem Report #17618
Received: (qmail 15454 invoked by uid 605); 17 Jul 2002 14:00:59 -0000
Message-Id: <200207171400.g6HE0vG17815@cyclops.waterside.net>
Date: Wed, 17 Jul 2002 10:00:57 -0400 (EDT)
From: rafal@netbsd.org
Sender: gnats-bugs-owner@netbsd.org
Reply-To: rafal@netbsd.org
To: gnats-bugs@gnats.netbsd.org
Subject: CDIOCRESET ioctl on Toshiba CDROM@ahc0 hangs machine if no media
X-Send-Pr-Version: 3.95
>Number: 17618
>Category: port-i386
>Synopsis: CDIOCRESET ioctl on Toshiba CDROM@ahc0 hangs machine if no media
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-i386-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jul 17 14:01:01 +0000 2002
>Closed-Date:
>Last-Modified: Mon Apr 12 23:47:00 +0000 2004
>Originator:
>Release: NetBSD 1.6D
>Organization:
procrastinators anonymous
>Environment:
System: NetBSD cyclops 1.6D NetBSD 1.6D (CYCLOPS.mp) #16: Tue Jul 9 22:05:56 EDT 2002 rafal@cyclops:/extra/src-current/sys/arch/i386/compile/CYCLOPS.mp i386
Architecture: i386
Machine: i386
>Description:
The GNOME CD player applet would hang my machine hard if I removed
the CD from the drive or started it when there was no CD in the
drive.
I've gotten the problem down to a simple test program that (probably)
could be trimmed even further... The culprit is the CDIOCRESET ioctl,
which in turns does a Bus Device Reset at the SCSI level. This logs
a message about sending the BDR and then hangs the box (keyboard
still responds if running on a wscons console, but I'm unable to
suspend/kill the process nor switch to a different virtual console.
If in X, the keyboard/mouse *don't* respond and I'm still unable
to get to a different virtual console).
The controller is the following:
ahc0 at pci0 dev 16 function 0
ahc0: interrupting at apic 2 int 16 (irq 11)
ahc0: aic7890/91 Wide Channel A, SCSI Id=7, 16/255 SCBs
The CDROM is:
cd0 at scsibus0 target 5 lun 0: <TOSHIBA, CD-ROM XM-6401TA, 1015> SCSI2 5/cdrom removable
cd0: sync (50.0ns offset 16), 8-bit (20.000MB/s) transfers
There's also a CD-R/RW on the bus, and I haven't tried with that
device yet:
cd1 at scsibus0 target 4 lun 0: <PLEXTOR, CD-R PX-W124TS, 1.07> SCSI2 5/cdrom removable
cd1: sync (50.0ns offset 8), 8-bit (20.000MB/s) transfers
FWIW, the SCSI chain only has those two devices and a SCSI ZIP drive;
the system disks are all IDE, so the BDR shouldn't be interfering with
normal system disk I/O.
>How-To-Repeat:
Given:
/dev/cdrom points to a device for a CDROM hanging off an
`ahc' controller, as below:
rwxr-xr-x 1 root wheel 10 Jul 10 22:42 /dev/cdrom@ -> /dev/rcd0d
There is no media in that CDROM drive.
Now:
Run the following short program (best done on a wscons
text console so results are more evident), and it should
hang the PC (after printing the "Bus Device Reset" message
from the ahc driver).
I believe it should be sufficient to just to the CDIOCRESET
ioctl, without the CDIOCREADSUBCHANNEL first, but have not
verified this.
(Note also that this is a SMP kernel, and has had the signal
trampoline changes from the trunk pulled up... I'm not sure
if either is relevant -- I'll see if I can reproduce this
with a "stock" uniprocessor kernel or an SMP kernel without
any hand-applied changes and append the results here later)
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/cdio.h>
main()
{
int fd;
struct ioc_read_subchannel subchnl;
int errcode;
if ((fd = open("/dev/cdrom", O_RDONLY)) < 0) {
perror("Open /dev/cdrom");
exit(1);
}
memset(&subchnl, 0, sizeof(struct ioc_read_subchannel));
subchnl.address_format = CD_MSF_FORMAT;
subchnl.data_format = CD_CURRENT_POSITION;
subchnl.data_len = sizeof(struct cd_sub_channel_info);
subchnl.data = malloc(sizeof(struct cd_sub_channel_info));
if (ioctl(fd, CDIOCREADSUBCHANNEL, &subchnl) == -1) {
free(subchnl.data);
perror("read CD subchannel");
printf("about to reset cdrom\n");
fflush(stdout);
sleep(3);
if (ioctl(fd, CDIOCRESET, 0) == -1) {
perror("CD reset");
} else {
printf("cdrom reset OK\n");
fflush(stdout);
}
}
printf("CD data read ok!\n");
free(subchnl.data);
exit(0);
}
>Fix:
Unknown.
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback
State-Changed-By: tls
State-Changed-When: Thu Apr 1 02:45:52 UTC 2004
State-Changed-Why:
Does this still hang the machine, even with the new ahc driver andother
relevant changes?
From: Rafal Boni <rafal@pobox.com>
To: gnats-bugs@gnats.netbsd.org
Cc:
Subject: Re: port-i386/17618
Date: Wed, 31 Mar 2004 23:08:11 -0500
--==_Exmh_43067288369P
Content-Type: text/plain; charset=us-ascii
In message <20040401024629.8300.qmail@mail.netbsd.org>, you write:
-> Does this still hang the machine, even with the new ahc driver andother
-> relevant changes?
I haven't verified this super-recently, but it certinaly did do so even
after the new ahc driver. Thought, last time I tried it was probably now
almost four months ago (end of Nov '03 / beginning of Dec '03)... I'll
re-check in the next day or two, but I'm not too hopeful.
--rafal
----
Rafal Boni rafal@pobox.com
We are all worms. But I do believe I am a glowworm. -- Winston Churchill
--==_Exmh_43067288369P
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)
Comment: Exmh version 2.5 07/13/2001
iD8DBQFAa5WrEeBxM8fTAkwRAiTkAKCxi8fS7YhAN637e2yE5Mh3crBl9ACfTB8+
KnpotaqAXenRsY1xSNas7+U=
=WyWi
-----END PGP SIGNATURE-----
--==_Exmh_43067288369P--
From: Rafal Boni <rafal@pobox.com>
To: gnats-bugs@gnats.netbsd.org
Cc: tls@netbsd.org, port-i386-maintainer@netbsd.org, toddpw@netbsd.org
Subject: Re: port-i386/17618
Date: Tue, 06 Apr 2004 01:07:54 -0400
--==_Exmh_177407374795P
Content-Type: text/plain; charset=us-ascii
I've now verified that this still happens both with the most recent HEAD
and 2.0 kernels (GENERIC and GENERIC.MP from the latest releng builds I
could find for i386 for both HEAD and netbsd-2-0).
A few interesting things learned along the way:
* It happens on both MP and UP kernels (the machine in question
is an MP system).
* It happens on both the Toshiba *and* the Plextor CD-R/W attached
to the ahc (I didn't verify this with all combinations of kernel
type [MP vs. UP] and source branch, but did check at least two
of them)
* It doesn't matter if there is (good) media or not in the drive;
the CD reset ioctl will zorch the system in any case.
* When this happens, I get a steady stream of "Inactive SCB on
pending list" kernel messages scrolling on the console.
I believe toddpw also saw this same issue on one of his machines, so I'm
CC'ing him as well (he seemed to be doing a little bit of digging, so I
hope he finds something :-).
--rafal
----
Rafal Boni rafal@pobox.com
We are all worms. But I do believe I am a glowworm. -- Winston Churchill
--==_Exmh_177407374795P
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)
Comment: Exmh version 2.5 07/13/2001
iD8DBQFAcjspEeBxM8fTAkwRAsxsAKDdkNcqmdBoDb5SwgCAVPHn/2xgXwCgoV0g
8nPAWQDJlOD1RPqQH1/jnc4=
=0mYk
-----END PGP SIGNATURE-----
--==_Exmh_177407374795P--
State-Changed-From-To: feedback->open
State-Changed-By: rafal
State-Changed-When: Tue Apr 6 05:10:49 UTC 2004
State-Changed-Why:
This is still a problem both in netbsd-2-0 branch and in HEAD
From: toddpw@toddpw.org
To: gnats-bugs@NetBSD.org
Cc: toddpw@toddpw.org (Todd Whitesel)
Subject: Re: kern/17618
Date: Mon, 12 Apr 2004 16:45:59 -0700 (PDT)
I see a similar issue when I'm ripping a CD with very bad blocks
(an ATAPI drive just sits there trying to read the block forever).
I placed a README/dmesg/console-log at ftp://ftp.toddpw.org/scb-bug/
and did some reading of the driver; here's what I think is going on
based on the output in 'scb-log.txt':
1. userland issues an ioctl CDIOCRESET or SCIOCRESET
2. dev/scsipi/cd.c calls cd_reset() which sends cmd with XS_CTL_RESET
3. aic7xxx_osm.c:ahc_action() translates that to SCB_DEVICE_RESET
4. aic7xxx.c:ahc_setup_initiator_msgout() prints
"ahc1:Bus Device Reset Message Sent"
5. aic7xxx.c:ahc_search_untagged_queues() prints "Inactive SCB in untaggedQ"
6. aic7xxx.c:ahc_abort_scbs() prints "Inactive SCB on pending list" forever,
causing an apparent hang if you are running X. However, from the text
console, DDB can be invoked successfully.
Hypothesis:
When we reset the CD drive, we should flush all operations-in-progress
for that device, since it presumably did the same when it got the reset.
However, in this case, something goes wrong and we fail to clear something,
so we get infinite warnings about it -- consuming the machine.
I skimmed through a diff of our driver and the FreeBSD -current version of
the same file, but didn't spot anything obvious. Then again, I don't really
know what I should be looking for: a new call to a clearing routine, a fix
to the matching code that finds the SCB's to clear, or something completely
different.
--
Todd Whitesel
toddpw @ toddpw.org
>Unformatted:
kernel build from i386 smp branch with sigtramp changes
from trunk applied. Haven't yet tested with UP kernel.
Sources from ~ Jul 9th (late evening US/Eastern)
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.