NetBSD Problem Report #31003
From igor@string1.ciencias.uniovi.es Tue Aug 16 23:23:39 2005
Return-Path: <igor@string1.ciencias.uniovi.es>
Received: from FRESNO.NET.UNIOVI.ES (fresno.net.uniovi.es [156.35.11.2])
by narn.netbsd.org (Postfix) with ESMTP id 1E74463B116
for <gnats-bugs@gnats.NetBSD.org>; Tue, 16 Aug 2005 23:23:39 +0000 (UTC)
Message-Id: <20050816232337.4067E3D83@string1.ciencias.uniovi.es>
Date: Wed, 17 Aug 2005 01:23:37 +0200
From: igor@string1.ciencias.uniovi.es
Sender: igor@string1.ciencias.uniovi.es
Reply-To: igor@string1.ciencias.uniovi.es
To: gnats-bugs@netbsd.org
Subject: umass(4) panic provoked by Plextor portable hard disk drive
X-Send-Pr-Version: 3.95
>Number: 31003
>Category: kern
>Synopsis: umass(4) panic provoked by Plextor portable hard disk drive
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Aug 16 23:24:00 +0000 2005
>Closed-Date: Sun Apr 28 07:22:51 +0000 2019
>Last-Modified: Sun Apr 28 07:22:51 +0000 2019
>Originator: Igor Sobrado
>Release: NetBSD 2.0.2
>Organization:
University of Oviedo
>Environment:
System: NetBSD altair.v6.local 2.0.2 NetBSD 2.0.2 (GENERIC_LAPTOP) #0: Wed Mar 23 08:59:09 UTC 2005 jmc@faith.netbsd.org:/home/builds/ab/netbsd-2-0-2-RELEASE/i386/200503220140Z-obj/home/builds/ab/netbsd-2-0-2-RELEASE/src/sys/arch/i386/compile/GENERIC_LAPTOP i386
Architecture: i386
Machine: i386
>Description:
The Plextor portable hard disk drive PX-PH08U is a member of a new
family of USB mass storage devices. The PX-PH08U is a 80 GB,
2.5 inch, hard disk drive in an external USB enclosure. It does
not require an external power supply unit; as a consequence, the
disk is turned off as soon as the transition to suspend mode is
honored. I suspect that this fact can be related with the problem
outlined in this PR.
Set up used:
The PX-PH08U portable hard disk drive is a USB 2.0 device connected
to a Dell Latitude CPi R400GT laptop (BIOS rev. A14) on its USB 1.1
port. This laptop is running NetBSD 2.0.2 and has an internal
20 GB Hitachi HDD (IC25N020ATDA04). The Plextor portable hard disk
drive is identified as an USB mass storage device:
Aug 11 12:52:26 altair /netbsd: umass0 at uhub0 port 1 configuration 1 interface 0
Aug 11 12:52:26 altair /netbsd: umass0: Plextor S.A./N.V. PLEXTOR PX-PH, rev 2.00/3.02, addr 2
Aug 11 12:52:26 altair /netbsd: umass0: using SCSI over Bulk-Only
Aug 11 12:52:26 altair /netbsd: scsibus0 at umass0: 2 targets, 1 lun per target
The PX-PH08U portable hard disk drive contains an UFS-2 filesystem
on it:
altair# disklabel sd0
# /dev/rsd0d:
type: SCSI
disk: PX-PH08U/T3
label:
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 155127
total sectors: 156368016
rpm: 5400
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # microseconds
track-to-track seek: 0 # microseconds
drivedata: 0
4 partitions:
# size offset fstype [fsize bsize cpg/sgs]
a: 156368016 0 4.2BSD 1024 8192 46936 # (Cyl. 0 - 155126)
c: 156368016 0 unused 0 0 # (Cyl. 0 - 155126)
d: 156368016 0 unused 0 0 # (Cyl. 0 - 155126)
I usually mount this filesystem in /mnt:
altair# mount /dev/sd0a /mnt
altair# df -k
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/wd0a 45903 19049 24558 43% /
/dev/wd0f 31207 14472 15174 48% /var
/dev/wd0e 370295 163482 188298 46% /usr
/dev/wd0g 11476539 209415 10693297 1% /home
/dev/wd0h 247007 113069 121587 48% /usr/X11R6
/dev/wd0i 31207 4038 25608 13% /usr/contrib
/dev/wd0j 986743 1 937404 0% /usr/obj
/dev/wd0k 1973735 774224 1100824 41% /usr/pkg
/dev/wd0l 349711 159174 173051 47% /usr/pkgsrc
/dev/wd0m 1480391 693818 712553 49% /usr/src
/dev/wd0n 986743 445625 491780 47% /usr/xsrc
mfs:433 63959 27 60734 0% /tmp
kernfs 1 1 0 100% /kern
fdesc 1 1 0 100% /dev
/dev/sd0a 73559093 1 69881137 0% /mnt
Description of the problem:
When the computer goes into suspend mode (Fn+Suspend) the next messages
are registered in /var/log/messages:
Aug 16 23:27:26 altair /netbsd: umass0: BBB reset failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-in clear stall failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-out clear stall failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB reset failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-in clear stall failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-out clear stall failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB reset failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-in clear stall failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-out clear stall failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB reset failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-in clear stall failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-out clear stall failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB reset failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-in clear stall failed, STALLED
Aug 16 23:27:26 altair /netbsd: umass0: BBB bulk-out clear stall failed, STALLED
followed by the next error:
Aug 16 23:22:34 altair /netbsd: umass0: at uhub0 port 1 (addr 2) disconnected
Aug 16 23:22:34 altair /netbsd: sd0(umass0:0:0:0): generic HBA error
Aug 16 23:22:34 altair /netbsd: uvm_fault(0xc0601680, 0, 0, 1) -> 0xe
Once rebooted, both the internal HDD filesystems and the portable
hard disk drive filesystem must be checked for consistency.
I have classified this PR as a critical high priority problem because
both it can damage filesystems (in portable hard disk drives and
other system disks as the filesystems cannot be cleanly unmounted)
and it enters to the in-kernel debugger stopping the computer.
>How-To-Repeat:
An easy activity to reproduce the problem is mounting a filesystem
in the portable hard disk drive in a mounting point (e.g., /mnt)
and request a suspend mode.
>Fix:
As a temporary workaround, it is possible unmounting the portable
hard disk drive when a client requests a suspend mode. This action
can be configured for the related apmd(8) transition in the files
in /etc/apm. It must be clear that this workaround cannot be
considered a fix at all for production systems.
>Release-Note:
>Audit-Trail:
From: Igor Sobrado <igor@string1.ciencias.uniovi.es>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/31003
Date: Wed, 17 Aug 2005 09:11:22 +0200
Two random notes on this PR:
1. Dates in the /var/log/messages file output are related with
two different events. That is the reason the umass "BBB"
messages are dated Aug 16 23:27:26, but the umass/SCSI/UVM
errors are dated Aug 16 23:22:34.
2. Other USB mass storage devices (e.g., USB flash drives)
do not show this behavior.
Best regards,
Igor.
From: Igor Sobrado <igor@string1.ciencias.uniovi.es>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/31003
Date: Wed, 17 Aug 2005 09:53:11 +0200
I will try to attach a trace from the "db> " prompt in the next hours
if this information is useful to isolate the problem.
Cheers,
Igor.
From: Igor Sobrado <igor@string1.ciencias.uniovi.es>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/31003
Date: Wed, 17 Aug 2005 10:02:11 +0200
Sorry for sending a lot of emails about this problem, but I have found
a serious error in the problem report and I must fix it. Where I write:
"When the computer goes into suspend mode (Fn+Suspend) the next messages
are registered in /var/log/messages:"
I really wanted to write"
"When the computer RETURNS FROM suspend mode (Fn+Suspend)..."
The problem happens when the computer returns from suspend mode,
not when going into that mode. Certainly it is not the same!
I will attach a trace as soon as possible.
Best wishes,
Igor.
From: Igor Sobrado <igor@string1.ciencias.uniovi.es>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/31003
Date: Wed, 17 Aug 2005 20:21:31 +0200
Ok, I have connected the PX-PH08U drive to the USB port on my laptop,
mounted its FFS2 filesystem on /mnt, and requested a transition to
suspend mode again. As expected, after resuming from suspend state
the in-kernel debugger was invoked by the system and the next error
was registered in /var/log/messages:
Aug 17 19:44:32 altair /netbsd: umass0: at uhub0 port 1 (addr 2) disconnected
Aug 17 19:44:32 altair /netbsd: sd0(umass0:0:0:0): generic HBA error
Aug 17 19:44:32 altair /netbsd: uvm_fault(0xc0601680, 0, 0, 1) -> 0xe
Aug 17 19:44:32 altair last message repeated 3 times
Now, I will try to play a bit with the in-kernel debugger:
db> trace
uvm_fault(0xc0601680, 0, 0, 1) -> 0xe
kernel: page fault trap, code=0
Faulted in DDB; continuing...
Sorry, not a very helpful output. All information was in /var/log/messages.
The output of ps shows:
db> ps
PID PPID PGRP UID S FLAGS LWPS COMMAND WAIT
[...]
>4 0 0 0 2 0x20200 1 usb0
Best wishes,
Igor.
From: Carl Brewer <carl@bl.echidna.id.au>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/31003
Date: Mon, 21 Nov 2005 16:30:42 +1100
another data point to this, I have a similar setup - using ViPower
USB/PATA housings with large (160GB) IDE/PATA disks as backup
media for NetBSD LAN servers.
I get this error and server lockups :
Nov 19 03:13:30 mail /netbsd: umass0: BBB bulk-in clear stall failed,
TIMEOUT
Nov 19 03:14:35 mail /netbsd: umass0: BBB bulk-out clear stall failed,
TIMEOUT
Nov 19 03:16:45 mail /netbsd: umass0: BBB reset failed, TIMEOUT
Nov 19 03:17:50 mail /netbsd: umass0: BBB bulk-in clear stall failed,
TIMEOUT
Nov 19 03:18:55 mail /netbsd: umass0: BBB bulk-out clear stall failed,
TIMEOUT
Nov 19 03:21:05 mail /netbsd: umass0: BBB reset failed, TIMEOUT
Nov 19 03:22:10 mail /netbsd: umass0: BBB bulk-in clear stall failed,
TIMEOUT
Nov 19 03:23:15 mail /netbsd: umass0: BBB bulk-out clear stall failed,
TIMEOUT
Nov 19 03:25:25 mail /netbsd: umass0: BBB reset failed, TIMEOUT
Then the machine hangs and requires a power cycle.
The box is now :
NetBSD mail.cashmoredesign.com 2.1 NetBSD 2.1 (GENERIC) #0: Mon Oct 24
22:35:45 UTC 2005
jmc@faith.netbsd.org:/home/builds/ab/netbsd-2-1-RELEASE/i386/200510241747Z-obj/home/builds/ab/netbsd-2-1-RELEASE/src/sys/arch/i386/compile/GENERIC
i386
From: Igor Sobrado <igor@string1.ciencias.uniovi.es>
To: gnats-bugs@netbsd.org
Cc: Carl Brewer <carl@bl.echidna.id.au>
Subject: Re: kern/31003
Date: Thu, 29 Dec 2005 18:54:02 +0100
Hi Carl.
Thanks a lot for the feedback about your USB/PATA hard disk drive.
Indeed, perhaps both problems are related.
NetBSD 3 has a different behaviour and, perhaps, this fact will help
looking for a fix. I have attached the Plextor portable hard disk drive
to my laptop:
Dec 29 18:27:46 localhost /netbsd: umass0 at uhub0 port 1 configuration 1 interface 0
Dec 29 18:27:46 localhost /netbsd:
Dec 29 18:27:46 localhost /netbsd: umass0: Plextor S.A./N.V. PLEXTOR PX-PH, rev 2.00/3.02, addr 2
Dec 29 18:27:46 localhost /netbsd: umass0: using SCSI over Bulk-Only
Dec 29 18:27:46 localhost /netbsd: scsibus0 at umass0: 2 targets, 1 lun per target
Dec 29 18:27:46 localhost /netbsd: sd0 at scsibus0 target 0 lun 0: <PLEXTOR, PX-PH, 3.02> disk fixed
Dec 29 18:27:46 localhost /netbsd: sd0: 76351 MB, 155127 cyl, 16 head, 63 sec, 512 bytes/sect x 156368016 sectors
...and changed the laptop state to suspend mode (Fn+Suspend) again.
Now the sequence of events that follows once the computer returns
from the suspend state:
Dec 29 18:28:58 localhost /netbsd: atabus0: resuming...
Dec 29 18:28:58 localhost /netbsd: atabus1: resuming...
Dec 29 18:28:59 localhost /netbsd: pms0: pms_synaptics_resume: reset on resume 0 0xaa 0x00
Dec 29 18:28:59 localhost /netbsd: cbb1: wait took 0.009072s
Dec 29 18:28:59 localhost /netbsd: umass0: at uhub0 port 1 (addr 2) disconnected
Dec 29 18:28:59 localhost /netbsd: sd0(umass0:0:0:0): generic HBA error
Dec 29 18:28:59 localhost /netbsd: sd0: cache synchronization failed
Dec 29 18:28:59 localhost /netbsd: sd0 detached
Dec 29 18:28:59 localhost /netbsd: scsibus0 detached
Dec 29 18:28:59 localhost /netbsd: umass0 detached
Dec 29 18:28:58 localhost /netbsd: umass0 at uhub0 port 1 configuration 1 interface 0
Dec 29 18:28:58 localhost /netbsd:
Dec 29 18:28:58 localhost /netbsd: umass0: Plextor S.A./N.V. PLEXTOR PX-PH, rev 2.00/3.02, addr 2
Dec 29 18:28:58 localhost /netbsd: umass0: using SCSI over Bulk-Only
Dec 29 18:28:58 localhost /netbsd: scsibus0 at umass0: 2 targets, 1 lun per target
Dec 29 18:28:58 localhost /netbsd: sd0 at scsibus0 target 0 lun 0: <PLEXTOR, PX-PH, 3.02> disk fixed
Dec 29 18:28:58 localhost /netbsd: sd0: 76351 MB, 155127 cyl, 16 head, 63 sec, 512 bytes/sect x 156368016 sectors
Dec 29 18:30:27 localhost syslogd: restart
Dec 29 18:30:27 localhost /netbsd: uvm_fault(0xcbd71708, 0, 0, 1) -> 0xe
Dec 29 18:30:27 localhost /netbsd: syncing disks... done
Dec 29 18:30:27 localhost /netbsd: unmounting file systems...unmount of /dev failed with error 16
Dec 29 18:30:27 localhost /netbsd: done
Dec 29 18:30:27 localhost /netbsd: WARNING: some file systems would not unmount
Dec 29 18:30:27 localhost /netbsd: rebooting...
Ok, this time at least I was able to reboot the computer, but the
screen was turned off. There is an *unrelated* problem with APM
in NetBSD: sometimes the screen does not turn on again once the
machine returns from a power management state. Will this problem
be fixed? It is a bit annoying if someone does not type "xset s off"
before starting a talk and the computer does not turn on again!!!
(no... it did not happen to me... yet!).
The key here is that the laptop can be rebooted and only the portable
hard disk drive requires a consistency check now (in any case, I do not
want to try this experiement a lot of times!)
Cheers,
Igor.
From: Carl Brewer <carl@bl.echidna.id.au>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/31003
Date: Thu, 13 Apr 2006 16:58:00 +1000
for what it's worth, I still see random total machine
hangs with the same hardware - ViPower and 160GB IDE HDD
and swapped hardware, but the same - just new disk drives and caddies
etc) on 3.0/i386
dmesg shows :
umass0 at uhub4 port 3 configuration 1 interface 0
umass0: ViPowER, Inc. ViPowER USB2.0 Storage Device, rev 2.00/0.01, addr 2
umass0: using SCSI over Bulk-Only
scsibus0 at umass0: 2 targets, 1 lun per target
sd0 at scsibus0 target 0 lun 0: <ST316002, 1A, 0\0000> disk fixed
sd0: 149 GB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 312581808 sectors
It shows a lot of this in dmesg :
sd0: dos partition I/O error
sd0(umass0:0:0:0): Check Condition on CDB: 0x28 00 00 00 00 00 00 00 01 00
SENSE KEY: Aborted Command
ASC/ASCQ: Logical Unit Communication CRC Error
sd0(umass0:0:0:0): Check Condition on CDB: 0x28 00 00 00 00 00 00 00 01 00
SENSE KEY: Aborted Command
ASC/ASCQ: Logical Unit Communication CRC Error
sd0(umass0:0:0:0): Check Condition on CDB: 0x28 00 00 00 00 00 00 00 01 00
SENSE KEY: Aborted Command
ASC/ASCQ: Logical Unit Communication CRC Error
sd0(umass0:0:0:0): Check Condition on CDB: 0x28 00 00 00 00 00 00 00 01 00
SENSE KEY: Aborted Command
ASC/ASCQ: Logical Unit Communication CRC Error
sd0(umass0:0:0:0): Check Condition on CDB: 0x28 00 00 00 00 00 00 00 01 00
SENSE KEY: Aborted Command
ASC/ASCQ: Logical Unit Communication CRC Error
Does the above suggest a hardware problem or a problem with
the umass driver?
Is anyone looking at this or doing anything similar? I'm
trying (but having no luck!) to use these removable HDDs
for backup devices, by mounting them, dumping the FS, and
umounting so the users at the site can take the HDD home as
an offsite backup.
State-Changed-From-To: open->feedback
State-Changed-By: gutteridge@NetBSD.org
State-Changed-When: Thu, 21 Mar 2019 00:38:08 +0000
State-Changed-Why:
Is this still duplicable? There have been a lot of changes in the last
14 years. There was a related PR 31428 that I confirmed was fixed ten
years ago. I also just pushed 1TB of data to an external HD via umass
without issue.
State-Changed-From-To: feedback->closed
State-Changed-By: gutteridge@NetBSD.org
State-Changed-When: Sun, 28 Apr 2019 07:22:51 +0000
State-Changed-Why:
Feedback timeout. There have been improvements made since this was
filed, and I can't reproduce this with either an external USB hard
drive or a USB stick.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.