NetBSD Problem Report #5133

Received: (qmail 12222 invoked from network); 8 Mar 1998 04:04:11 -0000
Message-Id: <199803080405.UAA08722@nooksack.ldc.cs.wwu.edu>
Date: Sat, 7 Mar 1998 20:05:40 -0800 (PST)
From: revilak@umbsky.cc.umb.edu
To: gnats-bugs@gnats.netbsd.org
Subject: SCSI Errors and gradual drive corruption
X-Send-Pr-Version: www-1.0

>Number:         5133
>Category:       port-mac68k
>Synopsis:       SCSI Errors and gradual drive corruption
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-mac68k-maintainer
>State:          suspended
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Mar 07 20:05:00 +0000 1998
>Closed-Date:    
>Last-Modified:  Thu Apr 01 17:46:16 +0000 2004
>Originator:     Steve Revilak
>Release:        1.3
>Organization:
>Environment:
NetBSD  1.3 NetBSD 1.3 (GENERIC) #56: Wed Dec 31 13:40:30 PST 1997     allen@wormey:/usr/src/sys/arch/mac68k/compile/GENERIC mac68k


>Description:
Hardware description: Quadra 605 (with FPU).
Filesystem is placed on a Syquest EZ135 drive.


The most outward manifestation is in the appearance of the error messages:  

Mar  6 13:39:40  /netbsd: dmaintr: discarded 32 b (last transfer was 1008 b).
Mar  6 13:39:40  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase 0, resid 3f0

When booting in single user mode, it happens only occasionally.
When in Multi-User, this message will appear every 30-120 seconds,
sometimes several in 'groups' of several lines. Happends regardless
of what the operator is doing--running a program, working at the
command line level, etc.

In single user mode, the messages are a minor muisance. In multiuser,
they flood the screen to the point of rendering the system unusable.

Third, if presented with an operation that will perform a moderately
large SCSI transaction (ie- redirecting the output of ls-lR to a
text file while at the top of the file system, the kernal will tend
to panic and crash. Several of the dmanintr:/esp0: messages appear,
before winding up at the debugger prompt:

panic:  blkfree:  freeing block data
Stopped at _Debugger + 0x6:   unlk  a6

I tried ls-lR > Filename 4 times. Crashed twice.

Finally, there is the less immediately obvious symptom of gradual
filesystem corruption. Under *light* single user use, after several
boots, the user will inevitably receive notice that the file system
is damaged and the advisory 'run FSCK_FFS by hand'. Upon a read-only
boot, running fsck_ffs presents the user with a host of errors:

CHECK BLOCKS.  UNKNOWN FILE TYPE I=nnnnn
CLEAR? [yn]

Followed by a corresponding string of:

UNALLOCATED I=XXXX OWNER=Root MODE=0 Size =0 MTIME=xxxxx NAME=xxxx
REMOVE [yn]

These appear in groups of (typically) 1-2 dozen each.  Sometimes more.

Since I've read a few things which tend to tangentally indicate
that it is better to respond 'y' at fsck prompts, I did this....
A few days later, so many items had been 'zapped' that it was
necessary to reinstall the filesystems in order to boot.  When
answering 'n' to the above prompts and then 'y' to the succeeding
string of 'REATTACH prompts', it appears that fsck can manage to
put the pieces back together.   I've only taken this approach
recently, after having reinstalled 3 times.  I'll know more later.


>How-To-Repeat:
Searching the port-mac68k digest archives, I see that there are
several reports of this same problem, mostly among users of Syquest
removable drives.  I'd say the easiest way to reproduce this would
be to install NetBSD on a Syquest cart and boot multi user.  Spend
five minutes typing in vi to see how much work one is able to
accomplish without having the screen flooded with error messages.
Alternatively from the '/' directory, issuing the command

# ls -lR > SomeFileNameHere does a pretty good job of mucking things up.

To verify the file corruption?  Run fsck every other boot.  That
should reveal plenty.  The big problem here--in this state, the
system is neither highly stable or (as a multi-user environment)
really usable.

>Fix:
None found.  I've send e-mail to the ones who posted similar problem
on port-mac68k to see if they have come up with anything.  I'm
waiting for replies and will gladly forward any info (provided you
let me know who I should be forwarding to...).

Additionally, I am willing to participate in any testing/bug-tracking
if it will help to resolve this matter.

Please feel free to contact me if there is any other useful
information I can provide.

>Release-Note:
>Audit-Trail:

From: Steve Revilak <revilak@umbsky.cc.umb.edu>
To: gnats-bugs@gnats.netbsd.org, gnats-admin@NetBSD.ORG
Cc:  Subject: Re: bin/5133: SCSI Errors and gradual drive corruption
Date: Sun, 8 Mar 1998 13:51:59 -0500

 }Thank you very much for your problem report.
 }It has the internal identification `bin/5133'.
 }The individual assigned to look at your
 }bug is: bin-bug-people (Utility Bug People).
 }
 }>Category:       bin
 }>Responsible:    bin-bug-people
 }>Synopsis:       SCSI Errors and gradual drive corruption
 }>Arrival-Date:   Sat Mar  7 20:05:00 1998

 It's been mentioned that lack of Apple Documentation has been a problem.
 Regarding SCSI, I don't know if the following will be helpful, but here it
 is anyway...

 http://devworld.apple.com/dev/techsupport/insidemac/Devices/Devices-2.html

 Steve Revilak
 revilak@umbsky.cc.umb.edu


Responsible-Changed-From-To: bin-bug-people->gnats-admin 
Responsible-Changed-By: fair 
Responsible-Changed-When: Fri Apr 10 02:02:25 PDT 1998 
Responsible-Changed-Why:  
This bug is more properly placed in the "port-mac68k" category, instead of 
in the "bin" category, which is for bugs in user-level programs; this is 
pretty clearly a kernel bug. Whether it's a Machine Independent (kern) or 
Machine Specific (port-mac68k in this case) problem in the SCSI drivers is 
a question best dealt with by the people more familiar with the NetBSD/mac68k 
port. 

One thing that would probably help diagnose the problem is the output of the 
"dmesg" command right after the system has come up in multi-user mode; this 
should contain the complete kernel autoconfiguration output (and any errors 
therein). 

W.R.T. "Apple Documentation" the problem is lack of *hardware* documentation
of the various Macintosh models - this is an area that Inside Macintosh does
not really cover, alas (no doubt because such documentation would make both
alternative Operating Systems (e.g. NetBSD) and clone hardware much easier to
produce.

Generally, where Apple has used industry standard chips (e.g. the Zilog 8530
SCC for serial ports), we can get the documentation from the chip manufacturer.
Where Apple has used Application Specific Integrated Circuits (ASICs), we must
engage in reverse engineering - a tedious process that is mostly trial and
error, and successive approximation.

Date:    Tue, 14 Apr 1998 09:23:02 -0400
From:    Steve Revilak <revilak@umbsky.cc.umb.edu>
Subject: Re: bin/5133
To:      fair@NetBSD.ORG
In-Reply-To: <19980410092011.19286.qmail@mail.NetBSD.ORG>

Eric,

	Apologies that it took me a few days to do this.  Had to Pick up a new
EZ- drive cartridge and install NetBSD on it.  Here is the output of dmesg
as well as a copy of /var/log/messages.  I was in multiuser mode, booted
from the cartridge for about 5 minutes.

	Please let me know if I can provide any other useful information.

	Thanks!

Stee Revilak
revilak@umbsky.cc.umb.edu



-------------------output of dmesg follows-------------
NetBSD 1.3 (GENERIC) #56: Wed Dec 31 13:40:30 PST 1997
    allen@wormey:/usr/src/sys/arch/mac68k/compile/GENERIC
Apple Macintosh Quadra 605/33  (68040)
cpu: delay factor 350
real mem = 20971520
avail mem = 16941056
using 204 buffers containing 835584 bytes of memory
mrg: 'Quadra/Centris 605 ROMs' ROM glue, tracing off, debug off, silent
traps
mrg: I/O map kludge for ROMs that use hardware addresses directly.
adb: bus subsystem
Got following HwCfgFlags: 0xfc00, 0x 500183f, 0x23804926, 0x  a2f2c0
mrg: setup_egret:
mrg: setup_egret: done.
\^H\^H\^H\^H\^H\^H\^H\^H\^H\^Hadb: extended keyboard at 2
adb: 200 dpi mouse at 3
adb: WACOM ArtPad II at 4
adb: 200 dpi mouse at 15
mainbus0 (root)
obio0 at mainbus0
adb0 at obio0 (ADB event device)
asc0 at obio0: Apple Sound Chip
intvid0 at obio0: DAFB: Monitor sense 1.
intvid0: 640 x 480, monochrome
grf0 at intvid0
ite at grf0 not configured
esp0 at obio0 (quick): address 0x897000: NCR53C96, 16MHz, SCSI ID 7
scsibus0 at esp0: 8 targets
sd0 at scsibus0 targ 0 lun 0: <QUANTUM, FIREBALL_TM1280S, 300Z> SCSI2
0/direct fixed
sd0: 1222MB, 6810 cyl, 2 head, 183 sec, 512 bytes/sect x 2503872 sectors
sd1 at scsibus0 targ 2 lun 0: <SyQuest, EZ135S, 1_13> SCSI2 0/direct
removable
sd1: 128MB, 3195 cyl, 1 head, 82 sec, 512 bytes/sect x 262144 sectors
cd0 at scsibus0 targ 3 lun 0: <PIONEER, CD-ROM DR-124X, 1.01> SCSI2 5/cdrom
removable
zsc0 at obio0 chip type 0
zsc0 channel 0: d_speed   9600 DCD clk 0 CTS clk 0
zstty0 at zsc0 channel 0
zsc0 channel 1: d_speed   9600 DCD clk 0 CTS clk 0
zstty1 at zsc0 channel 1
nubus0 at mainbus0
fpu0 at mainbus0 (mc68040)
boot device: sd1
root on sd1a dumps on sd1b
PRAM time does not appear to have been read correctly.
PRAM: 0x83da82c4, macos_boottime: 0x35331f96.
root file system type: ffs
dmaintr: discarded 32 b (last transfer was 6128 b).
esp0: !TC [intr 10, stat 87, step 4] prevphase 0, resid 17f0
dmaintr: discarded 32 b (last transfer was 1008 b).
esp0: !TC [intr 10, stat 87, step 4] prevphase 0, resid 3f0
dmaintr: discarded 32 b (last transfer was 4080 b).
esp0: !TC [intr 10, stat 87, step 4] prevphase 0, resid ff0
dmaintr: discarded 48 b (last transfer was 2016 b).
esp0: !TC [intr 10, stat 87, step 4] prevphase 0, resid 7e0
dmaintr: discarded 32 b (last transfer was 2032 b).
esp0: !TC [intr 10, stat 87, step 4] prevphase 0, resid 7f0
dmaintr: discarded 32 b (last transfer was 5104 b).
esp0: !TC [intr 10, stat 87, step 4] prevphase 0, resid 13f0
dmaintr: discarded 32 b (last transfer was 1008 b).
esp0: !TC [intr 10, stat 87, step 4] prevphase 0, resid 3f0



-----------copy of /var/log/messages-----------------
Apr 14 05:37:40  syslogd: restart
Apr 14 05:37:40  /netbsd: NetBSD 1.3 (GENERIC) #56: Wed Dec 31 13:40:30 PST
1997
Apr 14 05:37:40  /netbsd:
allen@wormey:/usr/src/sys/arch/mac68k/compile/GENERIC
Apr 14 05:37:40  /netbsd: Apple Macintosh Quadra 605/33  (68040)
Apr 14 05:37:40  /netbsd: cpu: delay factor 350
Apr 14 05:37:40  /netbsd: real mem = 20971520
Apr 14 05:37:40  /netbsd: avail mem = 16941056
Apr 14 05:37:40  /netbsd: using 204 buffers containing 835584 bytes of
memory
Apr 14 05:37:40  /netbsd: mrg: 'Quadra/Centris 605 ROMs' ROM glue, tracing
off, debug off, silent traps
Apr 14 05:37:40  /netbsd: mrg: I/O map kludge for ROMs that use hardware
addresses directly.
Apr 14 05:37:40  /netbsd: adb: bus subsystem
Apr 14 05:37:40  /netbsd: Got following HwCfgFlags: 0xfc00, 0x 500183f,
0x23804926, 0x  a2f2c0
Apr 14 05:37:40  /netbsd: mrg: setup_egret:
Apr 14 05:37:40  /netbsd: mrg: setup_egret: done.
Apr 14 05:37:40  /netbsd: adb: extended keyboard at 2
Apr 14 05:37:40  /netbsd: adb: 200 dpi mouse at 3
Apr 14 05:37:40  /netbsd: adb: WACOM ArtPad II at 4
Apr 14 05:37:41  /netbsd: adb: 200 dpi mouse at 15
Apr 14 05:37:41  /netbsd: mainbus0 (root)
Apr 14 05:37:41  /netbsd: obio0 at mainbus0
Apr 14 05:37:41  /netbsd: adb0 at obio0 (ADB event device)
Apr 14 05:37:41  /netbsd: asc0 at obio0: Apple Sound Chip
Apr 14 05:37:41  /netbsd: intvid0 at obio0: DAFB: Monitor sense 1.
Apr 14 05:37:41  /netbsd: intvid0: 640 x 480, monochrome
Apr 14 05:37:41  /netbsd: grf0 at intvid0
Apr 14 05:37:41  /netbsd: ite at grf0 not configured
Apr 14 05:37:41  /netbsd: esp0 at obio0 (quick): address 0x897000:
NCR53C96, 16MHz, SCSI ID 7
Apr 14 05:37:41  /netbsd: scsibus0 at esp0: 8 targets
Apr 14 05:37:41  /netbsd: sd0 at scsibus0 targ 0 lun 0: <QUANTUM,
FIREBALL_TM1280S, 300Z> SCSI2 0/direct fixed
Apr 14 05:37:41  /netbsd: sd0: 1222MB, 6810 cyl, 2 head, 183 sec, 512
bytes/sect x 2503872 sectors
Apr 14 05:37:41  /netbsd: sd1 at scsibus0 targ 2 lun 0: <SyQuest, EZ135S,
1_13> SCSI2 0/direct removable
Apr 14 05:37:41  /netbsd: sd1: 128MB, 3195 cyl, 1 head, 82 sec, 512
bytes/sect x 262144 sectors
Apr 14 05:37:41  /netbsd: cd0 at scsibus0 targ 3 lun 0: <PIONEER, CD-ROM
DR-124X, 1.01> SCSI2 5/cdrom removable
Apr 14 05:37:41  /netbsd: zsc0 at obio0 chip type 0
Apr 14 05:37:41  /netbsd: zsc0 channel 0: d_speed   9600 DCD clk 0 CTS clk 0
Apr 14 05:37:41  /netbsd: zstty0 at zsc0 channel 0
Apr 14 05:37:41  /netbsd: zsc0 channel 1: d_speed   9600 DCD clk 0 CTS clk 0
Apr 14 05:37:41  /netbsd: zstty1 at zsc0 channel 1
Apr 14 05:37:41  /netbsd: nubus0 at mainbus0
Apr 14 05:37:41  /netbsd: fpu0 at mainbus0 (mc68040)
Apr 14 05:37:41  /netbsd: boot device: sd1
Apr 14 05:37:41  /netbsd: root on sd1a dumps on sd1b
Apr 14 05:37:41  /netbsd: PRAM time does not appear to have been read
correctly.
Apr 14 05:37:41  /netbsd: PRAM: 0x83da82c4, macos_boottime: 0x35331f96.
Apr 14 05:37:41  /netbsd: root file system type: ffs
Apr 14 05:37:41  /netbsd: dmaintr: discarded 32 b (last transfer was 6128
b).
Apr 14 05:37:41  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase 0,
resid 17f0
Apr 14 05:37:42  /netbsd: dmaintr: discarded 32 b (last transfer was 1008
b).
Apr 14 05:37:42  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase 0,
resid 3f0
Apr 14 05:37:42  /netbsd: dmaintr: discarded 32 b (last transfer was 4080
b).
Apr 14 05:37:42  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase 0,
resid ff0
Apr 14 05:37:42  /netbsd: dmaintr: discarded 48 b (last transfer was 2016
b).
Apr 14 05:37:42  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase 0,
resid 7e0
Apr 14 05:37:42  /netbsd: dmaintr: discarded 32 b (last transfer was 2032
b).
Apr 14 05:37:42  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase 0,
resid 7f0
Apr 14 05:37:42  /netbsd: dmaintr: discarded 32 b (last transfer was 5104
b).
Apr 14 05:37:42  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase 0,
resid 13f0
Apr 14 05:40:18  /netbsd: dmaintr: discarded 32 b (last transfer was 1008
b).
Apr 14 05:40:18  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase 0,
resid 3f0
Apr 14 05:40:48  /netbsd: dmaintr: discarded 32 b (last transfer was 1008
b).
Apr 14 05:40:48  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase 0,
resid 3f0
Apr 14 05:40:49  /netbsd: dmaintr: discarded 32 b (last transfer was 6128
b).
Apr 14 05:40:49  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase 0,
resid 17f0
Apr 14 05:40:49  /netbsd: dmaintr: discarded 32 b (last transfer was 1008
b).
Apr 14 05:40:49  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase 0,
resid 3f0
Apr 14 05:40:49  /netbsd: dmaintr: discarded 32 b (last transfer was 1008
b).
Apr 14 05:40:49  /netbsd: esp0: !TC [intr 10, stat 87, step 4] prevphase 0,
resid 3f0
Apr 14 05:42:27  reboot: rebooted by root
Apr 14 05:42:27  syslogd: exiting on signal 15


Responsible-Changed-From-To: gnats-admin->port-mac68k-maintainer 
Responsible-Changed-By: fair 
Responsible-Changed-When: Mon Dec 28 09:38:05 PST 1998 
Responsible-Changed-Why:  
This PR is the responsibility of the portmaster, 
not the GNATS database administrator. 

From: Pascal Cabaud <Pascal.Cabaud@wanadoo.fr>
To: <gnats-bugs@netbsd.org>
Cc:  
Subject: bin/5133
Date: Sun, 10 Dec 00 02:13:30 +0200

 Hello,

 I've experimented this problem on a SEAGATE, a QUANTUM and an IBM on too 
 differents
 Quandra. I note that informations given by df and du are totaly
 incoherents. I've tried the 2 kernels but none works. Only the IBM 
 crashes but all
 refuse to store well files.

 Example : after an installation (base.tgz, etc.tgz, kern.tgz, comp.tgz,
 misc.tgz, text.tgz) and some small work on /home :
 ------ shell session --
 <pc @ diablotin 1:17:55 ~> df -k
 Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
 /dev/sd1a      108446    53115    44486    54%    /
 /dev/sd1g      305659   240284    34809    87%    /usr
 /dev/sd1e      157741   109560    32406    77%    /var
 /dev/sd1f      512701   341737   119693    74%    /home
 [...snip...]
 <pc @ diablotin 1:27:19 /var> du -sk .
 1114    .
 ------ shell session --
 so we can see that my /var is about 1MB with 'du' and 109MB with 'df'...
 the really size is 50MB.
 Other example : same conditions.
 ------ shell session --
 <pc @ zeus 1:20:13 ~> df -k
 Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
 /dev/sd0a      128374    71325    44211    61%    /
 /dev/sd0g      148263    75501    57935    56%    /usr
 /dev/sd0h      484646   436382     -201   100%    /var
 /dev/sd0f      435708   277108   115029    70%    /home
 [...snip...]
 <pc @ zeus 1:34:18 /var> du -sk .
 674     .
 ------ shell session --
 so we can see that my /var partition (always 50MB) is ill... and there is 
 only 674KB.

 Informations :
 ------ dmesg ----------
 NetBSD 1.4.2 (GENERIC) #4: Sat Mar 18 01:16:20 CST 2000
     fredb@corwin.home:/s/src/sys/arch/mac68k/compile/GENERIC
 Apple Macintosh Quadra 650  (68040)
 [...snip...]
 scsibus0 at esp0: 8 targets, 8 luns per target
 sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST3600N, 9422> SCSI2 0/direct 
 fixed
 sd0: 500MB, 1872 cyl, 7 head, 78 sec, 512 bytes/sect x 1025920 sectors
 ------ dmesg ---------

 ------ dmesg ---------
 NetBSD 1.4.1 (GENERIC) #0: Wed Aug 11 04:56:57 CDT 1999
     fredb@corwin.home:/usr/src/sys/arch/mac68k/compile/GENERIC
 Apple Macintosh Quadra 700  (68040)
 [...snip...]
 sd1 at scsibus0 targ 1 lun 0: <QUANTUM, FIREBALL540S, 1Q08> SCSI2
 0/direct fixed
 sd1: 520MB, 3835 cyl, 2 head, 138 sec, 512 bytes/sect x 1065235 sectors
 ------ dmesg ---------

 ------ dmesg ---------
 NetBSD 1.4.1 (GENERIC) #0: Wed Aug 11 04:56:57 CDT 1999
     fredb@corwin.home:/usr/src/sys/arch/mac68k/compile/GENERIC
 Apple Macintosh Quadra 700  (68040)
 [...snip...]
 sd0 at scsibus0 targ 0 lun 0: <IBM, DALS-3540, S60E> SCSI2 0/direct fixed
 sd0: 516MB, 4901 cyl, 2 head, 107 sec, 512 bytes/sect x 1056768 sectors
 sd1 at scsibus0 targ 1 lun 0: <QUANTUM, FIREBALL540S, 1Q08> SCSI2
 0/direct fixed
 sd1: 520MB, 3835 cyl, 2 head, 138 sec, 512 bytes/sect x 1065235 sectors
 ------ dmesg ---------

 I've a friend which encountered the same problem on a QUANTUM without
 crashes :

 ------ dmesg ---------
 NetBSD 1.4.2 (GENERIC) #4: Sat Mar 18 01:16:20 CST 2000
     fredb@corwin.home:/s/src/sys/arch/mac68k/compile/GENERIC
 Apple Macintosh Quadra 800  (68040)
 [...snip...]
 sd0 at scsibus0 targ 0 lun 0: <QUANTUM, LP240S GM240S01X, 6.3> SCSI2
 0/direct fixed
 sd0: 234MB, 1818 cyl, 4 head, 65 sec, 512 bytes/sect x 479350 sectors
 ------ dmesg ---------

 I've posted a more detailled message to port-mac68k ML this week on this
 problem before searching in GNATS.

 Is there somewhere a list of disks those are reported to work well ?

 pc
State-Changed-From-To: open->feedback 
State-Changed-By: tls 
State-Changed-When: Wed Mar 31 22:25:03 UTC 2004 
State-Changed-Why:  
I don't suppose there's any chance you can still reproduce this?  If you 
do still run NetBSD on a mac68k, I'd like to know if I can close this 
bug for 2.0. 

From: Scott Reynolds <scottr@clank.org>
To: gnats-bugs@gnats.netbsd.org
Cc: revilak@umbsky.cc.umb.edu, port-mac68k-maintainer@netbsd.org,
  tls@netbsd.org
Subject: Re: port-mac68k/5133
Date: Thu, 1 Apr 2004 09:54:57 -0600

 Nothing has changed in the Mac-specific part of the driver to resolve 
 this. It is an interaction between certain disks, the hacked on-board 
 SCSI controller, and our driver that causes the problem. Because I 
 don't actually have hardware to test this, though, I've been unable to 
 make any significant progress on resolving it.

 I suggest that the PR be suspended if the originator is unable to 
 assist in reproducing the issue.

State-Changed-From-To: feedback->suspended 
State-Changed-By: tls 
State-Changed-When: Thu Apr 1 17:45:23 UTC 2004 
State-Changed-Why:  
Port maintainer knows there's a bug, but has made no progress fixing it 
and does not anticipate progress in the near future; bug cannot be 
easily reproduced. 
>Unformatted:


 follow-up info copied here from PR 5175:

 From what I was able to gather the eps0: errors were most prone to occur
 when using removable media with NetBSD, although there was a report or two
 of problems of those using slower hard drives.  After exchanging a bunch of
 messages, I decided to follow Colin Wood's theory--"It just doesn't like
 removables".  I dug up the 160 meg IBM dirve that's been sitting in the
 back of my closet for some months now, ordered an enclosure, waited a week
 for it to come, then installed the drive and gave NetBSD another go.


 Running from a non-removable...works great, not a single re-occurence of
 the error messages.  Daily checks with fsck tell me that it's staying
 healthy too--not experiencing the file corruption I was getting with the EZ
 drive.


 Next objective: mount the cart which contained the old filesystem and try a
 few tests, something which would involve the transfer of a fair amount of
 data.  When booting from the EZ drive, redirecting the output of ls -lR
 would land me in the debugger in roughly 3 out of 4 cases.  Redirecting ls
 -lR to a file on the syquest now will typically bring about a few
 dmaintr/esp messages (2-3), but I wasn't able to make it crash.  (Resultant
 file size was around 800-900 kilobytes).


 A few other things I noticed:


        * Drive speed may or may not be too much of an issue.  The hard drive
 I'm using now is fairly slow.  3600 rpm.  In the past I've benchmarked it
 against the Syquest, and it actually lost by a narrow margin.


        * When booting from the Syquest, dmaintr/esp0 messages would occur
 sporadiacly when in single user-mode, chronically when in multi-user.
 Could this have something to do with the system's attempt at accessing swap
 space?


        * Compiling source code on the syquest?  Might not be a bad test...
 I'll have to try this out in the next few days.


        In a nutshell, the more the drive was accessed, the more frequently
 problems occured.  (Makes sense).  Also, FWIW, from what I've heard from
 others, these symptoms affect other models of Syquest drives in addition to
 the EZ135.  Zip drives appear to share much of the same maladies.


 Professionally, I'm a recording engineer.  In addition to shuttling 2" tape
 and pushing faders, I've spent a good deal of time in front of digital
 audio workstations.  One of the things this has taught me --SCSI voodoo is
 REAL.  Probably arising partially out of the evolution of the SCSI specs
 over time, and manufacturers varying degrees of implementation of modes,
 flags, and commands.  I've seen drives that absolutely would not work
 reliably with system A, run fine with system B.  Others would work to a
 point and then buckle.  (The ability of a drive to play back 8 channels of
 44.1 kHz/16 bit aoudio was always a good benchmark in my book.)


        Then of course, there's always the issue of T-cal (especially in older
 drives), cables, termination, and the degree of oxidization present on the
 connector contacts.  I'm surprised at how many times a can of Electro-Wash
 proved to be a solution.  So much for Voodoo....


        Could there be something along the lines of parameters unique to
 removables that esp0 simply does not like?  (In all honesty, I wish I knew
 more in this area than I do).


        My motive for setting up BSD was to use it as a learning environment for
 both Unix and sysadmin tasks (yeah, I'm a 'noodler').  Once I get a little
 more settled in I'd like to try modifying and compiling some kernels, but
 for the time being, I'm just going to let it sit.  Again, thanks to  those
 who took the time out to offer advice and suggestions.  Hopefully something
 here will be useful to others.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.