NetBSD Problem Report #40604
From www@NetBSD.org Tue Feb 10 22:53:26 2009
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 4BB5963BAB8
for <gnats-bugs@gnats.netbsd.org>; Tue, 10 Feb 2009 22:53:26 +0000 (UTC)
Message-Id: <20090210225326.19D9463B400@narn.NetBSD.org>
Date: Tue, 10 Feb 2009 22:53:26 +0000 (UTC)
From: polimarco@gmail.com
Reply-To: polimarco@gmail.com
To: gnats-bugs@NetBSD.org
Subject: AlphaServer DS20E loses HDs and other Drives when adding 1 GB more RAM
X-Send-Pr-Version: www-1.0
>Number: 40604
>Category: port-alpha
>Synopsis: AlphaServer DS20E loses HDs and other Drives when adding 1 GB more RAM
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: thorpej
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Feb 10 22:55:00 +0000 2009
>Closed-Date: Sun Jul 25 21:49:49 +0000 2021
>Last-Modified: Sun Jul 25 21:49:49 +0000 2021
>Originator: Marco Poli
>Release: 5.0 RC1
>Organization:
Design Lab - dlab.poli.usp.br
>Environment:
>Description:
Ok, that's one of the weirdest bugs I have ever faced.
I recently got a AlphaServer DS20E with 2 833 MHz CPUs, 5 Hard Drives and 3 Power Sources. Nice server.
The machine came with 1 GB RAM, 4x256 MB DIMM boards placed in Bank 0.
I installed NetBSD 4.0.1 without any issues and immediately got the 5 Hard Drives in a RAID configuration, with root and swap under RAID1 and /usr under a RAID5. All working very nicely.
One day I received 8 more of that 256 MB memories, and hurried to upgrade the server. I installed the boards in Banks 1 and 2.
What wasn't my surprise in the next boot, when I was faced with a mysterious
-----
probe(esiop0:0:0:0): request sense for a request sense ?
probe(esiop0:0:0:0): request sense failed with error 22
probe(esiop0:0:0:0): generic HBA error
-----
and that 3 messages repeat for each of my other 4 Hard Drives.
and then everything closes with the misterious:
-----
WARNING: can't figure out what device matches "SCSI 1 7 0 0 0 0 0"
-----
That should be my boot and root device, dka0.
The next line asks me to set the root device, but when I hit any key, the following line immediatly appears 3 times:
----
root device:
stray isa irq 1
stray isa irq 1
stray isa irq 1
use one of: fxp0 fd0[a-h] cd0[a-h] ddb halt reboot
stray isa ira 1; stopped logging
----
As you can see, none of my Hard Drives are listed... The first time that happened I imagined I had put some static and physically damaged my SCSI bus, but after a boot into Linux, everything seemed just fine hardware-wise.
Ok, so, lets try to boot using the CD-ROM (dqa0 in my case): now the same thing happens, but:
----
WARNING: can't figure out what device matches "IDE 0 105 0 0 0 0 0"
----
And all the same. The CD doesn't show in the list of available root devices, then.
When I remove the extra memory and leave only Bank 0 full, that is only 1 GB, everything gets back to normal.
Linux 2.6 works just fine with 2 GB or 3 GB of total memory, no issues noticed in the 2 or 3 days of uptime with this configuration.
This bug *might* be related to #38941, with the difference that in my case, it never really hangs, it just gets to a stale "no root disk". The install CD even gets the installation script running, but tells me there is nowhere to install to. I am able to use ddb at any point and quit and restart the installation script.
I can't say for sure. But I don't think it is related to #37915.
Machine is a:
---
COMPAQ AlphaServer DS20E 833 MHz, s/n ...
---
The SCSI device:
---
esiop0 at pci1 dev 7 function 0: Symbios Logic 53c895 (ultra2-wide scsi)
esiop0: using on-board RAM
esiop0: interrupting at dec 6600 irq 47
---
I am sorry for any typos, I was unable to copy-paste the actual screen, this is a newly typed-in reproduction.
Please tell me if I can provide any other useful information.
Thanks!
>How-To-Repeat:
Try to boot a DS20E with more than 1 GB of memory, or with memory in other Banks than Bank 0, I can't really tell.
>Fix:
>Release-Note:
>Audit-Trail:
From: Havard Eidnes <he@NetBSD.org>
To: gnats-bugs@NetBSD.org, polimarco@gmail.com
Cc: port-alpha-maintainer@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: port-alpha/40604: AlphaServer DS20E loses HDs and other Drives
when adding 1 GB more RAM
Date: Wed, 11 Feb 2009 15:45:18 +0100 (CET)
> Ok, that's one of the weirdest bugs I have ever faced.
>
> I recently got a AlphaServer DS20E with 2 833 MHz CPUs, 5 Hard
> Drives and 3 Power Sources. Nice server.
>
> The machine came with 1 GB RAM, 4x256 MB DIMM boards placed in Bank 0=
.=
>
> I installed NetBSD 4.0.1 without any issues and immediately got
> the 5 Hard Drives in a RAID configuration, with root and swap
> under RAID1 and /usr under a RAID5. All working very nicely.
>
> One day I received 8 more of that 256 MB memories, and hurried
> to upgrade the server. I installed the boards in Banks 1 and 2.
>
> What wasn't my surprise in the next boot, when I was faced with a mys=
terious
>
> -----
> probe(esiop0:0:0:0): request sense for a request sense ?
> probe(esiop0:0:0:0): request sense failed with error 22
> probe(esiop0:0:0:0): generic HBA error
> -----
> and that 3 messages repeat for each of my other 4 Hard Drives.
>
> and then everything closes with the misterious:
> -----
> WARNING: can't figure out what device matches "SCSI 1 7 0 0 0 0 0"
> -----
> That should be my boot and root device, dka0.
I experienced some similar weirdness on an Alpha DP264 box I
currently have in operation. I think my conclusion was that this
was due to some of the memory being bad. You could try to see if
this is the case by trying out the internal memory tester in the
SRM firmware, or the more brute-force approach of trying to run
the machine with only the new memory in bank 0, and see if it
then behaves any better (or worse).
However, your statement that it works fine with 2 or 3GB total
memory (presumably the same memory tested above) with no issues
while running Linux 2.6 makes this maybe implausible as an
explanation. Did you try to push the VM system while it ran
Linux? You could try pkgsrc/sysutils/memtester to excercise the
system a bit, even though it's not a "real" memory tester (when
you run it on a virtual memory system). You may need to adjust
the per-process memory limit up, and run sufficiently many
instances to put strain on larger portions of your memory.
Myself? I yanked out 1GB, so my DP264 box now runs with 1GB
memory. Admittedly not ideal if this is indeed a bug in NetBSD,
and not a hardware problem...
Regards,
- H=E5vard
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: port-alpha-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: port-alpha/40604: AlphaServer DS20E loses HDs and other Drives
when adding 1 GB more RAM
Date: Thu, 12 Feb 2009 16:21:15 +0100
On Tue, Feb 10, 2009 at 10:55:00PM +0000, polimarco@gmail.com wrote:
> [...]
> What wasn't my surprise in the next boot, when I was faced with a mysterious
>
> -----
> probe(esiop0:0:0:0): request sense for a request sense ?
> probe(esiop0:0:0:0): request sense failed with error 22
> probe(esiop0:0:0:0): generic HBA error
> -----
Do you have messages from the scsi layer or controller before this one ?
--
Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Marco Poli <polimarco@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-alpha/40604: AlphaServer DS20E loses HDs and other Drives
when adding 1 GB more RAM
Date: Thu, 12 Feb 2009 21:02:42 -0200
--00163628395ebd8adb0462c0b7dd
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Hello Manuel,
as far as I can tell (I have no serial terminal and the install CD does't
have dmesg), the only ones are:
---
ahc0 at pci0 dev 6 function 0: Adaptec aic7895 Ultra SCSI adapter
ahc0: interrupting at dec 6600 irq 19
ahc0: aic7895C: Ultra Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus0 at ahc0: 16 targets, 8 luns per target
ahc1 at pci0 dev 6 function 1: Adaptec aic7895 Ultra SCSI adapter
ahc1: interrupting at dec 6600 irq 18
ahc1: aic7895C: Ultra Wide Channel B, SCSI Id=7, 32/253 SCBs
scsibus1 at ahc1: 16 targets, 8 luns per target
scsibus 0: waiting 2 seconds for devices to settle...
scsibus 1: waiting 2 seconds for devices to settle...
scsibus 2: waiting 2 seconds for devices to settle...
---
The working (1 GB boot) dmesg is:
---
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 4.0.1 (GENERIC.MP) #0: Tue Oct 7 20:56:07 PDT 2008
builds@wb27
:/home/builds/ab/netbsd-4-0-1-RELEASE/alpha/200810080053Z-obj/home/builds/ab/netbsd-4-0-1-RELEASE/src/sys/arch/alpha/compile/
GENERIC.MP
COMPAQ AlphaServer DS20E 833 MHz, s/n ...
8192 byte page size, 2 processors.
total memory = 1024 MB
(2752 KB reserved for PROM, 1021 MB used by NetBSD)
avail memory = 996 MB
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21264B-4
cpu0: Architecture extensions: 1307<PAT,MVI,CIX,FIX,BWX>
cpu1 at mainbus0: ID 1, 21264B-4
cpu1: Architecture extensions: 1307<PAT,MVI,CIX,FIX,BWX>
tsc0 at mainbus0: 21272 Core Logic Chipset, Cchip rev 0
tsc0: 8 Dchips, 2 memory buses of 32 bytes
tsc0: arrays present: 1024MB, 0MB, 0MB, 0MB, Dchip 0 rev 1
tsp0 at tsc0
pci0 at tsp0 bus 0
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
sio0 at pci0 dev 5 function 0: Contaq Microsystems 82C693 PCI-ISA Bridge
(rev. 0x00)
cypide0 at pci0 dev 5 function 1
cypide0: Cypress 82C693 IDE Controller (rev. 0x00)
cypide0: bus-master DMA support present
cypide0: primary channel wired to compatibility mode
cypide0: primary channel interrupting at isa irq 14
atabus0 at cypide0 channel 0
cypide1 at pci0 dev 5 function 2
cypide1: Cypress 82C693 IDE Controller (rev. 0x00)
cypide1: hardware does not support DMA
cypide1: primary channel wired to compatibility mode
cypide1: secondary channel interrupting at isa irq 15
atabus1 at cypide1 channel 0
ohci0 at pci0 dev 5 function 3: Contaq Microsystems 82C693 PCI-ISA Bridge
(rev. 0x00)
ohci0: interrupting at isa irq 10
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: Contaq Microsys OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
ahc0 at pci0 dev 6 function 0: Adaptec aic7895 Ultra SCSI adapter
ahc0: interrupting at dec 6600 irq 19
ahc0: aic7895C: Ultra Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus0 at ahc0: 16 targets, 8 luns per target
ahc1 at pci0 dev 6 function 1: Adaptec aic7895 Ultra SCSI adapter
ahc1: interrupting at dec 6600 irq 18
ahc1: aic7895C: Ultra Wide Channel B, SCSI Id=7, 32/253 SCBs
scsibus1 at ahc1: 16 targets, 8 luns per target
vga0 at pci0 dev 7 function 0: 3D Labs GLINT Permedia 3 (rev. 0x01)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
isa0 at sio0
lpt0 at isa0 port 0x3bc-0x3bf irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
attimer0 at isa0 port 0x40-0x43: AT Timer
pcppi0 at isa0 port 0x61
pcppi0: children must have an explicit unit
midi0 at pcppi0: PC speaker (CPU-intensive output)
spkr0 at pcppi0
isabeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
pcppi0: attached to attimer0
tsp1 at tsc0
pci1 at tsp1 bus 0
pci1: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
esiop0 at pci1 dev 7 function 0: Symbios Logic 53c895 (ultra2-wide scsi)
esiop0: using on-board RAM
esiop0: interrupting at dec 6600 irq 47
scsibus2 at esiop0: 16 targets, 8 luns per target
fxp0 at pci1 dev 9 function 0: i82559 Ethernet, rev 8
fxp0: interrupting at dec 6600 irq 39
fxp0: Ethernet address 00:50:8b:ae:dc:8a
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
stray isa irq 14
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
scsibus2: waiting 2 seconds for devices to settle...
atapibus0 at atabus0: 2 targets
cd0 at atapibus0 drive 0: <CD-224E, , 9.5B> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2
cd0(cypide0:0:0): using PIO mode 4, DMA mode 2 (using DMA)
sd0 at scsibus2 target 0 lun 0: <COMPAQ, BD0366459B, B010> disk fixed
sd0: 34732 MB, 14002 cyl, 20 head, 254 sec, 512 bytes/sect x 71132000
sectors
sd0: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged
queueing
sd1 at scsibus2 target 1 lun 0: <COMPAQ, BD0366459B, B010> disk fixed
sd1: 34732 MB, 14002 cyl, 20 head, 254 sec, 512 bytes/sect x 71132000
sectors
sd1: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged
queueing
sd2 at scsibus2 target 2 lun 0: <COMPAQ, BD0366459B, B010> disk fixed
sd2: 34732 MB, 14002 cyl, 20 head, 254 sec, 512 bytes/sect x 71132000
sectors
sd2: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged
queueing
sd3 at scsibus2 target 3 lun 0: <COMPAQ, BD0366459B, B010> disk fixed
sd3: 34732 MB, 14002 cyl, 20 head, 254 sec, 512 bytes/sect x 71132000
sectors
sd3: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged
queueing
sd4 at scsibus2 target 4 lun 0: <COMPAQ, BD0366459B, B010> disk fixed
sd4: 34732 MB, 14002 cyl, 20 head, 254 sec, 512 bytes/sect x 71132000
sectors
sd4: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged
queueing
raid0: RAID Level 1
raid0: Components: /dev/sd0a /dev/sd2a
raid0: Total Sectors: 4460160 (2177 MB)
raid1: RAID Level 1
raid1: Components: /dev/sd1a /dev/sd3a
raid1: Total Sectors: 4460160 (2177 MB)
raid2: RAID Level 5
raid2: Components: /dev/sd0d /dev/sd2d /dev/sd3d /dev/sd4d
raid2: Total Sectors: 200015040 (97663 MB)
root on raid0a dumps on raid0b
root file system type: ffs
WARNING: clock gained 2 days -- CHECK AND RESET THE DATE!
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
----
One more thing that might be of interest: the /dev/md0a from the bootcd
works and gives me a working environment, but when I try to mount /dev/cd0a
/mnt, I get
stray isa irq 14
the cd spins and everything hangs.
On Thu, Feb 12, 2009 at 1:25 PM, Manuel Bouyer <bouyer@antioche.eu.org>wrote:
> The following reply was made to PR port-alpha/40604; it has been noted by
> GNATS.
>
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> To: gnats-bugs@NetBSD.org
> Cc: port-alpha-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
> netbsd-bugs@NetBSD.org
> Subject: Re: port-alpha/40604: AlphaServer DS20E loses HDs and other Drives
> when adding 1 GB more RAM
> Date: Thu, 12 Feb 2009 16:21:15 +0100
>
> On Tue, Feb 10, 2009 at 10:55:00PM +0000, polimarco@gmail.com wrote:
> > [...]
> > What wasn't my surprise in the next boot, when I was faced with a
> mysterious
> >
> > -----
> > probe(esiop0:0:0:0): request sense for a request sense ?
> > probe(esiop0:0:0:0): request sense failed with error 22
> > probe(esiop0:0:0:0): generic HBA error
> > -----
>
> Do you have messages from the scsi layer or controller before this one ?
>
> --
> Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr
> NetBSD: 26 ans d'experience feront toujours la difference
> --
>
>
--00163628395ebd8adb0462c0b7dd
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<br>Hello Manuel,<br><br><br> as far as I can tell (I have no serial t=
erminal and the install CD does't have dmesg), the only ones are:<br><b=
r>---<br>
<meta http-equiv=3D"CONTENT-TYPE" content=3D"text/html; charset=3Dutf-8">
<title></title>
<meta name=3D"GENERATOR" content=3D"OpenOffice.org 2.4 (Unix)">
<style type=3D"text/css">
<!--
@page { size: 8.5in 11in; margin: 0.79in }
P { margin-bottom: 0.08in }
-->
</style>
<p style=3D"margin-bottom: 0in;">ahc0 at pci0 dev 6 function 0: Adaptec
aic7895 Ultra SCSI adapter</p>
<p style=3D"margin-bottom: 0in;">ahc0: interrupting at dec 6600 irq 19</p>
<p style=3D"margin-bottom: 0in;">ahc0: aic7895C: Ultra Wide Channel A,
SCSI Id=3D7, 32/253 SCBs</p>
<p style=3D"margin-bottom: 0in;">scsibus0 at ahc0: 16 targets, 8 luns
per target</p>
<p style=3D"margin-bottom: 0in;">ahc1 at pci0 dev 6 function 1: Adaptec
aic7895 Ultra SCSI adapter</p>
<p style=3D"margin-bottom: 0in;">ahc1: interrupting at dec 6600 irq 18</p>
<p style=3D"margin-bottom: 0in;">ahc1: aic7895C: Ultra Wide Channel B,
SCSI Id=3D7, 32/253 SCBs</p>
<p style=3D"margin-bottom: 0in;">scsibus1 at ahc1: 16 targets, 8 luns
per target</p>
<br><br>scsibus 0: waiting 2 seconds for devices to settle...<br>scsibus 1:=
waiting 2 seconds for devices to settle...<br>scsibus 2: waiting 2 seconds=
for devices to settle...<br>---<br><br><br>The working (1 GB boot) dmesg i=
s:<br>
<br>---<br>
<meta http-equiv=3D"CONTENT-TYPE" content=3D"text/html; charset=3Dutf-8">
<title></title>
<meta name=3D"GENERATOR" content=3D"OpenOffice.org 2.4 (Unix)">
<style type=3D"text/css">
<!--
@page { size: 8.5in 11in; margin: 0.79in }
P { margin-bottom: 0.08in }
-->
</style>
<p style=3D"margin-bottom: 0in;">Copyright (c) 1996, 1997, 1998, 1999,
2000, 2001, 2002, 2003, 2004, 2005,</p>
<p style=3D"margin-bottom: 0in;"> 2006, 2007</p>
<p style=3D"margin-bottom: 0in;"> The NetBSD Foundation, Inc. All
rights reserved.</p>
<p style=3D"margin-bottom: 0in;">Copyright (c) 1982, 1986, 1989, 1991,
1993</p>
<p style=3D"margin-bottom: 0in;"> The Regents of the University of
California. All rights reserved.</p>
<p style=3D"margin-bottom: 0in;"><br>
</p>
<p style=3D"margin-bottom: 0in;">NetBSD 4.0.1 (<a href=3D"http://GENERIC.MP=
">GENERIC.MP</a>) #0: Tue Oct=20
7 20:56:07 PDT 2008</p>
<p style=3D"margin-bottom: 0in;"> =20
builds@wb27:/home/builds/ab/netbsd-4-0-1-RELEASE/alpha/200810080053Z-obj/ho=
me/builds/ab/netbsd-4-0-1-RELEASE/src/sys/arch/alpha/compile/<a href=3D"htt=
p://GENERIC.MP">GENERIC.MP</a></p>
<p style=3D"margin-bottom: 0in;">COMPAQ AlphaServer DS20E 833 MHz, s/n ...<=
/p>
<p style=3D"margin-bottom: 0in;">8192 byte page size, 2 processors.</p>
<p style=3D"margin-bottom: 0in;">total memory =3D 1024 MB</p>
<p style=3D"margin-bottom: 0in;">(2752 KB reserved for PROM, 1021 MB
used by NetBSD)</p>
<p style=3D"margin-bottom: 0in;">avail memory =3D 996 MB</p>
<p style=3D"margin-bottom: 0in;">mainbus0 (root)</p>
<p style=3D"margin-bottom: 0in;">cpu0 at mainbus0: ID 0 (primary),
21264B-4</p>
<p style=3D"margin-bottom: 0in;">cpu0: Architecture extensions:
1307<PAT,MVI,CIX,FIX,BWX></p>
<p style=3D"margin-bottom: 0in;">cpu1 at mainbus0: ID 1, 21264B-4</p>
<p style=3D"margin-bottom: 0in;">cpu1: Architecture extensions:
1307<PAT,MVI,CIX,FIX,BWX></p>
<p style=3D"margin-bottom: 0in;">tsc0 at mainbus0: 21272 Core Logic
Chipset, Cchip rev 0</p>
<p style=3D"margin-bottom: 0in;">tsc0: 8 Dchips, 2 memory buses of 32
bytes</p>
<p style=3D"margin-bottom: 0in;">tsc0: arrays present: 1024MB, 0MB, 0MB,
0MB, Dchip 0 rev 1</p>
<p style=3D"margin-bottom: 0in;">tsp0 at tsc0</p>
<p style=3D"margin-bottom: 0in;">pci0 at tsp0 bus 0</p>
<p style=3D"margin-bottom: 0in;">pci0: i/o space, memory space enabled,
rd/line, rd/mult, wr/inv ok</p>
<p style=3D"margin-bottom: 0in;">sio0 at pci0 dev 5 function 0: Contaq
Microsystems 82C693 PCI-ISA Bridge (rev. 0x00)</p>
<p style=3D"margin-bottom: 0in;">cypide0 at pci0 dev 5 function 1</p>
<p style=3D"margin-bottom: 0in;">cypide0: Cypress 82C693 IDE Controller
(rev. 0x00)</p>
<p style=3D"margin-bottom: 0in;">cypide0: bus-master DMA support present</p=
>
<p style=3D"margin-bottom: 0in;">cypide0: primary channel wired to
compatibility mode</p>
<p style=3D"margin-bottom: 0in;">cypide0: primary channel interrupting
at isa irq 14</p>
<p style=3D"margin-bottom: 0in;">atabus0 at cypide0 channel 0</p>
<p style=3D"margin-bottom: 0in;">cypide1 at pci0 dev 5 function 2</p>
<p style=3D"margin-bottom: 0in;">cypide1: Cypress 82C693 IDE Controller
(rev. 0x00)</p>
<p style=3D"margin-bottom: 0in;">cypide1: hardware does not support DMA</p>
<p style=3D"margin-bottom: 0in;">cypide1: primary channel wired to
compatibility mode</p>
<p style=3D"margin-bottom: 0in;">cypide1: secondary channel interrupting
at isa irq 15</p>
<p style=3D"margin-bottom: 0in;">atabus1 at cypide1 channel 0</p>
<p style=3D"margin-bottom: 0in;">ohci0 at pci0 dev 5 function 3: Contaq
Microsystems 82C693 PCI-ISA Bridge (rev. 0x00)</p>
<p style=3D"margin-bottom: 0in;">ohci0: interrupting at isa irq 10</p>
<p style=3D"margin-bottom: 0in;">ohci0: OHCI version 1.0, legacy support</p=
>
<p style=3D"margin-bottom: 0in;">usb0 at ohci0: USB revision 1.0</p>
<p style=3D"margin-bottom: 0in;">uhub0 at usb0</p>
<p style=3D"margin-bottom: 0in;">uhub0: Contaq Microsys OHCI root hub,
class 9/0, rev 1.00/1.00, addr 1</p>
<p style=3D"margin-bottom: 0in;">uhub0: 2 ports with 2 removable, self
powered</p>
<p style=3D"margin-bottom: 0in;">ahc0 at pci0 dev 6 function 0: Adaptec
aic7895 Ultra SCSI adapter</p>
<p style=3D"margin-bottom: 0in;">ahc0: interrupting at dec 6600 irq 19</p>
<p style=3D"margin-bottom: 0in;">ahc0: aic7895C: Ultra Wide Channel A,
SCSI Id=3D7, 32/253 SCBs</p>
<p style=3D"margin-bottom: 0in;">scsibus0 at ahc0: 16 targets, 8 luns
per target</p>
<p style=3D"margin-bottom: 0in;">ahc1 at pci0 dev 6 function 1: Adaptec
aic7895 Ultra SCSI adapter</p>
<p style=3D"margin-bottom: 0in;">ahc1: interrupting at dec 6600 irq 18</p>
<p style=3D"margin-bottom: 0in;">ahc1: aic7895C: Ultra Wide Channel B,
SCSI Id=3D7, 32/253 SCBs</p>
<p style=3D"margin-bottom: 0in;">scsibus1 at ahc1: 16 targets, 8 luns
per target</p>
<p style=3D"margin-bottom: 0in;">vga0 at pci0 dev 7 function 0: 3D Labs
GLINT Permedia 3 (rev. 0x01)</p>
<p style=3D"margin-bottom: 0in;">wsdisplay0 at vga0 kbdmux 1: console
(80x25, vt100 emulation)</p>
<p style=3D"margin-bottom: 0in;">wsmux1: connecting to wsdisplay0</p>
<p style=3D"margin-bottom: 0in;">isa0 at sio0</p>
<p style=3D"margin-bottom: 0in;">lpt0 at isa0 port 0x3bc-0x3bf irq 7</p>
<p style=3D"margin-bottom: 0in;">com0 at isa0 port 0x3f8-0x3ff irq 4:
ns16550a, working fifo</p>
<p style=3D"margin-bottom: 0in;">com1 at isa0 port 0x2f8-0x2ff irq 3:
ns16550a, working fifo</p>
<p style=3D"margin-bottom: 0in;">pckbc0 at isa0 port 0x60-0x64</p>
<p style=3D"margin-bottom: 0in;">pckbd0 at pckbc0 (kbd slot)</p>
<p style=3D"margin-bottom: 0in;">pckbc0: using irq 1 for kbd slot</p>
<p style=3D"margin-bottom: 0in;">wskbd0 at pckbd0: console keyboard,
using wsdisplay0</p>
<p style=3D"margin-bottom: 0in;">pms0 at pckbc0 (aux slot)</p>
<p style=3D"margin-bottom: 0in;">pckbc0: using irq 12 for aux slot</p>
<p style=3D"margin-bottom: 0in;">wsmouse0 at pms0 mux 0</p>
<p style=3D"margin-bottom: 0in;">attimer0 at isa0 port 0x40-0x43: AT
Timer</p>
<p style=3D"margin-bottom: 0in;">pcppi0 at isa0 port 0x61</p>
<p style=3D"margin-bottom: 0in;">pcppi0: children must have an explicit
unit</p>
<p style=3D"margin-bottom: 0in;">midi0 at pcppi0: PC speaker
(CPU-intensive output)</p>
<p style=3D"margin-bottom: 0in;">spkr0 at pcppi0</p>
<p style=3D"margin-bottom: 0in;">isabeep0 at pcppi0</p>
<p style=3D"margin-bottom: 0in;">fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq
2</p>
<p style=3D"margin-bottom: 0in;">mcclock0 at isa0 port 0x70-0x71:
mc146818 or compatible</p>
<p style=3D"margin-bottom: 0in;">pcppi0: attached to attimer0</p>
<p style=3D"margin-bottom: 0in;">tsp1 at tsc0</p>
<p style=3D"margin-bottom: 0in;">pci1 at tsp1 bus 0</p>
<p style=3D"margin-bottom: 0in;">pci1: i/o space, memory space enabled,
rd/line, rd/mult, wr/inv ok</p>
<p style=3D"margin-bottom: 0in;">esiop0 at pci1 dev 7 function 0:
Symbios Logic 53c895 (ultra2-wide scsi)</p>
<p style=3D"margin-bottom: 0in;">esiop0: using on-board RAM</p>
<p style=3D"margin-bottom: 0in;">esiop0: interrupting at dec 6600 irq 47</p=
>
<p style=3D"margin-bottom: 0in;">scsibus2 at esiop0: 16 targets, 8 luns
per target</p>
<p style=3D"margin-bottom: 0in;">fxp0 at pci1 dev 9 function 0: i82559
Ethernet, rev 8</p>
<p style=3D"margin-bottom: 0in;">fxp0: interrupting at dec 6600 irq 39</p>
<p style=3D"margin-bottom: 0in;">fxp0: Ethernet address
00:50:8b:ae:dc:8a</p>
<p style=3D"margin-bottom: 0in;">inphy0 at fxp0 phy 1: i82555 10/100
media interface, rev. 4</p>
<p style=3D"margin-bottom: 0in;">inphy0: 10baseT, 10baseT-FDX,
100baseTX, 100baseTX-FDX, auto</p>
<p style=3D"margin-bottom: 0in;">fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2
head, 18 sec</p>
<p style=3D"margin-bottom: 0in;">Kernelized RAIDframe activated</p>
<p style=3D"margin-bottom: 0in;">stray isa irq 14</p>
<p style=3D"margin-bottom: 0in;">scsibus0: waiting 2 seconds for devices
to settle...</p>
<p style=3D"margin-bottom: 0in;">scsibus1: waiting 2 seconds for devices
to settle...</p>
<p style=3D"margin-bottom: 0in;">scsibus2: waiting 2 seconds for devices
to settle...</p>
<p style=3D"margin-bottom: 0in;">atapibus0 at atabus0: 2 targets</p>
<p style=3D"margin-bottom: 0in;">cd0 at atapibus0 drive 0: <CD-224E,
, 9.5B> cdrom removable</p>
<p style=3D"margin-bottom: 0in;">cd0: 32-bit data port</p>
<p style=3D"margin-bottom: 0in;">cd0: drive supports PIO mode 4, DMA
mode 2</p>
<p style=3D"margin-bottom: 0in;">cd0(cypide0:0:0): using PIO mode 4, DMA
mode 2 (using DMA)</p>
<p style=3D"margin-bottom: 0in;">sd0 at scsibus2 target 0 lun 0:
<COMPAQ, BD0366459B, B010> disk fixed</p>
<p style=3D"margin-bottom: 0in;">sd0: 34732 MB, 14002 cyl, 20 head, 254
sec, 512 bytes/sect x 71132000 sectors</p>
<p style=3D"margin-bottom: 0in;">sd0: sync (25.00ns offset 31), 16-bit
(80.000MB/s) transfers, tagged queueing</p>
<p style=3D"margin-bottom: 0in;">sd1 at scsibus2 target 1 lun 0:
<COMPAQ, BD0366459B, B010> disk fixed</p>
<p style=3D"margin-bottom: 0in;">sd1: 34732 MB, 14002 cyl, 20 head, 254
sec, 512 bytes/sect x 71132000 sectors</p>
<p style=3D"margin-bottom: 0in;">sd1: sync (25.00ns offset 31), 16-bit
(80.000MB/s) transfers, tagged queueing</p>
<p style=3D"margin-bottom: 0in;">sd2 at scsibus2 target 2 lun 0:
<COMPAQ, BD0366459B, B010> disk fixed</p>
<p style=3D"margin-bottom: 0in;">sd2: 34732 MB, 14002 cyl, 20 head, 254
sec, 512 bytes/sect x 71132000 sectors</p>
<p style=3D"margin-bottom: 0in;">sd2: sync (25.00ns offset 31), 16-bit
(80.000MB/s) transfers, tagged queueing</p>
<p style=3D"margin-bottom: 0in;">sd3 at scsibus2 target 3 lun 0:
<COMPAQ, BD0366459B, B010> disk fixed</p>
<p style=3D"margin-bottom: 0in;">sd3: 34732 MB, 14002 cyl, 20 head, 254
sec, 512 bytes/sect x 71132000 sectors</p>
<p style=3D"margin-bottom: 0in;">sd3: sync (25.00ns offset 31), 16-bit
(80.000MB/s) transfers, tagged queueing</p>
<p style=3D"margin-bottom: 0in;">sd4 at scsibus2 target 4 lun 0:
<COMPAQ, BD0366459B, B010> disk fixed</p>
<p style=3D"margin-bottom: 0in;">sd4: 34732 MB, 14002 cyl, 20 head, 254
sec, 512 bytes/sect x 71132000 sectors</p>
<p style=3D"margin-bottom: 0in;">sd4: sync (25.00ns offset 31), 16-bit
(80.000MB/s) transfers, tagged queueing</p>
<p style=3D"margin-bottom: 0in;">raid0: RAID Level 1</p>
<p style=3D"margin-bottom: 0in;">raid0: Components: /dev/sd0a /dev/sd2a</p>
<p style=3D"margin-bottom: 0in;">raid0: Total Sectors: 4460160 (2177 MB)</p=
>
<p style=3D"margin-bottom: 0in;">raid1: RAID Level 1</p>
<p style=3D"margin-bottom: 0in;">raid1: Components: /dev/sd1a /dev/sd3a</p>
<p style=3D"margin-bottom: 0in;">raid1: Total Sectors: 4460160 (2177 MB)</p=
>
<p style=3D"margin-bottom: 0in;">raid2: RAID Level 5</p>
<p style=3D"margin-bottom: 0in;">raid2: Components: /dev/sd0d /dev/sd2d
/dev/sd3d /dev/sd4d</p>
<p style=3D"margin-bottom: 0in;">raid2: Total Sectors: 200015040 (97663
MB)</p>
<p style=3D"margin-bottom: 0in;">root on raid0a dumps on raid0b</p>
<p style=3D"margin-bottom: 0in;">root file system type: ffs</p>
<p style=3D"margin-bottom: 0in;">WARNING: clock gained 2 days -- CHECK
AND RESET THE DATE!</p>
<p style=3D"margin-bottom: 0in;">wsdisplay0: screen 1 added (80x25,
vt100 emulation)</p>
<p style=3D"margin-bottom: 0in;">wsdisplay0: screen 2 added (80x25,
vt100 emulation)</p>
<p style=3D"margin-bottom: 0in;">wsdisplay0: screen 3 added (80x25,
vt100 emulation)</p>
<p style=3D"margin-bottom: 0in;">wsdisplay0: screen 4 added (80x25,
vt100 emulation)</p>
----<br><br><br><br><br>One more thing that might be of interest: the /dev/=
md0a from the bootcd works and gives me a working environment, but wh=
en I try to mount /dev/cd0a /mnt, I get<br><br>stray isa irq 14<br><br>the =
cd spins and everything hangs.<br>
<br><br><br><br><div class=3D"gmail_quote">On Thu, Feb 12, 2009 at 1:25 PM,=
Manuel Bouyer <span dir=3D"ltr"><<a href=3D"mailto:bouyer@antioche.eu.o=
rg">bouyer@antioche.eu.org</a>></span> wrote:<br><blockquote class=3D"gm=
ail_quote" style=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt =
0pt 0pt 0.8ex; padding-left: 1ex;">
<div class=3D"Ih2E3d">The following reply was made to PR port-alpha/40604; =
it has been noted by GNATS.<br>
<br>
</div>From: Manuel Bouyer <<a href=3D"mailto:bouyer@antioche.eu.org">bou=
yer@antioche.eu.org</a>><br>
<div class=3D"Ih2E3d">To: gnats-bugs@NetBSD.org<br>
</div>Cc: port-alpha-maintainer@NetBSD.org, gnats-admin@NetBSD.org,<br>
netbsd-bugs@NetBSD.org<br>
<div class=3D"Ih2E3d">Subject: Re: port-alpha/40604: AlphaServer DS20E lose=
s HDs and other Drives<br>
when adding 1 GB more RAM<br>
</div>Date: Thu, 12 Feb 2009 16:21:15 +0100<br>
<br>
On Tue, Feb 10, 2009 at 10:55:00PM +0000, <a href=3D"mailto:polimarco=
@gmail.com">polimarco@gmail.com</a> wrote:<br>
> [...]<br>
> What wasn't my surprise in the next boot, when I was faced w=
ith a mysterious<br>
<div class=3D"Ih2E3d"> ><br>
> -----<br>
> probe(esiop0:0:0:0): request sense for a request sense ?<br>
> probe(esiop0:0:0:0): request sense failed with error 22<br>
> probe(esiop0:0:0:0): generic HBA error<br>
> -----<br>
<br>
</div> Do you have messages from the scsi layer or controller before t=
his one ?<br>
<br>
--<br>
<font color=3D"#888888"> Manuel Bouyer, LIP6, Universite Paris VI. &nb=
sp; <a href=3D"mailto:Manuel.Bouyer@lip6.fr">Ma=
nuel.Bouyer@lip6.fr</a><br>
NetBSD: 26 ans d'experience feront toujours la dif=
ference<br>
--<br>
<br>
</font></blockquote></div><br>
--00163628395ebd8adb0462c0b7dd--
From: Marco Poli <polimarco@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-alpha/40604: AlphaServer DS20E loses HDs and other Drives
when adding 1 GB more RAM
Date: Thu, 12 Feb 2009 21:36:33 -0200
Hello Havard!
I just switched Bank 0 with the 4 memory boards that used to be in
Bank 1. NetBSD booted fine.
I installed and ran sysutils/memtester, no errors reported.
---
On Wed, Feb 11, 2009 at 12:50 PM, Havard Eidnes <he@netbsd.org> wrote:
> The following reply was made to PR port-alpha/40604; it has been noted by GNATS.
>
> From: Havard Eidnes <he@NetBSD.org>
> To: gnats-bugs@NetBSD.org, polimarco@gmail.com
> Cc: port-alpha-maintainer@netbsd.org, netbsd-bugs@netbsd.org
> Subject: Re: port-alpha/40604: AlphaServer DS20E loses HDs and other Drives
> when adding 1 GB more RAM
> Date: Wed, 11 Feb 2009 15:45:18 +0100 (CET)
>
> > Ok, that's one of the weirdest bugs I have ever faced.
> >
> > I recently got a AlphaServer DS20E with 2 833 MHz CPUs, 5 Hard
> > Drives and 3 Power Sources. Nice server.
> >
> > The machine came with 1 GB RAM, 4x256 MB DIMM boards placed in Bank 0=
> .=
>
> >
> > I installed NetBSD 4.0.1 without any issues and immediately got
> > the 5 Hard Drives in a RAID configuration, with root and swap
> > under RAID1 and /usr under a RAID5. All working very nicely.
> >
> > One day I received 8 more of that 256 MB memories, and hurried
> > to upgrade the server. I installed the boards in Banks 1 and 2.
> >
> > What wasn't my surprise in the next boot, when I was faced with a mys=
> terious
> >
> > -----
> > probe(esiop0:0:0:0): request sense for a request sense ?
> > probe(esiop0:0:0:0): request sense failed with error 22
> > probe(esiop0:0:0:0): generic HBA error
> > -----
> > and that 3 messages repeat for each of my other 4 Hard Drives.
> >
> > and then everything closes with the misterious:
> > -----
> > WARNING: can't figure out what device matches "SCSI 1 7 0 0 0 0 0"
> > -----
> > That should be my boot and root device, dka0.
>
> I experienced some similar weirdness on an Alpha DP264 box I
> currently have in operation. I think my conclusion was that this
> was due to some of the memory being bad. You could try to see if
> this is the case by trying out the internal memory tester in the
> SRM firmware, or the more brute-force approach of trying to run
> the machine with only the new memory in bank 0, and see if it
> then behaves any better (or worse).
>
> However, your statement that it works fine with 2 or 3GB total
> memory (presumably the same memory tested above) with no issues
> while running Linux 2.6 makes this maybe implausible as an
> explanation. Did you try to push the VM system while it ran
> Linux? You could try pkgsrc/sysutils/memtester to excercise the
> system a bit, even though it's not a "real" memory tester (when
> you run it on a virtual memory system). You may need to adjust
> the per-process memory limit up, and run sufficiently many
> instances to put strain on larger portions of your memory.
>
> Myself? I yanked out 1GB, so my DP264 box now runs with 1GB
> memory. Admittedly not ideal if this is indeed a bug in NetBSD,
> and not a hardware problem...
>
> Regards,
>
> - H=E5vard
>
>
From: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-alpha/40604
Date: Thu, 7 Oct 2010 22:01:18 +0200
I just committed code to enable and decode the peripheral machine checks
on this kind of systems. This clearly shows that this not bad memory or
anything else:
...
mlx0 at pci0 dev 9 function 0: Mylex RAID (v2 interface)
mlx0: interrupting at dec 6600 irq 23
sgmap_load: ----- buf = 0xfffffe0000031300 -----
sgmap_load: dmaoffset = 0x1300, buflen = 0x80
sgmap_load: va:endva = 0xfffffe0000030000:0xfffffe0000032000
sgmap_load: sgvalen = 0x2000, boundary = 0x0
sgmap_load: sgva = 0x0, pteidx = 0, pte = 0xfffffc00002d0000 (pt = 0xfffffc00002d0000)
sgmap_load: wbase = 0x800000, vpage = 0x0, DMA addr = 0x801300
sgmap_load: pa = 0xfffa6000, pte = 0xfffffc00002d0000, *pte = 0xfffa7
System Machine Check (660): Rev 0x1, Code 0x202, Flags 0x0
Software Error Summary Flags = 0x0000000000000001
CPU Device Interrupt Requests = 0x4000000000000000
DIR = 0x4000000000000000<Pchip 0 error>
Cchip Miscellaneous Register = 0x0000000100000030
Pchip 0 Error Register = 0x0070fffa73700041
error = 0x41<Error lost,Target abort>
address = 0xfffa7370, 0x0<No DAC>
command = 0x7<PCI memory write>
Pchip 1 Error Register = 0x0000000000000000
unexpected machine check:
mces = 0x1
vector = 0x660
param = 0xfffffc0000006080
pc = 0xfffffc00009db220
ra = 0xfffffc00009db1e8
code = 0x100000202
curlwp = 0xfffffc0001234a60
pid = 0.1, comm = system
panic: machine check
...
This only happens when using SG DMA, which is done for address
everything > 1GB. I have no idea why it is failing, the code looks ok.
Maybe I get something wrong here, but why is the Pchip doing a PCI
memory write when it should be doing a DMA transfer to host memory?
Hans
--
%SYSTEM-F-ANARCHISM, The operating system has been overthrown
Responsible-Changed-From-To: port-alpha-maintainer->thorpej
Responsible-Changed-By: thorpej@NetBSD.org
Responsible-Changed-When: Tue, 13 Oct 2020 16:00:29 +0000
Responsible-Changed-Why:
Take.
State-Changed-From-To: open->analyzed
State-Changed-By: tnn@NetBSD.org
State-Changed-When: Sun, 18 Jul 2021 14:18:22 +0000
State-Changed-Why:
The problem is that the direct-mapped DMA window configured by SRM
is limited in size to 1 GiB. It is partially fixed in -current by
opening another DMA window between 1 GiB to 2 GiB.
There are still some issues in the common bus DMA code for alpha that
prevent this DMA window from being correctly used.
Me and thorpej@ are working on a fix for the above.
SGDMA fallback for PCI seems broken on tsunami for some different reason.
State-Changed-From-To: analyzed->feedback
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Sun, 18 Jul 2021 20:00:28 +0000
State-Changed-Why:
This issue should now be fixed. If you still have the system, could you
try out a NetBSD-current daily build once the fix has landed there?
From: "Jason R Thorpe" <thorpej@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/40604 CVS commit: src/sys/arch/alpha/pci
Date: Sun, 18 Jul 2021 19:58:34 +0000
Module Name: src
Committed By: thorpej
Date: Sun Jul 18 19:58:34 UTC 2021
Modified Files:
src/sys/arch/alpha/pci: tsp_dma.c
Log Message:
According to section 8.1.2.2 of the Tsunami/Typhoon hardware reference
manual (DS-0025A-TE), the SGMAP TLB is arranged as 168 locations of 4
consecutive quadwords. It seems that on some revisions of the Pchip,
SGMAP translation is not perfectly reliable unless we align the DMA
segments to the TLBs natural boundaries (observed on the API CS20).
N.B. the Titan (as observed on a Compaq DS25) does not seem to have this
problem, but we'll play it safe and run this way on both variants.
PR port-alpha/40604.
To generate a diff of this commit:
cvs rdiff -u -r1.20 -r1.21 src/sys/arch/alpha/pci/tsp_dma.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: feedback->closed
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Sun, 25 Jul 2021 21:49:49 +0000
State-Changed-Why:
Feedback timeout. But all experience with NetBSD 9.99.87 shows that this
issue is fixed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.