NetBSD Problem Report #38643
From mouse@Sparkle.Rodents.Montreal.QC.CA Mon May 12 20:39:03 2008
Return-Path: <mouse@Sparkle.Rodents.Montreal.QC.CA>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 2975E63B293
for <gnats-bugs@gnats.NetBSD.org>; Mon, 12 May 2008 20:39:03 +0000 (UTC)
Message-Id: <200805122039.QAA02019@Sparkle.Rodents.Montreal.QC.CA>
Date: Mon, 12 May 2008 16:39:01 -0400 (EDT)
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
Reply-To: mouse@Rodents.Montreal.QC.CA
To: gnats-bugs@gnats.NetBSD.org
Subject: [dM] st tape drive loses data
X-Send-Pr-Version: 3.95
>Number: 38643
>Category: kern
>Synopsis: [dM] st tape drive loses data
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: kern-bug-people
>State: feedback
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon May 12 20:40:02 +0000 2008
>Closed-Date:
>Last-Modified: Sat Dec 31 04:45:44 +0000 2022
>Originator: der Mouse
>Release: NetBSD 4.0
>Organization:
Dis-
>Environment:
System: not booted at the moment; see below
Architecture: i386
Machine: i386
>Description:
At work, we have a machine that's been running NetBSD/i386 3.0
for some time, and it's been working fine. One of the things
it does is run amanda, dumping to "virtual tapes" (files on
disk) on a RAIDed filesystem (firmware RAID, not RAIDframe - a
3ware). Another thing it does is to dump some of the resulting
files onto real tapes.
Recently, I put a 4.0 kernel and userland on it. Last night,
it did a dump-to-tape run, which failed oddly. On
investigating, I find there's a very peculiar problem with the
tape drive, which I can only ascribe to the switch to 4.0. (It
showed up when testing manually, and disappeared upon switching
back to the 3.0 kernel.)
Specifically, on a tape written with 10KB records, an attempt
to read 1048577 bytes reads one (10KB) record, returns it to
userland, and throws away the next 15 records. (Attempts to
read the tape with blocksize 10240 work fine; the data actually
are on the tape.)
The tape drive is
ahc1 at pci4 dev 2 function 0: Adaptec 2940 Pro Ultra SCSI adapter
ahc1: interrupting at ioapic0 pin 19 (irq 5)
ahc1: aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
scsibus0 at ahc1: 16 targets, 8 luns per target
...
st0 at scsibus0 target 5 lun 0: <QUANTUM, SDLT600, 1E1E> tape removable
st0: drive empty
st0: sync (50.00ns offset 8), 16-bit (40.000MB/s) transfers
and it works fine as far as I can tell under 3.0 (in particular
it does not exhibit this misbehaviour). Full dmesg included
below.
The kernel is the 4.0 GENERIC kernel,
builds@wb34:/home/builds/ab/netbsd-4-0-RELEASE/i386/200712160005Z-obj/home/builds/ab/netbsd-4-0-RELEASE/src/sys/arch/i386/compile/GENERIC
MD5 f201f3213ba5886a9b6f05a5492c6172.
See how-to-repeat - or my mail to tech-kern,
Message-Id: <200804292303.TAA20392@Sparkle.Rodents.Montreal.QC.CA>,
for more.
Here's a full dmesg from a 4.0 boot on the hardware in
question:
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 4.0 (GENERIC) #0: Sun Dec 16 00:20:10 PST 2007
builds@wb34:/home/builds/ab/netbsd-4-0-RELEASE/i386/200712160005Z-obj/home/builds/ab/netbsd-4-0-RELEASE/src/sys/arch/i386/compile/GENERIC
total memory = 1023 MB
rbus: rbus_min_start set to 0x40000000
avail memory = 995 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
BIOS32 rev. 0 found at 0xfd6e0
mainbus0 (root)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Xeon (686-class), 2790.83 MHz, id 0xf25
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 4400<CID,xTPR>
cpu0: "Intel(R) Xeon(TM) CPU 2.80GHz"
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu0: L2 cache 512 KB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: enabling thermal monitor 1 ... enabled.
cpu0: calibrating local timer
cpu0: apic clock running at 132 MHz
cpu0: 16 page colors
cpu1 at mainbus0: apid 1 (application processor)
cpu1: not started
ioapic0 at mainbus0 apid 2 (I/O APIC)
ioapic0: pa 0xfec00000, version 20, 24 pins
ioapic1 at mainbus0 apid 3 (I/O APIC)
ioapic1: pa 0xfec80000, version 20, 24 pins
ioapic2 at mainbus0 apid 4 (I/O APIC)
ioapic2: pa 0xfec80400, version 20, 24 pins
acpi0 at mainbus0: Advanced Configuration and Power Interface
acpi0: using Intel ACPI CA subsystem version 20060217
acpi0: X/RSDT: OemId <PTLTD , RSDT ,06040000>, AslId < LTP,00000000>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
ACPI-Fast 24-bit timer
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
PNP0A03 [PCI/PCI-X Host Bridge] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
INT0800 [Intel FWH Random Number Generator] at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0200 [AT DMA Controller] at acpi0 not configured
npx1 at acpi0 (PNP0C04)
npx1: io 0xf0-0xfe irq 13
npx1: reported by CPUID; using exception 16
PNP0000 [AT Interrupt Controller] at acpi0 not configured
PNP0B00 [AT Real-Time Clock] at acpi0 not configured
pcppi1 at acpi0 (PNP0800)
pcppi1: io 0x61
pcppi1: children must have an explicit unit
midi0 at pcppi1: PC speaker (CPU-intensive output)
sysbeep0 at pcppi1
attimer1 at acpi0 (PNP0100): AT Timer
attimer1: io 0x40-0x43 irq 0
PNP0A06 [Generic Container Device] at acpi0 not configured
pckbc1 at acpi0 (PNP0303): kbd port
pckbc1: io 0x60,0x64 irq 1
PNP0700 [PC standard floppy disk controller] at acpi0 not configured
PNP0501 [16550A-compatible COM port] at acpi0 not configured
PNP0501 [16550A-compatible COM port] at acpi0 not configured
PNP0401 [ECP printer port] at acpi0 not configured
acpibut0 at acpi0 (PNP0C0C): ACPI Power Button
pcppi1: attached to attimer1
pckbd0 at pckbc1 (kbd slot)
pckbc1: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Intel E7501 MCH Host (rev. 0x01)
Intel E7500 MCH DRAM Controller (undefined subclass 0x00, revision 0x01) at pci0 dev 0 function 1 not configured
ppb0 at pci0 dev 2 function 0: Intel E7500 MCH HI_B vppb 1 (rev. 0x01)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
Intel 82870P2 P64H2 IOxAPIC (interrupt system, interface 0x20, revision 0x04) at pci1 dev 28 function 0 not configured
ppb1 at pci1 dev 29 function 0: Intel 82870P2 P64H2 PCI-PCI Bridge (rev. 0x04)
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled
Intel 82870P2 P64H2 IOxAPIC (interrupt system, interface 0x20, revision 0x04) at pci1 dev 30 function 0 not configured
ppb2 at pci1 dev 31 function 0: Intel 82870P2 P64H2 PCI-PCI Bridge (rev. 0x04)
pci3 at ppb2 bus 3
pci3: i/o space, memory space enabled
twe0 at pci3 dev 1 function 0: 3ware Escalade
twe0: interrupting at ioapic2 pin 0 (irq 10)
twe0: 12 ports, Firmware FE7S 1.05.00.065, BIOS BE7X 1.08.00.048
twe0: Monitor ME7X 1.01.00.038, PCB Rev5 , Achip 3.20 , Pchip 1.30-66
twe0: port 0: ST3300831AS 286168 MB
twe0: port 1: ST3300831AS 286168 MB
twe0: port 2: ST3300831AS 286168 MB
twe0: port 3: ST3300831AS 286168 MB
twe0: port 4: ST3300831AS 286168 MB
twe0: port 5: ST3300831AS 286168 MB
twe0: port 6: ST3300831AS 286168 MB
twe0: port 7: ST3300831AS 286168 MB
twe0: port 8: ST3300831AS 286168 MB
twe0: port 9: ST3300831AS 286168 MB
twe0: port 10: ST3300831AS 286168 MB
twe0: port 11: ST3300831AS 286168 MB
ld0 at twe0 unit 0: 64K stripe RAID5, status: Normal
ld0: 1956 GB, 255368 cyl, 255 head, 63 sec, 512 bytes/sect x 4102491904 sectors
ld1 at twe0 unit 2: 1024K stripe RAID0, status: Normal
ld1: 838 GB, 109443 cyl, 255 head, 63 sec, 512 bytes/sect x 1758210048 sectors
ld2 at twe0 unit 9: JBOD, status: Normal
ld2: 279 GB, 36481 cyl, 255 head, 63 sec, 512 bytes/sect x 586072368 sectors
Intel E7500 MCH HI_B vppb 2 (undefined subclass 0x00, revision 0x01) at pci0 dev 2 function 1 not configured
uhci0 at pci0 dev 29 function 0: Intel 82801CA USB Controller (rev. 0x02)
uhci0: interrupting at ioapic0 pin 16 (irq 10)
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1: Intel 82801CA USB Controller (rev. 0x02)
uhci1: interrupting at ioapic0 pin 19 (irq 5)
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2 at pci0 dev 29 function 2: Intel 82801CA USB Controller (rev. 0x02)
uhci2: interrupting at ioapic0 pin 18 (irq 10)
usb2 at uhci2: USB revision 1.0
uhub2 at usb2
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
ppb3 at pci0 dev 30 function 0: Intel 82801BA Hub-PCI Bridge (rev. 0x42)
pci4 at ppb3 bus 4
pci4: i/o space, memory space enabled
ahc1 at pci4 dev 2 function 0: Adaptec 2940 Pro Ultra SCSI adapter
ahc1: interrupting at ioapic0 pin 19 (irq 5)
ahc1: aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
scsibus0 at ahc1: 16 targets, 8 luns per target
vga1 at pci4 dev 3 function 0: ATI Technologies Rage XL (rev. 0x27)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation), using wskbd0
wsmux1: connecting to wsdisplay0
fxp0 at pci4 dev 4 function 0: i82550 Ethernet, rev 13
fxp0: interrupting at ioapic0 pin 20 (irq 11)
fxp0: Ethernet address 00:0e:0c:4d:ec:5b
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
wm0 at pci4 dev 5 function 0: Intel i82540EM 1000BASE-T Ethernet, rev. 2
wm0: interrupting at ioapic0 pin 23 (irq 10)
wm0: 32-bit 33MHz PCI bus
wm0: 64 word (6 address bits) MicroWire EEPROM
wm0: Ethernet address 00:0e:0c:4d:ec:11
makphy0 at wm0 phy 1: Marvell 88E1011 Gigabit PHY, rev. 3
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
pcib0 at pci0 dev 31 function 0
pcib0: Intel 82801CA LPC Interface (rev. 0x02)
piixide0 at pci0 dev 31 function 1
piixide0: Intel 82801CA IDE Controller (ICH3) (rev. 0x02)
piixide0: bus-master DMA support present
piixide0: primary channel configured to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus0 at piixide0 channel 0
piixide0: secondary channel configured to compatibility mode
piixide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus1 at piixide0 channel 1
Intel 82801CA SMBus Controller (SMBus serial bus, revision 0x02) at pci0 dev 31 function 3 not configured
isa0 at pcib0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
isapnp0: no ISA Plug 'n Play devices found
ioapic0: enabling
ioapic1: enabling
ioapic2: enabling
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
scsibus0: waiting 2 seconds for devices to settle...
wd0 at atabus0 drive 0: <WDC WD400JB-00JJA0>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 38166 MB, 77545 cyl, 16 head, 63 sec, 512 bytes/sect x 78165360 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1 at atabus0 drive 1: <FUJITSU MPD3043AT>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 4125 MB, 8940 cyl, 15 head, 63 sec, 512 bytes/sect x 8448300 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
wd1(piixide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
st0 at scsibus0 target 5 lun 0: <QUANTUM, SDLT600, 1E1E> tape removable
st0: drive empty
st0: sync (50.00ns offset 8), 16-bit (40.000MB/s) transfers
boot device: wd1
root on wd1a dumps on wd1b
root file system type: ffs
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
>How-To-Repeat:
While I initially noticed the problem with other tools, I did
some tests with a fresh tape and very simple tools. I created
a ten-megabyte file full of distinctive data (given any
512-byte chunk, I could tell exactly where in the original file
it came from). I wrote this to the tape with
# dd bs=10240 if=file of=/dev/nrst0
which (reassuringly) reported 1024 records written. I then
rewound the tape and checked with
# dd bs=10240 if=/dev/nrst0 | cmp - file
to verify that it was in fact on the tape; except for dd's
report, this produced no output. Then I rewound again and
tried
# sh -c 'exec 3</dev/nrst0; dd bs=1048577 count=1 0<&3; dd bs=10240 0<&3' > file2
which is basically a command-line version of what I saw when I
ktraced the program that produced the odd behaviour that put me
onto this. This said
0+1 records in
0+1 records out
10240 bytes transferred in 0.063 secs (162539 bytes/sec)
1008+0 records in
1008+0 records out
10321920 bytes transferred in 0.482 secs (21414771 bytes/sec)
which is what I'd expect, except that the 1008 should be 1023
(and the total byte count, of course, should match). Looking
at the resulting file2, the first 20 blocks - the first 10K -
are exactly what they should be, but the next block is the one
from block offset 320, not 20, and it goes on from there;
something lost the next 15 tape records, as if the first dd had
actually read 16 10240-byte records, returned the first, and
thrown away the other 15.
Is the bug here with 4.0 or with my expectations? Certainly my
historical experience has been that reading a tape record with
a buffer far bigger than the record just reads the record into
the beginning of the buffer, not reads the record into the
beginning of the buffer and then reads and discards some
unobvious number of following records! (16 records is 320K,
less than a third of the 1048577-byte buffer size; I have no
idea why that many.)
My message to tech-kern appears to have drawn no response,
tending to argue against stupid pilot-error mistakes as the
cause.
I'm classing this as tech-kern, because I suspect a kernel bug
(the userland side of the above test is pretty much too simple
to break); serious, because it means most likely no 4.0 on this
machine until/unless it's fixed; medium, because it doesn't
seem to affect a lot of people or it would have been noticed
long ago. Adjust if you think it should be otherwise.
>Fix:
Unknown. I had a quick look but nothing was obvious.
I can test proposed fixes, but, depending on the time of day,
it may take a little while; the machine is in live production
and only practical window for test boots is weekday afternoons
(North American Eastern timezone, ie, roughly 16:00-22:00 UTC).
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML mouse@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
>Release-Note:
>Audit-Trail:
From: Havard Eidnes <he@NetBSD.org>
To: gnats-bugs@NetBSD.org, mouse@Rodents.Montreal.QC.CA
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Mon, 01 Sep 2008 20:06:58 +0200 (CEST)
Hi,
prompted by discussion elsewhere, I had a brief glance at this
bug.
I don't have any suggestions for a fix, but I may have an
observation (which I'm sure you've thought of yourself as well,
but it's good to have it written down anyway).
On nearly all NetBSD ports, the MAXPHYS constant is 64KB, and any
physical I/O operation can at most transfer this much data in one
operation.
1048577 is an odd size to read, if you were expecting an integral
number of 10KB tape blocks -- it's 16 * 64KB + 1 (why the +1?).
I suspect your first big read operation is being broken up into
separate MAXPHYS-sized requests, instead of just a single block
being read. I'm not sufficiently familiar with the code to
suggest where that split is being done, though, although I think
I would take a long hard look at the changes to kern_physio.c
between the two releases, and perhaps have a chat with yamt@
(CC'ed here), as he has been in there making some significant
changes between netbsd-3-0-RELEASE and netbsd-4-0-RELEASE.
(...which means that even though I've taken a cursory look, I
wasn't able to pinpoint where the problem is...)
Regards,
- H=E5vard
From: David Holland <dholland-bugs@netbsd.org>
To: Havard Eidnes <he@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, mouse@Rodents.Montreal.QC.CA,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Mon, 1 Sep 2008 18:41:36 +0000
On Mon, Sep 01, 2008 at 08:06:58PM +0200, Havard Eidnes wrote:
> I don't have any suggestions for a fix, but I may have an
> observation (which I'm sure you've thought of yourself as well,
> but it's good to have it written down anyway).
I think I've found the problem, and if so this is the fix:
Index: st.c
===================================================================
RCS file: /cvsroot/src/sys/dev/scsipi/st.c,v
retrieving revision 1.205
diff -u -p -r1.205 st.c
--- st.c 8 Jun 2008 18:18:34 -0000 1.205
+++ st.c 1 Sep 2008 18:24:12 -0000
@@ -1125,7 +1125,7 @@ done:
/*
* Correctly set the buf to indicate a completed xfer
*/
- bp->b_resid = bp->b_bcount;
+ bp->b_resid = 0;
biodone(bp);
return;
}
This has been this way ~forever, and AFAIK it's always been wrong, but
changes in kern_physio before 4.0 have changed the resulting behavior
so that it could easily be consistent with the observed phenomena.
There's also another maybe related issue: line 413 of kern_physio.c
1.87 has
/* uio->uio_offset = ps->ps_endoffset; */
which yamt added, commented out, when cleaning up some grossness prior
to 4.0. I am pretty sure it's why part of the data is getting skipped
over, and I think it should be uncommented. yamt?
On the other hand, I could be completely wrong here too...
--
David A. Holland
dholland@netbsd.org
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: David Holland <dholland-bugs@NetBSD.org>
Cc: Havard Eidnes <he@NetBSD.org>, gnats-bugs@NetBSD.org,
mouse@Rodents.Montreal.QC.CA, kern-bug-people@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, yamt@NetBSD.org
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Mon, 1 Sep 2008 22:22:20 +0200
On Mon, Sep 01, 2008 at 06:41:36PM +0000, David Holland wrote:
> On Mon, Sep 01, 2008 at 08:06:58PM +0200, Havard Eidnes wrote:
> > I don't have any suggestions for a fix, but I may have an
> > observation (which I'm sure you've thought of yourself as well,
> > but it's good to have it written down anyway).
>
> I think I've found the problem, and if so this is the fix:
>
> Index: st.c
> ===================================================================
> RCS file: /cvsroot/src/sys/dev/scsipi/st.c,v
> retrieving revision 1.205
> diff -u -p -r1.205 st.c
> --- st.c 8 Jun 2008 18:18:34 -0000 1.205
> +++ st.c 1 Sep 2008 18:24:12 -0000
> @@ -1125,7 +1125,7 @@ done:
> /*
> * Correctly set the buf to indicate a completed xfer
> */
> - bp->b_resid = bp->b_bcount;
> + bp->b_resid = 0;
> biodone(bp);
> return;
> }
This is wrong; This code is used in error path (b_error is not 0), and
for the case (bp->b_bcount == 0). When b_error is not 0, bp->b_resid
has to be equal to bp->b_bcount or a KASSERT will be triggered later.
I think the code here does what's intended, but maybe the comment is
misleading.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: David Holland <dholland-bugs@netbsd.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: David Holland <dholland-bugs@NetBSD.org>, Havard Eidnes <he@NetBSD.org>,
gnats-bugs@NetBSD.org, mouse@Rodents.Montreal.QC.CA,
kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org, yamt@NetBSD.org
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Mon, 1 Sep 2008 21:59:40 +0000
On Mon, Sep 01, 2008 at 10:22:20PM +0200, Manuel Bouyer wrote:
> > - bp->b_resid = bp->b_bcount;
> > + bp->b_resid = 0;
> > biodone(bp);
> > return;
> > }
>
> This is wrong; This code is used in error path (b_error is not 0), and
> for the case (bp->b_bcount == 0). When b_error is not 0, bp->b_resid
> has to be equal to bp->b_bcount or a KASSERT will be triggered later.
>
> I think the code here does what's intended, but maybe the comment is
> misleading.
Bah.
Yes, it does, but it's very unclear. I think I will go clarify it a
little.
I guess it was too much to hope it would be this simple...
(any thoughts on the bit from kern_physio?)
--
David A. Holland
dholland@netbsd.org
From: David Holland <dholland-bugs@netbsd.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: David Holland <dholland-bugs@NetBSD.org>, Havard Eidnes <he@NetBSD.org>,
gnats-bugs@NetBSD.org, mouse@Rodents.Montreal.QC.CA,
kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org, yamt@NetBSD.org
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Tue, 2 Sep 2008 05:07:44 +0000
On Mon, Sep 01, 2008 at 09:59:40PM +0000, David Holland wrote:
> (any thoughts on the bit from kern_physio?)
no such luck; the value left in uio_offset is ignored.
--
David A. Holland
dholland@netbsd.org
From: David Holland <dholland-bugs@netbsd.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: David Holland <dholland-bugs@NetBSD.org>, Havard Eidnes <he@NetBSD.org>,
gnats-bugs@NetBSD.org, mouse@Rodents.Montreal.QC.CA,
kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org, yamt@NetBSD.org
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Tue, 2 Sep 2008 06:54:54 +0000
On Tue, Sep 02, 2008 at 05:07:44AM +0000, David Holland wrote:
> no such luck; the value left in uio_offset is ignored.
However, it looks like st.c flatly ignores the passed-in block number
(b_blkno) and always reads whatever's under the tape head.
So I think what's happening is that physio is firing off sixteen 64k
reads, since PHYSIO_CONCURRENCY is 16 and MAXPHYS is 64k; each one
reads one 10k block, positioning the tape such that the next data read
will be in the place observed. But physio notices the short count on
the first read and drops the other 15, so only 10k comes back from the
read system call, and while vn_read will update the fd's seek position
to 10240, and that position is passed down on the next read, st
ignores it.
This also explains why things broke going from 3.0 -> 4.0, because in
3.0 physio didn't support having multiple requests in flight at once.
The 1 byte extra after those sixteen reads should either be skipped
entirely or be a 17th read generating an EIO that gets dropped,
depending on timing. (Note that physio only checks for an error
*before* it calls physio_wait, which might be construed as a
shortcoming.)
So there are at least two things wrong: (1) physio assumes b_blkno is
honored, and st doesn't; and (2) physio assumes st will read 64k when
asked, but it in fact apparently only reads one 10k block at a time.
(What does it do if the tape is written in 16k blocks? Or worse, say,
80k blocks?)
I have no idea what the proper way to resolve these discrepancies is.
It appears the immediate problem can be hacked around by having st
allocate a buf and pass it to physio, because that will cause physio
to use only that buf instead of up to PHYSIO_CONCURRENCY of its own;
but that's hardly a fix and doesn't even cover all the possible
failure cases.
--
David A. Holland
dholland@netbsd.org
From: der Mouse <mouse@Rodents-Montreal.ORG>
To: David Holland <dholland-bugs@netbsd.org>, Havard Eidnes <he@netbsd.org>,
gnats-bugs@netbsd.org, mouse@Rodents.Montreal.QC.CA,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, yamt@netbsd.org,
Manuel Bouyer <bouyer@antioche.eu.org>
Cc:
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Tue, 2 Sep 2008 03:42:18 -0400 (EDT)
> However, it looks like st.c flatly ignores the passed-in block number
> (b_blkno) and always reads whatever's under the tape head.
I'd expect this, actually; tapes are not random access, and the Unix
tape model treats them accordingly. I don't consider this a problem
any more than I consider it a problem that lseek doesn't work on tty
devices.
> So I think what's happening is that physio is firing off sixteen 64k
> reads, since PHYSIO_CONCURRENCY is 16 and MAXPHYS is 64k;
Ooh. Yes, that explains it very neatly: where the data is going and
why exactly 15 records get lost.
> So there are at least two things wrong: (1) physio assumes b_blkno is
> honored, and st doesn't; and (2) physio assumes st will read 64k when
> asked, but it in fact apparently only reads one 10k block at a time.
> (What does it do if the tape is written in 16k blocks?
I would expect it to read 16k at a time. The traditional Unix tape
model is that a tape is a stream of records (with interspersed
filemarks, but let's not complicate things with them), each of which
has an inherent size recorded on the tape along with the data -
basically, the way half-inch nine-track tapes work. If you try to read
R byutes and the next record's size is N bytes, then if R>=N you get
the next record and a return value of N, while if R<N you get...I'm not
sure, probably either an error or a return value of N with the
remaining N-R bytes dropped.
Some tapes don't fit this model, notably quarter-inch cartridge tapes,
but it's still the paradigm most of the tape subsystems and drivers
I've seen are designed around - and the model all the tape software
I've seen expects.
> Or worse, say, 80k blocks?)
You lose. :( Trying to write - or read - tape records over MAXPHYS has
never worked, not clear back to 4.2, possibly even 4.1c. I remember
running into a 63K limit because of this on the VAX under 4.x.
Well, not with tape drives fitting the nine-track model. Classic
quarter-inch cartridge tapes (eg, the DC600A) are streams of 512-byte
records, with no record boundaries; trying to read or write huge
records (eg, 1MB) just sources or sinks enough 512-byte physical
records to satisfy the request.
> I have no idea what the proper way to resolve these discrepancies is.
> It appears the immediate problem can be hacked around by having st
> allocate a buf and pass it to physio, because that will cause physio
> to use only that buf instead of up to PHYSIO_CONCURRENCY of its own;
> but that's hardly a fix and doesn't even cover all the possible
> failure cases.
I offer the thought that maybe PHYSIO_CONCURRENCY should effectively be
forced to 1 for DV_TAPE devices (or, possibly, non-DV_DISK devices).
The physio model of breaking big requests up into multiple concurrent
small requests really doesn't make sense except for random-access
devices, so maybe it shouldn't be done except for random-access
devices.
However, this is coming from someone who doesn't really understand
4.0's physio paradigm, so it could be pure nonsense.
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML mouse@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
From: der Mouse <mouse@Rodents-Montreal.ORG>
To: David Holland <dholland-bugs@netbsd.org>, Havard Eidnes <he@netbsd.org>,
gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, yamt@netbsd.org,
Manuel Bouyer <bouyer@antioche.eu.org>
Cc:
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Tue, 2 Sep 2008 23:46:26 -0400 (EDT)
>> So I think what's happening is that physio is firing off sixteen 64k
>> reads, since PHYSIO_CONCURRENCY is 16 and MAXPHYS is 64k;
> Ooh. Yes, that explains it very neatly: where the data is going and
> why exactly 15 records get lost.
As a quick test to see if this is what's wrong, I tried hacking
kern_physio to force concurrency to 0 (ie, effectively set
PHYSIO_CONCURRENCY to 1). It reduces the problem but does not cure it;
it now appears to read two records, rather than 16, when the big read
is attempted (I didn't have a scratch tape handy today; this is based
on tape block numbers reported by "mt status").
I don't expect to be in a position to pick this up again until
Thursday; then, I should be able to try having st.c provide a buf to
physio....
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML mouse@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
From: der Mouse <mouse@Rodents-Montreal.ORG>
To: Manuel Bouyer <bouyer@antioche.eu.org>, Havard Eidnes <he@NetBSD.org>,
gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, yamt@NetBSD.org
Cc:
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Thu, 4 Sep 2008 15:05:11 -0400 (EDT)
I tried (a) rolling back my concurrency change to physio and (b) making
stread and stwrite use statically allocated structs buf for their calls
to physio.
The results: the same as the concurrency change to physio. That is, it
loses one tape record, instead of 15, when reading. (I found a scratch
tape, so I was able to do a proper test, which also confirmed the
behaviour with the concurrency override in physio.)
In my Copious Spare Time, I hope to be able to try this with some of
the debugging printfs I see scattered around the st and scsipi code
turned on. If-&-when I manage that, I'll report back.
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML mouse@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
From: der Mouse <mouse@Rodents-Montreal.ORG>
To: Manuel Bouyer <bouyer@antioche.eu.org>, Havard Eidnes <he@NetBSD.org>,
gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, yamt@NetBSD.org
Cc:
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Mon, 8 Sep 2008 16:05:42 -0400 (EDT)
Today, I tried adding printfs in stread and ststrategy. These indicate
that, even when stread provides a buf, physio dispatches two requests
to the strategy routine:
stread: offset 0 resid 1048577
ststrategy: bcount 65536 blkno 0
ststrategy: bcount 65536 blkno 128
stread: offset 10240 resid 10240
ststrategy: bcount 10240 blkno 20
So it appears there is something wrong with physio. After staring at
it for a while, I came up with this
--- /dev/fd/4 Tue Dec 9 16:18:08 2003
+++ /dev/fd/5 Tue Dec 9 16:18:08 2003
@@ -336,6 +336,9 @@
if (error) {
goto done_locked;
}
+ if ((mbp->b_flags & B_ERROR) != 0) {
+ goto done_locked;
+ }
simple_unlock(&mbp->b_interlock);
if (obp != NULL) {
/*
which appears to fix this. (Otherwise, the main physio line blocks in
physio_wait before the first request comes off the workqueue and sets
B_ERROR in mbp, and the B_ERROR isn't noticed until after dispatching
the second request.) Combine this with making st.c use its own buffer
in stread and stwrite
--- /dev/fd/4 Tue Dec 9 16:18:08 2003
+++ /dev/fd/5 Tue Dec 9 16:18:08 2003
@@ -1379,8 +1379,9 @@
stread(dev_t dev, struct uio *uio, int iomode)
{
struct st_softc *st = st_cd.cd_devs[STUNIT(dev)];
+ static struct buf buf;
- return (physio(ststrategy, NULL, dev, B_READ,
+ return (physio(ststrategy, &buf, dev, B_READ,
st->sc_periph->periph_channel->chan_adapter->adapt_minphys, uio));
}
@@ -1388,8 +1389,9 @@
stwrite(dev_t dev, struct uio *uio, int iomode)
{
struct st_softc *st = st_cd.cd_devs[STUNIT(dev)];
+ static struct buf buf;
- return (physio(ststrategy, NULL, dev, B_WRITE,
+ return (physio(ststrategy, &buf, dev, B_WRITE,
st->sc_periph->periph_channel->chan_adapter->adapt_minphys, uio));
}
and, in initial tests, st is fine. I'll have to run with this for a
while to see if there are any subtle problems remaining.
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML mouse@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
From: der Mouse <mouse@Rodents-Montreal.ORG>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Thu, 5 Aug 2010 23:08:42 -0400 (EDT)
The machine behind this bug report is now at 4.0.1, with the last set
of fixes described in the PR to physio and st, and it appears to be
working correctly (here meaning, as expected based on the tape model
outlined above). If (the -current equivalent of) those changes goes
in, I would say this PR can be closed.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 31 Dec 2022 04:45:44 +0000
State-Changed-Why:
Did this ever get committed? I thought so but there don't seem to be
commits in the history.
>Unformatted:
(Contact us)
$NetBSD: gnats-precook-prs,v 1.4 2018/12/21 14:20:20 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.