NetBSD Problem Report #38643

From mouse@Sparkle.Rodents.Montreal.QC.CA  Mon May 12 20:39:03 2008
Return-Path: <mouse@Sparkle.Rodents.Montreal.QC.CA>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 2975E63B293
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 12 May 2008 20:39:03 +0000 (UTC)
Message-Id: <200805122039.QAA02019@Sparkle.Rodents.Montreal.QC.CA>
Date: Mon, 12 May 2008 16:39:01 -0400 (EDT)
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
Reply-To: mouse@Rodents.Montreal.QC.CA
To: gnats-bugs@gnats.NetBSD.org
Subject: [dM] st tape drive loses data
X-Send-Pr-Version: 3.95

>Number:         38643
>Category:       kern
>Synopsis:       [dM] st tape drive loses data
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 12 20:40:02 +0000 2008
>Last-Modified:  Fri Aug 06 03:10:04 +0000 2010
>Originator:     der Mouse
>Release:        NetBSD 4.0
>Organization:
	Dis-
>Environment:
System: not booted at the moment; see below
Architecture: i386
Machine: i386
>Description:
	At work, we have a machine that's been running NetBSD/i386 3.0
	for some time, and it's been working fine.  One of the things
	it does is run amanda, dumping to "virtual tapes" (files on
	disk) on a RAIDed filesystem (firmware RAID, not RAIDframe - a
	3ware).  Another thing it does is to dump some of the resulting
	files onto real tapes.

	Recently, I put a 4.0 kernel and userland on it.  Last night,
	it did a dump-to-tape run, which failed oddly.  On
	investigating, I find there's a very peculiar problem with the
	tape drive, which I can only ascribe to the switch to 4.0.  (It
	showed up when testing manually, and disappeared upon switching
	back to the 3.0 kernel.)

	Specifically, on a tape written with 10KB records, an attempt
	to read 1048577 bytes reads one (10KB) record, returns it to
	userland, and throws away the next 15 records.  (Attempts to
	read the tape with blocksize 10240 work fine; the data actually
	are on the tape.)

	The tape drive is

	ahc1 at pci4 dev 2 function 0: Adaptec 2940 Pro Ultra SCSI adapter
	ahc1: interrupting at ioapic0 pin 19 (irq 5)
	ahc1: aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
	scsibus0 at ahc1: 16 targets, 8 luns per target
	...
	st0 at scsibus0 target 5 lun 0: <QUANTUM, SDLT600, 1E1E> tape removable
	st0: drive empty
	st0: sync (50.00ns offset 8), 16-bit (40.000MB/s) transfers

	and it works fine as far as I can tell under 3.0 (in particular
	it does not exhibit this misbehaviour).  Full dmesg included
	below.

	The kernel is the 4.0 GENERIC kernel,
	builds@wb34:/home/builds/ab/netbsd-4-0-RELEASE/i386/200712160005Z-obj/home/builds/ab/netbsd-4-0-RELEASE/src/sys/arch/i386/compile/GENERIC
	MD5 f201f3213ba5886a9b6f05a5492c6172.

	See how-to-repeat - or my mail to tech-kern, 
	Message-Id: <200804292303.TAA20392@Sparkle.Rodents.Montreal.QC.CA>,
	for more.

	Here's a full dmesg from a 4.0 boot on the hardware in
	question:

	Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
	    2006, 2007
	    The NetBSD Foundation, Inc.  All rights reserved.
	Copyright (c) 1982, 1986, 1989, 1991, 1993
	    The Regents of the University of California.  All rights reserved.

	NetBSD 4.0 (GENERIC) #0: Sun Dec 16 00:20:10 PST 2007
		builds@wb34:/home/builds/ab/netbsd-4-0-RELEASE/i386/200712160005Z-obj/home/builds/ab/netbsd-4-0-RELEASE/src/sys/arch/i386/compile/GENERIC
	total memory = 1023 MB
	rbus: rbus_min_start set to 0x40000000
	avail memory = 995 MB
	timecounter: Timecounters tick every 10.000 msec
	timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
	BIOS32 rev. 0 found at 0xfd6e0
	mainbus0 (root)
	cpu0 at mainbus0: apid 0 (boot processor)
	cpu0: Intel Xeon (686-class), 2790.83 MHz, id 0xf25
	cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
	cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
	cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
	cpu0: features2 4400<CID,xTPR>
	cpu0: "Intel(R) Xeon(TM) CPU 2.80GHz"
	cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
	cpu0: L2 cache 512 KB 64B/line 8-way
	cpu0: ITLB 4K/4M: 64 entries
	cpu0: DTLB 4K/4M: 64 entries
	cpu0: enabling thermal monitor 1 ... enabled.
	cpu0: calibrating local timer
	cpu0: apic clock running at 132 MHz
	cpu0: 16 page colors
	cpu1 at mainbus0: apid 1 (application processor)
	cpu1: not started
	ioapic0 at mainbus0 apid 2 (I/O APIC)
	ioapic0: pa 0xfec00000, version 20, 24 pins
	ioapic1 at mainbus0 apid 3 (I/O APIC)
	ioapic1: pa 0xfec80000, version 20, 24 pins
	ioapic2 at mainbus0 apid 4 (I/O APIC)
	ioapic2: pa 0xfec80400, version 20, 24 pins
	acpi0 at mainbus0: Advanced Configuration and Power Interface
	acpi0: using Intel ACPI CA subsystem version 20060217
	acpi0: X/RSDT: OemId <PTLTD ,  RSDT  ,06040000>, AslId < LTP,00000000>
	acpi0: SCI interrupting at int 9
	acpi0: fixed-feature power button present
	timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
	ACPI-Fast 24-bit timer
	ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
	ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
	ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
	ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
	ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
	ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
	ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
	ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
	PNP0A03 [PCI/PCI-X Host Bridge] at acpi0 not configured
	PNP0C0F [PCI interrupt link device] at acpi0 not configured
	PNP0C0F [PCI interrupt link device] at acpi0 not configured
	PNP0C0F [PCI interrupt link device] at acpi0 not configured
	PNP0C0F [PCI interrupt link device] at acpi0 not configured
	PNP0C0F [PCI interrupt link device] at acpi0 not configured
	PNP0C0F [PCI interrupt link device] at acpi0 not configured
	INT0800 [Intel FWH Random Number Generator] at acpi0 not configured
	PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
	PNP0200 [AT DMA Controller] at acpi0 not configured
	npx1 at acpi0 (PNP0C04)
	npx1: io 0xf0-0xfe irq 13
	npx1: reported by CPUID; using exception 16
	PNP0000 [AT Interrupt Controller] at acpi0 not configured
	PNP0B00 [AT Real-Time Clock] at acpi0 not configured
	pcppi1 at acpi0 (PNP0800)
	pcppi1: io 0x61
	pcppi1: children must have an explicit unit
	midi0 at pcppi1: PC speaker (CPU-intensive output)
	sysbeep0 at pcppi1
	attimer1 at acpi0 (PNP0100): AT Timer
	attimer1: io 0x40-0x43 irq 0
	PNP0A06 [Generic Container Device] at acpi0 not configured
	pckbc1 at acpi0 (PNP0303): kbd port
	pckbc1: io 0x60,0x64 irq 1
	PNP0700 [PC standard floppy disk controller] at acpi0 not configured
	PNP0501 [16550A-compatible COM port] at acpi0 not configured
	PNP0501 [16550A-compatible COM port] at acpi0 not configured
	PNP0401 [ECP printer port] at acpi0 not configured
	acpibut0 at acpi0 (PNP0C0C): ACPI Power Button
	pcppi1: attached to attimer1
	pckbd0 at pckbc1 (kbd slot)
	pckbc1: using irq 1 for kbd slot
	wskbd0 at pckbd0: console keyboard
	pci0 at mainbus0 bus 0: configuration mode 1
	pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
	pchb0 at pci0 dev 0 function 0
	pchb0: Intel E7501 MCH Host (rev. 0x01)
	Intel E7500 MCH DRAM Controller (undefined subclass 0x00, revision 0x01) at pci0 dev 0 function 1 not configured
	ppb0 at pci0 dev 2 function 0: Intel E7500 MCH HI_B vppb 1 (rev. 0x01)
	pci1 at ppb0 bus 1
	pci1: i/o space, memory space enabled
	Intel 82870P2 P64H2 IOxAPIC (interrupt system, interface 0x20, revision 0x04) at pci1 dev 28 function 0 not configured
	ppb1 at pci1 dev 29 function 0: Intel 82870P2 P64H2 PCI-PCI Bridge (rev. 0x04)
	pci2 at ppb1 bus 2
	pci2: i/o space, memory space enabled
	Intel 82870P2 P64H2 IOxAPIC (interrupt system, interface 0x20, revision 0x04) at pci1 dev 30 function 0 not configured
	ppb2 at pci1 dev 31 function 0: Intel 82870P2 P64H2 PCI-PCI Bridge (rev. 0x04)
	pci3 at ppb2 bus 3
	pci3: i/o space, memory space enabled
	twe0 at pci3 dev 1 function 0: 3ware Escalade
	twe0: interrupting at ioapic2 pin 0 (irq 10)
	twe0: 12 ports, Firmware FE7S 1.05.00.065, BIOS BE7X 1.08.00.048
	twe0: Monitor ME7X 1.01.00.038, PCB Rev5    , Achip 3.20    , Pchip 1.30-66 
	twe0: port 0: ST3300831AS                              286168 MB
	twe0: port 1: ST3300831AS                              286168 MB
	twe0: port 2: ST3300831AS                              286168 MB
	twe0: port 3: ST3300831AS                              286168 MB
	twe0: port 4: ST3300831AS                              286168 MB
	twe0: port 5: ST3300831AS                              286168 MB
	twe0: port 6: ST3300831AS                              286168 MB
	twe0: port 7: ST3300831AS                              286168 MB
	twe0: port 8: ST3300831AS                              286168 MB
	twe0: port 9: ST3300831AS                              286168 MB
	twe0: port 10: ST3300831AS                              286168 MB
	twe0: port 11: ST3300831AS                              286168 MB
	ld0 at twe0 unit 0: 64K stripe RAID5, status: Normal
	ld0: 1956 GB, 255368 cyl, 255 head, 63 sec, 512 bytes/sect x 4102491904 sectors
	ld1 at twe0 unit 2: 1024K stripe RAID0, status: Normal
	ld1: 838 GB, 109443 cyl, 255 head, 63 sec, 512 bytes/sect x 1758210048 sectors
	ld2 at twe0 unit 9: JBOD, status: Normal
	ld2: 279 GB, 36481 cyl, 255 head, 63 sec, 512 bytes/sect x 586072368 sectors
	Intel E7500 MCH HI_B vppb 2 (undefined subclass 0x00, revision 0x01) at pci0 dev 2 function 1 not configured
	uhci0 at pci0 dev 29 function 0: Intel 82801CA USB Controller (rev. 0x02)
	uhci0: interrupting at ioapic0 pin 16 (irq 10)
	usb0 at uhci0: USB revision 1.0
	uhub0 at usb0
	uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
	uhub0: 2 ports with 2 removable, self powered
	uhci1 at pci0 dev 29 function 1: Intel 82801CA USB Controller (rev. 0x02)
	uhci1: interrupting at ioapic0 pin 19 (irq 5)
	usb1 at uhci1: USB revision 1.0
	uhub1 at usb1
	uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
	uhub1: 2 ports with 2 removable, self powered
	uhci2 at pci0 dev 29 function 2: Intel 82801CA USB Controller (rev. 0x02)
	uhci2: interrupting at ioapic0 pin 18 (irq 10)
	usb2 at uhci2: USB revision 1.0
	uhub2 at usb2
	uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
	uhub2: 2 ports with 2 removable, self powered
	ppb3 at pci0 dev 30 function 0: Intel 82801BA Hub-PCI Bridge (rev. 0x42)
	pci4 at ppb3 bus 4
	pci4: i/o space, memory space enabled
	ahc1 at pci4 dev 2 function 0: Adaptec 2940 Pro Ultra SCSI adapter
	ahc1: interrupting at ioapic0 pin 19 (irq 5)
	ahc1: aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
	scsibus0 at ahc1: 16 targets, 8 luns per target
	vga1 at pci4 dev 3 function 0: ATI Technologies Rage XL (rev. 0x27)
	wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation), using wskbd0
	wsmux1: connecting to wsdisplay0
	fxp0 at pci4 dev 4 function 0: i82550 Ethernet, rev 13
	fxp0: interrupting at ioapic0 pin 20 (irq 11)
	fxp0: Ethernet address 00:0e:0c:4d:ec:5b
	inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
	inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
	wm0 at pci4 dev 5 function 0: Intel i82540EM 1000BASE-T Ethernet, rev. 2
	wm0: interrupting at ioapic0 pin 23 (irq 10)
	wm0: 32-bit 33MHz PCI bus
	wm0: 64 word (6 address bits) MicroWire EEPROM
	wm0: Ethernet address 00:0e:0c:4d:ec:11
	makphy0 at wm0 phy 1: Marvell 88E1011 Gigabit PHY, rev. 3
	makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
	pcib0 at pci0 dev 31 function 0
	pcib0: Intel 82801CA LPC Interface (rev. 0x02)
	piixide0 at pci0 dev 31 function 1
	piixide0: Intel 82801CA IDE Controller (ICH3) (rev. 0x02)
	piixide0: bus-master DMA support present
	piixide0: primary channel configured to compatibility mode
	piixide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
	atabus0 at piixide0 channel 0
	piixide0: secondary channel configured to compatibility mode
	piixide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
	atabus1 at piixide0 channel 1
	Intel 82801CA SMBus Controller (SMBus serial bus, revision 0x02) at pci0 dev 31 function 3 not configured
	isa0 at pcib0
	lpt0 at isa0 port 0x378-0x37b irq 7
	com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
	com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
	isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
	fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
	isapnp0: no ISA Plug 'n Play devices found
	ioapic0: enabling
	ioapic1: enabling
	ioapic2: enabling
	timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
	fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
	Kernelized RAIDframe activated
	scsibus0: waiting 2 seconds for devices to settle...
	wd0 at atabus0 drive 0: <WDC WD400JB-00JJA0>
	wd0: drive supports 16-sector PIO transfers, LBA addressing
	wd0: 38166 MB, 77545 cyl, 16 head, 63 sec, 512 bytes/sect x 78165360 sectors
	wd0: 32-bit data port
	wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
	wd1 at atabus0 drive 1: <FUJITSU MPD3043AT>
	wd1: drive supports 16-sector PIO transfers, LBA addressing
	wd1: 4125 MB, 8940 cyl, 15 head, 63 sec, 512 bytes/sect x 8448300 sectors
	wd1: 32-bit data port
	wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
	wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
	wd1(piixide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
	st0 at scsibus0 target 5 lun 0: <QUANTUM, SDLT600, 1E1E> tape removable
	st0: drive empty
	st0: sync (50.00ns offset 8), 16-bit (40.000MB/s) transfers
	boot device: wd1
	root on wd1a dumps on wd1b
	root file system type: ffs
	wsdisplay0: screen 1 added (80x25, vt100 emulation)
	wsdisplay0: screen 2 added (80x25, vt100 emulation)
	wsdisplay0: screen 3 added (80x25, vt100 emulation)
	wsdisplay0: screen 4 added (80x25, vt100 emulation)
>How-To-Repeat:
	While I initially noticed the problem with other tools, I did
	some tests with a fresh tape and very simple tools.  I created
	a ten-megabyte file full of distinctive data (given any
	512-byte chunk, I could tell exactly where in the original file
	it came from).  I wrote this to the tape with

	# dd bs=10240 if=file of=/dev/nrst0

	which (reassuringly) reported 1024 records written.  I then
	rewound the tape and checked with

	# dd bs=10240 if=/dev/nrst0 | cmp - file

	to verify that it was in fact on the tape; except for dd's
	report, this produced no output.  Then I rewound again and
	tried

	# sh -c 'exec 3</dev/nrst0; dd bs=1048577 count=1 0<&3; dd bs=10240 0<&3' > file2

	which is basically a command-line version of what I saw when I
	ktraced the program that produced the odd behaviour that put me
	onto this.  This said

	0+1 records in
	0+1 records out
	10240 bytes transferred in 0.063 secs (162539 bytes/sec)
	1008+0 records in
	1008+0 records out
	10321920 bytes transferred in 0.482 secs (21414771 bytes/sec)

	which is what I'd expect, except that the 1008 should be 1023
	(and the total byte count, of course, should match).  Looking
	at the resulting file2, the first 20 blocks - the first 10K -
	are exactly what they should be, but the next block is the one
	from block offset 320, not 20, and it goes on from there;
	something lost the next 15 tape records, as if the first dd had
	actually read 16 10240-byte records, returned the first, and
	thrown away the other 15.

	Is the bug here with 4.0 or with my expectations?  Certainly my
	historical experience has been that reading a tape record with
	a buffer far bigger than the record just reads the record into
	the beginning of the buffer, not reads the record into the
	beginning of the buffer and then reads and discards some
	unobvious number of following records!  (16 records is 320K,
	less than a third of the 1048577-byte buffer size; I have no
	idea why that many.)

	My message to tech-kern appears to have drawn no response,
	tending to argue against stupid pilot-error mistakes as the
	cause.

	I'm classing this as tech-kern, because I suspect a kernel bug
	(the userland side of the above test is pretty much too simple
	to break); serious, because it means most likely no 4.0 on this
	machine until/unless it's fixed; medium, because it doesn't
	seem to affect a lot of people or it would have been noticed
	long ago.  Adjust if you think it should be otherwise.
>Fix:
	Unknown.  I had a quick look but nothing was obvious.

	I can test proposed fixes, but, depending on the time of day,
	it may take a little while; the machine is in live production
	and only practical window for test boots is weekday afternoons
	(North American Eastern timezone, ie, roughly 16:00-22:00 UTC).

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

>Audit-Trail:
From: Havard Eidnes <he@NetBSD.org>
To: gnats-bugs@NetBSD.org, mouse@Rodents.Montreal.QC.CA
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Mon, 01 Sep 2008 20:06:58 +0200 (CEST)

 Hi,

 prompted by discussion elsewhere, I had a brief glance at this
 bug.

 I don't have any suggestions for a fix, but I may have an
 observation (which I'm sure you've thought of yourself as well,
 but it's good to have it written down anyway).

 On nearly all NetBSD ports, the MAXPHYS constant is 64KB, and any
 physical I/O operation can at most transfer this much data in one
 operation.

 1048577 is an odd size to read, if you were expecting an integral
 number of 10KB tape blocks -- it's 16 * 64KB + 1 (why the +1?).

 I suspect your first big read operation is being broken up into
 separate MAXPHYS-sized requests, instead of just a single block
 being read.  I'm not sufficiently familiar with the code to
 suggest where that split is being done, though, although I think
 I would take a long hard look at the changes to kern_physio.c
 between the two releases, and perhaps have a chat with yamt@
 (CC'ed here), as he has been in there making some significant
 changes between netbsd-3-0-RELEASE and netbsd-4-0-RELEASE.

 (...which means that even though I've taken a cursory look, I
 wasn't able to pinpoint where the problem is...)

 Regards,

 - H=E5vard

From: David Holland <dholland-bugs@netbsd.org>
To: Havard Eidnes <he@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, mouse@Rodents.Montreal.QC.CA,
	kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Mon, 1 Sep 2008 18:41:36 +0000

 On Mon, Sep 01, 2008 at 08:06:58PM +0200, Havard Eidnes wrote:
  > I don't have any suggestions for a fix, but I may have an
  > observation (which I'm sure you've thought of yourself as well,
  > but it's good to have it written down anyway).

 I think I've found the problem, and if so this is the fix:

 Index: st.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/scsipi/st.c,v
 retrieving revision 1.205
 diff -u -p -r1.205 st.c
 --- st.c	8 Jun 2008 18:18:34 -0000	1.205
 +++ st.c	1 Sep 2008 18:24:12 -0000
 @@ -1125,7 +1125,7 @@ done:
  	/*
  	 * Correctly set the buf to indicate a completed xfer
  	 */
 -	bp->b_resid = bp->b_bcount;
 +	bp->b_resid = 0;
  	biodone(bp);
  	return;
  }

 This has been this way ~forever, and AFAIK it's always been wrong, but
 changes in kern_physio before 4.0 have changed the resulting behavior
 so that it could easily be consistent with the observed phenomena.

 There's also another maybe related issue: line 413 of kern_physio.c
 1.87 has
 	/* uio->uio_offset = ps->ps_endoffset; */

 which yamt added, commented out, when cleaning up some grossness prior
 to 4.0. I am pretty sure it's why part of the data is getting skipped
 over, and I think it should be uncommented. yamt?

 On the other hand, I could be completely wrong here too...

 -- 
 David A. Holland
 dholland@netbsd.org

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: David Holland <dholland-bugs@NetBSD.org>
Cc: Havard Eidnes <he@NetBSD.org>, gnats-bugs@NetBSD.org,
        mouse@Rodents.Montreal.QC.CA, kern-bug-people@NetBSD.org,
        gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, yamt@NetBSD.org
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Mon, 1 Sep 2008 22:22:20 +0200

 On Mon, Sep 01, 2008 at 06:41:36PM +0000, David Holland wrote:
 > On Mon, Sep 01, 2008 at 08:06:58PM +0200, Havard Eidnes wrote:
 >  > I don't have any suggestions for a fix, but I may have an
 >  > observation (which I'm sure you've thought of yourself as well,
 >  > but it's good to have it written down anyway).
 > 
 > I think I've found the problem, and if so this is the fix:
 > 
 > Index: st.c
 > ===================================================================
 > RCS file: /cvsroot/src/sys/dev/scsipi/st.c,v
 > retrieving revision 1.205
 > diff -u -p -r1.205 st.c
 > --- st.c	8 Jun 2008 18:18:34 -0000	1.205
 > +++ st.c	1 Sep 2008 18:24:12 -0000
 > @@ -1125,7 +1125,7 @@ done:
 >  	/*
 >  	 * Correctly set the buf to indicate a completed xfer
 >  	 */
 > -	bp->b_resid = bp->b_bcount;
 > +	bp->b_resid = 0;
 >  	biodone(bp);
 >  	return;
 >  }

 This is wrong; This code is used in error path (b_error is not 0), and
 for the case (bp->b_bcount == 0). When b_error is not 0, bp->b_resid
 has to be equal to bp->b_bcount or a KASSERT will be triggered later.

 I think the code here does what's intended, but maybe the comment is
 misleading.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: David Holland <dholland-bugs@netbsd.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: David Holland <dholland-bugs@NetBSD.org>, Havard Eidnes <he@NetBSD.org>,
	gnats-bugs@NetBSD.org, mouse@Rodents.Montreal.QC.CA,
	kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org, yamt@NetBSD.org
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Mon, 1 Sep 2008 21:59:40 +0000

 On Mon, Sep 01, 2008 at 10:22:20PM +0200, Manuel Bouyer wrote:
  > > -	bp->b_resid = bp->b_bcount;
  > > +	bp->b_resid = 0;
  > >  	biodone(bp);
  > >  	return;
  > >  }
  > 
  > This is wrong; This code is used in error path (b_error is not 0), and
  > for the case (bp->b_bcount == 0). When b_error is not 0, bp->b_resid
  > has to be equal to bp->b_bcount or a KASSERT will be triggered later.
  > 
  > I think the code here does what's intended, but maybe the comment is
  > misleading.

 Bah.

 Yes, it does, but it's very unclear. I think I will go clarify it a
 little.

 I guess it was too much to hope it would be this simple...

 (any thoughts on the bit from kern_physio?)

 -- 
 David A. Holland
 dholland@netbsd.org

From: David Holland <dholland-bugs@netbsd.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: David Holland <dholland-bugs@NetBSD.org>, Havard Eidnes <he@NetBSD.org>,
	gnats-bugs@NetBSD.org, mouse@Rodents.Montreal.QC.CA,
	kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org, yamt@NetBSD.org
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Tue, 2 Sep 2008 05:07:44 +0000

 On Mon, Sep 01, 2008 at 09:59:40PM +0000, David Holland wrote:
  > (any thoughts on the bit from kern_physio?)

 no such luck; the value left in uio_offset is ignored.

 -- 
 David A. Holland
 dholland@netbsd.org

From: David Holland <dholland-bugs@netbsd.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: David Holland <dholland-bugs@NetBSD.org>, Havard Eidnes <he@NetBSD.org>,
	gnats-bugs@NetBSD.org, mouse@Rodents.Montreal.QC.CA,
	kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org, yamt@NetBSD.org
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Tue, 2 Sep 2008 06:54:54 +0000

 On Tue, Sep 02, 2008 at 05:07:44AM +0000, David Holland wrote:
  > no such luck; the value left in uio_offset is ignored.

 However, it looks like st.c flatly ignores the passed-in block number
 (b_blkno) and always reads whatever's under the tape head.

 So I think what's happening is that physio is firing off sixteen 64k
 reads, since PHYSIO_CONCURRENCY is 16 and MAXPHYS is 64k; each one
 reads one 10k block, positioning the tape such that the next data read
 will be in the place observed. But physio notices the short count on
 the first read and drops the other 15, so only 10k comes back from the
 read system call, and while vn_read will update the fd's seek position
 to 10240, and that position is passed down on the next read, st
 ignores it.

 This also explains why things broke going from 3.0 -> 4.0, because in
 3.0 physio didn't support having multiple requests in flight at once.

 The 1 byte extra after those sixteen reads should either be skipped
 entirely or be a 17th read generating an EIO that gets dropped,
 depending on timing. (Note that physio only checks for an error
 *before* it calls physio_wait, which might be construed as a
 shortcoming.)

 So there are at least two things wrong: (1) physio assumes b_blkno is
 honored, and st doesn't; and (2) physio assumes st will read 64k when
 asked, but it in fact apparently only reads one 10k block at a time.
 (What does it do if the tape is written in 16k blocks? Or worse, say,
 80k blocks?)

 I have no idea what the proper way to resolve these discrepancies is.
 It appears the immediate problem can be hacked around by having st
 allocate a buf and pass it to physio, because that will cause physio
 to use only that buf instead of up to PHYSIO_CONCURRENCY of its own;
 but that's hardly a fix and doesn't even cover all the possible
 failure cases.

 -- 
 David A. Holland
 dholland@netbsd.org

From: der Mouse <mouse@Rodents-Montreal.ORG>
To: David Holland <dholland-bugs@netbsd.org>, Havard Eidnes <he@netbsd.org>,
        gnats-bugs@netbsd.org, mouse@Rodents.Montreal.QC.CA,
        kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org, yamt@netbsd.org,
        Manuel Bouyer <bouyer@antioche.eu.org>
Cc: 
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Tue, 2 Sep 2008 03:42:18 -0400 (EDT)

 > However, it looks like st.c flatly ignores the passed-in block number
 > (b_blkno) and always reads whatever's under the tape head.

 I'd expect this, actually; tapes are not random access, and the Unix
 tape model treats them accordingly.  I don't consider this a problem
 any more than I consider it a problem that lseek doesn't work on tty
 devices.

 > So I think what's happening is that physio is firing off sixteen 64k
 > reads, since PHYSIO_CONCURRENCY is 16 and MAXPHYS is 64k;

 Ooh.  Yes, that explains it very neatly: where the data is going and
 why exactly 15 records get lost.

 > So there are at least two things wrong: (1) physio assumes b_blkno is
 > honored, and st doesn't; and (2) physio assumes st will read 64k when
 > asked, but it in fact apparently only reads one 10k block at a time.
 > (What does it do if the tape is written in 16k blocks?

 I would expect it to read 16k at a time.  The traditional Unix tape
 model is that a tape is a stream of records (with interspersed
 filemarks, but let's not complicate things with them), each of which
 has an inherent size recorded on the tape along with the data -
 basically, the way half-inch nine-track tapes work.  If you try to read
 R byutes and the next record's size is N bytes, then if R>=N you get
 the next record and a return value of N, while if R<N you get...I'm not
 sure, probably either an error or a return value of N with the
 remaining N-R bytes dropped.

 Some tapes don't fit this model, notably quarter-inch cartridge tapes,
 but it's still the paradigm most of the tape subsystems and drivers
 I've seen are designed around - and the model all the tape software
 I've seen expects.

 > Or worse, say, 80k blocks?)

 You lose. :(  Trying to write - or read - tape records over MAXPHYS has
 never worked, not clear back to 4.2, possibly even 4.1c.  I remember
 running into a 63K limit because of this on the VAX under 4.x.

 Well, not with tape drives fitting the nine-track model.  Classic
 quarter-inch cartridge tapes (eg, the DC600A) are streams of 512-byte
 records, with no record boundaries; trying to read or write huge
 records (eg, 1MB) just sources or sinks enough 512-byte physical
 records to satisfy the request.

 > I have no idea what the proper way to resolve these discrepancies is.
 > It appears the immediate problem can be hacked around by having st
 > allocate a buf and pass it to physio, because that will cause physio
 > to use only that buf instead of up to PHYSIO_CONCURRENCY of its own;
 > but that's hardly a fix and doesn't even cover all the possible
 > failure cases.

 I offer the thought that maybe PHYSIO_CONCURRENCY should effectively be
 forced to 1 for DV_TAPE devices (or, possibly, non-DV_DISK devices).
 The physio model of breaking big requests up into multiple concurrent
 small requests really doesn't make sense except for random-access
 devices, so maybe it shouldn't be done except for random-access
 devices.

 However, this is coming from someone who doesn't really understand
 4.0's physio paradigm, so it could be pure nonsense.

 /~\ The ASCII				der Mouse
 \ / Ribbon Campaign
  X  Against HTML		mouse@rodents-montreal.org
 / \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From: der Mouse <mouse@Rodents-Montreal.ORG>
To: David Holland <dholland-bugs@netbsd.org>, Havard Eidnes <he@netbsd.org>,
        gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
        gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, yamt@netbsd.org,
        Manuel Bouyer <bouyer@antioche.eu.org>
Cc: 
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Tue, 2 Sep 2008 23:46:26 -0400 (EDT)

 >> So I think what's happening is that physio is firing off sixteen 64k
 >> reads, since PHYSIO_CONCURRENCY is 16 and MAXPHYS is 64k;
 > Ooh.  Yes, that explains it very neatly: where the data is going and
 > why exactly 15 records get lost.

 As a quick test to see if this is what's wrong, I tried hacking
 kern_physio to force concurrency to 0 (ie, effectively set
 PHYSIO_CONCURRENCY to 1).  It reduces the problem but does not cure it;
 it now appears to read two records, rather than 16, when the big read
 is attempted (I didn't have a scratch tape handy today; this is based
 on tape block numbers reported by "mt status").

 I don't expect to be in a position to pick this up again until
 Thursday; then, I should be able to try having st.c provide a buf to
 physio....

 /~\ The ASCII				der Mouse
 \ / Ribbon Campaign
  X  Against HTML		mouse@rodents-montreal.org
 / \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From: der Mouse <mouse@Rodents-Montreal.ORG>
To: Manuel Bouyer <bouyer@antioche.eu.org>, Havard Eidnes <he@NetBSD.org>,
        gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
        gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, yamt@NetBSD.org
Cc: 
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Thu, 4 Sep 2008 15:05:11 -0400 (EDT)

 I tried (a) rolling back my concurrency change to physio and (b) making
 stread and stwrite use statically allocated structs buf for their calls
 to physio.

 The results: the same as the concurrency change to physio.  That is, it
 loses one tape record, instead of 15, when reading.  (I found a scratch
 tape, so I was able to do a proper test, which also confirmed the
 behaviour with the concurrency override in physio.)

 In my Copious Spare Time, I hope to be able to try this with some of
 the debugging printfs I see scattered around the st and scsipi code
 turned on.  If-&-when I manage that, I'll report back.

 /~\ The ASCII				der Mouse
 \ / Ribbon Campaign
  X  Against HTML		mouse@rodents-montreal.org
 / \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From: der Mouse <mouse@Rodents-Montreal.ORG>
To: Manuel Bouyer <bouyer@antioche.eu.org>, Havard Eidnes <he@NetBSD.org>,
        gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
        gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, yamt@NetBSD.org
Cc: 
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Mon, 8 Sep 2008 16:05:42 -0400 (EDT)

 Today, I tried adding printfs in stread and ststrategy.  These indicate
 that, even when stread provides a buf, physio dispatches two requests
 to the strategy routine:

 stread: offset 0 resid 1048577
 ststrategy: bcount 65536 blkno 0
 ststrategy: bcount 65536 blkno 128
 stread: offset 10240 resid 10240
 ststrategy: bcount 10240 blkno 20

 So it appears there is something wrong with physio.  After staring at
 it for a while, I came up with this

 --- /dev/fd/4	Tue Dec  9 16:18:08 2003
 +++ /dev/fd/5	Tue Dec  9 16:18:08 2003
 @@ -336,6 +336,9 @@
  			if (error) {
  				goto done_locked;
  			}
 +			if ((mbp->b_flags & B_ERROR) != 0) {
 +				goto done_locked;
 +			}
  			simple_unlock(&mbp->b_interlock);
  			if (obp != NULL) {
  				/*

 which appears to fix this.  (Otherwise, the main physio line blocks in
 physio_wait before the first request comes off the workqueue and sets
 B_ERROR in mbp, and the B_ERROR isn't noticed until after dispatching
 the second request.)  Combine this with making st.c use its own buffer
 in stread and stwrite

 --- /dev/fd/4	Tue Dec  9 16:18:08 2003
 +++ /dev/fd/5	Tue Dec  9 16:18:08 2003
 @@ -1379,8 +1379,9 @@
  stread(dev_t dev, struct uio *uio, int iomode)
  {
  	struct st_softc *st = st_cd.cd_devs[STUNIT(dev)];
 +	static struct buf buf;

 -	return (physio(ststrategy, NULL, dev, B_READ,
 +	return (physio(ststrategy, &buf, dev, B_READ,
  	    st->sc_periph->periph_channel->chan_adapter->adapt_minphys, uio));
  }

 @@ -1388,8 +1389,9 @@
  stwrite(dev_t dev, struct uio *uio, int iomode)
  {
  	struct st_softc *st = st_cd.cd_devs[STUNIT(dev)];
 +	static struct buf buf;

 -	return (physio(ststrategy, NULL, dev, B_WRITE,
 +	return (physio(ststrategy, &buf, dev, B_WRITE,
  	    st->sc_periph->periph_channel->chan_adapter->adapt_minphys, uio));
  }


 and, in initial tests, st is fine.  I'll have to run with this for a
 while to see if there are any subtle problems remaining.

 /~\ The ASCII				der Mouse
 \ / Ribbon Campaign
  X  Against HTML		mouse@rodents-montreal.org
 / \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

From: der Mouse <mouse@Rodents-Montreal.ORG>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Thu, 5 Aug 2010 23:08:42 -0400 (EDT)

 The machine behind this bug report is now at 4.0.1, with the last set
 of fixes described in the PR to physio and st, and it appears to be
 working correctly (here meaning, as expected based on the tape model
 outlined above).  If (the -current equivalent of) those changes goes
 in, I would say this PR can be closed.

 /~\ The ASCII				  Mouse
 \ / Ribbon Campaign
  X  Against HTML		mouse@rodents-montreal.org
 / \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.