NetBSD Problem Report #48550

From martin@duskware.de  Sat Jan 25 14:41:20 2014
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 5FAFDA6486
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 25 Jan 2014 14:41:20 +0000 (UTC)
Date: Sat, 25 Jan 2014 15:40:04 CET
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: panic when accessing CDs
X-Send-Pr-Version: 3.95

>Number:         48550
>Category:       kern
>Synopsis:       panic when accessing CDs
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jan 25 14:45:00 +0000 2014
>Closed-Date:    Fri Apr 18 06:25:22 +0000 2014
>Last-Modified:  Fri Apr 18 06:35:00 +0000 2014
>Originator:     Martin Husemann
>Release:        NetBSD 6.99.30
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD night-owl.duskware.de 6.99.30 NetBSD 6.99.30 (NIGHT-OWL) #222: Sat Jan 25 14:34:38 CET 2014 martin@night-owl.duskware.de:/usr/src/sys/arch/amd64/compile/NIGHT-OWL amd64
Architecture: x86_64
Machine: amd64
>Description:

When rebooting with a CD medium inside the CD drive, my kernel panics.

#0  0xffffffff804a6caf in cpu_reboot ()
#1  0xffffffff8066ac82 in vpanic ()
#2  0xffffffff8066ad3d in panic ()
#3  0xffffffff807501f3 in buf_mempoolidx ()
#4  0xffffffff80752b03 in allocbuf ()
#5  0xffffffff80752f2b in geteblk ()
#6  0xffffffff8065d4d2 in readdisklabel ()
#7  0xffffffff80221d4d in cdgetdisklabel ()
#8  0xffffffff80224fd5 in cdopen ()
#9  0xffffffff8065bdba in cdev_open ()
#10 0xffffffff8064343b in spec_open ()
#11 0xffffffff8077967f in VOP_OPEN ()
#12 0xffffffff807691b1 in vn_open ()
#13 0xffffffff8075f224 in do_open ()
#14 0xffffffff8075f378 in do_sys_openat ()
#15 0xffffffff807625d5 in sys_open ()
#16 0xffffffff806842ca in syscall ()
#17 0xffffffff801006a1 in Xsyscall ()

(gdb) p (char*)panicstr
$2 = 0xffffffff80dbe9a0 <scratchstr.10948> "buf mem pool index 12"

>How-To-Repeat:
s/a - dunno what is special, happens every time for me.

>Fix:
n/a

>Release-Note:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48550: panic when accessing CDs
Date: Wed, 29 Jan 2014 20:50:54 +0100

 Here is a better backtrace:

 #3  0xffffffff80750333 in buf_mempoolidx (size=<optimized out>, 
     size@entry=1572864) at ../../../../kern/vfs_bio.c:607
 #4  0xffffffff80752c43 in buf_roundsize (size=1572864)
     at ../../../../kern/vfs_bio.c:615
 #5  allocbuf (bp=bp@entry=0xfffffe813695eb48, size=size@entry=1572864, 
     preserve=preserve@entry=0) at ../../../../kern/vfs_bio.c:1245
 #6  0xffffffff8075306b in geteblk (size=1572864)
     at ../../../../kern/vfs_bio.c:1224
 #7  0xffffffff8065d5e2 in readdisklabel (dev=1539, 
     strat=strat@entry=0xffffffff80221391 <cdstrategy>, 
     lp=lp@entry=0xfffffe8136e0d8c0, osdep=osdep@entry=0xfffffe81070e3800)
     at ../../../../kern/subr_disk_mbr.c:435
 #8  0xffffffff80221d4d in cdgetdisklabel (cd=cd@entry=0xfffffe8136e0da80)
     at ../../../../dev/scsipi/cd.c:1766

 611     static u_long
 612     buf_roundsize(u_long size)
 613     {
 614             /* Round up to nearest power of 2 */
 615             return (1 << (buf_mempoolidx(size) + MEMPOOL_INDEX_OFFSET));
 616     }

 (gdb) p/x size
 $6 = 0x180000


 and a few frames below:
 #7  0xffffffff8065d5e2 in readdisklabel (dev=1539, 
     strat=strat@entry=0xffffffff80221391 <cdstrategy>, 
     lp=lp@entry=0xfffffe8136e0d8c0, osdep=osdep@entry=0xfffffe81070e3800)
     at ../../../../kern/subr_disk_mbr.c:435
 435             a.bp = geteblk(SCANBLOCKS * (int)lp->d_secsize);

 (gdb) p *lp
 $10 = {d_magic = 2186691927, d_type = 13, d_subtype = 0, 
   d_typename = "optical media\000\000", d_un = {
     un_d_packname = "fictitious\000\000\000\000\000", un_b = {
       un_d_boot0 = 0x6f69746974636966 <Address 0x6f69746974636966 out of bounds>, un_d_boot1 = 0x7375 <Address 0x7375 out of bounds>}, 
     un_d_pad = 8028075807037679974}, d_secsize = 524288, d_nsectors = 100, 
   d_ntracks = 1, d_ncylinders = 255355, d_secpercyl = 100, 
   d_secperunit = 25535489, d_sparespertrack = 0, d_sparespercyl = 0, 
   d_acylinders = 0, d_rpm = 300, d_interleave = 1, d_trackskew = 0, 
   d_cylskew = 0, d_headswitch = 0, d_trkseek = 0, d_flags = 33, d_drivedata = {
     0, 0, 0, 0, 0}, d_spare = {0, 0, 0, 0, 0}, d_magic2 = 2186691927, 
   d_checksum = 24808, d_npartitions = 4, d_bbsize = 0, d_sbsize = 0, 
   d_partitions = {{p_size = 25535489, p_offset = 0, __partition_u2 = {
         fsize = 0, cdsession = 0}, p_fstype = 7 '\a', p_frag = 0 '\000', 
       __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, 
       __partition_u2 = {fsize = 0, cdsession = 0}, p_fstype = 0 '\000', 
       p_frag = 0 '\000', __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, 
       p_offset = 0, __partition_u2 = {fsize = 0, cdsession = 0}, 
       p_fstype = 0 '\000', p_frag = 0 '\000', __partition_u1 = {cpg = 0, 
         sgs = 0}}, {p_size = 25535489, p_offset = 0, __partition_u2 = {
         fsize = 0, cdsession = 0}, p_fstype = 24 '\030', p_frag = 0 '\000', 
       __partition_u1 = {cpg = 0, sgs = 0}}, {p_size = 0, p_offset = 0, 
       __partition_u2 = {fsize = 0, cdsession = 0}, p_fstype = 0 '\000', 
       p_frag = 0 '\000', __partition_u1 = {cpg = 0, 
         sgs = 0}} <repeats 12 times>}}


 So it seems the generation of the fictious label has gone awfully wrong.
 The medium is some random NetBSD/alpha install medium in this case.

 Hardware in question:

 ahcisata0 at pci0 dev 31 function 2: vendor 0x8086 product 0x3b29 (rev. 0x05)
 ahcisata0: interrupting at ioapic0 pin 19
 ahcisata0: 64-bit DMA
 ahcisata0: AHCI revision 1.30, 4 ports, 32 slots, CAP 0xff20ff63<SXS,EMS,PSC,SSC
 ,PMD,ISS=0x2=Gen2,SCLO,SAL,SALP,SSS,SMPS,SSNTF,SNCQ,S64A>
 atabus0 at ahcisata0 channel 0
 atabus1 at ahcisata0 channel 1
 atabus2 at ahcisata0 channel 4
 atabus3 at ahcisata0 channel 5
 ahcisata0 port 0: device present, speed: 3.0Gb/s
 ahcisata0 port 4: device present, speed: 1.5Gb/s
 atapibus0 at atabus2: 1 targets
 cd0 at atapibus0 drive 0: <HL-DT-STDVDRAM GT30N, KZJ9CM31623, 1.01> cdrom removable
 cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
 cd0(ahcisata0:4:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100) (using DMA)


 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48550: panic when accessing CDs
Date: Tue, 4 Feb 2014 12:32:18 +0100

 The following patch fixes the issue (seems to be a strange medium/drive
 interaction) for me.

 While there it makes sure blksize/last_lba is always initialized before
 calling cd_read_capacity(), as that may leave "the defaults" untouched
 and still return 0.

 The limit to 16k is arbitrary and could be selected smaller.

 With the change I actually get the proper disklabel:

 # /dev/rcd0d:
 type: ATAPI
 disk: iso partition
 label: fictitious
 flags: removable
 bytes/sector: 2048
 sectors/track: 100
 tracks/cylinder: 1
 sectors/cylinder: 100
 cylinders: 255355
 total sectors: 25535489
 rpm: 300
 interleave: 1
 trackskew: 0
 cylinderskew: 0
 headswitch: 0           # microseconds
 track-to-track seek: 0  # microseconds
 drivedata: 0 

 4 partitions:
 #        size    offset     fstype [fsize bsize cpg/sgs]
  a:  25535489         0    ISO9660       0             # (Cyl.      0 - 255354*)
  d:  25535489         0        UDF                     # (Cyl.      0 - 255354*)


 Martin

 Index: cd.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/scsipi/cd.c,v
 retrieving revision 1.316
 diff -u -p -r1.316 cd.c
 --- cd.c	25 Oct 2013 11:35:55 -0000	1.316
 +++ cd.c	4 Feb 2014 11:25:40 -0000
 @@ -1822,7 +1822,8 @@ read_cd_capacity(struct scsipi_periph *p
  	*last_lba = _4btol(cap.addr);

  	/* blksize is 2048 for CD, but some drives give gibberish */
 -	if ((*blksize < 512) || ((*blksize & 511) != 0))
 +	if ((*blksize < 512) || ((*blksize & 511) != 0)
 +	    || (*blksize > 16*1024))
  		*blksize = 2048;	/* some drives lie ! */

  	/* recordables have READ_DISCINFO implemented */
 @@ -1874,8 +1875,8 @@ read_cd_capacity(struct scsipi_periph *p
  static u_long
  cd_size(struct cd_softc *cd, int flags)
  {
 -	u_int blksize;
 -	u_long last_lba, size;
 +	u_int blksize = 2048;
 +	u_long last_lba = 0, size;
  	int error;

  	error = read_cd_capacity(cd->sc_periph, &blksize, &last_lba);
 @@ -2978,7 +2979,7 @@ mmc_getdiscinfo(struct scsipi_periph *pe
  	struct scsipi_read_discinfo_data  di;
  	const uint32_t buffer_size = 1024;
  	uint32_t feat_tbl_len, pos;
 -	u_long   last_lba;
 +	u_long   last_lba = 0;
  	uint8_t  *buffer, *fpos;
  	int feature, last_feature, features_len, feature_cur, feature_len;
  	int lsb, msb, error, flags;

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: Manuel Bouyer <bouyer@NetBSD.org>,
	henning petersen <henning.petersen@t-online.de>
Subject: Re: kern/48550: panic when accessing CDs
Date: Wed, 19 Mar 2014 16:44:18 +0100

 As henning petersen pointed out in 48664, this is a side effect of gcc 4.8.
 Actually it turns out that the scsipi_read_cd_cap_data structure is
 misaligned on the stack (as it formally requires no alignement) by newer
 gcc and scsipi_command() doesn't seem to like this.

 This patch avoids that and makes the READ_CD_CAPACITY work:

 --8<--
 Index: scsipi_cd.h
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/scsipi/scsipi_cd.h,v
 retrieving revision 1.21
 diff -u -r1.21 scsipi_cd.h
 --- scsipi_cd.h	1 Apr 2009 12:19:04 -0000	1.21
 +++ scsipi_cd.h	19 Mar 2014 15:35:33 -0000
 @@ -285,7 +285,7 @@
  struct scsipi_read_cd_cap_data {
  	u_int8_t addr[4];
  	u_int8_t length[4];
 -} __packed;
 +} __packed __aligned(4);


  /* mod pages common to scsi and atapi */

 -->8--

 I'd suggest to add KASSERT() to scsipi_command() and check this conditions,
 if this is expected behaviour. If not, we need to search deeper in there.

 I'll commit the other parts (additonal initialization and sanity checks
 anyway). Manuel, can you comment on this?

 Martin

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48550 CVS commit: src/sys/dev/scsipi
Date: Wed, 19 Mar 2014 15:48:23 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Wed Mar 19 15:48:23 UTC 2014

 Modified Files:
 	src/sys/dev/scsipi: cd.c

 Log Message:
 PR kern/48550: additional initialization and sanity checking on the reported
 blocksize of the medium.


 To generate a diff of this commit:
 cvs rdiff -u -r1.317 -r1.318 src/sys/dev/scsipi/cd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Jonathan A. Kollasch" <jakllsch@kollasch.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48550: panic when accessing CDs
Date: Wed, 19 Mar 2014 11:13:21 -0500

 The AHCI 1.0 spec says that all PRDT data buffers must be word (16-bit)
 aligned.  Martin mentioned that his struct scsipi_read_cd_cap_data was
 at a 0x...f address.

From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48550: panic when accessing CDs
Date: Wed, 19 Mar 2014 19:00:59 +0000

 On Wed, Mar 19, 2014 at 03:45:00PM +0000, Martin Husemann wrote:
 ...
 >  As henning petersen pointed out in 48664, this is a side effect of gcc 4.8.
 >  Actually it turns out that the scsipi_read_cd_cap_data structure is
 >  misaligned on the stack (as it formally requires no alignement) by newer
 >  gcc and scsipi_command() doesn't seem to like this.
 >  
 >  This patch avoids that and makes the READ_CD_CAPACITY work:
 >  
 >  --8<--
 >  Index: scsipi_cd.h
 >  ===================================================================
 >  RCS file: /cvsroot/src/sys/dev/scsipi/scsipi_cd.h,v
 >  retrieving revision 1.21
 >  diff -u -r1.21 scsipi_cd.h
 >  --- scsipi_cd.h	1 Apr 2009 12:19:04 -0000	1.21
 >  +++ scsipi_cd.h	19 Mar 2014 15:35:33 -0000
 >  @@ -285,7 +285,7 @@
 >   struct scsipi_read_cd_cap_data {
 >   	u_int8_t addr[4];
 >   	u_int8_t length[4];
 >  -} __packed;
 >  +} __packed __aligned(4);

 ISTM that it might be worth removing the __packed and replacing
 the two fields with uint32_t - and then accessing with the correct
 endianness.

 	David

 -- 
 David Laight: david@l8s.co.uk

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@NetBSD.org, henning petersen <henning.petersen@t-online.de>
Subject: Re: kern/48550: panic when accessing CDs
Date: Thu, 20 Mar 2014 18:02:58 +0100

 On Wed, Mar 19, 2014 at 04:44:18PM +0100, Martin Husemann wrote:
 > As henning petersen pointed out in 48664, this is a side effect of gcc 4.8.
 > Actually it turns out that the scsipi_read_cd_cap_data structure is
 > misaligned on the stack (as it formally requires no alignement) by newer
 > gcc and scsipi_command() doesn't seem to like this.
 > 
 > This patch avoids that and makes the READ_CD_CAPACITY work:
 > 
 > --8<--
 > Index: scsipi_cd.h
 > ===================================================================
 > RCS file: /cvsroot/src/sys/dev/scsipi/scsipi_cd.h,v
 > retrieving revision 1.21
 > diff -u -r1.21 scsipi_cd.h
 > --- scsipi_cd.h	1 Apr 2009 12:19:04 -0000	1.21
 > +++ scsipi_cd.h	19 Mar 2014 15:35:33 -0000
 > @@ -285,7 +285,7 @@
 >  struct scsipi_read_cd_cap_data {
 >  	u_int8_t addr[4];
 >  	u_int8_t length[4];
 > -} __packed;
 > +} __packed __aligned(4);
 >  
 >  
 >  /* mod pages common to scsi and atapi */
 > 
 > -->8--
 > 
 > I'd suggest to add KASSERT() to scsipi_command() and check this conditions,
 > if this is expected behaviour. If not, we need to search deeper in there.
 > 
 > I'll commit the other parts (additonal initialization and sanity checks
 > anyway). Manuel, can you comment on this?

 This may be a limitation from the underlying driver.
 If this is the case other hardware structures needs to be audited too ...
 (or the underlying driver may need to do bounce-buffering if the
 buffer is not aligned, in case it comes from e.g. userland ?)

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

State-Changed-From-To: open->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Fri, 18 Apr 2014 06:25:22 +0000
State-Changed-Why:
Workaround commited, successor PR for the underlying issue: 48754.


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48550 CVS commit: src/sys/dev/scsipi
Date: Fri, 18 Apr 2014 06:23:32 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Fri Apr 18 06:23:32 UTC 2014

 Modified Files:
 	src/sys/dev/scsipi: cd.c

 Log Message:
 Fix PR kern/48550 by aligning the single instance of scsipi_read_cd_cap_data
 that we found misaligned in the wild so far properly for the ahcisata
 driver. Also point at PR kern/48754 for the real issue.


 To generate a diff of this commit:
 cvs rdiff -u -r1.318 -r1.319 src/sys/dev/scsipi/cd.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.