NetBSD Problem Report #10430

Received: (qmail 26683 invoked from network); 23 Jun 2000 16:26:22 -0000
Message-Id: <200006231626.e5NGQLh00601@edge.sky.yamashina.kyoto.jp>
Date: Sat, 24 Jun 2000 01:26:21 +0900 (JST)
From: Takahiro Kambe <taca@sky.yamashina.kyoto.jp>
Reply-To: taca@sky.yamashina.kyoto.jp
To: gnats-bugs@gnats.netbsd.org
Subject: Wd driver cannot handle bad144 table properly?
X-Send-Pr-Version: 3.95

>Number:         10430
>Category:       kern
>Synopsis:       Wd driver cannot handle bad144 table properly?
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jun 23 16:27:00 +0000 2000
>Closed-Date:    
>Last-Modified:  Sun Jun 25 23:35:00 +0000 2000
>Originator:     Takahiro Kambe
>Release:        NetBSD-current 2000/6/14
>Organization:

>Environment:

System: NetBSD edge.sky.yamashina.kyoto.jp 1.4ZD NetBSD 1.4ZD (CF-M33) #32: Thu Jun 15 08:58:49 JST 2000 taca@edge.sky.yamashina.kyoto.jp:/usr/src/sys/arch/i386/compile/CF-M33 i386


>Description:
	Wd driver cannot handle bad144 talbe properly?
	In my experience, kernel compalins bad sector table is correupted
	when hadrdisk has more than 1024 cylinders.

wd0: bad sector table corrupted
wd0: bad sector table corrupted
boot device: wd0
root on wd0a dumps on wd0b
wd0: bad sector table corrupted
wd0: bad sector table corrupted
wd0: bad sector table corrupted
wd0: bad sector table corrupted

	Maybe, first two of "corrupted" messages are looking for root
	and dump device.  Rests are four partiton in wd0.

>How-To-Repeat:
	1. IDE hard disk (wd1) which has more than 1024 cylinders.
	2. Add "badsect" flag to the disklabel.
	3. Initialize bad144 table with bad144(8).

		# bad144 wd1 1234

	4. Reboot the system and see the boot message.

>Fix:
	Unknown.
>Release-Note:
>Audit-Trail:

From: Manuel Bouyer <bouyer@antioche.lip6.fr>
To: Takahiro Kambe <taca@sky.yamashina.kyoto.jp>
Cc: gnats-bugs@gnats.netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
Date: Fri, 23 Jun 2000 19:19:33 +0200

 On Sat, Jun 24, 2000 at 01:26:21AM +0900, Takahiro Kambe wrote:
 > 
 > 
 > >Description:
 > 	Wd driver cannot handle bad144 talbe properly?
 > 	In my experience, kernel compalins bad sector table is correupted
 > 	when hadrdisk has more than 1024 cylinders.

 This is possible, bad144 was created for disks that don't have automatic
 bad sectors forwarding. All newer disks have it, which is probably why
 bad144 wasn't updated for cyl > 1024 (it may also have issues 
 for disks > 8G as well :) It should at last be converted to use LBA.

 --
 Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
 --

From: Takahiro Kambe <taca@sky.yamashina.kyoto.jp>
To: bouyer@antioche.lip6.fr
Cc: gnats-bugs@gnats.netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
Date: Sat, 24 Jun 2000 07:01:01 +0900

 In message <20000623191933.A4345@antioche.lip6.fr>
 	on Fri, 23 Jun 2000 19:19:33 +0200,
 	Manuel Bouyer <bouyer@antioche.lip6.fr> wrote:
 > This is possible, bad144 was created for disks that don't have automatic
 > bad sectors forwarding. All newer disks have it, which is probably why
 Yes, bad144 isn't powerful since it supports very limited number of
 bad sectors. 

 But bad144 is still useful when one bad sector was created by sudden
 poewr failure, in my little experience.

 > bad144 wasn't updated for cyl > 1024 (it may also have issues 
 > for disks > 8G as well :) It should at last be converted to use LBA.
 It should be fixed (I hope this), or note the limitation on manual
 page.

 Cheers.

 --
 Takahiro Kambe <taca@sky.yamashina.kyoto.jp>

From: woods@weird.com (Greg A. Woods)
To: gnats-bugs@gnats.netbsd.org, netbsd-bugs@netbsd.org
Cc:  
Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
Date: Sat, 24 Jun 2000 13:59:09 -0400 (EDT)

 [ On Saturday, June 24, 2000 at 07:01:01 (+0900), Takahiro Kambe wrote: ]
 > Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
 >
 > But bad144 is still useful when one bad sector was created by sudden
 > poewr failure, in my little experience.

 I'm not sure I understand how this can be.

 If the sector has been damaged in such a way that it can no longer be
 written to then it should be automatically reallocated by the drive when
 it fails to verify after being written to (perhaps after several
 retries).

 I'm not sure what happens with the 'wd' drives these days but on SCSI
 drives one often has to turn on the AWRE (and ARRE) flags because
 they're usually not on in the factory default configuration (nor do any
 NetBSD SCSI drivers turn them on automatically so far as I know).
 Perhaps there's a similar "feature" on 'wd' drives?

 On the other hand if nothing's really wrong with the physical sector
 then all you have to do is write the correct data back to it.
 Allocating it as a "bad" sector is, it seems, counter-productive.

 -- 
 							Greg A. Woods

 +1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
 Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>

From: John Hawkinson <jhawk@MIT.EDU>
To: woods@weird.com (Greg A. Woods)
Cc: gnats-bugs@gnats.netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
Date: Sat, 24 Jun 2000 14:43:17 -0400 (EDT)

 It is unreasonable to expect automatic bad block mapping
 of drives to work reliably in all cases:

    a)	  bad-block maps can fill if disks are very bad. It is
           still desirable to be able to use NetBSD on those disks,
 	  and bad-block mapping should be possible at a higher layer.

    b)	  bad-block mapping by the hardware is transparent and there is
 	  little visiblity into it. When it fails (i.e. one continually
 	  gets read errors on a given block), it is hard to know if the
 	  bad-block mapping is broken, or if a) is in force, or some
 	  unknown. Again, it is desirable to be able to recover from
 	  this case.

    c)	  "Trust no one"; "Be liberal in what you accept and consrevative
 	  in what you send."

 bad144 should work in all cases, for these reasons.

 --jhawk
   (who has had to use multiple nested ccds to make "holes" in his IDE disks
   to avoid bad blocks because 1) bad144 didn't work 2) the drive did not
   auto-remap, and is therefore bitter.)

From: woods@weird.com (Greg A. Woods)
To: gnats-bugs@gnats.netbsd.org, netbsd-bugs@netbsd.org
Cc:  
Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
Date: Sat, 24 Jun 2000 16:30:15 -0400 (EDT)

 [ On Saturday, June 24, 2000 at 14:43:17 (-0400), John Hawkinson wrote: ]
 > Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
 >
 > It is unreasonable to expect automatic bad block mapping
 > of drives to work reliably in all cases:
 > 
 >    a)	  bad-block maps can fill if disks are very bad. It is
 >           still desirable to be able to use NetBSD on those disks,
 > 	  and bad-block mapping should be possible at a higher layer.

 Yeah, that's certainly possible -- time to toss the disk in the garbage
 compactor at that point though....

 >    b)	  bad-block mapping by the hardware is transparent and there is
 > 	  little visiblity into it. When it fails (i.e. one continually
 > 	  gets read errors on a given block), it is hard to know if the
 > 	  bad-block mapping is broken, or if a) is in force, or some
 > 	  unknown. Again, it is desirable to be able to recover from
 > 	  this case.

 yes, possible too, but again it's an indication of broken hardware (or
 at least buggy firmware on the hardware, which ammounts to the same
 thing) that desparately needs to be replaced.

 This is pretty simple, but very low level, data integrity we're talking
 about here -- you might as well use water-based inks on papyrus in a
 monsoon if you don't replace known-to-be-broken hardware....

 >    c)	  "Trust no one"; "Be liberal in what you accept and consrevative
 > 	  in what you send."
 > 
 > bad144 should work in all cases, for these reasons.

 I'm still not convinced.

 (It should still work for those devices which cannot by design remap...)

 > --jhawk
 >   (who has had to use multiple nested ccds to make "holes" in his IDE disks
 >   to avoid bad blocks because 1) bad144 didn't work 2) the drive did not
 >   auto-remap, and is therefore bitter.)

 Egads!  What masterful hackery!  ;-)

 -- 
 							Greg A. Woods

 +1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
 Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>

From: Manuel Bouyer <bouyer@antioche.lip6.fr>
To: John Hawkinson <jhawk@MIT.EDU>
Cc: "Greg A. Woods" <woods@weird.com>, gnats-bugs@gnats.netbsd.org,
   netbsd-bugs@netbsd.org
Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
Date: Sun, 25 Jun 2000 14:39:37 +0200

 On Sat, Jun 24, 2000 at 02:43:17PM -0400, John Hawkinson wrote:
 > It is unreasonable to expect automatic bad block mapping
 > of drives to work reliably in all cases:
 > 
 >    a)	  bad-block maps can fill if disks are very bad. It is
 >           still desirable to be able to use NetBSD on those disks,
 > 	  and bad-block mapping should be possible at a higher layer.

 I consider that when the bad block map is filled, the drive is dead (if it has
 so much bad blocks then there's no reasons for it to stop getting new ones).

 > 
 >    b)	  bad-block mapping by the hardware is transparent and there is
 > 	  little visiblity into it. When it fails (i.e. one continually

 This is what SMART is for IDE drives, I'm not sure about SCSI.

 > 	  gets read errors on a given block), it is hard to know if the
 > 	  bad-block mapping is broken, or if a) is in force, or some
 > 	  unknown. Again, it is desirable to be able to recover from
 > 	  this case.
 > 
 >    c)	  "Trust no one"; "Be liberal in what you accept and consrevative
 > 	  in what you send."
 > 
 > bad144 should work in all cases, for these reasons.

 Then we should port bad144 to SCSI and other supported disk types ?
 Also note that there's severe performances penalty with bad144, as it has
 to work in single-sector mode.

 --
 Manuel Bouyer <bouyer@antioche.eu.org>
 --

From: Takahiro Kambe <taca@sky.yamashina.kyoto.jp>
To: bouyer@antioche.lip6.fr
Cc: jhawk@MIT.EDU, woods@weird.com, gnats-bugs@gnats.netbsd.org,
   netbsd-bugs@netbsd.org
Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
Date: Sun, 25 Jun 2000 22:13:58 +0900

 In message <20000625143937.A513@antioche.eu.org>
 	on Sun, 25 Jun 2000 14:39:37 +0200,
 	Manuel Bouyer <bouyer@antioche.lip6.fr> wrote:
 > > bad144 should work in all cases, for these reasons.
 > 
 > Then we should port bad144 to SCSI and other supported disk types ?
 I have a adapter which attach an IDE hard disk to SCSI interface.
 It confirms SCSI2 but I don't think SCSI's reassign block command
 works with it.

 So, bad144 might be useful but I also think we might create another
 type of software bad block mapping scheme.

 > Also note that there's severe performances penalty with bad144, as it has
 > to work in single-sector mode.
 Dose it need to work in single-sector mode all that time?

 --
 Takahiro Kambe <taca@sky.yamashina.kyoto.jp>

From: woods@weird.com (Greg A. Woods)
To: gnats-bugs@gnats.netbsd.org, netbsd-bugs@netbsd.org
Cc:  
Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
Date: Sun, 25 Jun 2000 12:01:33 -0400 (EDT)

 [ On Sunday, June 25, 2000 at 22:13:58 (+0900), Takahiro Kambe wrote: ]
 > Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
 >
 > In message <20000625143937.A513@antioche.eu.org>
 > 	on Sun, 25 Jun 2000 14:39:37 +0200,
 > 	Manuel Bouyer <bouyer@antioche.lip6.fr> wrote:
 > > 
 > > Then we should port bad144 to SCSI and other supported disk types ?
 >
 > I have a adapter which attach an IDE hard disk to SCSI interface.
 > It confirms SCSI2 but I don't think SCSI's reassign block command
 > works with it.
 > 
 > So, bad144 might be useful but I also think we might create another
 > type of software bad block mapping scheme.

 Indeed I have a couple of ESDI-SCSI interfaces (both by Emulex, IIRC,
 one 2-LUN unit from a Sun-3, and one 4-LUN unit from a 3B2).

 ESDI drives do need manual re-mapping of bad sectors, and IIRC bad144 is
 useful to work around a bad sector or two so that you can bridge over to
 a time when you can afford to re-format the disk with a new bad-sector
 map.

 I'm not sure what this means though....

 I should also note that I've had some luck with some types of bad
 sectors that I can simply gather up into a file I usually call /BAD on
 the filesystem.  So long as you don't do full low-level dumps of the
 disk and you can keep fsck from trying to read them, this is a workable
 way of hiding such things.

 A generic OS-level bad-block handling tool should not be called bad144
 though -- IIRC that name relates directly back to the interface in
 ST-506 drives and isn't even correct for ESDI.

 I really liked the generic "hdelog" device driver and "hdelogger",
 "hdeadd", and "hdefix" programs from AT&T UNIX.  Hard disk errors during
 normal multi-user mode were automatically caught and logged and often
 automatically remapped too.

 -- 
 							Greg A. Woods

 +1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
 Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>

From: "Charles M. Hannum" <root@ihack.net>
To: woods@weird.com
Cc: gnats-bugs@gnats.netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
Date: Sun, 25 Jun 2000 12:00:17 -0400 (EDT)

 > A generic OS-level bad-block handling tool should not be called bad144
 > though -- IIRC that name relates directly back to the interface in
 > ST-506 drives and isn't even correct for ESDI.

 Are you just making this stuff up??

 From the man page:

 `The format of the information is specified by DEC standard 144, as
 follows.'

 It has absolutely NOTHING to do with ST-506 or ESDI.


From: Takahiro Kambe <taca@sky.yamashina.kyoto.jp>
To: gnats-bugs@gnats.netbsd.org, netbsd-bugs@netbsd.org
Cc:  
Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
Date: Mon, 26 Jun 2000 01:10:18 +0900

 In message <20000625160133.47BF5E2@proven.weird.com>
 	on Sun, 25 Jun 2000 12:01:33 -0400 (EDT),
 	woods@weird.com (Greg A. Woods) wrote:
 > I should also note that I've had some luck with some types of bad
 > sectors that I can simply gather up into a file I usually call /BAD on
 > the filesystem.  So long as you don't do full low-level dumps of the
 > disk and you can keep fsck from trying to read them, this is a workable
 > way of hiding such things.
 It was lucky that those bad sectors didn't exist in swap area.  :-)

 --
 Takahiro Kambe <taca@sky.yamashina.kyoto.jp>

From: Manuel Bouyer <bouyer@antioche.lip6.fr>
To: Takahiro Kambe <taca@sky.yamashina.kyoto.jp>
Cc: jhawk@MIT.EDU, woods@weird.com, gnats-bugs@gnats.netbsd.org,
   netbsd-bugs@netbsd.org
Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
Date: Sun, 25 Jun 2000 20:53:28 +0200

 On Sun, Jun 25, 2000 at 10:13:58PM +0900, Takahiro Kambe wrote:
 > I have a adapter which attach an IDE hard disk to SCSI interface.
 > It confirms SCSI2 but I don't think SCSI's reassign block command
 > works with it.
 > 
 > So, bad144 might be useful but I also think we might create another
 > type of software bad block mapping scheme.

 Then I think it should not be done in disk drivers, but at a upper level.

 > 
 > > Also note that there's severe performances penalty with bad144, as it has
 > > to work in single-sector mode.
 > Dose it need to work in single-sector mode all that time?

 No, only when in the area of the bad sector.

 --
 Manuel Bouyer <bouyer@antioche.eu.org>
 --

From: Takahiro Kambe <taca@sky.yamashina.kyoto.jp>
To: bouyer@antioche.lip6.fr
Cc: jhawk@MIT.EDU, woods@weird.com, gnats-bugs@gnats.netbsd.org,
   netbsd-bugs@netbsd.org
Subject: Re: kern/10430: Wd driver cannot handle bad144 table properly?
Date: Mon, 26 Jun 2000 08:34:09 +0900

 In message <20000625205328.A435@antioche.eu.org>
 	on Sun, 25 Jun 2000 20:53:28 +0200,
 	Manuel Bouyer <bouyer@antioche.lip6.fr> wrote:
 > > So, bad144 might be useful but I also think we might create another
 > > type of software bad block mapping scheme.
 > 
 > Then I think it should not be done in disk drivers, but at a upper level.
 I agree.  But in short term, current bad144 should be fixed if there
 isn't much difficulty.

 > > > Also note that there's severe performances penalty with bad144, as it has
 > > > to work in single-sector mode.
 > > Dose it need to work in single-sector mode all that time?
 > 
 > No, only when in the area of the bad sector.
 Then performances penalty isn't so severe, since it happens within 126
 sectors in a drive.

 --
 Takahiro Kambe <taca@sky.yamashina.kyoto.jp>
>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.