NetBSD Problem Report #39686

From www@NetBSD.org  Fri Oct  3 16:38:13 2008
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 5B79763BA98
	for <gnats-bugs@gnats.netbsd.org>; Fri,  3 Oct 2008 16:38:13 +0000 (UTC)
Message-Id: <20081003163813.2CD4C63B884@narn.NetBSD.org>
Date: Fri,  3 Oct 2008 16:38:13 +0000 (UTC)
From: pettai@nordu.net
Reply-To: pettai@nordu.net
To: gnats-bugs@NetBSD.org
Subject: NetBSD 4.x has I/O problems on HP Compaq DL38[0|5] w/ Smart Array 6i controller
X-Send-Pr-Version: www-1.0

>Number:         39686
>Category:       kern
>Synopsis:       NetBSD 4.x has I/O problems on HP/Compaq ProLiant DL380, DL385 w/ Smart Array 6i (ciss) controller
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    mhitch
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Oct 03 16:40:00 +0000 2008
>Closed-Date:    Mon Nov 22 04:42:45 +0000 2010
>Last-Modified:  Mon Nov 22 04:42:45 +0000 2010
>Originator:     Fredrik Pettai
>Release:        NetBSD 4.99.72, both i386 and amd64
>Organization:
NORDUnet A/S
>Environment:
NetBSD  4.99.72 NetBSD 4.99.72 (GENERIC) #0: Tue Sep 30 19:32:05 PDT 2008  builds@wb28:/home/builds/ab/HEAD/amd64/200809300002Z-obj/home/builds/ab/HEAD/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
This problem has been found on both HP Compaq DL 380 and DL 385 with the Smart Array 6i controller. We first tried NetBSD 4.0 RELEASE i386/amd64, which gave the same symptoms.

Boot up the installation of NetBSD 4.99.72 (w/ "no APCI" enabled or disable), and during the installation (with progress) of the NetBSD base system the installer will show "- stalled -" instead of "xyz MiB/s". It usually happens after 79%+ of progress while installing (extracting via tar) the comp.tgz and xcomp.tgz packages.



>How-To-Repeat:
The general I/O performance against the disk(controller) is extremely slow. Extracting pkgsrc.tgz takes about an hour...
>Fix:

>Release-Note:

>Audit-Trail:

From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/39686: NetBSD 4.x has I/O problems on HP Compaq DL38[0|5]
 w/ Smart Array 6i controller
Date: Thu, 15 Apr 2010 15:11:30 -0600 (MDT)

 On Fri, 3 Oct 2008, pettai@nordu.net wrote:

 > This problem has been found on both HP Compaq DL 380 and DL 385 with the Smart Array 6i controller. We first tried NetBSD 4.0 RELEASE i386/amd64, which gave the same symptoms.
 >
 > Boot up the installation of NetBSD 4.99.72 (w/ "no APCI" enabled or disable), and during the installation (with progress) of the NetBSD base system the installer will show "- stalled -" instead of "xyz MiB/s". It usually happens after 79%+ of progress while installing (extracting via tar) the comp.tgz and xcomp.tgz packages.
 >
 >
 >
 >> How-To-Repeat:
 > The general I/O performance against the disk(controller) is extremely slow. Extracting pkgsrc.tgz takes about an hour...

    This will occur on any of the Compaq/HP servers with Smart Array 
 controllers that do not have a battery-backed cache with write caching 
 enabled.  Without battery backup, the controller will do no write caching 
 at all, and I presume turns off any write caching on the disk drives. 
 This causes any writes to the logical drives to be quite slow, since the 
 controller will not indicate the write has completed until all the data 
 has been written to all the associated disk drives.  I noticed this when I 
 was working on getting ciss(4) running on a DL360 G4 with a 5i or 6i 
 controller.  The DL360 that I requested for a later project had the 
 battery-backed cache option, and write performance was much better.

    The netbsd-users thread that lead to this PR, 
 http://mail-index.netbsd.org/netbsd-users/2008/10/01/msg002083.html , 
 mentions that FreeBSD works "fine".  I was never able to find anything in 
 the FreeBSD driver, nor the Linux driver, nor the OpenBSD driver that 
 could have helped in this case.  I think I did do a simple test on Linux, 
 and found that if I wrote a very large amount of data using dd, it would 
 eventually get quite a bit slower.  My guess at the time was that Linux 
 was buffering even the dd output in memory making it appear to be much 
 faster that the disk writes actually were.

    With the mention of this problem recently, I started digging into it 
 again, using a ML370 G3.  This machine didn't have the integrated Smart 
 Array, but did have a Smart Array 642 without a battery-backed cache (I 
 had also added a Smart Array 5300, which does have battery-backed cache).

    I did verify that extracting pkgsrc.tar.gz onto a RAID1 volume did take 
 just over an hour, with the disks on the SA 642.  With the SA 5300, the 
 same extract took less than 3 minutes.  Then I disabled array acceleration 
 on that RAID1 volume, and was back to the 1 hour time to extract 
 pkgsrc.tar.gz.

    I had a FreeBSD livecd (FreeSBIE, based on FreeBSD 6.2), and found that 
 the pkgsrc.tar.gz extract took about 32 minutes.  While much better than 
 the 1 hour that NetBSD took, I wouldn't consider it good performance. 
 Wondering what the difference might be, I took a look at the number of 
 disk transfers for both NetBSD and FreeBSD, and found that NetBSD did 
 almost 2 times the number of transfer - which accounts for the difference 
 in speed.

    The next think I tried was to use a log mount (WAPBL) on NetBSD [I'm 
 running with NetBSD-5.0_STABLE by the way].  Wow, what a difference!  The 
 pkgsrc.tar.gz extract time was 17 minutes.

    I'll be running some more tests with FreeBSD with the array acceleration 
 enabled to compare with the non-accelerated case, and will then probably 
 try the same variations with a Linux live CD I have.

    So, for the best performance with ciss(4), you really need to have a 
 battery-backed cache option.  Failing that, running NetBSD 5.0 (or later) 
 and using a log mounted file system should help (although the 
 pkgsrc.tar.gz extract is probably an extreme case where WAPBL helps 
 significantly).

 --
 Michael L. Hitch			mhitch@montana.edu
 Computer Consultant
 Information Technology Center
 Montana State University	Bozeman, MT	USA

From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: pettai@nordu.net
Subject: Re: kern/39686: NetBSD 4.x has I/O problems on HP Compaq DL38[0|5]
 w/ Smart Array 6i controller
Date: Sun, 8 Aug 2010 20:51:06 -0600 (MDT)

    I've recently acquired a DL360 G4p with a 6i controller and no 
 battery-backed cache, and started looking into this again.  A tech-kern 
 thread started by Anders Magnusson,
 http://mail-index.NetBSD.org/tech-kern/2008/11/30/msg003704.html , which I 
 had forgotten about, had some information on a problem with ciss(4).  I 
 have been investigating and verified that ciss(4) was only getting one 
 command at a time.  Another test with a Linux Live CD did show me that 
 when Linux was doing a dd to the disk, it was getting around 150 commands 
 queued to the ciss adapter, and resulted in a respectable write transfer 
 rate (how that is actually working with no write cache is a mystery to 
 me).

    I was able to determine that the scsipi layer was only sending one 
 command at a time to the adapter.  It was doing this because it will only 
 send multiple commands to the adapter if all the commands use tagged 
 queueing.  Commenting out the check for non-tagged commands resulted in a 
 much better write performance under certain conditions - specifically by 
 using the log option on the file system.  [That will keep a lot of the 
 disk pages in the buffer cache, and writes them out asynchronously, 
 allowing multiple write commands to the disk.]

    I then determined that the reason for the untagged commands was that 
 tagged queueing was that ciss(4) was not telling the scsipi layer to used 
 tagged queueing.  There's a problem with doing that though - the 
 capabilities for the drive(s) don't include tagged queuing because the 
 implementation of the Inquiry command by Compaq/HP in the ciss adapter is, 
 uh, lacking.  It sets the CmdQue bit that indicates is supports tagged 
 queuing, but doesn't set the version field to indicate that the additional 
 flags in the inquiry data is valid, and NetBSD happily ignores it.  I'm 
 now working on getting ciss(4) to force tagged queueing, which should help 
 this performance problem.

 --
 Michael L. Hitch			mhitch@montana.edu
 Computer Consultant
 Information Technology Center
 Montana State University	Bozeman, MT	USA

Responsible-Changed-From-To: kern-bug-people->mhitch
Responsible-Changed-By: mhitch@NetBSD.org
Responsible-Changed-When: Mon, 09 Aug 2010 03:36:40 +0000
Responsible-Changed-Why:
I took it.


State-Changed-From-To: open->analyzed
State-Changed-By: mhitch@NetBSD.org
State-Changed-When: Mon, 09 Aug 2010 03:36:40 +0000
State-Changed-Why:
I have determined what the problem with the driver is.


From: Fredrik Pettai <pettai@nordu.net>
To: gnats-bugs@NetBSD.org
Cc: mhitch@NetBSD.org, kern-bug-people@NetBSD.org, netbsd-bugs@NetBSD.org,
        gnats-admin@NetBSD.org
Subject: Re: kern/39686 (NetBSD 4.x has I/O problems on HP/Compaq ProLiant DL380, DL385 w/ Smart Array 6i (ciss) controller)
Date: Mon, 6 Sep 2010 23:50:31 +0200

 I've tested mhitch@ patch, and it gives the ciss controller much better =
 (and reasonable) performance.
 mhitch@ please commit your patch (and request a pullup to netbsd-5).

From: "Michael L. Hitch" <mhitch@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/39686 (NetBSD 4.x has I/O problems on HP/Compaq ProLiant
 DL380, DL385 w/ Smart Array 6i (ciss) controller)
Date: Mon, 6 Sep 2010 15:54:03 -0600 (MDT)

 On Mon, 6 Sep 2010, Fredrik Pettai wrote:

 > I've tested mhitch@ patch, and it gives the ciss controller much better =
 > (and reasonable) performance.
 > mhitch@ please commit your patch (and request a pullup to netbsd-5).

    I'm actually working on a slightly different patch which also supresses 
 a error message when the kernel tries to do disk cache flushes.

 --
 Michael L. Hitch			mhitch@NetBSD.org

From: "Michael L. Hitch" <mhitch@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/39686 CVS commit: src/sys/dev/ic
Date: Tue, 7 Sep 2010 18:19:17 +0000

 Module Name:	src
 Committed By:	mhitch
 Date:		Tue Sep  7 18:19:16 UTC 2010

 Modified Files:
 	src/sys/dev/ic: ciss.c

 Log Message:
 Fix a performance problem with the ciss(4) driver.  NetBSD does common
 queueing at the scsipi midlayer, and if the midlayer is not requested to
 enable tagged queueing, the midlayer will only queue one command to the
 adapter driver for each device.  The SmartArray adapter is capable of
 handling multiple commands, and in the rather common case where there is
 no battery backup and no write cache, doing single write commands is very
 slow.  The SmartArray adapter runs much better when several commands can
 be issued to a device.

 This has been observed and discussed in several list threads, notably:
 http://mail-index.NetBSD.org/netbsd-users/2008/10/01/msg002083.html
 http://mail-index.NetBSD.org/tech-kern/2008/11/30/msg003704.html

 This also addresses PR kern/39686.

 To enable tagged queueing, the adapter driver responds to the midlayer
 request to set the transfer mode.  However, the SmartArray does not respond
 to the SCSI INQUIRY command with an ANSII field of 2 or more, so the
 scsipi midlayer will ignore the CmdQue bit in the flags3 field of the
 inquiry data.  This fix will patch the inquiry data so set the ANSII field
 to 2, and responds to the midlayer request to set the transfer mode by
 requesting tagged queueing.

 In addition, the original port of the driver did not set up the adapter
 parameters correctly as mentioned in the second list thread mentioned
 above.  The adapt_openings is the total number of commands that the
 adapter will accept rather than the number of commands divided by the
 number of logical drives.  Also, the adapt_max_periph is the maximum number
 of commands which can be queued per peripheral device, not the number of
 logical drives [which in the case of a single logical drive limited the
 number of commands queued to 1].

 I'm also suppressing an error message for invalid commands if the error
 was due to the SCSI_SYNCHRONIZE_CACHE_10 command, since that command is
 not supported by the SmartArray adapter, but used with wapbl(4) meta-data
 journaling.  Setting the ANSII version to 2 to allow enabling tagged queueing
 also enables the use of the SCSI_SYNCHRONIZE_CACHE_10 command.


 To generate a diff of this commit:
 cvs rdiff -u -r1.22 -r1.23 src/sys/dev/ic/ciss.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Fredrik Pettai <pettai@nordu.net>
To: gnats-bugs@NetBSD.org
Cc: mhitch@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: PR/39686 CVS commit: src/sys/dev/ic
Date: Tue, 7 Sep 2010 20:41:18 +0200

 Many thanks for fixing this mlitch!
 Will you request a pullup to netbsd-5 and close this ticket?


From: Fredrik Pettai <pettai@nordu.net>
To: gnats-bugs@NetBSD.org
Cc: mhitch@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: PR/39686 CVS commit: src/sys/dev/ic
Date: Thu, 9 Sep 2010 14:21:20 +0200

 I requested pullup to netbsd-5, as I couldn't see that it had been done.

State-Changed-From-To: analyzed->pending-pullups
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 26 Sep 2010 22:57:16 +0000
State-Changed-Why:
that's pullup-5 #1452.


From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/39686 CVS commit: [netbsd-5] src/sys/dev/ic
Date: Sun, 21 Nov 2010 21:02:50 +0000

 Module Name:	src
 Committed By:	riz
 Date:		Sun Nov 21 21:02:49 UTC 2010

 Modified Files:
 	src/sys/dev/ic [netbsd-5]: ciss.c

 Log Message:
 Pull up following revision(s) (requested by pettai in ticket #1452):
 	sys/dev/ic/ciss.c: revision 1.23
 Fix a performance problem with the ciss(4) driver.  NetBSD does common
 queueing at the scsipi midlayer, and if the midlayer is not requested to
 enable tagged queueing, the midlayer will only queue one command to the
 adapter driver for each device.  The SmartArray adapter is capable of
 handling multiple commands, and in the rather common case where there is
 no battery backup and no write cache, doing single write commands is very
 slow.  The SmartArray adapter runs much better when several commands can
 be issued to a device.
 This has been observed and discussed in several list threads, notably:
 http://mail-index.NetBSD.org/netbsd-users/2008/10/01/msg002083.html
 http://mail-index.NetBSD.org/tech-kern/2008/11/30/msg003704.html
 This also addresses PR kern/39686.
 To enable tagged queueing, the adapter driver responds to the midlayer
 request to set the transfer mode.  However, the SmartArray does not respond
 to the SCSI INQUIRY command with an ANSII field of 2 or more, so the
 scsipi midlayer will ignore the CmdQue bit in the flags3 field of the
 inquiry data.  This fix will patch the inquiry data so set the ANSII field
 to 2, and responds to the midlayer request to set the transfer mode by
 requesting tagged queueing.
 In addition, the original port of the driver did not set up the adapter
 parameters correctly as mentioned in the second list thread mentioned
 above.  The adapt_openings is the total number of commands that the
 adapter will accept rather than the number of commands divided by the
 number of logical drives.  Also, the adapt_max_periph is the maximum number
 of commands which can be queued per peripheral device, not the number of
 logical drives [which in the case of a single logical drive limited the
 number of commands queued to 1].
 I'm also suppressing an error message for invalid commands if the error
 was due to the SCSI_SYNCHRONIZE_CACHE_10 command, since that command is
 not supported by the SmartArray adapter, but used with wapbl(4) meta-data
 journaling.  Setting the ANSII version to 2 to allow enabling tagged queueing
 also enables the use of the SCSI_SYNCHRONIZE_CACHE_10 command.


 To generate a diff of this commit:
 cvs rdiff -u -r1.14.4.1 -r1.14.4.2 src/sys/dev/ic/ciss.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 22 Nov 2010 04:42:45 +0000
State-Changed-Why:
pullup completed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.