NetBSD Problem Report #39686
From www@NetBSD.org Fri Oct 3 16:38:13 2008
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 5B79763BA98
for <gnats-bugs@gnats.netbsd.org>; Fri, 3 Oct 2008 16:38:13 +0000 (UTC)
Message-Id: <20081003163813.2CD4C63B884@narn.NetBSD.org>
Date: Fri, 3 Oct 2008 16:38:13 +0000 (UTC)
From: pettai@nordu.net
Reply-To: pettai@nordu.net
To: gnats-bugs@NetBSD.org
Subject: NetBSD 4.x has I/O problems on HP Compaq DL38[0|5] w/ Smart Array 6i controller
X-Send-Pr-Version: www-1.0
>Number: 39686
>Category: kern
>Synopsis: NetBSD 4.x has I/O problems on HP/Compaq ProLiant DL380, DL385 w/ Smart Array 6i (ciss) controller
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: mhitch
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Oct 03 16:40:00 +0000 2008
>Closed-Date: Mon Nov 22 04:42:45 +0000 2010
>Last-Modified: Mon Nov 22 04:42:45 +0000 2010
>Originator: Fredrik Pettai
>Release: NetBSD 4.99.72, both i386 and amd64
>Organization:
NORDUnet A/S
>Environment:
NetBSD 4.99.72 NetBSD 4.99.72 (GENERIC) #0: Tue Sep 30 19:32:05 PDT 2008 builds@wb28:/home/builds/ab/HEAD/amd64/200809300002Z-obj/home/builds/ab/HEAD/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
This problem has been found on both HP Compaq DL 380 and DL 385 with the Smart Array 6i controller. We first tried NetBSD 4.0 RELEASE i386/amd64, which gave the same symptoms.
Boot up the installation of NetBSD 4.99.72 (w/ "no APCI" enabled or disable), and during the installation (with progress) of the NetBSD base system the installer will show "- stalled -" instead of "xyz MiB/s". It usually happens after 79%+ of progress while installing (extracting via tar) the comp.tgz and xcomp.tgz packages.
>How-To-Repeat:
The general I/O performance against the disk(controller) is extremely slow. Extracting pkgsrc.tgz takes about an hour...
>Fix:
>Release-Note:
>Audit-Trail:
From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/39686: NetBSD 4.x has I/O problems on HP Compaq DL38[0|5]
w/ Smart Array 6i controller
Date: Thu, 15 Apr 2010 15:11:30 -0600 (MDT)
On Fri, 3 Oct 2008, pettai@nordu.net wrote:
> This problem has been found on both HP Compaq DL 380 and DL 385 with the Smart Array 6i controller. We first tried NetBSD 4.0 RELEASE i386/amd64, which gave the same symptoms.
>
> Boot up the installation of NetBSD 4.99.72 (w/ "no APCI" enabled or disable), and during the installation (with progress) of the NetBSD base system the installer will show "- stalled -" instead of "xyz MiB/s". It usually happens after 79%+ of progress while installing (extracting via tar) the comp.tgz and xcomp.tgz packages.
>
>
>
>> How-To-Repeat:
> The general I/O performance against the disk(controller) is extremely slow. Extracting pkgsrc.tgz takes about an hour...
This will occur on any of the Compaq/HP servers with Smart Array
controllers that do not have a battery-backed cache with write caching
enabled. Without battery backup, the controller will do no write caching
at all, and I presume turns off any write caching on the disk drives.
This causes any writes to the logical drives to be quite slow, since the
controller will not indicate the write has completed until all the data
has been written to all the associated disk drives. I noticed this when I
was working on getting ciss(4) running on a DL360 G4 with a 5i or 6i
controller. The DL360 that I requested for a later project had the
battery-backed cache option, and write performance was much better.
The netbsd-users thread that lead to this PR,
http://mail-index.netbsd.org/netbsd-users/2008/10/01/msg002083.html ,
mentions that FreeBSD works "fine". I was never able to find anything in
the FreeBSD driver, nor the Linux driver, nor the OpenBSD driver that
could have helped in this case. I think I did do a simple test on Linux,
and found that if I wrote a very large amount of data using dd, it would
eventually get quite a bit slower. My guess at the time was that Linux
was buffering even the dd output in memory making it appear to be much
faster that the disk writes actually were.
With the mention of this problem recently, I started digging into it
again, using a ML370 G3. This machine didn't have the integrated Smart
Array, but did have a Smart Array 642 without a battery-backed cache (I
had also added a Smart Array 5300, which does have battery-backed cache).
I did verify that extracting pkgsrc.tar.gz onto a RAID1 volume did take
just over an hour, with the disks on the SA 642. With the SA 5300, the
same extract took less than 3 minutes. Then I disabled array acceleration
on that RAID1 volume, and was back to the 1 hour time to extract
pkgsrc.tar.gz.
I had a FreeBSD livecd (FreeSBIE, based on FreeBSD 6.2), and found that
the pkgsrc.tar.gz extract took about 32 minutes. While much better than
the 1 hour that NetBSD took, I wouldn't consider it good performance.
Wondering what the difference might be, I took a look at the number of
disk transfers for both NetBSD and FreeBSD, and found that NetBSD did
almost 2 times the number of transfer - which accounts for the difference
in speed.
The next think I tried was to use a log mount (WAPBL) on NetBSD [I'm
running with NetBSD-5.0_STABLE by the way]. Wow, what a difference! The
pkgsrc.tar.gz extract time was 17 minutes.
I'll be running some more tests with FreeBSD with the array acceleration
enabled to compare with the non-accelerated case, and will then probably
try the same variations with a Linux live CD I have.
So, for the best performance with ciss(4), you really need to have a
battery-backed cache option. Failing that, running NetBSD 5.0 (or later)
and using a log mounted file system should help (although the
pkgsrc.tar.gz extract is probably an extreme case where WAPBL helps
significantly).
--
Michael L. Hitch mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: pettai@nordu.net
Subject: Re: kern/39686: NetBSD 4.x has I/O problems on HP Compaq DL38[0|5]
w/ Smart Array 6i controller
Date: Sun, 8 Aug 2010 20:51:06 -0600 (MDT)
I've recently acquired a DL360 G4p with a 6i controller and no
battery-backed cache, and started looking into this again. A tech-kern
thread started by Anders Magnusson,
http://mail-index.NetBSD.org/tech-kern/2008/11/30/msg003704.html , which I
had forgotten about, had some information on a problem with ciss(4). I
have been investigating and verified that ciss(4) was only getting one
command at a time. Another test with a Linux Live CD did show me that
when Linux was doing a dd to the disk, it was getting around 150 commands
queued to the ciss adapter, and resulted in a respectable write transfer
rate (how that is actually working with no write cache is a mystery to
me).
I was able to determine that the scsipi layer was only sending one
command at a time to the adapter. It was doing this because it will only
send multiple commands to the adapter if all the commands use tagged
queueing. Commenting out the check for non-tagged commands resulted in a
much better write performance under certain conditions - specifically by
using the log option on the file system. [That will keep a lot of the
disk pages in the buffer cache, and writes them out asynchronously,
allowing multiple write commands to the disk.]
I then determined that the reason for the untagged commands was that
tagged queueing was that ciss(4) was not telling the scsipi layer to used
tagged queueing. There's a problem with doing that though - the
capabilities for the drive(s) don't include tagged queuing because the
implementation of the Inquiry command by Compaq/HP in the ciss adapter is,
uh, lacking. It sets the CmdQue bit that indicates is supports tagged
queuing, but doesn't set the version field to indicate that the additional
flags in the inquiry data is valid, and NetBSD happily ignores it. I'm
now working on getting ciss(4) to force tagged queueing, which should help
this performance problem.
--
Michael L. Hitch mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
Responsible-Changed-From-To: kern-bug-people->mhitch
Responsible-Changed-By: mhitch@NetBSD.org
Responsible-Changed-When: Mon, 09 Aug 2010 03:36:40 +0000
Responsible-Changed-Why:
I took it.
State-Changed-From-To: open->analyzed
State-Changed-By: mhitch@NetBSD.org
State-Changed-When: Mon, 09 Aug 2010 03:36:40 +0000
State-Changed-Why:
I have determined what the problem with the driver is.
From: Fredrik Pettai <pettai@nordu.net>
To: gnats-bugs@NetBSD.org
Cc: mhitch@NetBSD.org, kern-bug-people@NetBSD.org, netbsd-bugs@NetBSD.org,
gnats-admin@NetBSD.org
Subject: Re: kern/39686 (NetBSD 4.x has I/O problems on HP/Compaq ProLiant DL380, DL385 w/ Smart Array 6i (ciss) controller)
Date: Mon, 6 Sep 2010 23:50:31 +0200
I've tested mhitch@ patch, and it gives the ciss controller much better =
(and reasonable) performance.
mhitch@ please commit your patch (and request a pullup to netbsd-5).
From: "Michael L. Hitch" <mhitch@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/39686 (NetBSD 4.x has I/O problems on HP/Compaq ProLiant
DL380, DL385 w/ Smart Array 6i (ciss) controller)
Date: Mon, 6 Sep 2010 15:54:03 -0600 (MDT)
On Mon, 6 Sep 2010, Fredrik Pettai wrote:
> I've tested mhitch@ patch, and it gives the ciss controller much better =
> (and reasonable) performance.
> mhitch@ please commit your patch (and request a pullup to netbsd-5).
I'm actually working on a slightly different patch which also supresses
a error message when the kernel tries to do disk cache flushes.
--
Michael L. Hitch mhitch@NetBSD.org
From: "Michael L. Hitch" <mhitch@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/39686 CVS commit: src/sys/dev/ic
Date: Tue, 7 Sep 2010 18:19:17 +0000
Module Name: src
Committed By: mhitch
Date: Tue Sep 7 18:19:16 UTC 2010
Modified Files:
src/sys/dev/ic: ciss.c
Log Message:
Fix a performance problem with the ciss(4) driver. NetBSD does common
queueing at the scsipi midlayer, and if the midlayer is not requested to
enable tagged queueing, the midlayer will only queue one command to the
adapter driver for each device. The SmartArray adapter is capable of
handling multiple commands, and in the rather common case where there is
no battery backup and no write cache, doing single write commands is very
slow. The SmartArray adapter runs much better when several commands can
be issued to a device.
This has been observed and discussed in several list threads, notably:
http://mail-index.NetBSD.org/netbsd-users/2008/10/01/msg002083.html
http://mail-index.NetBSD.org/tech-kern/2008/11/30/msg003704.html
This also addresses PR kern/39686.
To enable tagged queueing, the adapter driver responds to the midlayer
request to set the transfer mode. However, the SmartArray does not respond
to the SCSI INQUIRY command with an ANSII field of 2 or more, so the
scsipi midlayer will ignore the CmdQue bit in the flags3 field of the
inquiry data. This fix will patch the inquiry data so set the ANSII field
to 2, and responds to the midlayer request to set the transfer mode by
requesting tagged queueing.
In addition, the original port of the driver did not set up the adapter
parameters correctly as mentioned in the second list thread mentioned
above. The adapt_openings is the total number of commands that the
adapter will accept rather than the number of commands divided by the
number of logical drives. Also, the adapt_max_periph is the maximum number
of commands which can be queued per peripheral device, not the number of
logical drives [which in the case of a single logical drive limited the
number of commands queued to 1].
I'm also suppressing an error message for invalid commands if the error
was due to the SCSI_SYNCHRONIZE_CACHE_10 command, since that command is
not supported by the SmartArray adapter, but used with wapbl(4) meta-data
journaling. Setting the ANSII version to 2 to allow enabling tagged queueing
also enables the use of the SCSI_SYNCHRONIZE_CACHE_10 command.
To generate a diff of this commit:
cvs rdiff -u -r1.22 -r1.23 src/sys/dev/ic/ciss.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Fredrik Pettai <pettai@nordu.net>
To: gnats-bugs@NetBSD.org
Cc: mhitch@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: PR/39686 CVS commit: src/sys/dev/ic
Date: Tue, 7 Sep 2010 20:41:18 +0200
Many thanks for fixing this mlitch!
Will you request a pullup to netbsd-5 and close this ticket?
From: Fredrik Pettai <pettai@nordu.net>
To: gnats-bugs@NetBSD.org
Cc: mhitch@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: PR/39686 CVS commit: src/sys/dev/ic
Date: Thu, 9 Sep 2010 14:21:20 +0200
I requested pullup to netbsd-5, as I couldn't see that it had been done.
State-Changed-From-To: analyzed->pending-pullups
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 26 Sep 2010 22:57:16 +0000
State-Changed-Why:
that's pullup-5 #1452.
From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/39686 CVS commit: [netbsd-5] src/sys/dev/ic
Date: Sun, 21 Nov 2010 21:02:50 +0000
Module Name: src
Committed By: riz
Date: Sun Nov 21 21:02:49 UTC 2010
Modified Files:
src/sys/dev/ic [netbsd-5]: ciss.c
Log Message:
Pull up following revision(s) (requested by pettai in ticket #1452):
sys/dev/ic/ciss.c: revision 1.23
Fix a performance problem with the ciss(4) driver. NetBSD does common
queueing at the scsipi midlayer, and if the midlayer is not requested to
enable tagged queueing, the midlayer will only queue one command to the
adapter driver for each device. The SmartArray adapter is capable of
handling multiple commands, and in the rather common case where there is
no battery backup and no write cache, doing single write commands is very
slow. The SmartArray adapter runs much better when several commands can
be issued to a device.
This has been observed and discussed in several list threads, notably:
http://mail-index.NetBSD.org/netbsd-users/2008/10/01/msg002083.html
http://mail-index.NetBSD.org/tech-kern/2008/11/30/msg003704.html
This also addresses PR kern/39686.
To enable tagged queueing, the adapter driver responds to the midlayer
request to set the transfer mode. However, the SmartArray does not respond
to the SCSI INQUIRY command with an ANSII field of 2 or more, so the
scsipi midlayer will ignore the CmdQue bit in the flags3 field of the
inquiry data. This fix will patch the inquiry data so set the ANSII field
to 2, and responds to the midlayer request to set the transfer mode by
requesting tagged queueing.
In addition, the original port of the driver did not set up the adapter
parameters correctly as mentioned in the second list thread mentioned
above. The adapt_openings is the total number of commands that the
adapter will accept rather than the number of commands divided by the
number of logical drives. Also, the adapt_max_periph is the maximum number
of commands which can be queued per peripheral device, not the number of
logical drives [which in the case of a single logical drive limited the
number of commands queued to 1].
I'm also suppressing an error message for invalid commands if the error
was due to the SCSI_SYNCHRONIZE_CACHE_10 command, since that command is
not supported by the SmartArray adapter, but used with wapbl(4) meta-data
journaling. Setting the ANSII version to 2 to allow enabling tagged queueing
also enables the use of the SCSI_SYNCHRONIZE_CACHE_10 command.
To generate a diff of this commit:
cvs rdiff -u -r1.14.4.1 -r1.14.4.2 src/sys/dev/ic/ciss.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 22 Nov 2010 04:42:45 +0000
State-Changed-Why:
pullup completed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.