NetBSD Problem Report #55192

From john@chi.zia.io  Tue Apr 21 10:18:21 2020
Return-Path: <john@chi.zia.io>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id D8B491A924D
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 21 Apr 2020 10:18:21 +0000 (UTC)
Message-Id: <202004211018.03LAIFZ8001217@chi.zia.io>
Date: Tue, 21 Apr 2020 10:18:15 GMT
From: john@ziaspace.com
Reply-To: john@ziaspace.com
To: gnats-bugs@NetBSD.org
Subject: mfii0 on NetBSD-9 can't seem to transfer more than 16 megabytes
X-Send-Pr-Version: 3.95

>Number:         55192
>Category:       kern
>Synopsis:       mfii0 on NetBSD-9 can't seem to transfer more than 16 megabytes
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Apr 21 10:20:00 +0000 2020
>Last-Modified:  Fri Aug 14 00:53:53 +0000 2020
>Originator:     John Klos
>Release:        NetBSD 9.0_STABLE
>Organization:

>Environment:


System: NetBSD chi.zia.io 9.0_STABLE NetBSD 9.0_STABLE (CHI) #0: Sat Feb 22 18:22:31 UTC 2020 john@chi.zia.io:/usr/obj-amd64/sys/arch/amd64/compile/CHI amd64
Architecture: x86_64
Machine: amd64
>Description:

mfii0 has issues when transferring lots of data under NetBSD 9:

daisy# dd if=/dev/rsd0d of=/dev/null bs=4m count=1
1+0 records in
1+0 records out
4194304 bytes transferred in 0.057 secs (73584280 bytes/sec)
daisy# dd if=/dev/rsd0d of=/dev/null bs=4m count=2
2+0 records in
2+0 records out
8388608 bytes transferred in 0.038 secs (220752842 bytes/sec)
daisy# dd if=/dev/rsd0d of=/dev/null bs=4m count=3
3+0 records in
3+0 records out
12582912 bytes transferred in 0.112 secs (112347428 bytes/sec)
daisy# dd if=/dev/rsd0d of=/dev/null bs=4m count=4
(the process hangs, and no more I/O can be done with mfii0)

After the transfer hangs, the system is still responsive and the kernel spits 
out lots of errors like these:

[ 366.2511098] mfii0: cmd timeout ccb 0xffff9800829ed788
[ 366.3114282] mfii0: cmd timeout ccb 0xffff9800829eda40
[ 366.3717494] mfii0: cmd timeout ccb 0xffff9800829ece78
[ 366.4320696] mfii0: cmd timeout ccb 0xffff9800829ed6a0
[ 366.4923884] mfii0: cmd timeout ccb 0xffff9800829ed3e8

Control-t shows (not from the dd command):

[ 736.7504063] load: 1.00  cmd: rsync 23 [uvn_fp2] 0.57u 2.64s 0% 8216k

Occasionally I'll see:

[ 303.0409734] mfii0: workqueue busy: updates stopped

The hardware is fine. Testing with a NetBSD 8 kernel works fine:

daisy# dd if=/dev/rsd0d of=/dev/null bs=4m count=10000
10000+0 records in
10000+0 records out
41943040000 bytes transferred in 43.908 secs (955248246 bytes/sec)

This is an amd64 Ryzen system with eight 8 TB HGST drives. The device 
configuration is the same for NetBSD 8 and NetBSD 9:

mfii0 at pci5 dev 0 function 0: "ServeRAID M5016", firmware 23.34.0-0019, 1024MB cache
mfii0: interrupting at ioapic1 pin 8
scsibus0 at mfii0: 64 targets, 8 luns per target
mfii0: physical disk inserted id 10 enclosure 252
mfii0: physical disk inserted id 11 enclosure 252
mfii0: physical disk inserted id 12 enclosure 252
mfii0: physical disk inserted id 13 enclosure 252
mfii0: physical disk inserted id 14 enclosure 252
mfii0: physical disk inserted id 15 enclosure 252
mfii0: physical disk inserted id 16 enclosure 252
mfii0: physical disk inserted id 17 enclosure 252

>How-To-Repeat:

>Fix:


>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: gnats-admin->kern-bug-people
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Fri, 14 Aug 2020 00:53:53 +0000
Responsible-Changed-Why:
fix up broken PR


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.