NetBSD Problem Report #53807
From ryo_on@yk.rim.or.jp Fri Dec 21 16:20:23 2018
Return-Path: <ryo_on@yk.rim.or.jp>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 4B55A7A189
for <gnats-bugs@gnats.NetBSD.org>; Fri, 21 Dec 2018 16:20:23 +0000 (UTC)
Message-Id: <43LsN84RG1z1XRk1d@mail.SiriusCloud.jp>
Date: Sat, 22 Dec 2018 00:02:56 +0900
From: ryoon@NetBSD.org
Reply-To: ryoon@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web Service EC2 c5.large instance causes kernel panic
X-Send-Pr-Version: 3.95
>Number: 53807
>Category: port-amd64
>Synopsis: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web Service EC2 c5.large instance causes kernel panic
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-amd64-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Dec 21 16:25:00 +0000 2018
>Closed-Date: Sun Dec 29 08:18:01 +0000 2019
>Last-Modified: Sun Dec 29 08:18:01 +0000 2019
>Originator: Ryo ONODERA
>Release: NetBSD 8.99.28
>Organization:
>Environment:
System: NetBSD brownie 8.99.28 NetBSD 8.99.28 (DTRACE7) #9: Fri Dec 21 18:26:39 JST 2018 ryoon@brownie:/usr/world/8.99/amd64/obj/sys/arch/amd64/compile/DTRACE7 amd64
Architecture: x86_64
Machine: amd64
>Description:
NetBSD/amd64 8.99.28 on Amazon Web Service (AWS) EC2 c5.large instance gets
kernel panic in ena(4) detection.
I have added the following line to my kernel configuration file:
ena* at pci? dev ? function ? # Amazon Elastic Network Adapter
From screenshot of c5.large instance (manual transcript):
http://www.netbsd.org/~ryoon/ena/ena-panic-20181221.png
ena0 at pci0 dev 5 function 0: vendor 1d0f product ec20 (rev. 0x00)
pci0: Elastic Network Adapter (ENA)ena vDRV_MODULE_VER_MAJOR.DRV_MODULE_VER_MINOR.DRV_MODULE_VER_SUBMINOR
ena0: initialize 2 io queues
ena0: failed to allocate interrupt
ena0: Frror with ESI-X enablement
ena0: Failed to enable and set the admin interrupts
uvm_fault(0xfffffffff8157a680, 0x0, 2) -> e
fatal page fault in supervisor mode
trap type 6 code 0x2 rip 0xffffffffff8022680c cs 0x8 rflags 0x10246 cr2 0 ilevel 0x8 rsp 0xffffffff81904638
curlwp 0xxxxxxxxx81457d00 pid 0.1 lowesr kstack 0xffffffff819012c0
kernel: page fault trap, code=0
Stopped in pid 0.1 (system) at netbsd:mutex_enter+0xc: lock cmpxchgq %rcx,0(%rdi)
If you want to create AMI for yoe test, See:
http://www.netbsd.org/~ryoon/ena/how-to-create-ami-for-aws-c5.txt
>How-To-Repeat:
Run NetBSD/amd64 8.99.28 on AWS EC2 c5.large instance.
>Fix:
I have no idea.
>Release-Note:
>Audit-Trail:
From: Ryo ONODERA <ryo@tetera.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-amd64/53807: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web
Service EC2 c5.large instance causes kernel panic
Date: Wed, 26 Dec 2018 21:46:50 +0900 (JST)
Hi,
bus_space_map() returns non-zero in the following src/sys/arch/x86/pci/msipic.c.
And changing #if 0 to #if 1 has no effect, so I have gotten same result.
From src/sys/arch/x86/pci/msipic.c
#if 0
err = pci_mapreg_submap(pa, bar, memtype, BUS_SPACE_MAP_LINEAR,
roundup(table_size, PAGE_SIZE), table_offset,
&bstag, &bshandle, NULL, &bssize);
#else
/*
* Workaround for PCI prefetchable bit. Some chips (e.g. Intel 82599)
* report SERR and MSI-X doesn't work. This problem might not be the
* driver's bug but our PCI common part or VMs' bug. Until we find a
* real reason, we ignore the prefetchable bit.
*/
if (pci_mapreg_info(pa->pa_pc, pa->pa_tag, bar, memtype,
&memaddr, NULL, &flags) != 0) {
DPRINTF(("cannot get a map info.\n"));
msipic_destruct_common_msi_pic(msix_pic);
return NULL;
}
if ((flags & BUS_SPACE_MAP_PREFETCHABLE) != 0) {
DPRINTF(( "clear prefetchable bit\n"));
flags &= ~BUS_SPACE_MAP_PREFETCHABLE;
}
bssize = roundup(table_size, PAGE_SIZE);
err = bus_space_map(pa->pa_memt, memaddr + table_offset, bssize, flags,
&bshandle);
bstag = pa->pa_memt;
#endif
if (err) {
DPRINTF(("cannot map msix table.\n"));
msipic_destruct_common_msi_pic(msix_pic);
return NULL;
}
--
Ryo ONODERA // ryo@tetera.org
PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB FD1B F404 27FA C7D1 15F3
From: Ryo ONODERA <ryoon@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-amd64/53807: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web Service EC2 c5.large instance causes kernel panic
Date: Mon, 29 Jul 2019 11:57:27 +0900
Hi,
msaitoh@ provided me a patch.
And the following is his patch with my modification.
http://www.netbsd.org/~ryoon/ena-submap-20190127-0_20190713.dif
This works fine on AWS EC2 c5 instances (amd64)
and A1 instances (evbarm64; works fine like before).
The dmesgs are here:
amd64:
https://dmesgd.nycbug.org/index.cgi?do=view&id=5054
evbarm64:
https://dmesgd.nycbug.org/index.cgi?do=view&id=5055
Could you enable ena(4) for NetBSD/amd64?
Thank you.
--
Ryo ONODERA // ryo@tetera.org
PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB FD1B F404 27FA C7D1 15F3
From: Ryo ONODERA <ryoon@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-amd64/53807: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web Service EC2 c5.large instance causes kernel panic
Date: Mon, 29 Jul 2019 12:22:48 +0900
Hi,
And please include ena upstream information in doc/3RDPARTY
Index: 3RDPARTY
===================================================================
RCS file: /cvsroot/src/doc/3RDPARTY,v
retrieving revision 1.1638
diff -u -r1.1638 3RDPARTY
--- 3RDPARTY 25 Jul 2019 08:59:32 -0000 1.1638
+++ 3RDPARTY 29 Jul 2019 03:21:36 -0000
@@ -2174,3 +2174,15 @@
Location: usr.bin/indent
Notes:
Tests are stored in tests/usr.bin/indent.
+
+Package: ena
+Version: 0.8.1
+Current Vers: 2.0.0
+Maintainer: Amazon.com
+Archive Site: https://github.com/amzn/amzn-drivers/tree/master/kernel/fbsd/ena
+Home Page: https://github.com/amzn/amzn-drivers/
+Mailing List: none
+Responsible:
+License: BSD-like (2 and 3-clause)
+Location: sys/external/bsd/ena-com
+Notes:
Thank you.
--
Ryo ONODERA // ryo@tetera.org
PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB FD1B F404 27FA C7D1 15F3
From: SAITOH Masanobu <msaitoh@execsw.org>
To: gnats-bugs@netbsd.org, port-amd64-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, ryoon@NetBSD.org
Cc: msaitoh@execsw.org
Subject: Re: port-amd64/53807: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web
Service EC2 c5.large instance causes kernel panic
Date: Thu, 1 Aug 2019 22:50:09 +0900
On 2019/07/29 13:40, Ryo ONODERA wrote:
> The following reply was made to PR port-amd64/53807; it has been noted by GNATS.
>
> From: Ryo ONODERA <ryoon@NetBSD.org>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: port-amd64/53807: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web Service EC2 c5.large instance causes kernel panic
> Date: Mon, 29 Jul 2019 12:22:48 +0900
>
> Hi,
>
> And please include ena upstream information in doc/3RDPARTY
All done!
--
-----------------------------------------------
SAITOH Masanobu (msaitoh@execsw.org
msaitoh@netbsd.org)
State-Changed-From-To: open->closed
State-Changed-By: ryoon@NetBSD.org
State-Changed-When: Sun, 29 Dec 2019 08:18:01 +0000
State-Changed-Why:
Fixed in -current and netbsd-9. Thank you.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.45 2018/12/21 14:23:33 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.