NetBSD Problem Report #53807

From ryo_on@yk.rim.or.jp  Fri Dec 21 16:20:23 2018
Return-Path: <ryo_on@yk.rim.or.jp>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4B55A7A189
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 21 Dec 2018 16:20:23 +0000 (UTC)
Message-Id: <43LsN84RG1z1XRk1d@mail.SiriusCloud.jp>
Date: Sat, 22 Dec 2018 00:02:56 +0900
From: ryoon@NetBSD.org
Reply-To: ryoon@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web Service EC2 c5.large instance causes kernel panic
X-Send-Pr-Version: 3.95

>Number:         53807
>Category:       port-amd64
>Synopsis:       ena(4) on NetBSD/amd64 8.99.28 on Amazon Web Service EC2 c5.large instance causes kernel panic
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-amd64-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Dec 21 16:25:00 +0000 2018
>Closed-Date:    Sun Dec 29 08:18:01 +0000 2019
>Last-Modified:  Sun Dec 29 08:18:01 +0000 2019
>Originator:     Ryo ONODERA
>Release:        NetBSD 8.99.28
>Organization:

>Environment:


System: NetBSD brownie 8.99.28 NetBSD 8.99.28 (DTRACE7) #9: Fri Dec 21 18:26:39 JST 2018 ryoon@brownie:/usr/world/8.99/amd64/obj/sys/arch/amd64/compile/DTRACE7 amd64
Architecture: x86_64
Machine: amd64
>Description:
NetBSD/amd64 8.99.28 on Amazon Web Service (AWS) EC2 c5.large instance gets
kernel panic in ena(4) detection.

I have added the following line to my kernel configuration file:
ena*    at pci? dev ? function ?        # Amazon Elastic Network Adapter

From screenshot of c5.large instance (manual transcript):
http://www.netbsd.org/~ryoon/ena/ena-panic-20181221.png

ena0 at pci0 dev 5 function 0: vendor 1d0f product ec20 (rev. 0x00)
pci0: Elastic Network Adapter (ENA)ena vDRV_MODULE_VER_MAJOR.DRV_MODULE_VER_MINOR.DRV_MODULE_VER_SUBMINOR
ena0: initialize 2 io queues
ena0: failed to allocate interrupt
ena0: Frror with ESI-X enablement
ena0: Failed to enable and set the admin interrupts
uvm_fault(0xfffffffff8157a680, 0x0, 2) -> e
fatal page fault in supervisor mode
trap type 6 code 0x2 rip 0xffffffffff8022680c cs 0x8 rflags 0x10246 cr2 0 ilevel 0x8 rsp 0xffffffff81904638
curlwp 0xxxxxxxxx81457d00 pid 0.1 lowesr kstack 0xffffffff819012c0
kernel: page fault trap, code=0
Stopped in pid 0.1 (system) at netbsd:mutex_enter+0xc: lock cmpxchgq %rcx,0(%rdi)

If you want to create AMI for yoe test, See:
http://www.netbsd.org/~ryoon/ena/how-to-create-ami-for-aws-c5.txt

>How-To-Repeat:
Run NetBSD/amd64 8.99.28 on AWS EC2 c5.large instance.

>Fix:

I have no idea.

>Release-Note:

>Audit-Trail:
From: Ryo ONODERA <ryo@tetera.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-amd64/53807: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web
 Service EC2 c5.large instance causes kernel panic
Date: Wed, 26 Dec 2018 21:46:50 +0900 (JST)

 Hi,

 bus_space_map() returns non-zero in the following src/sys/arch/x86/pci/msipic.c.
 And changing #if 0 to #if 1 has no effect, so I have gotten same result.

 From src/sys/arch/x86/pci/msipic.c
 #if 0
         err = pci_mapreg_submap(pa, bar, memtype, BUS_SPACE_MAP_LINEAR,
             roundup(table_size, PAGE_SIZE), table_offset,
             &bstag, &bshandle, NULL, &bssize);
 #else
         /*
          * Workaround for PCI prefetchable bit. Some chips (e.g. Intel 82599)
          * report SERR and MSI-X doesn't work. This problem might not be the
          * driver's bug but our PCI common part or VMs' bug. Until we find a
          * real reason, we ignore the prefetchable bit.
          */
         if (pci_mapreg_info(pa->pa_pc, pa->pa_tag, bar, memtype,
                 &memaddr, NULL, &flags) != 0) {
                 DPRINTF(("cannot get a map info.\n"));
                 msipic_destruct_common_msi_pic(msix_pic);
                 return NULL;
         }
         if ((flags & BUS_SPACE_MAP_PREFETCHABLE) != 0) {
                 DPRINTF(( "clear prefetchable bit\n"));
                 flags &= ~BUS_SPACE_MAP_PREFETCHABLE;
         }
         bssize = roundup(table_size, PAGE_SIZE);
         err = bus_space_map(pa->pa_memt, memaddr + table_offset, bssize, flags,
             &bshandle);
         bstag = pa->pa_memt;
 #endif
         if (err) {
                 DPRINTF(("cannot map msix table.\n"));
                 msipic_destruct_common_msi_pic(msix_pic);
                 return NULL;
         }

 --
 Ryo ONODERA // ryo@tetera.org
 PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB  FD1B F404 27FA C7D1 15F3

From: Ryo ONODERA <ryoon@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-amd64/53807: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web Service EC2 c5.large instance causes kernel panic
Date: Mon, 29 Jul 2019 11:57:27 +0900

 Hi,

 msaitoh@ provided me a patch.
 And the following is his patch with my modification.

 http://www.netbsd.org/~ryoon/ena-submap-20190127-0_20190713.dif

 This works fine on AWS EC2 c5 instances (amd64)
 and A1 instances (evbarm64; works fine like before).

 The dmesgs are here:
 amd64:
 https://dmesgd.nycbug.org/index.cgi?do=view&id=5054

 evbarm64:
 https://dmesgd.nycbug.org/index.cgi?do=view&id=5055

 Could you enable ena(4) for NetBSD/amd64?

 Thank you.

 -- 
 Ryo ONODERA // ryo@tetera.org
 PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB  FD1B F404 27FA C7D1 15F3

From: Ryo ONODERA <ryoon@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-amd64/53807: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web Service EC2 c5.large instance causes kernel panic
Date: Mon, 29 Jul 2019 12:22:48 +0900

 Hi,

 And please include ena upstream information in doc/3RDPARTY

 Index: 3RDPARTY
 ===================================================================
 RCS file: /cvsroot/src/doc/3RDPARTY,v
 retrieving revision 1.1638
 diff -u -r1.1638 3RDPARTY
 --- 3RDPARTY	25 Jul 2019 08:59:32 -0000	1.1638
 +++ 3RDPARTY	29 Jul 2019 03:21:36 -0000
 @@ -2174,3 +2174,15 @@
  Location:	usr.bin/indent
  Notes:
  Tests are stored in tests/usr.bin/indent.
 +
 +Package:	ena
 +Version:	0.8.1
 +Current Vers:	2.0.0
 +Maintainer:	Amazon.com
 +Archive Site:	https://github.com/amzn/amzn-drivers/tree/master/kernel/fbsd/ena
 +Home Page:	https://github.com/amzn/amzn-drivers/
 +Mailing List:	none
 +Responsible:
 +License:	BSD-like (2 and 3-clause)
 +Location:	sys/external/bsd/ena-com
 +Notes:

 Thank you.

 -- 
 Ryo ONODERA // ryo@tetera.org
 PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB  FD1B F404 27FA C7D1 15F3

From: SAITOH Masanobu <msaitoh@execsw.org>
To: gnats-bugs@netbsd.org, port-amd64-maintainer@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, ryoon@NetBSD.org
Cc: msaitoh@execsw.org
Subject: Re: port-amd64/53807: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web
 Service EC2 c5.large instance causes kernel panic
Date: Thu, 1 Aug 2019 22:50:09 +0900

 On 2019/07/29 13:40, Ryo ONODERA wrote:
 > The following reply was made to PR port-amd64/53807; it has been noted by GNATS.
 > 
 > From: Ryo ONODERA <ryoon@NetBSD.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: port-amd64/53807: ena(4) on NetBSD/amd64 8.99.28 on Amazon Web Service EC2 c5.large instance causes kernel panic
 > Date: Mon, 29 Jul 2019 12:22:48 +0900
 > 
 >  Hi,
 >  
 >  And please include ena upstream information in doc/3RDPARTY

 All done!

 -- 
 -----------------------------------------------
                 SAITOH Masanobu (msaitoh@execsw.org
                                  msaitoh@netbsd.org)

State-Changed-From-To: open->closed
State-Changed-By: ryoon@NetBSD.org
State-Changed-When: Sun, 29 Dec 2019 08:18:01 +0000
State-Changed-Why:
Fixed in -current and netbsd-9. Thank you.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.45 2018/12/21 14:23:33 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.