NetBSD Problem Report #56857

From bouyer@antioche.eu.org  Thu May 26 11:34:35 2022
Return-Path: <bouyer@antioche.eu.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 373D31A921F
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 26 May 2022 11:34:35 +0000 (UTC)
Message-Id: <20220526113430.2039810754@rochebonne.antioche.eu.org>
Date: Thu, 26 May 2022 13:34:30 +0200 (CEST)
From: bouyer@antioche.eu.org
Reply-To: bouyer@antioche.eu.org
To: gnats-bugs@NetBSD.org
Subject: ixg(4) doesn't work in legacy interrupt mode
X-Send-Pr-Version: 3.95

>Number:         56857
>Category:       kern
>Synopsis:       ixg(4) doesn't work in legacy interrupt mode
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    msaitoh
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu May 26 11:35:01 +0000 2022
>Closed-Date:    Mon May 30 07:43:28 +0000 2022
>Last-Modified:  Tue May 31 14:10:01 +0000 2022
>Originator:     Manuel Bouyer
>Release:        NetBSD 9.99.97
>Organization:
>Environment:
System: NetBSD bolero.soc.lip6.fr 9.99.97 NetBSD 9.99.97 (GENERIC) #4: Thu May 26 13:16:41 CEST 2022  bouyer@bip:/dsk/l1/misc/bouyer/tmp/amd64/obj/dsk/l1/misc/bouyer/HEAD/4commit/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
	Forcing a system to use legacy interrupts with this patch:
Index: pci_machdep.c
===================================================================
RCS file: /cvsroot/src/sys/arch/x86/pci/pci_machdep.c,v
retrieving revision 1.91
diff -u -p -u -r1.91 pci_machdep.c
--- pci_machdep.c	24 May 2022 14:00:23 -0000	1.91
+++ pci_machdep.c	26 May 2022 11:18:59 -0000
@@ -565,6 +565,9 @@ pci_attach_hook(device_t parent, device_
 		}
 	}

+	pba->pba_flags &= ~PCI_FLAGS_MSIX_OKAY;
+	pba->pba_flags &= ~PCI_FLAGS_MSI_OKAY;
+
 #endif /* __HAVE_PCI_MSI_MSIX */
 }

	cause igx(4) device to fail to receive packet (or to receive them
	with very high latency). Other devices, including nvme(4) and wm(4)
	works fine in legacy mode. This is a show stopper for Xen (see PR
	kern/55667).
	The device works fine with MSI-X. I coulnd't test with MSI as
	disabling only MSI-X cause the kernel to panic (probably unrelated to
	igx(4)), but experiments with Xen suggest that it fails too.
	Here is how the device shows up (boot -vx):
[   1.1197710] ppb3 at pci4 dev 2 function 0: Intel product 2032 (rev. 0x07)
[   1.1197710] ppb3: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x4 @ 8.0GT/s                                                          
[   1.1197710] pci5 at ppb3 bus 24
[   1.1197710] acpi0: MCFG: 024:00:0: Ok (cfg[0x100]=0x14020001 extconf=Y)
[   1.1197710] acpi0: MCFG: 024:00:1: Ok (cfg[0x100]=0x14020001 extconf=Y)
[   1.1197710] acpi0: MCFG: bus 24: valid devices
[   1.1197710] acpi0: MCFG: 024:00:0
[   1.1197710] acpi0: MCFG: 024:00:1
[   1.1197710] acpi0: acpimcfg_map_bus done
[   1.1197710] pci5: i/o space, memory space enabled
[   1.1197710] ixg0 at pci5 dev 0 function 0: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 4.0.1-k                                                
[   1.1197710] ixg0: clearing prefetchable bit
[   1.1197710] ixg0: device X550
[   1.1197710] ixg0: NVM Image Version 2.00, PHY FW Revision 2.0b ID 0x9, NVM Map version 2.52, OEM NVM Image version 0.06, ETrackID 80000d62                 
[   1.1197710] ixg0: PBA number K15087-004
[   1.1197710] ixg0: failed to allocate MSI-X interrupt
[   1.1197710] allocated pic ioapic2 type level pin 2 level 6 to cpu0 slot 4 idt entry 98                                                                    
[   1.1197710] ixg0: interrupting at ioapic2 pin 2
[   1.1197710] ixg0: Ethernet address 78:ac:44:86:b0:6c
[   1.1197710] ixg0: PHY: OUI 0x00aa00 model 0x0022, rev. 0
[   1.1197710] ixg0: PCI Express Bus: Speed 8.0GT/s Width x4
[   1.1197710] ixg0: feature cap 0x97a0<TEMP_SENSOR,LEGACY_TX,FDIR,MSI,MSIX,LEGACY_IRQ,RECOVERY_MODE>                                                         
[   1.1197710] ixg0: feature ena 0x9020<TEMP_SENSOR,LEGACY_IRQ,RECOVERY_MODE>

After a multiuser boot (including dhcp address request):
bolero# vmstat -i
interrupt              total rate
TLB shootdown            554    3
cpu0 timer             16580   97
ioapic0 pin 3            308    1
ioapic0 pin 16           314    1
ioapic2 pin 2            190    1
ixg0 Link event            2    0
ixg0 q0 IRQs on queue      1    0
ioapic4 pin 2           3854   22
Total                  21803  128

tcpdump shows that packets are sent.

>How-To-Repeat:
	disable MSI and MSI-X with the above patch, try to use the ixg device.
>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->msaitoh
Responsible-Changed-By: bouyer@NetBSD.org
Responsible-Changed-When: Thu, 26 May 2022 11:44:25 +0000
Responsible-Changed-Why:
Hello,
can you please have a look ?
thanks


From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/56857: ixg(4) doesn't work in legacy interrupt mode
Date: Sat, 28 May 2022 18:58:48 +0200

 On Thu, May 26, 2022 at 11:35:01AM +0000, bouyer@antioche.eu.org wrote:
 >  [...]
 > 
 > 	cause igx(4) device to fail to receive packet (or to receive them
 > 	with very high latency). Other devices, including nvme(4) and wm(4)
 > 	works fine in legacy mode. This is a show stopper for Xen (see PR
 > 	kern/55667).
 > 	The device works fine with MSI-X. I coulnd't test with MSI as
 > 	disabling only MSI-X cause the kernel to panic (probably unrelated to
 > 	igx(4)), but experiments with Xen suggest that it fails too.

 Restricting the Xen dom0 to a single CPU makes ixg work with MSI.
 So maybe the issue is that too much queues are allocated when one interrupt
 per queue is not available ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Masanobu SAITOH <msaitoh@execsw.org>
To: gnats-bugs@netbsd.org, msaitoh@netbsd.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org, bouyer@antioche.eu.org
Cc: msaitoh@execsw.org
Subject: Re: kern/56857: ixg(4) doesn't work in legacy interrupt mode
Date: Sun, 29 May 2022 13:43:10 +0900

 On 2022/05/29 2:00, Manuel Bouyer wrote:
 > The following reply was made to PR kern/56857; it has been noted by GNATS.
 > 
 > From: Manuel Bouyer <bouyer@antioche.eu.org>
 > To: gnats-bugs@netbsd.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 > Subject: Re: kern/56857: ixg(4) doesn't work in legacy interrupt mode
 > Date: Sat, 28 May 2022 18:58:48 +0200
 > 
 >  On Thu, May 26, 2022 at 11:35:01AM +0000, bouyer@antioche.eu.org wrote:
 >  >  [...]
 >  > 
 >  > 	cause igx(4) device to fail to receive packet (or to receive them
 >  > 	with very high latency). Other devices, including nvme(4) and wm(4)
 >  > 	works fine in legacy mode. This is a show stopper for Xen (see PR
 >  > 	kern/55667).
 >  > 	The device works fine with MSI-X. I coulnd't test with MSI as
 >  > 	disabling only MSI-X cause the kernel to panic (probably unrelated to
 >  > 	igx(4)), but experiments with Xen suggest that it fails too.
 >  
 >  Restricting the Xen dom0 to a single CPU makes ixg work with MSI.
 >  So maybe the issue is that too much queues are allocated when one interrupt
 >  per queue is not available ?
 >  
 >  -- 
 >  Manuel Bouyer <bouyer@antioche.eu.org>
 >       NetBSD: 26 ans d'experience feront toujours la difference
 >  --

 I think I could reproduce the same problem with amd64/conf/GENERIC + your test patch.
 (i.e. not dom0).

 An event counter of ixgbe:

 	ixg0 MAC Statistics Interrupt conditions zero          30    0 misc

 ixgbe_legacy_irq() increment the counter for each receiving packet and
 return quickly.

 -- 
 -----------------------------------------------
                 SAITOH Masanobu (msaitoh@execsw.org
                                  msaitoh@netbsd.org)

From: "SAITOH Masanobu" <msaitoh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56857 CVS commit: src/sys/dev/pci/ixgbe
Date: Mon, 30 May 2022 05:07:38 +0000

 Module Name:	src
 Committed By:	msaitoh
 Date:		Mon May 30 05:07:38 UTC 2022

 Modified Files:
 	src/sys/dev/pci/ixgbe: ixgbe.c

 Log Message:
 Fix a bug that the legacy interrupt doesn't work when MSI-X allocation failed.
 Fixes PR kern/56857.


 To generate a diff of this commit:
 cvs rdiff -u -r1.314 -r1.315 src/sys/dev/pci/ixgbe/ixgbe.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org, msaitoh@netbsd.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org, bouyer@antioche.eu.org, jdolecek@netbsd.org
Cc: 
Subject: Re: kern/56857: ixg(4) doesn't work in legacy interrupt mode
Date: Mon, 30 May 2022 07:49:57 +0200

 SUCCESS - networking is ok again with ixg*. Thanks for the quick fix.

 On 05/30/22 07:13, Masanobu SAITOH wrote:
 > Please test the latest -current (ixgbe.c rev. 1.315).
 >

State-Changed-From-To: open->closed
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Mon, 30 May 2022 07:43:28 +0000
State-Changed-Why:
Two persons says it's fixed so assume it is


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56857 CVS commit: [netbsd-9] src/sys/dev/pci/ixgbe
Date: Tue, 31 May 2022 14:03:27 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Tue May 31 14:03:27 UTC 2022

 Modified Files:
 	src/sys/dev/pci/ixgbe [netbsd-9]: ixgbe.c ixgbe.h ixv.c

 Log Message:
 Pull up following revision(s) (requested by msaitoh in ticket #1458):

 	sys/dev/pci/ixgbe/ixv.c: revision 1.181
 	sys/dev/pci/ixgbe/ixgbe.c: revision 1.315
 	sys/dev/pci/ixgbe/ixgbe.h: revision 1.86

 Fix a bug that the legacy interrupt doesn't work when MSI-X allocation failed.
 Fixes PR kern/56857.

 Remove unused adapter->msix_mem.


 To generate a diff of this commit:
 cvs rdiff -u -r1.199.2.21 -r1.199.2.22 src/sys/dev/pci/ixgbe/ixgbe.c
 cvs rdiff -u -r1.56.2.7 -r1.56.2.8 src/sys/dev/pci/ixgbe/ixgbe.h
 cvs rdiff -u -r1.125.2.18 -r1.125.2.19 src/sys/dev/pci/ixgbe/ixv.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56857 CVS commit: [netbsd-8] src/sys/dev/pci/ixgbe
Date: Tue, 31 May 2022 14:07:52 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Tue May 31 14:07:52 UTC 2022

 Modified Files:
 	src/sys/dev/pci/ixgbe [netbsd-8]: ix_txrx.c ixgbe.c ixgbe.h ixv.c

 Log Message:
 Pull up following revision(s) (requested by msaitoh in ticket #1745):

 	sys/dev/pci/ixgbe/ix_txrx.c: revision 1.98
 	sys/dev/pci/ixgbe/ixv.c: revision 1.181
 	sys/dev/pci/ixgbe/ixgbe.c: revision 1.315
 	sys/dev/pci/ixgbe/ixgbe.h: revision 1.86

 bus_dmamem_unmap() before bus_dmamem_free(), otherwise we may give back meomry
 which is still (and will stay) mapped.

 Fixes one instance of "panic: HYPERVISOR_mmu_update failed" on Xen.
 There may be others.

 Fix a bug that the legacy interrupt doesn't work when MSI-X allocation failed.
 Fixes PR kern/56857.

 Remove unused adapter->msix_mem.


 To generate a diff of this commit:
 cvs rdiff -u -r1.24.2.24 -r1.24.2.25 src/sys/dev/pci/ixgbe/ix_txrx.c
 cvs rdiff -u -r1.88.2.50 -r1.88.2.51 src/sys/dev/pci/ixgbe/ixgbe.c
 cvs rdiff -u -r1.24.6.24 -r1.24.6.25 src/sys/dev/pci/ixgbe/ixgbe.h
 cvs rdiff -u -r1.56.2.37 -r1.56.2.38 src/sys/dev/pci/ixgbe/ixv.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.