NetBSD Problem Report #56857
From bouyer@antioche.eu.org Thu May 26 11:34:35 2022
Return-Path: <bouyer@antioche.eu.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 373D31A921F
for <gnats-bugs@gnats.NetBSD.org>; Thu, 26 May 2022 11:34:35 +0000 (UTC)
Message-Id: <20220526113430.2039810754@rochebonne.antioche.eu.org>
Date: Thu, 26 May 2022 13:34:30 +0200 (CEST)
From: bouyer@antioche.eu.org
Reply-To: bouyer@antioche.eu.org
To: gnats-bugs@NetBSD.org
Subject: ixg(4) doesn't work in legacy interrupt mode
X-Send-Pr-Version: 3.95
>Number: 56857
>Category: kern
>Synopsis: ixg(4) doesn't work in legacy interrupt mode
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: msaitoh
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu May 26 11:35:01 +0000 2022
>Closed-Date: Mon May 30 07:43:28 +0000 2022
>Last-Modified: Tue May 31 14:10:01 +0000 2022
>Originator: Manuel Bouyer
>Release: NetBSD 9.99.97
>Organization:
>Environment:
System: NetBSD bolero.soc.lip6.fr 9.99.97 NetBSD 9.99.97 (GENERIC) #4: Thu May 26 13:16:41 CEST 2022 bouyer@bip:/dsk/l1/misc/bouyer/tmp/amd64/obj/dsk/l1/misc/bouyer/HEAD/4commit/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
Forcing a system to use legacy interrupts with this patch:
Index: pci_machdep.c
===================================================================
RCS file: /cvsroot/src/sys/arch/x86/pci/pci_machdep.c,v
retrieving revision 1.91
diff -u -p -u -r1.91 pci_machdep.c
--- pci_machdep.c 24 May 2022 14:00:23 -0000 1.91
+++ pci_machdep.c 26 May 2022 11:18:59 -0000
@@ -565,6 +565,9 @@ pci_attach_hook(device_t parent, device_
}
}
+ pba->pba_flags &= ~PCI_FLAGS_MSIX_OKAY;
+ pba->pba_flags &= ~PCI_FLAGS_MSI_OKAY;
+
#endif /* __HAVE_PCI_MSI_MSIX */
}
cause igx(4) device to fail to receive packet (or to receive them
with very high latency). Other devices, including nvme(4) and wm(4)
works fine in legacy mode. This is a show stopper for Xen (see PR
kern/55667).
The device works fine with MSI-X. I coulnd't test with MSI as
disabling only MSI-X cause the kernel to panic (probably unrelated to
igx(4)), but experiments with Xen suggest that it fails too.
Here is how the device shows up (boot -vx):
[ 1.1197710] ppb3 at pci4 dev 2 function 0: Intel product 2032 (rev. 0x07)
[ 1.1197710] ppb3: PCI Express capability version 2 <Root Port of PCI-E Root Complex> x4 @ 8.0GT/s
[ 1.1197710] pci5 at ppb3 bus 24
[ 1.1197710] acpi0: MCFG: 024:00:0: Ok (cfg[0x100]=0x14020001 extconf=Y)
[ 1.1197710] acpi0: MCFG: 024:00:1: Ok (cfg[0x100]=0x14020001 extconf=Y)
[ 1.1197710] acpi0: MCFG: bus 24: valid devices
[ 1.1197710] acpi0: MCFG: 024:00:0
[ 1.1197710] acpi0: MCFG: 024:00:1
[ 1.1197710] acpi0: acpimcfg_map_bus done
[ 1.1197710] pci5: i/o space, memory space enabled
[ 1.1197710] ixg0 at pci5 dev 0 function 0: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 4.0.1-k
[ 1.1197710] ixg0: clearing prefetchable bit
[ 1.1197710] ixg0: device X550
[ 1.1197710] ixg0: NVM Image Version 2.00, PHY FW Revision 2.0b ID 0x9, NVM Map version 2.52, OEM NVM Image version 0.06, ETrackID 80000d62
[ 1.1197710] ixg0: PBA number K15087-004
[ 1.1197710] ixg0: failed to allocate MSI-X interrupt
[ 1.1197710] allocated pic ioapic2 type level pin 2 level 6 to cpu0 slot 4 idt entry 98
[ 1.1197710] ixg0: interrupting at ioapic2 pin 2
[ 1.1197710] ixg0: Ethernet address 78:ac:44:86:b0:6c
[ 1.1197710] ixg0: PHY: OUI 0x00aa00 model 0x0022, rev. 0
[ 1.1197710] ixg0: PCI Express Bus: Speed 8.0GT/s Width x4
[ 1.1197710] ixg0: feature cap 0x97a0<TEMP_SENSOR,LEGACY_TX,FDIR,MSI,MSIX,LEGACY_IRQ,RECOVERY_MODE>
[ 1.1197710] ixg0: feature ena 0x9020<TEMP_SENSOR,LEGACY_IRQ,RECOVERY_MODE>
After a multiuser boot (including dhcp address request):
bolero# vmstat -i
interrupt total rate
TLB shootdown 554 3
cpu0 timer 16580 97
ioapic0 pin 3 308 1
ioapic0 pin 16 314 1
ioapic2 pin 2 190 1
ixg0 Link event 2 0
ixg0 q0 IRQs on queue 1 0
ioapic4 pin 2 3854 22
Total 21803 128
tcpdump shows that packets are sent.
>How-To-Repeat:
disable MSI and MSI-X with the above patch, try to use the ixg device.
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->msaitoh
Responsible-Changed-By: bouyer@NetBSD.org
Responsible-Changed-When: Thu, 26 May 2022 11:44:25 +0000
Responsible-Changed-Why:
Hello,
can you please have a look ?
thanks
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/56857: ixg(4) doesn't work in legacy interrupt mode
Date: Sat, 28 May 2022 18:58:48 +0200
On Thu, May 26, 2022 at 11:35:01AM +0000, bouyer@antioche.eu.org wrote:
> [...]
>
> cause igx(4) device to fail to receive packet (or to receive them
> with very high latency). Other devices, including nvme(4) and wm(4)
> works fine in legacy mode. This is a show stopper for Xen (see PR
> kern/55667).
> The device works fine with MSI-X. I coulnd't test with MSI as
> disabling only MSI-X cause the kernel to panic (probably unrelated to
> igx(4)), but experiments with Xen suggest that it fails too.
Restricting the Xen dom0 to a single CPU makes ixg work with MSI.
So maybe the issue is that too much queues are allocated when one interrupt
per queue is not available ?
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Masanobu SAITOH <msaitoh@execsw.org>
To: gnats-bugs@netbsd.org, msaitoh@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, bouyer@antioche.eu.org
Cc: msaitoh@execsw.org
Subject: Re: kern/56857: ixg(4) doesn't work in legacy interrupt mode
Date: Sun, 29 May 2022 13:43:10 +0900
On 2022/05/29 2:00, Manuel Bouyer wrote:
> The following reply was made to PR kern/56857; it has been noted by GNATS.
>
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> To: gnats-bugs@netbsd.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
> Subject: Re: kern/56857: ixg(4) doesn't work in legacy interrupt mode
> Date: Sat, 28 May 2022 18:58:48 +0200
>
> On Thu, May 26, 2022 at 11:35:01AM +0000, bouyer@antioche.eu.org wrote:
> > [...]
> >
> > cause igx(4) device to fail to receive packet (or to receive them
> > with very high latency). Other devices, including nvme(4) and wm(4)
> > works fine in legacy mode. This is a show stopper for Xen (see PR
> > kern/55667).
> > The device works fine with MSI-X. I coulnd't test with MSI as
> > disabling only MSI-X cause the kernel to panic (probably unrelated to
> > igx(4)), but experiments with Xen suggest that it fails too.
>
> Restricting the Xen dom0 to a single CPU makes ixg work with MSI.
> So maybe the issue is that too much queues are allocated when one interrupt
> per queue is not available ?
>
> --
> Manuel Bouyer <bouyer@antioche.eu.org>
> NetBSD: 26 ans d'experience feront toujours la difference
> --
I think I could reproduce the same problem with amd64/conf/GENERIC + your test patch.
(i.e. not dom0).
An event counter of ixgbe:
ixg0 MAC Statistics Interrupt conditions zero 30 0 misc
ixgbe_legacy_irq() increment the counter for each receiving packet and
return quickly.
--
-----------------------------------------------
SAITOH Masanobu (msaitoh@execsw.org
msaitoh@netbsd.org)
From: "SAITOH Masanobu" <msaitoh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56857 CVS commit: src/sys/dev/pci/ixgbe
Date: Mon, 30 May 2022 05:07:38 +0000
Module Name: src
Committed By: msaitoh
Date: Mon May 30 05:07:38 UTC 2022
Modified Files:
src/sys/dev/pci/ixgbe: ixgbe.c
Log Message:
Fix a bug that the legacy interrupt doesn't work when MSI-X allocation failed.
Fixes PR kern/56857.
To generate a diff of this commit:
cvs rdiff -u -r1.314 -r1.315 src/sys/dev/pci/ixgbe/ixgbe.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org, msaitoh@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, bouyer@antioche.eu.org, jdolecek@netbsd.org
Cc:
Subject: Re: kern/56857: ixg(4) doesn't work in legacy interrupt mode
Date: Mon, 30 May 2022 07:49:57 +0200
SUCCESS - networking is ok again with ixg*. Thanks for the quick fix.
On 05/30/22 07:13, Masanobu SAITOH wrote:
> Please test the latest -current (ixgbe.c rev. 1.315).
>
State-Changed-From-To: open->closed
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Mon, 30 May 2022 07:43:28 +0000
State-Changed-Why:
Two persons says it's fixed so assume it is
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56857 CVS commit: [netbsd-9] src/sys/dev/pci/ixgbe
Date: Tue, 31 May 2022 14:03:27 +0000
Module Name: src
Committed By: martin
Date: Tue May 31 14:03:27 UTC 2022
Modified Files:
src/sys/dev/pci/ixgbe [netbsd-9]: ixgbe.c ixgbe.h ixv.c
Log Message:
Pull up following revision(s) (requested by msaitoh in ticket #1458):
sys/dev/pci/ixgbe/ixv.c: revision 1.181
sys/dev/pci/ixgbe/ixgbe.c: revision 1.315
sys/dev/pci/ixgbe/ixgbe.h: revision 1.86
Fix a bug that the legacy interrupt doesn't work when MSI-X allocation failed.
Fixes PR kern/56857.
Remove unused adapter->msix_mem.
To generate a diff of this commit:
cvs rdiff -u -r1.199.2.21 -r1.199.2.22 src/sys/dev/pci/ixgbe/ixgbe.c
cvs rdiff -u -r1.56.2.7 -r1.56.2.8 src/sys/dev/pci/ixgbe/ixgbe.h
cvs rdiff -u -r1.125.2.18 -r1.125.2.19 src/sys/dev/pci/ixgbe/ixv.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56857 CVS commit: [netbsd-8] src/sys/dev/pci/ixgbe
Date: Tue, 31 May 2022 14:07:52 +0000
Module Name: src
Committed By: martin
Date: Tue May 31 14:07:52 UTC 2022
Modified Files:
src/sys/dev/pci/ixgbe [netbsd-8]: ix_txrx.c ixgbe.c ixgbe.h ixv.c
Log Message:
Pull up following revision(s) (requested by msaitoh in ticket #1745):
sys/dev/pci/ixgbe/ix_txrx.c: revision 1.98
sys/dev/pci/ixgbe/ixv.c: revision 1.181
sys/dev/pci/ixgbe/ixgbe.c: revision 1.315
sys/dev/pci/ixgbe/ixgbe.h: revision 1.86
bus_dmamem_unmap() before bus_dmamem_free(), otherwise we may give back meomry
which is still (and will stay) mapped.
Fixes one instance of "panic: HYPERVISOR_mmu_update failed" on Xen.
There may be others.
Fix a bug that the legacy interrupt doesn't work when MSI-X allocation failed.
Fixes PR kern/56857.
Remove unused adapter->msix_mem.
To generate a diff of this commit:
cvs rdiff -u -r1.24.2.24 -r1.24.2.25 src/sys/dev/pci/ixgbe/ix_txrx.c
cvs rdiff -u -r1.88.2.50 -r1.88.2.51 src/sys/dev/pci/ixgbe/ixgbe.c
cvs rdiff -u -r1.24.6.24 -r1.24.6.25 src/sys/dev/pci/ixgbe/ixgbe.h
cvs rdiff -u -r1.56.2.37 -r1.56.2.38 src/sys/dev/pci/ixgbe/ixv.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.