NetBSD Problem Report #57878
From www@netbsd.org Thu Jan 25 23:13:53 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id D6A591A9238
for <gnats-bugs@gnats.NetBSD.org>; Thu, 25 Jan 2024 23:13:52 +0000 (UTC)
Message-Id: <20240125231351.435C51A9239@mollari.NetBSD.org>
Date: Thu, 25 Jan 2024 23:13:51 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: i915 calls agp_i810_chipset_flush with spin lock held but agp_i810_chipset_flush waits for xcall
X-Send-Pr-Version: www-1.0
>Number: 57878
>Category: kern
>Synopsis: i915 calls agp_i810_chipset_flush with spin lock held but agp_i810_chipset_flush waits for xcall
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jan 25 23:15:00 +0000 2024
>Closed-Date: Sat Feb 03 15:21:01 +0000 2024
>Last-Modified: Sat Feb 03 15:21:01 +0000 2024
>Originator: Taylor R Campbell
>Release: current, 10
>Organization:
The NetAGP Foundation
>Environment:
>Description:
[ 1.051264] agp0 at pchb0: i855-family chipset
[ 1.051264] agp0: detected 32636k stolen memory
[ 1.051264] agp0: aperture at 0xf0000000, size 0x8000000
[ 1.051264] Intel 82855GM GMCH Memory Controller (miscellaneous system, revision 0x02) at pci0 dev 0 function 1 not configured
[ 1.051264] Intel 82855GM GMCH Configuration Process (miscellaneous system, revision 0x02) at pci0 dev 0 function 3 not configured
[ 1.051264] i915drmkms0 at pci0 dev 2 function 0: Intel 82855GM GMCH Integrated Graphics Device (rev. 0x02)
[ 1.051264] Intel 82855GM GMCH Integrated Graphics Device (miscellaneous display, revision 0x02) at pci0 dev 2 function 1 not configured
...
[ 93.755186] panic: kernel diagnostic assertion "ci->ci_mtx_count == -1" failed: file "/usr/src/sys/kern/kern_synch.c", line 762 mi_switch: cpu0: ci_mtx_count (-2) != -1 (block with spin-mutex held)
[ 93.755186] cpu0: Begin traceback...
[ 93.755186] vpanic(c13ec9a4,db9ddb1c,db9ddb64,c0d8486b,c13ec9a4,c12d733f,c13ecc68,c13ec7e4,2fa,c12b9380) at netbsd:vpanic+0x184
[ 93.755186] kern_assert(c13ec9a4,c12d733f,c13ecc68,c13ec7e4,2fa,c12b9380,0,fffffffe,c0d7fc5b,c32ad400) at netbsd:kern_assert+0x23
[ 93.755186] mi_switch(c32ad400,c0d80241,c2c55640,c166dec0,0,c13f4c9f,c1673d84,c1673d84,c1673d80,0) at netbsd:mi_switch+0x81a
[ 93.755186] sleepq_block(0,0,c12b86e0,0,0,c166dec0,0,33,c1673d80,db9ddbec) at netbsd:sleepq_block+0x22e
[ 93.755186] cv_wait(c1673d84,c1673d80,c4f68104,c4e76964,33,c1673d84,c2c7d960,c2f1b104,20000,db9ddc0c) at netbsd:cv_wait+0xe4
[ 93.755186] xc_wait(33,0,0,0,c2c7d960,c2c7b530,db9ddc28,c06756f9,c2f1b104,2) at netbsd:xc_wait+0x5e
[ 93.755186] agp_i810_chipset_flush(c2f1b104,2,0,c4e05200,0,db9ddc44,c0729312,c4e0526c,18,c4e05200) at netbsd:agp_i810_chipset_flush+0x85
[ 93.755186] intel_gtt_chipset_flush(c4e0526c,18,c4e05200,c3154ac0,c3154b64,db9ddc78,c07137f4,c2c7d960,2cc00,cc00) at netbsd:intel_gtt_chipset_flush+0x1a
[ 93.755186] intel_gt_flush_ggtt_writes(c2c7d960,2cc00,cc00,c4e05200,0,c4e052ac,c4e052b0,c071aa7a,c4e05200,0) at netbsd:intel_gt_flush_ggtt_writes+0x87
[ 93.755186] i915_gem_object_flush_write_domain(c4e05200,fffffffe,3,c4f68158,c078a8c5,c4e05318,ffff0000,c2c7b004,c4f68104,c4e05200) at netbsd:i915_gem_object_flush_write_domain+0x171
[ 93.755186] i915_gem_object_prepare_write(c4e05200,db9ddd24,0,0,0,0,d,0,7,c4f13340) at netbsd:i915_gem_object_prepare_write+0x180
[ 93.755186] i915_gem_pwrite_ioctl(c2c7b004,db9ddeac,c4dcc244,c4354e5c,c4e5aa80,0,db9dddc4,20,1d,db9dddc4) at netbsd:i915_gem_pwrite_ioctl+0x6fb
[ 93.755186] drm_ioctl(c4234180,8020645d,db9ddeac,db9ddf38,c0dcdbce,c4234180,8020645d,db9ddeac,db9ddeb0,c0d8ad5f) at netbsd:drm_ioctl+0x246
[ 93.755186] drm_ioctl_shim(c4234180,8020645d,db9ddeac,db9ddeb0,c0d8ad5f,c161c8e0,ffffffff,0,c4234180,0) at netbsd:drm_ioctl_shim+0x43
[ 93.755186] sys_ioctl(c32ad400,db9ddf68,db9ddf60,c4c05a68,0,c32ad400,c1651038,db9ddf68,0,0) at netbsd:sys_ioctl+0x35b
[ 93.755186] syscall() at netbsd:syscall+0x194
This happens because:
/* i915_gem_object_flush_write_domain */
...
spin_lock(&obj->vma.lock);
for_each_ggtt_vma(vma, obj) {
if (i915_vma_unset_ggtt_write(vma))
intel_gt_flush_ggtt_writes(vma->vm->gt);
}
spin_unlock(&obj->vma.lock);
/* agp_i810_chipset_flush */
...
xc_wait(xc_broadcast(0, &agp_flush_cache_xc,
NULL, NULL));
>How-To-Repeat:
launch X with a DIAGNOSTIC kernel on a machine with an affected Intel graphics chipset
>Fix:
change agp_i810_chipset_flush from xcall(9) to ipi(9)
>Release-Note:
>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Cc:
Subject: Re: kern/57878: i915 calls agp_i810_chipset_flush with spin lock held but agp_i810_chipset_flush waits for xcall
Date: Fri, 26 Jan 2024 12:47:36 +0000
This is a multi-part message in MIME format.
--=_doTlMKKkYbLBtme6fab/51tYZGWhbizH
The attached patch should fix this issue; awaiting confirmation on an
affected machine.
--=_doTlMKKkYbLBtme6fab/51tYZGWhbizH
Content-Type: text/plain; charset="ISO-8859-1"; name="pr57878-agpcacheflushipi"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr57878-agpcacheflushipi.patch"
From 96a70b66956066f08377d6275970b441b34dc73d Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh@NetBSD.org>
Date: Thu, 25 Jan 2024 23:17:34 +0000
Subject: [PATCH] agp_i810(4): Use ipi(9) for chipset flush on all CPUs, not
xcall(9).
i915 now calls into this with a spin lock held, so we have to use
ipi(9), which spin-waits for the other CPUs to complete, rather than
xcall(9), which may sleep-wait.
Fortunately, this is just to execute WBINVD on x86 (and if this code
ever runs on other architectures, which it probably doesn't, it'll be
a similar barrier instruction), so spinning to wait for that on all
CPUs isn't too costly.
PR kern/57878
XXX pullup-10
---
sys/dev/pci/agp_i810.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/sys/dev/pci/agp_i810.c b/sys/dev/pci/agp_i810.c
index adfb5716d4a3..4d8d9d8969f4 100644
--- a/sys/dev/pci/agp_i810.c
+++ b/sys/dev/pci/agp_i810.c
@@ -180,7 +180,7 @@ agp_i810_post_gtt_entry(struct agp_i810_softc *isc, off=
_t off)
}
=20
static void
-agp_flush_cache_xc(void *a __unused, void *b __unused)
+agp_flush_cache_ipi(void *cookie __unused)
{
=20
agp_flush_cache();
@@ -204,11 +204,17 @@ agp_i810_chipset_flush(struct agp_i810_softc *isc)
* XXX Come to think of it, do these chipsets appear in
* any multi-CPU systems?
*/
- if (cold)
+ if (cold) {
agp_flush_cache();
- else
- xc_wait(xc_broadcast(0, &agp_flush_cache_xc,
- NULL, NULL));
+ } else {
+ /*
+ * Caller may hold a spin lock, so use ipi(9)
+ * rather than xcall(9) here.
+ */
+ ipi_msg_t msg =3D { .func =3D agp_flush_cache_ipi };
+ ipi_broadcast(&msg, /*skip_self*/false);
+ ipi_wait(&msg);
+ }
WRITE4(AGP_I830_HIC, READ4(AGP_I830_HIC) | __BIT(31));
while (ISSET(READ4(AGP_I830_HIC), __BIT(31))) {
if (timo-- =3D=3D 0)
--=_doTlMKKkYbLBtme6fab/51tYZGWhbizH--
State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Fri, 26 Jan 2024 12:51:38 +0000
State-Changed-Why:
awaiting feedback from user
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Cc:
Subject: Re: kern/57878: i915 calls agp_i810_chipset_flush with spin lock held but agp_i810_chipset_flush waits for xcall
Date: Fri, 26 Jan 2024 22:26:39 +0000
This is a multi-part message in MIME format.
--=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv
Correction: need kpreempt_disable/enable around ipi(9) if we _don't_
hold a spin lock when we call it.
--=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv
Content-Type: text/plain; charset="ISO-8859-1"; name="pr57878-agpcacheflushipi-v2"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr57878-agpcacheflushipi-v2.patch"
From eac456a8881930d2b1f66a87850eaa13be51d093 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh@NetBSD.org>
Date: Thu, 25 Jan 2024 23:17:34 +0000
Subject: [PATCH] agp_i810(4): Use ipi(9) for chipset flush on all CPUs, not
xcall(9).
i915 now calls into this with a spin lock held, so we have to use
ipi(9), which spin-waits for the other CPUs to complete, rather than
xcall(9), which may sleep-wait.
Fortunately, this is just to execute WBINVD on x86 (and if this code
ever runs on other architectures, which it probably doesn't, it'll be
a similar barrier instruction), so spinning to wait for that on all
CPUs isn't too costly.
PR kern/57878
XXX pullup-10
---
sys/dev/pci/agp_i810.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/sys/dev/pci/agp_i810.c b/sys/dev/pci/agp_i810.c
index adfb5716d4a3..961a3a5b077b 100644
--- a/sys/dev/pci/agp_i810.c
+++ b/sys/dev/pci/agp_i810.c
@@ -180,7 +180,7 @@ agp_i810_post_gtt_entry(struct agp_i810_softc *isc, off=
_t off)
}
=20
static void
-agp_flush_cache_xc(void *a __unused, void *b __unused)
+agp_flush_cache_ipi(void *cookie __unused)
{
=20
agp_flush_cache();
@@ -204,11 +204,19 @@ agp_i810_chipset_flush(struct agp_i810_softc *isc)
* XXX Come to think of it, do these chipsets appear in
* any multi-CPU systems?
*/
- if (cold)
+ if (cold) {
agp_flush_cache();
- else
- xc_wait(xc_broadcast(0, &agp_flush_cache_xc,
- NULL, NULL));
+ } else {
+ /*
+ * Caller may hold a spin lock, so use ipi(9)
+ * rather than xcall(9) here.
+ */
+ ipi_msg_t msg =3D { .func =3D agp_flush_cache_ipi };
+ kpreempt_disable();
+ ipi_broadcast(&msg, /*skip_self*/false);
+ ipi_wait(&msg);
+ kpreempt_enable();
+ }
WRITE4(AGP_I830_HIC, READ4(AGP_I830_HIC) | __BIT(31));
while (ISSET(READ4(AGP_I830_HIC), __BIT(31))) {
if (timo-- =3D=3D 0)
--=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv--
State-Changed-From-To: feedback->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 29 Jan 2024 01:09:52 +0000
State-Changed-Why:
might be other problems with this path so feedback would still be good,
but this is definitely a fix for a real problem and I've confirmed it
doesn't break things worse than they were before
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57878 CVS commit: src/sys/dev/pci
Date: Mon, 29 Jan 2024 01:05:56 +0000
Module Name: src
Committed By: riastradh
Date: Mon Jan 29 01:05:55 UTC 2024
Modified Files:
src/sys/dev/pci: agp_i810.c
Log Message:
agp_i810(4): Use ipi(9) for chipset flush on all CPUs, not xcall(9).
i915 now calls into this with a spin lock held, so we have to use
ipi(9), which spin-waits for the other CPUs to complete, rather than
xcall(9), which may sleep-wait.
Fortunately, this is just to execute WBINVD on x86 (and if this code
ever runs on other architectures, which it probably doesn't, it'll be
a similar barrier instruction), so spinning to wait for that on all
CPUs isn't too costly.
PR kern/57878
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.125 -r1.126 src/sys/dev/pci/agp_i810.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 29 Jan 2024 02:09:25 +0000
State-Changed-Why:
pullup-10 #574
inapplicable to <10
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57878 CVS commit: [netbsd-10] src/sys/dev/pci
Date: Sat, 3 Feb 2024 14:17:03 +0000
Module Name: src
Committed By: martin
Date: Sat Feb 3 14:17:03 UTC 2024
Modified Files:
src/sys/dev/pci [netbsd-10]: agp_i810.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #574):
sys/dev/pci/agp_i810.c: revision 1.126
agp_i810(4): Use ipi(9) for chipset flush on all CPUs, not xcall(9).
i915 now calls into this with a spin lock held, so we have to use
ipi(9), which spin-waits for the other CPUs to complete, rather than
xcall(9), which may sleep-wait.
Fortunately, this is just to execute WBINVD on x86 (and if this code
ever runs on other architectures, which it probably doesn't, it'll be
a similar barrier instruction), so spinning to wait for that on all
CPUs isn't too costly.
PR kern/57878
To generate a diff of this commit:
cvs rdiff -u -r1.125 -r1.125.4.1 src/sys/dev/pci/agp_i810.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sat, 03 Feb 2024 15:21:01 +0000
State-Changed-Why:
fixed and pulled up
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.