NetBSD Problem Report #57878

From www@netbsd.org  Thu Jan 25 23:13:53 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id D6A591A9238
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 25 Jan 2024 23:13:52 +0000 (UTC)
Message-Id: <20240125231351.435C51A9239@mollari.NetBSD.org>
Date: Thu, 25 Jan 2024 23:13:51 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: i915 calls agp_i810_chipset_flush with spin lock held but agp_i810_chipset_flush waits for xcall
X-Send-Pr-Version: www-1.0

>Number:         57878
>Category:       kern
>Synopsis:       i915 calls agp_i810_chipset_flush with spin lock held but agp_i810_chipset_flush waits for xcall
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jan 25 23:15:00 +0000 2024
>Closed-Date:    Sat Feb 03 15:21:01 +0000 2024
>Last-Modified:  Sat Feb 03 15:21:01 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10
>Organization:
The NetAGP Foundation
>Environment:
>Description:
[     1.051264] agp0 at pchb0: i855-family chipset
[     1.051264] agp0: detected 32636k stolen memory
[     1.051264] agp0: aperture at 0xf0000000, size 0x8000000
[     1.051264] Intel 82855GM GMCH Memory Controller (miscellaneous system, revision 0x02) at pci0 dev 0 function 1 not configured
[     1.051264] Intel 82855GM GMCH Configuration Process (miscellaneous system, revision 0x02) at pci0 dev 0 function 3 not configured
[     1.051264] i915drmkms0 at pci0 dev 2 function 0: Intel 82855GM GMCH Integrated Graphics Device (rev. 0x02)
[     1.051264] Intel 82855GM GMCH Integrated Graphics Device (miscellaneous display, revision 0x02) at pci0 dev 2 function 1 not configured
...
[    93.755186] panic: kernel diagnostic assertion "ci->ci_mtx_count == -1" failed: file "/usr/src/sys/kern/kern_synch.c", line 762 mi_switch: cpu0: ci_mtx_count (-2) != -1 (block with spin-mutex held)
[    93.755186] cpu0: Begin traceback...
[    93.755186] vpanic(c13ec9a4,db9ddb1c,db9ddb64,c0d8486b,c13ec9a4,c12d733f,c13ecc68,c13ec7e4,2fa,c12b9380) at netbsd:vpanic+0x184
[    93.755186] kern_assert(c13ec9a4,c12d733f,c13ecc68,c13ec7e4,2fa,c12b9380,0,fffffffe,c0d7fc5b,c32ad400) at netbsd:kern_assert+0x23
[    93.755186] mi_switch(c32ad400,c0d80241,c2c55640,c166dec0,0,c13f4c9f,c1673d84,c1673d84,c1673d80,0) at netbsd:mi_switch+0x81a
[    93.755186] sleepq_block(0,0,c12b86e0,0,0,c166dec0,0,33,c1673d80,db9ddbec) at netbsd:sleepq_block+0x22e
[    93.755186] cv_wait(c1673d84,c1673d80,c4f68104,c4e76964,33,c1673d84,c2c7d960,c2f1b104,20000,db9ddc0c) at netbsd:cv_wait+0xe4
[    93.755186] xc_wait(33,0,0,0,c2c7d960,c2c7b530,db9ddc28,c06756f9,c2f1b104,2) at netbsd:xc_wait+0x5e
[    93.755186] agp_i810_chipset_flush(c2f1b104,2,0,c4e05200,0,db9ddc44,c0729312,c4e0526c,18,c4e05200) at netbsd:agp_i810_chipset_flush+0x85
[    93.755186] intel_gtt_chipset_flush(c4e0526c,18,c4e05200,c3154ac0,c3154b64,db9ddc78,c07137f4,c2c7d960,2cc00,cc00) at netbsd:intel_gtt_chipset_flush+0x1a
[    93.755186] intel_gt_flush_ggtt_writes(c2c7d960,2cc00,cc00,c4e05200,0,c4e052ac,c4e052b0,c071aa7a,c4e05200,0) at netbsd:intel_gt_flush_ggtt_writes+0x87
[    93.755186] i915_gem_object_flush_write_domain(c4e05200,fffffffe,3,c4f68158,c078a8c5,c4e05318,ffff0000,c2c7b004,c4f68104,c4e05200) at netbsd:i915_gem_object_flush_write_domain+0x171
[    93.755186] i915_gem_object_prepare_write(c4e05200,db9ddd24,0,0,0,0,d,0,7,c4f13340) at netbsd:i915_gem_object_prepare_write+0x180
[    93.755186] i915_gem_pwrite_ioctl(c2c7b004,db9ddeac,c4dcc244,c4354e5c,c4e5aa80,0,db9dddc4,20,1d,db9dddc4) at netbsd:i915_gem_pwrite_ioctl+0x6fb
[    93.755186] drm_ioctl(c4234180,8020645d,db9ddeac,db9ddf38,c0dcdbce,c4234180,8020645d,db9ddeac,db9ddeb0,c0d8ad5f) at netbsd:drm_ioctl+0x246
[    93.755186] drm_ioctl_shim(c4234180,8020645d,db9ddeac,db9ddeb0,c0d8ad5f,c161c8e0,ffffffff,0,c4234180,0) at netbsd:drm_ioctl_shim+0x43
[    93.755186] sys_ioctl(c32ad400,db9ddf68,db9ddf60,c4c05a68,0,c32ad400,c1651038,db9ddf68,0,0) at netbsd:sys_ioctl+0x35b
[    93.755186] syscall() at netbsd:syscall+0x194

This happens because:

/* i915_gem_object_flush_write_domain */
...
		spin_lock(&obj->vma.lock);
		for_each_ggtt_vma(vma, obj) {
			if (i915_vma_unset_ggtt_write(vma))
				intel_gt_flush_ggtt_writes(vma->vm->gt);
		}
		spin_unlock(&obj->vma.lock);

/* agp_i810_chipset_flush */
...
			xc_wait(xc_broadcast(0, &agp_flush_cache_xc,
				NULL, NULL));

>How-To-Repeat:
launch X with a DIAGNOSTIC kernel on a machine with an affected Intel graphics chipset
>Fix:
change agp_i810_chipset_flush from xcall(9) to ipi(9)

>Release-Note:

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Cc: 
Subject: Re: kern/57878: i915 calls agp_i810_chipset_flush with spin lock held but agp_i810_chipset_flush waits for xcall
Date: Fri, 26 Jan 2024 12:47:36 +0000

 This is a multi-part message in MIME format.
 --=_doTlMKKkYbLBtme6fab/51tYZGWhbizH

 The attached patch should fix this issue; awaiting confirmation on an
 affected machine.

 --=_doTlMKKkYbLBtme6fab/51tYZGWhbizH
 Content-Type: text/plain; charset="ISO-8859-1"; name="pr57878-agpcacheflushipi"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="pr57878-agpcacheflushipi.patch"

 From 96a70b66956066f08377d6275970b441b34dc73d Mon Sep 17 00:00:00 2001
 From: Taylor R Campbell <riastradh@NetBSD.org>
 Date: Thu, 25 Jan 2024 23:17:34 +0000
 Subject: [PATCH] agp_i810(4): Use ipi(9) for chipset flush on all CPUs, not
  xcall(9).

 i915 now calls into this with a spin lock held, so we have to use
 ipi(9), which spin-waits for the other CPUs to complete, rather than
 xcall(9), which may sleep-wait.

 Fortunately, this is just to execute WBINVD on x86 (and if this code
 ever runs on other architectures, which it probably doesn't, it'll be
 a similar barrier instruction), so spinning to wait for that on all
 CPUs isn't too costly.

 PR kern/57878

 XXX pullup-10
 ---
  sys/dev/pci/agp_i810.c | 16 +++++++++++-----
  1 file changed, 11 insertions(+), 5 deletions(-)

 diff --git a/sys/dev/pci/agp_i810.c b/sys/dev/pci/agp_i810.c
 index adfb5716d4a3..4d8d9d8969f4 100644
 --- a/sys/dev/pci/agp_i810.c
 +++ b/sys/dev/pci/agp_i810.c
 @@ -180,7 +180,7 @@ agp_i810_post_gtt_entry(struct agp_i810_softc *isc, off=
 _t off)
  }
 =20
  static void
 -agp_flush_cache_xc(void *a __unused, void *b __unused)
 +agp_flush_cache_ipi(void *cookie __unused)
  {
 =20
  	agp_flush_cache();
 @@ -204,11 +204,17 @@ agp_i810_chipset_flush(struct agp_i810_softc *isc)
  		 * XXX Come to think of it, do these chipsets appear in
  		 * any multi-CPU systems?
  		 */
 -		if (cold)
 +		if (cold) {
  			agp_flush_cache();
 -		else
 -			xc_wait(xc_broadcast(0, &agp_flush_cache_xc,
 -				NULL, NULL));
 +		} else {
 +			/*
 +			 * Caller may hold a spin lock, so use ipi(9)
 +			 * rather than xcall(9) here.
 +			 */
 +			ipi_msg_t msg =3D { .func =3D agp_flush_cache_ipi };
 +			ipi_broadcast(&msg, /*skip_self*/false);
 +			ipi_wait(&msg);
 +		}
  		WRITE4(AGP_I830_HIC, READ4(AGP_I830_HIC) | __BIT(31));
  		while (ISSET(READ4(AGP_I830_HIC), __BIT(31))) {
  			if (timo-- =3D=3D 0)

 --=_doTlMKKkYbLBtme6fab/51tYZGWhbizH--

State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Fri, 26 Jan 2024 12:51:38 +0000
State-Changed-Why:
awaiting feedback from user


From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Cc: 
Subject: Re: kern/57878: i915 calls agp_i810_chipset_flush with spin lock held but agp_i810_chipset_flush waits for xcall
Date: Fri, 26 Jan 2024 22:26:39 +0000

 This is a multi-part message in MIME format.
 --=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv

 Correction: need kpreempt_disable/enable around ipi(9) if we _don't_
 hold a spin lock when we call it.

 --=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv
 Content-Type: text/plain; charset="ISO-8859-1"; name="pr57878-agpcacheflushipi-v2"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="pr57878-agpcacheflushipi-v2.patch"

 From eac456a8881930d2b1f66a87850eaa13be51d093 Mon Sep 17 00:00:00 2001
 From: Taylor R Campbell <riastradh@NetBSD.org>
 Date: Thu, 25 Jan 2024 23:17:34 +0000
 Subject: [PATCH] agp_i810(4): Use ipi(9) for chipset flush on all CPUs, not
  xcall(9).

 i915 now calls into this with a spin lock held, so we have to use
 ipi(9), which spin-waits for the other CPUs to complete, rather than
 xcall(9), which may sleep-wait.

 Fortunately, this is just to execute WBINVD on x86 (and if this code
 ever runs on other architectures, which it probably doesn't, it'll be
 a similar barrier instruction), so spinning to wait for that on all
 CPUs isn't too costly.

 PR kern/57878

 XXX pullup-10
 ---
  sys/dev/pci/agp_i810.c | 18 +++++++++++++-----
  1 file changed, 13 insertions(+), 5 deletions(-)

 diff --git a/sys/dev/pci/agp_i810.c b/sys/dev/pci/agp_i810.c
 index adfb5716d4a3..961a3a5b077b 100644
 --- a/sys/dev/pci/agp_i810.c
 +++ b/sys/dev/pci/agp_i810.c
 @@ -180,7 +180,7 @@ agp_i810_post_gtt_entry(struct agp_i810_softc *isc, off=
 _t off)
  }
 =20
  static void
 -agp_flush_cache_xc(void *a __unused, void *b __unused)
 +agp_flush_cache_ipi(void *cookie __unused)
  {
 =20
  	agp_flush_cache();
 @@ -204,11 +204,19 @@ agp_i810_chipset_flush(struct agp_i810_softc *isc)
  		 * XXX Come to think of it, do these chipsets appear in
  		 * any multi-CPU systems?
  		 */
 -		if (cold)
 +		if (cold) {
  			agp_flush_cache();
 -		else
 -			xc_wait(xc_broadcast(0, &agp_flush_cache_xc,
 -				NULL, NULL));
 +		} else {
 +			/*
 +			 * Caller may hold a spin lock, so use ipi(9)
 +			 * rather than xcall(9) here.
 +			 */
 +			ipi_msg_t msg =3D { .func =3D agp_flush_cache_ipi };
 +			kpreempt_disable();
 +			ipi_broadcast(&msg, /*skip_self*/false);
 +			ipi_wait(&msg);
 +			kpreempt_enable();
 +		}
  		WRITE4(AGP_I830_HIC, READ4(AGP_I830_HIC) | __BIT(31));
  		while (ISSET(READ4(AGP_I830_HIC), __BIT(31))) {
  			if (timo-- =3D=3D 0)

 --=_jGdl80rZzbJ3AV6nW+2j7DYaEAx46dnv--

State-Changed-From-To: feedback->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 29 Jan 2024 01:09:52 +0000
State-Changed-Why:
might be other problems with this path so feedback would still be good,
but this is definitely a fix for a real problem and I've confirmed it
doesn't break things worse than they were before


From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57878 CVS commit: src/sys/dev/pci
Date: Mon, 29 Jan 2024 01:05:56 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jan 29 01:05:55 UTC 2024

 Modified Files:
 	src/sys/dev/pci: agp_i810.c

 Log Message:
 agp_i810(4): Use ipi(9) for chipset flush on all CPUs, not xcall(9).

 i915 now calls into this with a spin lock held, so we have to use
 ipi(9), which spin-waits for the other CPUs to complete, rather than
 xcall(9), which may sleep-wait.

 Fortunately, this is just to execute WBINVD on x86 (and if this code
 ever runs on other architectures, which it probably doesn't, it'll be
 a similar barrier instruction), so spinning to wait for that on all
 CPUs isn't too costly.

 PR kern/57878

 XXX pullup-10


 To generate a diff of this commit:
 cvs rdiff -u -r1.125 -r1.126 src/sys/dev/pci/agp_i810.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 29 Jan 2024 02:09:25 +0000
State-Changed-Why:
pullup-10 #574
inapplicable to <10


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57878 CVS commit: [netbsd-10] src/sys/dev/pci
Date: Sat, 3 Feb 2024 14:17:03 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sat Feb  3 14:17:03 UTC 2024

 Modified Files:
 	src/sys/dev/pci [netbsd-10]: agp_i810.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #574):

 	sys/dev/pci/agp_i810.c: revision 1.126

 agp_i810(4): Use ipi(9) for chipset flush on all CPUs, not xcall(9).

 i915 now calls into this with a spin lock held, so we have to use
 ipi(9), which spin-waits for the other CPUs to complete, rather than
 xcall(9), which may sleep-wait.

 Fortunately, this is just to execute WBINVD on x86 (and if this code
 ever runs on other architectures, which it probably doesn't, it'll be
 a similar barrier instruction), so spinning to wait for that on all
 CPUs isn't too costly.

 PR kern/57878


 To generate a diff of this commit:
 cvs rdiff -u -r1.125 -r1.125.4.1 src/sys/dev/pci/agp_i810.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sat, 03 Feb 2024 15:21:01 +0000
State-Changed-Why:
fixed and pulled up


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.