NetBSD Problem Report #57537
From mrg@eterna.com.au Sat Jul 22 06:40:49 2023
Return-Path: <mrg@eterna.com.au>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id F1DCC1A923E
for <gnats-bugs@gnats.NetBSD.org>; Sat, 22 Jul 2023 06:40:48 +0000 (UTC)
Message-Id: <20230722063734.3D4E81590CF@splode.eterna.com.au>
Date: Sat, 22 Jul 2023 16:37:34 +1000 (AEST)
From: mrg@eterna.com.au
Reply-To: mrg@eterna.com.au
To: gnats-bugs@NetBSD.org
Subject: radeon drm hangs with multiple glxgears active
X-Send-Pr-Version: 3.95
>Number: 57537
>Category: kern
>Synopsis: radeon drm hangs with multiple glxgears active
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: riastradh
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jul 22 06:45:00 +0000 2023
>Closed-Date: Wed Aug 02 13:09:07 +0000 2023
>Last-Modified: Wed Aug 02 13:09:07 +0000 2023
>Originator: matthew green
>Release: NetBSD 10.99.6 amd64
>Organization:
people's front against (bozotic) www (softwar foundation)
>Environment:
-10 or -current amd64.
radeon0 at pci1 dev 0 function 0: ATI Technologies Mobility Radeon HD 4670 (rev. 0x00)
[ ... ]
[drm] initializing kernel modesetting (RV730 0x1002:0x9488 0x1028:0x02FE 0x00).
[ .. normal looking messages, no errors ]
>Description:
on a system with radeon 4760 running 8 concurrent glxgears
pretty quickly has at least 3-5 of them lock up and stop
spinning the gear. (the system has 8 cpu threads.)
crash(8) shows that each of these glxgears has 3 lwps, two
are in lwp_park() and for the 5 stuck instances right now,
this is the kernel stack trace (all the same):
crash> bt/a ffff8fb035453140
trace: pid 1159 lid 1972 at 0xffff928243e406f0
sleepq_block() at sleepq_block+0x166
cv_wait_sig() at cv_wait_sig+0x55
ww_mutex_lock_wait_sig() at ww_mutex_lock_wait_sig+0xab
linux_ww_mutex_lock_interruptible() at linux_ww_mutex_lock_interruptible+0x1d3
ttm_eu_reserve_buffers() at ttm_eu_reserve_buffers+0x1a6
radeon_bo_list_validate() at radeon_bo_list_validate+0xab
radeon_cs_ioctl() at radeon_cs_ioctl+0x975
drm_ioctl() at drm_ioctl+0x260
drm_ioctl_shim() at drm_ioctl_shim+0x45
sys_ioctl() at sys_ioctl+0x5d3
syscall() at syscall+0x1ae
--- syscall (number 54) ---
syscall+0x1ae:
>How-To-Repeat:
boot on r600 system, run concurrent glxgears with
"vblank_mode=1" in the environment.
>Fix:
>Release-Note:
>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: kern/57537: radeon drm hangs with multiple glxgears active
Date: Sat, 22 Jul 2023 16:59:59 +1000
i realised that i've been near this problem before, i've got
a patch that printf()s instead of panic()s here, and it's
firing in dmesg:
ww_mutex_lock_wait_sig:408: nopanic: ww mutex class mismatch: 0xffffffff81=
09e4c0 !=3D 0xffffffff804b89d6
see patch below. i'll work on getting more info (stack trace
at the very least).
.mrg.
Index: sys/external/bsd/drm2/linux/linux_ww_mutex.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/src/sys/external/bsd/drm2/linux/linux_ww_mutex.c,v
retrieving revision 1.14
diff -p -u -r1.14 linux_ww_mutex.c
--- sys/external/bsd/drm2/linux/linux_ww_mutex.c 18 Mar 2022 23:33:41 -000=
0 1.14
+++ sys/external/bsd/drm2/linux/linux_ww_mutex.c 22 Jul 2023 06:57:19 -000=
0
@@ -398,9 +398,16 @@ ww_mutex_lock_wait_sig(struct ww_mutex *
KASSERT((mutex->wwm_state =3D=3D WW_CTX) ||
(mutex->wwm_state =3D=3D WW_WANTOWN));
KASSERT(mutex->wwm_u.ctx !=3D ctx);
+#if 0
KASSERTMSG((ctx->wwx_class =3D=3D mutex->wwm_u.ctx->wwx_class),
"ww mutex class mismatch: %p !=3D %p",
ctx->wwx_class, mutex->wwm_u.ctx->wwx_class);
+#else
+ if (ctx->wwx_class !=3D mutex->wwm_u.ctx->wwx_class)
+ printf("%s:%d: nopanic: ww mutex class mismatch: %p !=3D %p\n",
+ __func__, __LINE__,
+ ctx->wwx_class, mutex->wwm_u.ctx->wwx_class);
+#endif
KASSERTMSG((mutex->wwm_u.ctx->wwx_ticket !=3D ctx->wwx_ticket),
"ticket number reused: %"PRId64" (%p) %"PRId64" (%p)",
ctx->wwx_ticket, ctx,
@@ -751,9 +758,16 @@ retry: switch (mutex->wwm_state) {
* Owned by a higher-priority party. Tell the caller
* to unlock everything and start over.
*/
+#if 0
KASSERTMSG((ctx->wwx_class =3D=3D mutex->wwm_u.ctx->wwx_class),
"ww mutex class mismatch: %p !=3D %p",
ctx->wwx_class, mutex->wwm_u.ctx->wwx_class);
+#else
+ if (!(ctx->wwx_class =3D=3D mutex->wwm_u.ctx->wwx_class))
+ printf("%s:%d: nopanic: ww mutex class mismatch: %p !=3D %p\n",
+ __func__, __LINE__,
+ ctx->wwx_class, mutex->wwm_u.ctx->wwx_class);
+#endif
ret =3D -EDEADLK;
goto out_unlock;
}
Responsible-Changed-From-To: kern-bug-people->riastradh
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Sat, 29 Jul 2023 22:44:44 +0000
Responsible-Changed-Why:
my bug
State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sat, 29 Jul 2023 22:44:44 +0000
State-Changed-Why:
fix committed
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57537 CVS commit: src/sys/external/bsd/drm2/linux
Date: Sat, 29 Jul 2023 22:43:56 +0000
Module Name: src
Committed By: riastradh
Date: Sat Jul 29 22:43:56 UTC 2023
Modified Files:
src/sys/external/bsd/drm2/linux: linux_ww_mutex.c
Log Message:
drm/linux_ww_mutex: Fix wait loops.
If cv_wait_sig returns because a signal is delivered, we may
nonetheless have been granted the lock. It is harmless for us to
ignore this fact in three of the four paths, but in
ww_mutex_state_wait_sig, we may now have ownership of the lock and
MUST NOT return failure because the caller MUST release the lock
before destroying the ww_acquire_ctx.
While here, restructure the other three loops for clarity, so they
match the structure of the fourth and so they have a little less
impenetrable negation.
PR kern/57537
XXX pullup-8
XXX pullup-9
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.14 -r1.15 src/sys/external/bsd/drm2/linux/linux_ww_mutex.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 30 Jul 2023 12:36:47 +0000
State-Changed-Why:
pullup-10 #298
(more work needed for netbsd-9 or netbsd-8)
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57537 CVS commit: [netbsd-10] src/sys/external/bsd/drm2/linux
Date: Tue, 1 Aug 2023 16:53:19 +0000
Module Name: src
Committed By: martin
Date: Tue Aug 1 16:53:18 UTC 2023
Modified Files:
src/sys/external/bsd/drm2/linux [netbsd-10]: linux_ww_mutex.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #298):
sys/external/bsd/drm2/linux/linux_ww_mutex.c: revision 1.15
drm/linux_ww_mutex: Fix wait loops.
If cv_wait_sig returns because a signal is delivered, we may
nonetheless have been granted the lock. It is harmless for us to
ignore this fact in three of the four paths, but in
ww_mutex_state_wait_sig, we may now have ownership of the lock and
MUST NOT return failure because the caller MUST release the lock
before destroying the ww_acquire_ctx.
While here, restructure the other three loops for clarity, so they
match the structure of the fourth and so they have a little less
impenetrable negation.
PR kern/57537
To generate a diff of this commit:
cvs rdiff -u -r1.14 -r1.14.4.1 \
src/sys/external/bsd/drm2/linux/linux_ww_mutex.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57537 CVS commit: [netbsd-9] src/sys/external/bsd/drm2/linux
Date: Tue, 1 Aug 2023 17:26:29 +0000
Module Name: src
Committed By: martin
Date: Tue Aug 1 17:26:28 UTC 2023
Modified Files:
src/sys/external/bsd/drm2/linux [netbsd-9]: linux_ww_mutex.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #1696):
sys/external/bsd/drm2/linux/linux_ww_mutex.c: revision 1.15
drm/linux_ww_mutex: Fix wait loops.
If cv_wait_sig returns because a signal is delivered, we may
nonetheless have been granted the lock. It is harmless for us to
ignore this fact in three of the four paths, but in
ww_mutex_state_wait_sig, we may now have ownership of the lock and
MUST NOT return failure because the caller MUST release the lock
before destroying the ww_acquire_ctx.
While here, restructure the other three loops for clarity, so they
match the structure of the fourth and so they have a little less
impenetrable negation.
PR kern/57537
To generate a diff of this commit:
cvs rdiff -u -r1.7.2.2 -r1.7.2.3 \
src/sys/external/bsd/drm2/linux/linux_ww_mutex.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57537 CVS commit: [netbsd-8] src/sys/external/bsd/drm2/linux
Date: Tue, 1 Aug 2023 17:29:15 +0000
Module Name: src
Committed By: martin
Date: Tue Aug 1 17:29:15 UTC 2023
Modified Files:
src/sys/external/bsd/drm2/linux [netbsd-8]: linux_ww_mutex.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #1876):
sys/external/bsd/drm2/linux/linux_ww_mutex.c: revision 1.15
drm/linux_ww_mutex: Fix wait loops.
If cv_wait_sig returns because a signal is delivered, we may
nonetheless have been granted the lock. It is harmless for us to
ignore this fact in three of the four paths, but in
ww_mutex_state_wait_sig, we may now have ownership of the lock and
MUST NOT return failure because the caller MUST release the lock
before destroying the ww_acquire_ctx.
While here, restructure the other three loops for clarity, so they
match the structure of the fourth and so they have a little less
impenetrable negation.
PR kern/57537
To generate a diff of this commit:
cvs rdiff -u -r1.2.10.5 -r1.2.10.6 \
src/sys/external/bsd/drm2/linux/linux_ww_mutex.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 02 Aug 2023 13:09:07 +0000
State-Changed-Why:
fixed and pulled up to 10, 9, and 8
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.