NetBSD Problem Report #56561

From www@netbsd.org  Mon Dec 20 15:36:54 2021
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 3F15C1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 20 Dec 2021 15:36:54 +0000 (UTC)
Message-Id: <20211220153653.4AB631A923B@mollari.NetBSD.org>
Date: Mon, 20 Dec 2021 15:36:53 +0000 (UTC)
From: prlw1@cam.ac.uk
Reply-To: prlw1@cam.ac.uk
To: gnats-bugs@NetBSD.org
Subject: cv_is_valid assertion failure in intel drm
X-Send-Pr-Version: www-1.0

>Number:         56561
>Category:       kern
>Synopsis:       cv_is_valid assertion failure in intel drm
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Dec 20 15:40:00 +0000 2021
>Closed-Date:    Sun Aug 28 13:37:41 +0000 2022
>Last-Modified:  Sun Aug 28 13:37:41 +0000 2022
>Originator:     Patrick Welche
>Release:        NetBSD-9.99.92/amd64 of 20 Dec 2021
>Organization:
>Environment:
>Description:
Trying out the new drm code on a

i915drmkms0 at pci0 dev 2 function 0: Intel UHD Graphics 620 (rev. 0x02)

which now has hardware acceleration(!) I happened across this panic:

(gdb) print panicstr
$1 = 0xffffffff810f8600 <scratchstr> "kernel diagnostic assertion \"cv_is_valid(cv)\" failed: file \"../../../../kern/kern_condvar.c\", line 511 "
(gdb) bt
#0  0xffffffff80222705 in cpu_reboot (howto=howto@entry=260, 
    bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:713
#1  0xffffffff808c69c4 in kern_reboot (howto=howto@entry=260, 
    bootstr=bootstr@entry=0x0) at ../../../../kern/kern_reboot.c:73
#2  0xffffffff8090909a in vpanic (
    fmt=0xffffffff80d8e280 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xffffb6813c48edc8) at ../../../../kern/subr_prf.c:290
#3  0xffffffff80a5bb47 in kern_assert (
    fmt=fmt@entry=0xffffffff80d8e280 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at ../../../../../../lib/libkern/kern_assert.c:51
#4  0xffffffff80894d9a in cv_broadcast (cv=0xffffb6814e554b50)
    at ../../../../kern/kern_condvar.c:511
#5  0xffffffff80a2be73 in linux___dma_fence_signal_wake (
    fence=0xffffa1db7f364840, timestamp=<optimized out>)
    at ../../../../external/bsd/drm2/linux/linux_dma_fence.c:1176
#6  0xffffffff80575b7d in signal_irq_work (work=0xffffb6801fac41e8)
    at ../../../../external/bsd/drm2/dist/drm/i915/gt/intel_breadcrumbs.c:218
#7  0xffffffff80a30939 in irq_work_intr (cookie=<optimized out>)
    at ../../../../external/bsd/drm2/linux/linux_irq_work.c:74
#8  0xffffffff808d4ff0 in softint_execute (s=5, l=0xffffa1debe5fe4c0)
    at ../../../../kern/kern_softint.c:565
#9  softint_dispatch (pinned=<optimized out>, s=5)
    at ../../../../kern/kern_softint.c:814
#10 0xffffffff8021d7ff in Xsoftintr ()
#11 0xa6658f2fddbb7892 in ?? ()
#12 0xa6658f2fddbb7892 in ?? ()
Backtrace stopped: Cannot access memory at address 0xffffb6813c48f000
(gdb) frame 4
#4  0xffffffff80894d9a in cv_broadcast (cv=0xffffb6814e554b50)
    at ../../../../kern/kern_condvar.c:511
511             KASSERT(cv_is_valid(cv));
(gdb) list
506      */
507     void
508     cv_broadcast(kcondvar_t *cv)
509     {
510
511             KASSERT(cv_is_valid(cv));
512
513             if (__predict_false(!LIST_EMPTY(CV_SLEEPQ(cv))))
514                     cv_wakeup_all(cv);
515     }
(gdb) print *cv
$1 = {cv_opaque = {0x0, 0xffffffff80d69100 <deadcv>}}

>How-To-Repeat:
Not entirely sure: things have been pretty stable...
>Fix:

>Release-Note:

>Audit-Trail:
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56561: cv_is_valid assertion failure in intel drm
Date: Tue, 21 Dec 2021 10:18:42 +0000

 While "idle" (running X) same laptop/kernel just fell over with:

 Crash version 9.99.92, image version 9.99.92.
 crash: _kvm_kvatop(0)
 Kernel compiled without options LOCKDEBUG.
 System panicked: trap
 Backtrace from time of crash is available.
 crash> bt
 _KERNEL_OPT_NVGA_RASTERCONSOLE() at 0
 _KERNEL_OPT_PMS_DISABLE_POWERHOOK() at ffffa6814e59a000
 sys_reboot() at sys_reboot
 vpanic() at vpanic+0x154
 device_printf() at device_printf
 startlwp() at startlwp
 calltrap() at calltrap+0x19
 intel_disable_ddi() at intel_disable_ddi+0xa3
 intel_encoders_disable() at intel_encoders_disable+0x90
 hsw_crtc_disable() at hsw_crtc_disable+0x13
 intel_old_crtc_state_disables() at intel_old_crtc_state_disables+0x11c
 intel_atomic_commit_tail() at intel_atomic_commit_tail+0xf1b
 intel_atomic_commit() at intel_atomic_commit+0x29e
 drm_atomic_connector_commit_dpms() at drm_atomic_connector_commit_dpms+0xe0
 drm_mode_obj_set_property_ioctl() at drm_mode_obj_set_property_ioctl+0x162
 drm_connector_property_set_ioctl() at drm_connector_property_set_ioctl+0x27
 drm_ioctl() at drm_ioctl+0x2cb
 drm_ioctl_shim() at drm_ioctl_shim+0x2f
 sys_ioctl() at sys_ioctl+0x555
 syscall() at syscall+0x18c
 --- syscall (number 54) ---
 syscall+0x18c:

 (gdb couldn't make sense of the corefile(!))

 PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 1511 >1511 7   1        40   ffffed90b3166240                  X
 0    > 189 7   2       240   ffffed90adf20bc0            ioflush
 0    > 163 7   0       200   ffffed90ada62580          nd6_timer
 0    > 202 7   3       240   ffffed90ad7f12c0               iic0
 0    > 124 1   7       201   ffffed90acf791c0             idle/7
 0    > 118 1   6       201   ffffed90acf2c140             idle/6
 0    > 112 1   5       201   ffffed90aceaf0c0             idle/5
 0    > 106 1   4       201   ffffed90ace52040             idle/4
 0    >   6 7   0       200   ffffed9408ffe4c0          softser/0
 0    >   5 7   0       200   ffffed9408ffe080          softclk/0

From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: prlw1@cam.ac.uk
Subject: Re: kern/56561: cv_is_valid assertion failure in intel drm
Date: Fri, 24 Dec 2021 13:47:11 +0000

 Do you have dmesg from the crash dump of the latest crash?  Wondering
 whether the original trap description is there.

From: Patrick Welche <prlw1@cam.ac.uk>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/56561: cv_is_valid assertion failure in intel drm
Date: Sun, 26 Dec 2021 17:54:13 +0000

 On Fri, Dec 24, 2021 at 01:47:11PM +0000, Taylor R Campbell wrote:
 > Do you have dmesg from the crash dump of the latest crash?  Wondering
 > whether the original trap description is there.

 Indeed it is! It turns out the trap happens more frequently, and
 usually when idle. This is with a 22 Dec 9.99.93 kernel:

 prevented execution of 0x0 (SMEP)
 fatal page fault in supervisor mode
 trap type 6 code 0x10 rip 0 cs 0x8 rflags 0x10246 cr2 0 ilevel 0 rsp 0xffff94014e33a898
 curlwp 0xffffc78220fd6b00 pid 1521.1521 lowest kstack 0xffff94014e3362c0
 panic: trap
 cpu1: Begin traceback...
 vpanic() at netbsd:vpanic+0x14a
 panic() at netbsd:panic+0x3c
 trap() at netbsd:trap+0xa7d
 --- trap (number 6) ---
 ?() at 0
 intel_disable_ddi() at netbsd:intel_disable_ddi+0xa3
 intel_encoders_disable() at netbsd:intel_encoders_disable+0x90
 hsw_crtc_disable() at netbsd:hsw_crtc_disable+0x13
 intel_old_crtc_state_disables() at netbsd:intel_old_crtc_state_disables+0x11c
 intel_atomic_commit_tail() at netbsd:intel_atomic_commit_tail+0xf1b
 intel_atomic_commit() at netbsd:intel_atomic_commit+0x29e
 drm_atomic_connector_commit_dpms() at netbsd:drm_atomic_connector_commit_dpms+0xe0
 drm_mode_obj_set_property_ioctl() at netbsd:drm_mode_obj_set_property_ioctl+0x162
 drm_connector_property_set_ioctl() at netbsd:drm_connector_property_set_ioctl+0x27
 drm_ioctl() at netbsd:drm_ioctl+0x2cb
 drm_ioctl_shim() at netbsd:drm_ioctl_shim+0x2f
 sys_ioctl() at netbsd:sys_ioctl+0x555
 syscall() at netbsd:syscall+0x18c
 --- syscall (number 54) ---
 netbsd:syscall+0x18c:
 cpu1: End traceback...

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Patrick Welche <prlw1@cam.ac.uk>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/56561: cv_is_valid assertion failure in intel drm
Date: Sun, 26 Dec 2021 17:57:50 +0000

 > Date: Sun, 26 Dec 2021 17:54:13 +0000
 > From: Patrick Welche <prlw1@cam.ac.uk>
 > 
 > Indeed it is! It turns out the trap happens more frequently, and
 > usually when idle. This is with a 22 Dec 9.99.93 kernel:
 > 
 > prevented execution of 0x0 (SMEP)
 > [...]
 > intel_disable_ddi() at netbsd:intel_disable_ddi+0xa3

 Got a line number for intel_disable_ddi+0xa3?

From: Patrick Welche <prlw1@cam.ac.uk>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/56561: cv_is_valid assertion failure in intel drm
Date: Sun, 26 Dec 2021 18:11:16 +0000

 On Sun, Dec 26, 2021 at 05:57:50PM +0000, Taylor R Campbell wrote:
 > > Date: Sun, 26 Dec 2021 17:54:13 +0000
 > > From: Patrick Welche <prlw1@cam.ac.uk>
 > > 
 > > Indeed it is! It turns out the trap happens more frequently, and
 > > usually when idle. This is with a 22 Dec 9.99.93 kernel:
 > > 
 > > prevented execution of 0x0 (SMEP)
 > > [...]
 > > intel_disable_ddi() at netbsd:intel_disable_ddi+0xa3
 > 
 > Got a line number for intel_disable_ddi+0xa3?

 Sort of? A3=163

 Dump of assembler code for function intel_disable_ddi:
    0xffffffff804e9d6d <+0>:       push   %rbp
 ...
    0xffffffff804e9e0b <+158>:     call   0xffffffff80521597 <intel_edp_backlight_off>
    0xffffffff804e9e10 <+163>:     xor    %edx,%edx    <----
    0xffffffff804e9e12 <+165>:     mov    %r12,%rsi
    0xffffffff804e9e15 <+168>:     mov    %r15,%rdi
    0xffffffff804e9e18 <+171>:     pop    %r12
    0xffffffff804e9e1a <+173>:     pop    %r13
    0xffffffff804e9e1c <+175>:     pop    %r14
    0xffffffff804e9e1e <+177>:     pop    %r15
    0xffffffff804e9e20 <+179>:     pop    %rbp
    0xffffffff804e9e21 <+180>:     jmp    0xffffffff805215fd <intel_dp_sink_set_decompression_state>

 (I'm clearly net getting addr2line right...)

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Patrick Welche <prlw1@cam.ac.uk>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/56561: cv_is_valid assertion failure in intel drm
Date: Sun, 26 Dec 2021 19:35:46 +0000

 This is a multi-part message in MIME format.
 --=_iH22Mq8cEGadrlwrn55fhppMX7+AHtfW

 Can you try the attached patch?

 You might be able to accelerate testing by explicitly asking to blank
 the screen, maybe by screenblank(1), xset dpms, or xscreensaver.

 --=_iH22Mq8cEGadrlwrn55fhppMX7+AHtfW
 Content-Type: text/plain; charset="ISO-8859-1"; name="intelpanel"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="intelpanel.patch"

 From 26fda8a04b487a3a59e6940cbd02faf0fcc68a8a Mon Sep 17 00:00:00 2001
 From: Taylor R Campbell <riastradh@NetBSD.org>
 Date: Sun, 26 Dec 2021 19:33:15 +0000
 Subject: [PATCH] i915: Unifdef cnp_enable/disable_backlight.

 Not sure why this was ifdef'd out in the first place!  Appears to
 have been a mistake in merge.
 ---
  sys/external/bsd/drm2/dist/drm/i915/display/intel_panel.c | 6 ++----
  1 file changed, 2 insertions(+), 4 deletions(-)

 diff --git a/sys/external/bsd/drm2/dist/drm/i915/display/intel_panel.c b/sy=
 s/external/bsd/drm2/dist/drm/i915/display/intel_panel.c
 index a5204f61f3ab..c0f192e7bc73 100644
 --- a/sys/external/bsd/drm2/dist/drm/i915/display/intel_panel.c
 +++ b/sys/external/bsd/drm2/dist/drm/i915/display/intel_panel.c
 @@ -831,7 +831,6 @@ static void bxt_disable_backlight(const struct drm_conn=
 ector_state *old_conn_sta
  	}
  }
 =20
 -#ifndef __NetBSD__		/* XXX mipi */
  static void cnp_disable_backlight(const struct drm_connector_state *old_co=
 nn_state)
  {
  	struct intel_connector *connector =3D to_intel_connector(old_conn_state->=
 connector);
 @@ -846,6 +845,7 @@ static void cnp_disable_backlight(const struct drm_conn=
 ector_state *old_conn_sta
  		   tmp & ~BXT_BLC_PWM_ENABLE);
  }
 =20
 +#ifndef __NetBSD__		/* XXX mipi */
  static void pwm_disable_backlight(const struct drm_connector_state *old_co=
 nn_state)
  {
  	struct intel_connector *connector =3D to_intel_connector(old_conn_state->=
 connector);
 @@ -1138,7 +1138,6 @@ static void bxt_enable_backlight(const struct intel_c=
 rtc_state *crtc_state,
  			pwm_ctl | BXT_BLC_PWM_ENABLE);
  }
 =20
 -#ifndef __NetBSD__		/* XXX mipi */
  static void cnp_enable_backlight(const struct intel_crtc_state *crtc_state,
  				 const struct drm_connector_state *conn_state)
  {
 @@ -1170,6 +1169,7 @@ static void cnp_enable_backlight(const struct intel_c=
 rtc_state *crtc_state,
  		   pwm_ctl | BXT_BLC_PWM_ENABLE);
  }
 =20
 +#ifndef __NetBSD__		/* XXX mipi */
  static void pwm_enable_backlight(const struct intel_crtc_state *crtc_state,
  				 const struct drm_connector_state *conn_state)
  {
 @@ -2008,10 +2008,8 @@ intel_panel_init_backlight_funcs(struct intel_panel =
 *panel)
  		panel->backlight.hz_to_pwm =3D bxt_hz_to_pwm;
  	} else if (INTEL_PCH_TYPE(dev_priv) >=3D PCH_CNP) {
  		panel->backlight.setup =3D cnp_setup_backlight;
 -#ifndef __NetBSD__ /* XXX mipi */
  		panel->backlight.enable =3D cnp_enable_backlight;
  		panel->backlight.disable =3D cnp_disable_backlight;
 -#endif
  		panel->backlight.set =3D bxt_set_backlight;
  		panel->backlight.get =3D bxt_get_backlight;
  		panel->backlight.hz_to_pwm =3D cnp_hz_to_pwm;

 --=_iH22Mq8cEGadrlwrn55fhppMX7+AHtfW--

From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56561: cv_is_valid assertion failure in intel drm
Date: Mon, 27 Dec 2021 19:41:26 +0000

 On Sun, Dec 26, 2021 at 07:40:02PM +0000, Taylor R Campbell wrote:
 >  Can you try the attached patch?

 It seems to have fixed the trap - I observed a successful uneventful
 blanking of the screen with it!

From: Patrick Welche <prlw1@talktalk.net>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56561: cv_is_valid assertion failure in intel drm
Date: Wed, 12 Jan 2022 11:11:18 +0000

 On the same laptop, with 9.99.93 10 Jan 2022 code, I just observed:

 panic() at device_printf
 trap() at startlwp
 --- trap (number 6) ---
 i915_gem_evict_for_node() at i915_gem_evict_for_node+0xc9
 i915_gem_gtt_reserve() at i915_gem_gtt_reserve+0xdc
 i915_gem_gtt_insert() at i915_gem_gtt_insert+0x1f4
 i915_vma_pin() at i915_vma_pin+0xa10
 eb_lookup_vmas() at eb_lookup_vmas+0x5e5
 i915_gem_do_execbuffer() at i915_gem_do_execbuffer+0x6c4
 i915_gem_execbuffer2_ioctl() at i915_gem_execbuffer2_ioctl+0x1f9
 drm_ioctl() at drm_ioctl+0x214
 drm_ioctl_shim() at drm_ioctl_shim+0x2f
 sys_ioctl() at sys_ioctl+0x555
 syscall() at syscall+0x18c
 --- syscall (number 54) ---
 syscall+0x18c:

 (note to self: /usr/obj/crash/netbsd.1.core.gz)

From: Patrick Welche <prlw1@talktalk.net>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56561: cv_is_valid assertion failure in intel drm
Date: Mon, 7 Mar 2022 09:50:50 +0000

 I just observed the original cv_is_valid(cv) assertion as mentioned at
 the top of this bug on 9.99.93 of 1st March 2022.

 Just in case the line numbers changed:

 (gdb) thread apply all bt

 Thread 2.1 (<kvm>):
 #0  0xffffffff80222765 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:720
 #1  0xffffffff808af187 in kern_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at ../../../../kern/kern_reboot.c:73
 #2  0xffffffff808f43ca in vpanic (fmt=0xffffffff80d8e4a8 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xffffde013c59edc8) at ../../../../kern/subr_prf.c:290
 #3  0xffffffff80a701f7 in kern_assert (fmt=fmt@entry=0xffffffff80d8e4a8 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at ../../../../../../lib/libkern/kern_assert.c:51
 #4  0xffffffff8087b1aa in cv_broadcast (cv=0xffffde014e40ab50) at ../../../../kern/kern_condvar.c:511
 #5  0xffffffff80a1a553 in linux___dma_fence_signal_wake (fence=0xffff969807f14a40, timestamp=<optimized out>) at ../../../../external/bsd/drm2/linux/linux_dma_fence.c:1176
 #6  0xffffffff8057e467 in signal_irq_work (work=0xffffde001faed1e8) at ../../../../external/bsd/drm2/dist/drm/i915/gt/intel_breadcrumbs.c:218
 #7  0xffffffff80a1f089 in irq_work_intr (cookie=<optimized out>) at ../../../../external/bsd/drm2/linux/linux_irq_work.c:74
 #8  0xffffffff808bdbd0 in softint_execute (s=5, l=0xffff969b483f24c0) at ../../../../kern/kern_softint.c:565
 #9  softint_dispatch (pinned=<optimized out>, s=5) at ../../../../kern/kern_softint.c:814
 #10 0xffffffff8021d7ff in Xsoftintr ()
 #11 0x07b77c698bd464af in ?? ()
 #12 0x07f77c298b9464ef in ?? ()
 Backtrace stopped: Cannot access memory at address 0xffffde013c59f000

From: "John D. Baker" <jdbaker@consolidated.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/56561: cv_is_valid assertion failure in intel drm
Date: Sat, 28 May 2022 11:19:15 -0500 (CDT)

 I have just run into this panic as well on 9.99.97/amd64.  The system
 was idle with my usual X arangement.  When I returned to the machine,
 it had rebooted.

 The last bit of "dmesg" as recovered by 'crash':

 [...]
 [ 8348.0558266] heartbeat *
 [ 8348.0558266] heartbeat Idle? no
 [ 8348.0558266] heartbeat Signals:
 [ 8348.0558266] heartbeat       [2:2cac6*] @ 5990ms
 [ 8348.0558266] heartbeat       [2:2cac7] @ 4990ms
 [ 8348.0558266] i915drmkms0: notice: Resetting chip for stopped heartbeat on rcs
 0
 [ 8348.0558266] i915drmkms0: notice: xlock[1578] context reset due to GPU hang
 [ 9078.8967489] nfs server yggdrasil:/r0/home/jdbaker: not responding
 [ 9079.0665965] nfs server yggdrasil:/r0/home/jdbaker: is alive again
 [ 10316.4686326] nfs server yggdrasil:/r0/home/jdbaker: not responding
 [ 10316.5386338] nfs server yggdrasil:/r0/home/jdbaker: is alive again
 [ 13680.6386480] panic: kernel diagnostic assertion "cv_is_valid(cv)" failed: fi
 le "/x/current/src/sys/kern/kern_condvar.c", line 511
 [ 13680.6386480] cpu0: Begin traceback...
 [ 13680.6386480] vpanic() at netbsd:vpanic+0x183
 [ 13680.6386480] kern_assert() at netbsd:kern_assert+0x4b
 [ 13680.6386480] cv_broadcast() at netbsd:cv_broadcast+0x56
 [ 13680.6386480] linux___dma_fence_signal_wake() at netbsd:linux___dma_fence_sig
 nal_wake+0x13e
 [ 13680.6386480] signal_irq_work() at netbsd:signal_irq_work+0x2ca
 [ 13680.6386480] irq_work_intr() at netbsd:irq_work_intr+0x87
 [ 13680.6386480] softint_dispatch() at netbsd:softint_dispatch+0xf9
 [ 13680.6386480] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xffffc580aed4
 90f0
 [ 13680.6386480] Xsoftintr() at netbsd:Xsoftintr+0x4f
 [ 13680.6386480] --- interrupt ---
 [ 13680.6386480] bfbd3f7fb7d79fb7:
 [ 13680.6386480] cpu0: End traceback...

 [ 13680.6386480] dumping to dev 0,1 (offset=16867127, size=2086023):
 [ 13680.6386480] dump

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]consolidated[flyspeck]net  OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: "John D. Baker" <jdbaker@consolidated.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/56561: cv_is_valid assertion failure in intel drm
Date: Sun, 10 Jul 2022 16:15:30 -0500 (CDT)

 Just tried running a 9.99.98/amd64 system with i915drmkms on an
 82G41-based system and the panic still occurs:

 [...]
 [  8794.379988] heartbeat rcs0 heartbeat {prio:-2147483645} not ticking
 [  8794.379988] heartbeat 	Awake? 6
 [  8794.379988] heartbeat 	Barriers?: no
 [  8794.379988] heartbeat 	Latency: 18us
 [  8794.379988] heartbeat 	Heartbeat: 3000 ms ago
 [  8794.379988] heartbeat 	Reset count: 0 (global 0)
 [  8794.379988] heartbeat 	Requests:
 [  8794.379988] heartbeat 		active  2:b16a9*-  @ 6000ms: xlock[4402]
 [  8794.379988] heartbeat 		ring->start:  0x00004000
 [  8794.379988] heartbeat 		ring->head:   0x00001ca0
 [  8794.379988] heartbeat 		ring->tail:   0x00001e30
 [  8794.379988] heartbeat 		ring->emit:   0x00001e30
 [  8794.379988] heartbeat 		ring->space:  0x00003e30
 [  8794.379988] heartbeat 		ring->hwsp:   0x00002100
 [  8794.379988] heartbeat [head 1ca0, postfix 1d10, tail 1d28, batch 0x00000000_00d3c000]:
 [  8794.379988] warning: /x/current/src/sys/external/bsd/drm2/dist/drm/i915/gt/intel_engine_cs.c:1234: WARN_ON_ONCE(hex_dump_to_buffer(buf + pos, len - pos, rowsize, sizeof(u32), line, sizeof(line), 0) >= sizeof(line))
 [  8794.379988] heartbeat [0000] 22000002 0240007a 04e0ff1f 00000000 00000000 00000002 00000002 00000002
 [  8794.379988] 00000002 00000002 00000002 00000002 00000002 00000002 0
 [  8794.379988] heartbeat [0020] 00000002 00000002 00000002 00000002 00000002 00000002 00000002 00000002
 [  8794.379988] 00000002 0240007a 04e0ff1f 00000000 00000000 22000002 0
 [  8794.379988] heartbeat [0040] 00000002 0240007a 04e0ff1f 00000000 00000000 22000002 00000000 0000000c
 [  8794.379988] 0c31dc00 00000000 80018018 00c0d300 00000002 01008010 0
 [  8794.379988] heartbeat [0060] 0c31dc00 00000000 80018018 00c0d300 00000002 01008010 00010000 a9160b00
 [  8794.379988] 00000001 00000000

 [  8794.379988] heartbeat [0080] 00000001 00000000

 [  8794.379988] heartbeat 	On hold?: 0
 [  8794.379988] heartbeat 	MMIO base:  0x00002000
 [  8794.379988] heartbeat 	CCID: 0x00dc310d
 [  8794.379988] heartbeat 	RING_START: 0x00004000
 [  8794.379988] heartbeat 	RING_HEAD:  0x00001d10
 [  8794.379988] heartbeat 	RING_TAIL:  0x00001e30
 [  8794.379988] heartbeat 	RING_CTL:   0x00003001
 [  8794.379988] heartbeat 	RING_MODE:  0x00000040
 [  8794.379988] heartbeat 	ACTHD:  0x00000000_00d3c1bc
 [  8794.379988] heartbeat 	BBADDR: 0x00000000_00d3c1bb
 [  8794.379988] heartbeat 	DMA_FADDR: 0x00000000_00d3c380
 [  8794.379988] heartbeat 	IPEIR: 0x00000000
 [  8794.379988] heartbeat 	IPEHR: 0x60020100
 [  8794.379988] heartbeat 		E  2:b16a9*-  @ 6000ms: xlock[4402]
 [  8794.379988] heartbeat 		E  2:b16aa-  @ 5840ms: X[19491]
 [  8794.379988] heartbeat 		E  2:b16ab  @ 3000ms: [i915]
 [  8794.379988] heartbeat HWSP:
 [  8794.379988] heartbeat [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 [  8794.379988] 00000000 00000000 00000000 00000000 00000000 00000000 0
 [  8794.379988] heartbeat *
 [  8794.379988] heartbeat [0100] a8160b00 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 [  8794.379988] 00000000 00000000 00000000 00000000 00000000 00000000 0
 [  8794.379988] heartbeat [0120] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 [  8794.379988] 00000000 00000000 00000000 00000000 00000000 00000000 0
 [  8794.379988] heartbeat *
 [  8794.379988] heartbeat Idle? no
 [  8794.379988] heartbeat Signals:
 [  8794.379988] heartbeat 	[2:b16a9*] @ 6000ms
 [  8794.379988] heartbeat 	[2:b16aa] @ 5840ms
 [  8794.379988] i915drmkms0: notice: Resetting chip for stopped heartbeat on rcs0
 [  8794.379988] i915drmkms0: notice: xlock[4402] context reset due to GPU hang
 [...]
 [ 15477.672336] panic: kernel diagnostic assertion "cv_is_valid(cv)" failed: file "/x/current/src/sys/kern/kern_condvar.c", line 511 
 [ 15477.672336] cpu0: Begin traceback...
 [ 15477.672336] vpanic() at netbsd:vpanic+0x183
 [ 15477.672336] kern_assert() at netbsd:kern_assert+0x4b
 [ 15477.672336] cv_broadcast() at netbsd:cv_broadcast+0x56
 [ 15477.672336] linux___dma_fence_signal_wake() at netbsd:linux___dma_fence_signal_wake+0x13e
 [ 15477.672336] signal_irq_work() at netbsd:signal_irq_work+0x2ca
 [ 15477.672336] irq_work_intr() at netbsd:irq_work_intr+0x87
 [ 15477.672336] softint_dispatch() at netbsd:softint_dispatch+0xf9
 [ 15477.672336] DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xffffb300aed490f0
 [ 15477.672336] Xsoftintr() at netbsd:Xsoftintr+0x4f
 [ 15477.672336] --- interrupt ---
 [ 15477.672336] ff9dffe7fafbfeff:
 [ 15477.672336] cpu0: End traceback...

 [ 15477.672336] dumping to dev 0,1 (offset=16867127, size=2086023):
 [ 15477.672336] dump {subsequent boot dmesg begins here}

 Was playing a YouTube video through Firefox 101.0.1 (pkgsrc-2022Q2)
 at the time, but I've seen it before while the system was essentially
 idle (or as idle as it can be with Firefox running).

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]consolidated[flyspeck]net  OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56561 CVS commit: src/sys/external/bsd/drm2/dist/drm/i915
Date: Mon, 11 Jul 2022 18:56:00 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jul 11 18:56:00 UTC 2022

 Modified Files:
 	src/sys/external/bsd/drm2/dist/drm/i915: i915_request.c

 Log Message:
 i915: Defer destroying waitqueue until after callback is removed.

 Candidate fix for PR kern/56561.


 To generate a diff of this commit:
 cvs rdiff -u -r1.16 -r1.17 \
     src/sys/external/bsd/drm2/dist/drm/i915/i915_request.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "John D. Baker" <jdbaker@consolidated.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/56561: cv_is_valid assertion failure in intel drm
Date: Mon, 18 Jul 2022 12:42:28 -0500 (CDT)

 After this commit:

   https://mail-index.netbsd.org/source-changes/2022/07/11/msg139718.html

 I've been running the resulting 9.99.98 (from sources around 13 July
 2022) for over 2 days straight without this issue recurring, so I'd say
 it's fixed.  (It would usually occur within a few hours of booting.)

 Now building 9.99.99 to see if the issues described here:

   https://mail-index.netbsd.org/current-users/2022/07/17/msg042673.html

 affect my i82G41 system.


 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]consolidated[flyspeck]net  OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

State-Changed-From-To: open->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 28 Aug 2022 13:37:41 +0000
State-Changed-Why:
fixed in i915_request.c 1.17 on 2022-07-11


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.