NetBSD Problem Report #55488

From jruohone@gmail.com  Tue Jul 14 05:04:01 2020
Return-Path: <jruohone@gmail.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 94E4A1A9213
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 14 Jul 2020 05:04:01 +0000 (UTC)
Message-Id: <20200714050355.EC58F1AECCE@kafka.localdomain>
Date: Tue, 14 Jul 2020 08:03:55 +0300 (EEST)
From: jruohonen@iki.fi
Sender: j ruohonen <jruohone@gmail.com>
Reply-To: jruohonen@iki.fi
To: gnats-bugs@NetBSD.org
Subject: Occasional panic upon resume (i915drmkms-related)
X-Send-Pr-Version: 3.95

>Number:         55488
>Category:       kern
>Synopsis:       Occasional panic upon resume (i915drmkms-related)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jul 14 05:05:00 +0000 2020
>Originator:     Jukka Ruohonen
>Release:        NetBSD 9.0_STABLE
>Organization:
>Environment:
NetBSD camus 9.0_STABLE NetBSD 9.0_STABLE (GENERIC_KASLR) #0: Sat Jul  4
18:46:38 EEST 2020
jruoho@camus:/usr/obj/sys/arch/amd64/compile/GENERIC_KASLR amd64
>Description:
The i915drmkms driver sometimes causes a panic when resuming from suspend:

[ 65582.854289] ioapic0 reenabling
[ 65589.565336] kern info: [drm] stuck on render ring
[ 65589.565336] kern info: [drm] GPU HANG: ecode 7:0:0xfffffffe, reason: Ring hung, action: reset
[ 65589.565336] kern info: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 65589.565336] kern info: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI ->  DRM/Intel
[ 65589.565336] kern info: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 65589.565336] kern info: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 65589.565336] kern info: [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 65589.565336] kern error: [drm:(/usr/src/sys/external/bsd/drm2/dist/drm/i915/i915_drv.c:841)i915_drm_resume] *ERROR* failed to re-initialize GPU, declaring wedged!
[ 65589.565336] drm/i915: Resetting chip after gpu hang
[ 65589.565336] uvm_fault(0xffffffff8ffe2c80, 0x0, 1) -> e
[ 65589.565336] fatal page fault in supervisor mode
[ 65589.565336] trap type 6 code 0 rip 0xffffffffc7b3e670 cs 0x8 rflags 0x10293 cr2 0x8 ilevel 0 rsp 0xffffb880aafa5d10
[ 65589.565336] curlwp 0xfffff1c521abe4a0 pid 0.89 lowest kstack 0xffffb880aafa32c0
[ 65589.565336] panic: trap
[ 65589.565336] cpu3: Begin traceback...
[ 65589.575344] vpanic() at netbsd:vpanic+0x160
[ 65589.575344] snprintf() at netbsd:snprintf
[ 65589.585351] startlwp() at netbsd:startlwp
[ 65589.585351] warning: /usr/src/sys/external/bsd/drm2/dist/drm/i915/intel_display.c:2457: WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex))alltraps() at netbsd:alltraps+0xbb
[ 65589.585351] intel_cleanup_ring_buffer() at netbsd:intel_cleanup_ring_buffer+0xec
[ 65589.595359] i915_gem_cleanup_ringbuffer() at netbsd:i915_gem_cleanup_ringbuffer+0x51
[ 65589.595359] i915_gem_init_hw() at netbsd:i915_gem_init_hw+0x58e
[ 65589.605367] i915_reset() at netbsd:i915_reset+0x89
[ 65589.605367] i915_handle_error() at netbsd:i915_handle_error+0x9ba
[ 65589.615374] linux_workqueue_thread() at netbsd:linux_workqueue_thread+0xdd
[ 65589.615374] cpu3: End traceback...
[ 65589.615374] dumping to dev 20,0 (offset=50954631, size=2019279):
[ 65589.615374] dump ehci1: config timeout
[ 65589.805522] ahcisata0 port 0: device present, speed: 6.0Gb/s
[ 65589.805522] autoconfiguration error: ahcisata0 port 0: clearing WDCTL_RST failed for drive 0
[ 65589.805522] ACPI Error: Mutex [ACPI_MTX_Caches] (0x4) is not acquired, cannot release (20190405/utmutex-369)
[ 65589.805522] ACPI Error: Could not allocate an object descriptor (20190405/utcopy-1050)
[ 65589.805522] ACPI Error: Aborting method \_PR.CPU0._CST due to previous error (AE_NO_MEMORY) (20190405/psparse-581)
[ 65589.805522] ACPI Error: Mutex [ACPI_MTX_Caches] (0x4) is not acquired, cannot release (20190405/utmutex-369)
[ 65589.805522] ACPI Error: Failed to extend the result stack (20190405/dswstate-184)
[ 65589.805522] ACPI Error: Aborting method \_PR.CPU0._CST due to previous error (AE_NO_MEMORY) (20190405/psparse-581)
[ 65589.805522] ACPI Error: Aborting method \_PR.CPU1._CST due to previous error (AE_NO_MEMORY) (20190405/psparse-581)
[ 65589.805522] ACPI Error: Mutex [ACPI_MTX_Caches] (0x4) is not acquired, cannot release (20190405/utmutex-369)
[ 65589.805522] ACPI Error: Mutex [ACPI_MTX_Namespace] (0x1) is not acquired, cannot release (20190405/utmutex-369)
[ 65589.805522] ACPI Error: Could not release AML Namespace mutex (20190405/exutils-151)
[ 65589.805522] ACPI Error: Mutex [ACPI_MTX_Interpreter] (0x0) is not acquired, cannot release (20190405/utmutex-369)
[ 65589.805522] ACPI Error: Could not release AML Interpreter mutex (20190405/exutils-156)
>How-To-Repeat:
1. Reproduction is difficult. This panic only occurs in about 1/10:th of
suspend/resume cycles.

2. However, I have not seen the panic with identical hardware running
-current. So it may be that the problematic code has already been fixed.
>Fix:
N/A

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.