NetBSD Problem Report #56672

From manu@netbsd.org  Wed Jan 26 10:26:24 2022
Return-Path: <manu@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 14DAF1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 26 Jan 2022 10:26:24 +0000 (UTC)
Message-Id: <20220126102623.E8C3184CFD@mail.netbsd.org>
Date: Wed, 26 Jan 2022 10:26:23 +0000 (UTC)
From: manu@netbsd.org
Reply-To: manu@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: i915drmkms hangs on boot
X-Send-Pr-Version: 3.95

>Number:         56672
>Category:       kern
>Synopsis:       i915drmkms hangs on boot
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          feedback
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 26 10:30:00 +0000 2022
>Closed-Date:    
>Last-Modified:  Sun Jul 09 20:10:01 +0000 2023
>Originator:     Emmanuel Dreyfus
>Release:        NetBSD 9.99.93
>Organization:
Emmanuel Dreyfus
manu@netbsd.org
>Environment:
NetBSD basalte 9.99.93 NetBSD 9.99.93 (GENERIC) #96: Fri Jan  7 05:54:04 CET 2022  manu@basalte:/home2/manu/netbsd-src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
Since the recent DRM upgrade, I randomly get hang at boot time in i915drmkms initialization. 

Initial investigation with ddb shows that the i915_flip kernel thread gets a corrupted backtrace. I suspect everyone is waiting on it to complete, but that will never happen because it crashed.
>How-To-Repeat:
Reboot and sometimes get hung
>Fix:
Not knwon yet

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sun, 22 May 2022 22:23:33 +0000
State-Changed-Why:
Does this still happen? Can you mention the latest dmesg messages, preferably with `boot -v -x`?


From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56672 (i915drmkms hangs on boot)
Date: Tue, 24 May 2022 08:48:41 +0000

 On Sun, May 22, 2022 at 10:23:34PM +0000, maya@NetBSD.org wrote:
 > Does this still happen? Can you mention the latest dmesg messages, preferably with `boot -v -x`?

 Yes, it still happens, but not with a serial console, hence  I cannot 
 copy/paste the whole output

 Here is a case:
 intelfb: framebuffer at 0xc0040000, size 1200x1920, depth 32, srtride 4800
 (hang with black screen until I hit power butten...)
 acpi0: power button pressed, shuttig down!
 (...)
 {drm:netbsd:drm_atomic_helper_wait_for_flip_done+0x1ea} *ERROR*  [CRTC:51:pipe A] flip_done timed out

 Another case:
 panic: kernel assertion "(i * BITMAP_SIZE) < pp->pr_itemperpage" failed: file (...) subr_poool.c, line 450
 backtrace show we come from:
 (...)
 kmem_intr_alloc
 drm_client_modeset_probe
 drm_fb_helper_hotplug_event.part.0

 It seems to crash less often with -v -x, there must be a lot hangs caused
 by race conditions, because -v -x or serial console cause a huge slowdown.

 Without -v -x, I most often hang on:
 i915drmkms0: interrupting at msi5 vec 0 (i9015drmkms0)
 i915drmkms0: notice: Failed to load DNC firmware i915/kbl_dmc_vec1_04.bin: Disabling runtiome power management


 -- 
 Emmanuel Dreyfus
 manu@netbsd.org

State-Changed-From-To: feedback->open
State-Changed-By: maya@NetBSD.org
State-Changed-When: Fri, 27 May 2022 01:13:55 +0000
State-Changed-Why:


State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Thu, 15 Sep 2022 21:18:01 +0000
State-Changed-Why:
Can you please try with a LOCKDEBUG kernel?
(DIAGNOSTIC+DEBUG+LOCKDEBUG)


From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
	gnats-admin@netbsd.org, riastradh@NetBSD.org, manu@netbsd.org
Subject: Re: kern/56672 (i915drmkms hangs on boot)
Date: Sun, 9 Oct 2022 00:16:45 +0000

 On Thu, Sep 15, 2022 at 09:18:02PM +0000, riastradh@NetBSD.org wrote:
 > Can you please try with a LOCKDEBUG kernel?
 > (DIAGNOSTIC+DEBUG+LOCKDEBUG)

 After upgrading to -current from 20221002, I have seen no panic at boot 
 time for a while. The machine stil crashes often at boot though. It
 happens at framebuffer switch while the screen is black, and if there
 is a panic, I cannot see it.


 -- 
 Emmanuel Dreyfus
 manu@netbsd.org

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56672: i915drmkms hangs on boot
Date: Wed, 31 May 2023 09:36:32 +0200

 FWIW, I'm also seeing a hang an boot on a Dell laptop with i915.
 It hangs after printing "i915drmkms0: notice: Failed to load DNC firmware".
 I've not seen the other issues described in this PR

 I also have 2 other hosts (desktops) with i915 and never seen any boot issue
 with them so it seems to be hardware-specific.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Taylor R Campbell <riastradh@NetBSD.org>
To: manu@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/56672: i915drmkms hangs on boot
Date: Sun, 9 Jul 2023 20:09:00 +0000

 Can you try enabling the new HEARTBEAT kernel option, and setting
 DDB_COMMANDONENTER="bt;ps;show all tstiles/t;sync", and see if you get
 more diagnostic information that way?

 This should be considerably less costly than LOCKDEBUG, in case that
 was a barrier to further feedback the last time around -- I intend to
 turn HEARTBEAT on by default once it has had a little more testing.

 options 	HEARTBEAT
 options 	HEARTBEAT_MAX_PERIOD_DEFAULT=15
 options 	DDB_COMMANDONENTER="bt;ps;show all tstiles/t;sync"

 Without more diagnostic information, there's not much more I can do
 about this.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.