NetBSD Problem Report #57999

From www@netbsd.org  Tue Mar  5 01:50:56 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E31E31A9239
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  5 Mar 2024 01:50:56 +0000 (UTC)
Message-Id: <20240305015054.F17801A923A@mollari.NetBSD.org>
Date: Tue,  5 Mar 2024 01:50:54 +0000 (UTC)
From: schaecsn@gmx.net
Reply-To: schaecsn@gmx.net
To: gnats-bugs@NetBSD.org
Subject: i915 heartbeat not ticking on Sandy Bridge
X-Send-Pr-Version: www-1.0

>Number:         57999
>Category:       kern
>Synopsis:       i915 heartbeat not ticking on Sandy Bridge
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Mar 05 01:55:00 +0000 2024
>Originator:     Stefan Schaeckeler
>Release:        10.0_RC4
>Organization:
>Environment:
NetBSD netbsd 10.0_RC4 NetBSD 10.0_RC4 (GENERIC) #0: Sun Feb 18 08:20:49 PST 2024  root@netbsd:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
>Description:
There are two i915 PRs when searching for "heartbeat":

- This one http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57440 is closed as it only seen from UEFI boots but not from BIOS boots. My computer has a BIOS only and so my problem must different.

- The patch from http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57268 is Haswell specific.


Below is my PR for NetBSD 10_R4 on Sandy Bridge (Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz, id 0x206a7)

# lspci -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)

xorg.conf
 Option     "AccelMethod"                "uax"


Below heartbeat not ticking error happens after running X for 5 days. There were occasional graphics glitches the days and in particular the hours before this error (pixel noise: some segments of the screen showed random pixels for a few minutes).

[ 495143.126497] heartbeat rcs0 heartbeat {prio:-2147483645} not ticking
[ 495143.126497] heartbeat      Awake? 4
[ 495143.126497] heartbeat      Barriers?: no
[ 495143.126497] heartbeat      Latency: 139us
[ 495143.126497] heartbeat      Heartbeat: 3000 ms ago
[ 495143.126497] heartbeat      Reset count: 0 (global 0)
[ 495143.126497] heartbeat      Requests:
[ 495143.126497] heartbeat              active  2:5235ed*-  @ 6000ms: X[1063]
[ 495143.126497] heartbeat              ring->start:  0x7fffa000
[ 495143.126497] heartbeat              ring->head:   0x00000c30
[ 495143.126497] heartbeat              ring->tail:   0x00000de8
[ 495143.126497] heartbeat              ring->emit:   0x00001860
[ 495143.126497] heartbeat              ring->space:  0x00003390
[ 495143.126497] heartbeat              ring->hwsp:   0x7fffe100
[ 495143.126497] heartbeat [head 0c68, postfix 0db0, tail 0de8, batch 0x00000000_01679000]:
[ 495143.126497] warning: /usr/src/sys/external/bsd/drm2/dist/drm/i915/gt/intel_engine_cs.c:1234: WARN_ON_ONCE(hex_dump_to_buffer(buf + pos, len - pos, rowsize, sizeof(u32), line, sizeof(line), 0) >= sizeof(line))
[ 495143.126497] heartbeat [0000] 0300007a 02001000 84f0ff7f 00000000 00000000 00000000 0300007a 00400000
[ 495143.126497] 84f0ff7f 00000000 00000000 00000000 0200007a 1c4c1400 8
[ 495143.126497] heartbeat [0020] 84f0ff7f 00000000 00000000 00000000 0200007a 1c4c1400 84f0ff7f 00000000
[ 495143.126497] 0300007a 02001000 84f0ff7f 00000000 00000000 00000000 0
[ 495143.126497] heartbeat [0040] 0300007a 02001000 84f0ff7f 00000000 00000000 00000000 0300007a 00400000
[ 495143.126497] 84f0ff7f 00000000 00000000 00000000 0200007a 01101000 8
[ 495143.126497] heartbeat [0060] 84f0ff7f 00000000 00000000 00000000 0200007a 01101000 84f0ff7f 00000000
[ 495143.126497] 01000011 20220000 ffffffff 01000011 28220000 0000df7f 0
[ 495143.126497] heartbeat [0080] 01000011 20220000 ffffffff 01000011 28220000 0000df7f 01004012 28220000
[ 495143.126497] 00f0ff7f 01000011 c0200000 00020002 0300007a 02001000 8
[ 495143.126497] heartbeat [00a0] 00f0ff7f 01000011 c0200000 00020002 0300007a 02001000 84f0ff7f 00000000
[ 495143.126497] 00000000 00000000 0300007a 00400000 84f0ff7f 00000000 0
[ 495143.126497] heartbeat [00c0] 00000000 00000000 0300007a 00400000 84f0ff7f 00000000 00000000 00000000
[ 495143.126497] 0200007a 01101000 84f0ff7f 00000000 0300007a 02001000 8
[ 495143.126497] heartbeat [00e0] 0200007a 01101000 84f0ff7f 00000000 0300007a 02001000 84f0ff7f 00000000
[ 495143.126497] 00000000 00000000 0300007a 00400000 84f0ff7f 00000000 0
[ 495143.126497] heartbeat [0100] 00000000 00000000 0300007a 00400000 84f0ff7f 00000000 00000000 00000000
[ 495143.126497] 0200007a 1c4c1400 84f0ff7f 00000000 00000000 0000000c 0
[ 495143.126497] heartbeat [0120] 0200007a 1c4c1400 84f0ff7f 00000000 00000000 0000000c 0ce19001 00000000
[ 495143.126497] 00018018 00906701 0200007a 02001000 00000000 00000000 0
[ 495143.126497] heartbeat [0140] 00018018 00906701 0200007a 02001000 00000000 00000000 0200007a 00400000
[ 495143.126497] 04f0ff7f 00000000 0200007a 21501000 04e1ff7f ed355200 0
[ 495143.126497] heartbeat [0160] 04f0ff7f 00000000 0200007a 21501000 04e1ff7f ed355200 00000001 00000000

[ 495143.126497] heartbeat      On hold?: 0
[ 495143.126497] heartbeat      MMIO base:  0x00002000
[ 495143.126497] heartbeat      CCID: 0x0190e10d
[ 495143.126497] heartbeat      RING_START: 0x7fffa000
[ 495143.126497] heartbeat      RING_HEAD:  0x00000db0
[ 495143.126497] heartbeat      RING_TAIL:  0x00000de8
[ 495143.126497] heartbeat      RING_CTL:   0x00003001
[ 495143.126497] heartbeat      RING_MODE:  0x00004040
[ 495143.126497] heartbeat      RING_IMR: fffffffe
[ 495143.126497] heartbeat      ACTHD:  0x00000000_016b8f24
[ 495143.126497] heartbeat      BBADDR: 0x00000000_016b8f25
[ 495143.126497] heartbeat      DMA_FADDR: 0x00000000_016b9100
[ 495143.126497] heartbeat      IPEIR: 0x00000000
[ 495143.126497] heartbeat      IPEHR: 0x23002000
[ 495143.126497] heartbeat              E  2:5235ed*-  @ 6000ms: X[1063]
[ 495143.126497] heartbeat HWSP:
[ 495143.126497] heartbeat [0000] 01000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 495143.126497] 00000000 00000000 00000000 00000000 00000000 00000000 0
[ 495143.126497] heartbeat [0020] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 495143.126497] 00000000 00000000 00000000 00000000 00000000 00000000 0
[ 495143.126497] heartbeat *
[ 495143.126497] heartbeat [0100] ec355200 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 495143.126497] 00000000 00000000 00000000 00000000 00000000 00000000 0
[ 495143.126497] heartbeat [0120] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 495143.126497] 00000000 00000000 00000000 00000000 00000000 00000000 0
[ 495143.126497] heartbeat *
[ 495143.126497] heartbeat Idle? no
[ 495143.126497] heartbeat Signals:
[ 495143.126497] heartbeat      [2:5235ed*] @ 6000ms
[ 495143.126497] i915drmkms0: notice: Resetting chip for stopped heartbeat on rcs0
[ 495143.126497] i915drmkms0: notice: X[1063] context reset due to GPU hang
[ 495143.126497] heartbeat bcs0 heartbeat {prio:-2147483645} not ticking
[ 495143.126497] heartbeat      Awake? 4
[ 495143.126497] heartbeat      Barriers?: no
[ 495143.126497] heartbeat      Latency: 179us
[ 495143.126497] heartbeat      Heartbeat: 3000 ms ago
[ 495143.126497] heartbeat      Reset count: 0 (global 1)
[ 495143.126497] heartbeat      Requests:
[ 495143.126497] heartbeat              active  4:1dd3927!-  @ 6000ms: X[1063]
[ 495143.126497] heartbeat              ring->start:  0x7fff3000
[ 495143.126497] heartbeat              ring->head:   0x00002310
[ 495143.126497] heartbeat              ring->tail:   0x00002640
[ 495143.126497] heartbeat              ring->emit:   0x00002748
[ 495143.126497] heartbeat              ring->space:  0x00003b88
[ 495143.126497] heartbeat              ring->hwsp:   0x7fff7100
[ 495143.126497] heartbeat [head 25b8, postfix 2630, tail 2640, batch 0x00000000_040cd000]:
[ 495143.126497] heartbeat [0000] 01402413 04020000 00000000 00000000 01402013 04020000 00000000 00000000
[ 495143.126497] 01000011 20220200 ffffffff 01000011 28220200 0000df7f 0
[ 495143.126497] heartbeat [0020] 01000011 20220200 ffffffff 01000011 28220200 0000df7f 01004012 28220200
[ 495143.126497] 00f0ff7f 01000011 c0200200 00020002 01402013 04020000 0
[ 495143.126497] heartbeat [0040] 00f0ff7f 01000011 c0200200 00020002 01402013 04020000 00000000 00000000
[ 495143.126497] 01402413 04020000 00000000 00000000 00000000 00000000 0
[ 495143.126497] heartbeat [0060] 01402413 04020000 00000000 00000000 00000000 00000000 01402013 04010000
[ 495143.126497] 2739dd01 00000001

[ 495143.126497] heartbeat [0080] 2739dd01 00000001

[ 495143.126497] heartbeat      On hold?: 0
[ 495143.126497] heartbeat      MMIO base:  0x00022000
[ 495143.126497] heartbeat      RING_START: 0x7fff3000
[ 495143.126497] heartbeat      RING_HEAD:  0x00002748
[ 495143.126497] heartbeat      RING_TAIL:  0x00002748
[ 495143.126497] heartbeat      RING_CTL:   0x00003001
[ 495143.126497] heartbeat      RING_MODE:  0x00000200 [idle]
[ 495143.126497] heartbeat      RING_IMR: ffbfffff
[ 495143.126497] heartbeat      ACTHD:  0x00000000_00002748
[ 495143.126497] heartbeat      BBADDR: 0x00000000_00000000
[ 495143.126497] heartbeat      DMA_FADDR: 0x00000000_7fff5748
[ 495143.126497] heartbeat      IPEIR: 0x00000000
[ 495143.126497] heartbeat      IPEHR: 0x01000000
[ 495143.126497] heartbeat              E  4:1dd3922!+  @ 6000ms: signaled
[ 495143.126497] heartbeat              E  4:1dd3923!+  @ 6000ms: signaled
[ 495143.126497] heartbeat              E  4:1dd3924!+  @ 6000ms: signaled
[ 495143.126497] heartbeat              E  4:1dd3925!+  @ 6000ms: signaled
[ 495143.126497] heartbeat              E  4:1dd3926!+  @ 6000ms: signaled
[ 495143.126497] heartbeat              E  4:1dd3927!+  @ 6000ms: signaled
[ 495143.126497] heartbeat              E  4:1dd3928!  @ 5990ms: X[1063]
[ 495143.126497] heartbeat              E  4:1dd3929!  @ 3000ms: [i915]
[ 495143.126497] heartbeat HWSP:
[ 495143.126497] heartbeat [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 495143.126497] 00000000 00000000 00000000 00000000 00000000 00000000 0
[ 495143.126497] heartbeat *
[ 495143.126497] heartbeat [0100] 2939dd01 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 495143.126497] 00000000 00000000 00000000 00000000 00000000 00000000 0
[ 495143.126497] heartbeat [0120] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 495143.126497] 00000000 00000000 00000000 00000000 00000000 00000000 0
[ 495143.126497] heartbeat *
[ 495143.126497] heartbeat Idle? yes
[ 495143.126497] i915drmkms0: notice: Resetting chip for stopped heartbeat on bcs0
[ 495143.126497] i915drmkms0: notice: X[1063] context reset due to GPU hang


I had also issues with the i915 driver on NetBSD 9. I spent a few hours debugging it, but realized a life-time is not enough.

Is there a recommended external GPU for NetBSD? Something that just works without any graphic glitches? Nothing fancy, just something for watching videos with mpv a/o ffplay, and firefox/youtube?
>How-To-Repeat:
Run X on Sandy Bridge GPUs.
>Fix:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.