NetBSD Problem Report #57325
From www@netbsd.org Thu Apr 6 02:21:39 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id DEAFE1A9239
for <gnats-bugs@gnats.NetBSD.org>; Thu, 6 Apr 2023 02:21:38 +0000 (UTC)
Message-Id: <20230406022137.6BC241A923A@mollari.NetBSD.org>
Date: Thu, 6 Apr 2023 02:21:37 +0000 (UTC)
From: germain@lanvaux.ca
Reply-To: germain@lanvaux.ca
To: gnats-bugs@NetBSD.org
Subject: Crash on boot w/ amdgpu driver on Lenovo ThinkCentre M75n
X-Send-Pr-Version: www-1.0
>Number: 57325
>Category: kern
>Synopsis: Crash on boot w/ amdgpu driver on Lenovo ThinkCentre M75n
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Apr 06 02:25:00 +0000 2023
>Last-Modified: Mon Aug 21 21:15:01 +0000 2023
>Originator: Germain Le Chapelain
>Release: Current (10.99.2)
>Organization:
Lanvaux Computer Games Limited
>Environment:
NetBSD germ2.lanvaux.fr 10.99.2 NetBSD 10.99.2 (GENERIC) #0: Mon Apr 3 01:29:03 PDT 2023 german@germ2.lanvaux.fr:/usr/src/sys/arch/amd64/compile/obj/GENERIC amd64
>Description:
It crashes on boot with a call stack after some error in dmesg, notably `unable to locate a BIOS ROM'.
I'm still not 100% on debugging kernel, so I have an actual screenshot here:
http://lanvaux.fr/.download/projs/support/IMG_5866.JPG
>How-To-Repeat:
Compile the current source w/ amdgpu & amdgpufb lines uncommented in `GENERIC' and install & run on a Lenovo ThinkCentre M75n
>Fix:
I'm sorry I want to help more.
I can keep looking down the code though rly not sure how far I can take it.
Up, it *looked* like some function was returning IERRVAL (-1?) and some function was checking that against 0 to determine if there was a problem.
But that's after the problem of `nor being able to locate the BIOS'.
>Audit-Trail:
From: Germain Le Chapelain <germain.lechapelain@lanvaux.fr>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57325: Crash on boot w/ amdgpu driver on Lenovo ThinkCentre M75n
Date: Wed, 10 May 2023 21:45:28 -0700
--Apple-Mail-27DFC81C-E2F0-4749-8ACA-EB1D934795CB
Content-Type: text/plain;
charset=utf-8
Content-Transfer-Encoding: quoted-printable
=EF=BB=BFDear NetBSD,
So I=E2=80=99ve tried (and succeeded in) patching my kernel to pass this err=
or
The parches are here&here:
http://lanvaux.ca/.download/projs/support/patch
http://lanvaux.ca/.download/projs/support/patch.2 <- important one, passes t=
he error
However I am facing a new error:
=E2=80=9CNo console dev=E2=80=9D panic or something of the like
Something tells me that I should enable ACPI proper without tempering w/ the=
original code but it=E2=80=99s still a shot in the dark
I don=E2=80=99t *think* I had a typo in my patch and it checks for a checksu=
m at the end of the function previously returning just false.
I was aiming for the smallest change
(But that=E2=80=99s not the smallest change in the original code.)
Thank you see you! (Also the code is updated since upstream.. but yeah .. a r=
edder herring I think.)
Kindest regards,
=E2=80=94=20
Germain
--Apple-Mail-27DFC81C-E2F0-4749-8ACA-EB1D934795CB
Content-Type: text/html;
charset=utf-8
Content-Transfer-Encoding: quoted-printable
<html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D=
utf-8"></head><body dir=3D"auto"><div dir=3D"ltr">=EF=BB=BF<meta http-equiv=3D=
"content-type" content=3D"text/html; charset=3Dutf-8">Dear NetBSD,<div><br><=
/div><div>So I=E2=80=99ve tried (and succeeded in) patching my kernel to pas=
s this error</div><div><br></div><div>The parches are here&here:</div><d=
iv><ul><li><a href=3D"http://lanvaux.ca/.download/projs/support/patch">http:=
//lanvaux.ca/.download/projs/support/patch</a></li><li><a href=3D"http://lan=
vaux.ca/.download/projs/support/patch.2">http://lanvaux.ca/.download/projs/s=
upport/patch.2</a> <- important one, passes the error</li></ul><div>=
However I am facing a new error:</div></div><div><br></div><div>=E2=80=9CNo c=
onsole dev=E2=80=9D panic or something of the like</div><div><br></div=
><div>Something tells me that I should enable ACPI proper without tempering w=
/ the original code but it=E2=80=99s still a shot in the dark</div><div><br>=
</div><div>I don=E2=80=99t *think* I had a typo in my patch and it checks fo=
r a checksum at the end of the function previously returning just false.</di=
v><div><br></div><div>I was aiming for the smallest change</div><div>(But th=
at=E2=80=99s not the smallest change in the original code.)</div><div><br></=
div><div>Thank you see you! (Also the code is updated since upstream.. but y=
eah .. a redder herring I think.)</div><div><br></div><div>Kindest regards,<=
/div><div>=E2=80=94 </div><div>Germain</div><div><br></div><div><br></d=
iv><div><br></div><div><br></div></div></body></html>=
--Apple-Mail-27DFC81C-E2F0-4749-8ACA-EB1D934795CB--
From: Germain Le Chapelain <germain@lanvaux.ca>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57325: Crash on boot w/ amdgpu driver on Lenovo
ThinkCentre M75n
Date: Wed, 12 Jul 2023 17:40:44 -0700
Following is the patch to get to problem 57059
I think it's good to check-in!
Kindest regards,
--
Germain Le Chapelain <germain.lechapelain@lanvaux.fr>
Index: sys/arch/amd64/conf/GENERIC
===================================================================
RCS file: /cvsroot/src/sys/arch/amd64/conf/GENERIC,v
retrieving revision 1.602
diff -u -r1.602 GENERIC
--- sys/arch/amd64/conf/GENERIC 12 Apr 2023 06:39:15 -0000 1.602
+++ sys/arch/amd64/conf/GENERIC 13 Jul 2023 00:15:10 -0000
@@ -461,8 +461,8 @@
radeon* at pci? dev ? function ?
radeondrmkmsfb* at radeonfbbus?
-#amdgpu* at pci? dev ? function ?
-#amdgpufb* at amdgpufbbus?
+amdgpu* at pci? dev ? function ?
+amdgpufb* at amdgpufbbus?
nouveau* at pci? dev ? function ?
nouveaufb* at nouveaufbbus?
Index: sys/external/bsd/drm2/amdgpu/files.amdgpu
===================================================================
RCS file: /cvsroot/src/sys/external/bsd/drm2/amdgpu/files.amdgpu,v
retrieving revision 1.29
diff -u -r1.29 files.amdgpu
--- sys/external/bsd/drm2/amdgpu/files.amdgpu 24 Jul 2022 20:05:00 -0000 1.29
+++ sys/external/bsd/drm2/amdgpu/files.amdgpu 13 Jul 2023 00:15:12 -0000
@@ -32,7 +32,10 @@
makeoptions amdgpu "CPPFLAGS.amdgpu"+="-I$S/external/bsd/drm2/dist/drm/amd/display/amdgpu_dm"
makeoptions amdgpu "CPPFLAGS.amdgpu"+="-I$S/external/bsd/drm2/dist/drm/amd/display/dmub/inc"
+makeoptions amdgpu "CPPFLAGS.amdgpu"+="-DCONFIG_ACPI=1"
+makeoptions amdgpu "CPPFLAGS.amdgpu"+="-DNACPICA=1"
makeoptions amdgpu "CPPFLAGS.amdgpu"+="-DCONFIG_DRM_AMD_ACP=1"
+makeoptions amdgpu "CPPFLAGS.amdgpu"+="-DCONFIG_DRM_AMD_DC=1"
makeoptions amdgpu "CPPFLAGS.amdgpu"+="-DCONFIG_DRM_AMD_DC_DCN=1"
makeoptions amdgpu "CPPFLAGS.amdgpu"+="-DCONFIG_DRM_AMD_DC_HDCP=1"
makeoptions amdgpu "CPPFLAGS.amdgpu"+="-DCONFIG_PERF_EVENTS=0"
@@ -353,7 +356,7 @@
file external/bsd/drm2/dist/drm/amd/amdgpu/../powerplay/smumgr/amdgpu_vega20_smumgr.c amdgpu
file external/bsd/drm2/dist/drm/amd/amdgpu/../powerplay/smumgr/amdgpu_vegam_smumgr.c amdgpu
file external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acp.c amdgpu
-#file external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c amdgpu
+file external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c amdgpu
file external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_afmt.c amdgpu
file external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_amdkfd.c amdgpu
file external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_arct_reg_init.c amdgpu
Index: sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c
===================================================================
RCS file: /cvsroot/src/sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c,v
retrieving revision 1.5
diff -u -r1.5 amdgpu_acpi.c
--- sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c 27 Feb 2022 14:24:26 -0000 1.5
+++ sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c 13 Jul 2023 00:15:12 -0000
@@ -40,6 +40,9 @@
#include "amd_acpi.h"
#include "atom.h"
+#include <linux/nbsd-namespace.h>
+#include <linux/nbsd-namespace-acpi.h>
+
struct amdgpu_atif_notification_cfg {
bool enabled;
int command_code;
@@ -362,6 +365,7 @@
return err;
}
+#ifndef __NetBSD__
/**
* amdgpu_atif_get_sbios_requests - get requested sbios event
*
@@ -487,6 +491,7 @@
*/
return NOTIFY_BAD;
}
+#endif
/* Call the ATCS method
*/
@@ -635,7 +640,12 @@
struct amdgpu_atcs *atcs = &adev->atcs;
/* Get the device handle */
+#if defined(__NetBSD__)
+ handle = (adev->pdev->pd_ad ? adev->pdev->pd_ad->ad_handle
+ : NULL);
+#else
handle = ACPI_HANDLE(&adev->pdev->dev);
+#endif
if (!handle)
return -EINVAL;
@@ -678,7 +688,12 @@
return -EINVAL;
/* Get the device handle */
+#if defined(__NetBSD__)
+ handle = (adev->pdev->pd_ad ? adev->pdev->pd_ad->ad_handle
+ : NULL);
+#else
handle = ACPI_HANDLE(&adev->pdev->dev);
+#endif
if (!handle)
return -EINVAL;
@@ -695,8 +710,13 @@
atcs_input.req_type = ATCS_PCIE_LINK_SPEED;
atcs_input.perf_req = perf_req;
+#if defined(__NetBSD__)
+ params.Length = sizeof(struct atcs_pref_req_input);
+ params.Pointer = &atcs_input;
+#else
params.length = sizeof(struct atcs_pref_req_input);
params.pointer = &atcs_input;
+#endif
while (retry--) {
info = amdgpu_atcs_call(handle, ATCS_FUNCTION_PCIE_PERFORMANCE_REQUEST, ¶ms);
@@ -747,6 +767,9 @@
unsigned long val,
void *data)
{
+#ifdef __NetBSD__
+ return 0;
+#else
struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, acpi_nb);
struct acpi_bus_event *entry = (struct acpi_bus_event *)data;
@@ -761,6 +784,7 @@
/* Check for pending SBIOS requests */
return amdgpu_atif_handler(adev, entry);
+#endif
}
/* Call all ACPI methods here */
@@ -781,7 +805,12 @@
int ret;
/* Get the device handle */
+#if defined(__NetBSD__)
+ handle = (adev->pdev->pd_ad ? adev->pdev->pd_ad->ad_handle
+ : NULL);
+#else
handle = ACPI_HANDLE(&adev->pdev->dev);
+#endif
if (!adev->bios || !handle)
return 0;
@@ -863,7 +892,9 @@
out:
adev->acpi_nb.notifier_call = amdgpu_acpi_event;
+#ifndef __NetBSD__
register_acpi_notifier(&adev->acpi_nb);
+#endif
return ret;
}
@@ -889,6 +920,8 @@
*/
void amdgpu_acpi_fini(struct amdgpu_device *adev)
{
+#ifndef __NetBSD__
unregister_acpi_notifier(&adev->acpi_nb);
+#endif
kfree(adev->atif);
}
Index: sys/external/bsd/drm2/include/linux/nbsd-namespace-acpi.h
===================================================================
RCS file: /cvsroot/src/sys/external/bsd/drm2/include/linux/nbsd-namespace-acpi.h,v
retrieving revision 1.1
diff -u -r1.1 nbsd-namespace-acpi.h
--- sys/external/bsd/drm2/include/linux/nbsd-namespace-acpi.h 27 Feb 2022 14:22:42 -0000 1.1
+++ sys/external/bsd/drm2/include/linux/nbsd-namespace-acpi.h 13 Jul 2023 00:15:13 -0000
@@ -39,6 +39,7 @@
#define type Type
#define value Value
+#define acpi_get_name AcpiGetName
#define acpi_get_handle AcpiGetHandle
#define acpi_get_table AcpiGetTable
#define acpi_evaluate_object AcpiEvaluateObject
Index: sys/modules/amdgpu/Makefile
===================================================================
RCS file: /cvsroot/src/sys/modules/amdgpu/Makefile,v
retrieving revision 1.5
diff -u -r1.5 Makefile
--- sys/modules/amdgpu/Makefile 30 Jul 2022 03:29:52 -0000 1.5
+++ sys/modules/amdgpu/Makefile 13 Jul 2023 00:15:14 -0000
@@ -34,7 +34,10 @@
CPPFLAGS+= -I${S}/external/bsd/drm2/dist/drm/amd/display/modules/hdcp
CPPFLAGS+= -I${S}/external/bsd/drm2/dist/drm/amd/display/amdgpu_dm
CPPFLAGS+= -I${S}/external/bsd/drm2/dist/drm/amd/display/dmub/inc
+CPPFLAGS+= -DCONFIG_ACPI=1
+CPPFLAGS+= -DNACPICA=1
CPPFLAGS+= -DCONFIG_DRM_AMD_ACP=1
+CPPFLAGS+= -DCONFIG_DRM_AMD_DC=1
CPPFLAGS+= -DCONFIG_DRM_AMD_DC_DCN=1
CPPFLAGS+= -DCONFIG_DRM_AMD_DC_HDCP=1
CPPFLAGS+= -DCONFIG_PERF_EVENTS=0
@@ -143,6 +146,7 @@
# sed -ne 's,^file external/bsd/drm2/.*/\([^/ ]*\) .*,SRCS+= \1,gp' <files.amdgpu | sort -u
SRCS+= amdgpu_acp.c
SRCS+= amdgpu_acp_hw.c
+SRCS+= amdgpu_acpi.c
SRCS+= amdgpu_afmt.c
SRCS+= amdgpu_amd_powerplay.c
SRCS+= amdgpu_amdkfd.c
From: Germain Le Chapelain <germain@lanvaux.ca>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57325: Crash on boot w/ amdgpu driver on Lenovo
ThinkCentre M75n
Date: Thu, 13 Jul 2023 01:13:39 -0700
I forgot to mention my new call-stack:
http://lanvaux.fr/.download/projs/support/NetBSD-amdgpu/crashwithchanges.txt
It's based on that that it was like the aforementioned problem but looking deeper and I am not seeing it
But I don't knoe how to check the last mime attachment
--
Germain Le Chapelain <germain.lechapelain@lanvaux.fr>
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57325 CVS commit: src/sys
Date: Fri, 14 Jul 2023 13:05:59 +0000
Module Name: src
Committed By: riastradh
Date: Fri Jul 14 13:05:59 UTC 2023
Modified Files:
src/sys/external/bsd/drm2/amdgpu: files.amdgpu
src/sys/modules/amdgpu: Makefile
Log Message:
amdgpu: Define CONFIG_DRM_AMD_DC to enable display core logic.
Probably resolves a host of issues with amdgpu not detecting
displays!
Noticed by rjs@.
PR kern/57059
PR kern/57325
PR kern/57452
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.29 -r1.30 src/sys/external/bsd/drm2/amdgpu/files.amdgpu
cvs rdiff -u -r1.5 -r1.6 src/sys/modules/amdgpu/Makefile
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57325 CVS commit: [netbsd-10] src/sys
Date: Wed, 2 Aug 2023 10:28:08 +0000
Module Name: src
Committed By: martin
Date: Wed Aug 2 10:28:08 UTC 2023
Modified Files:
src/sys/external/bsd/drm2/amdgpu [netbsd-10]: files.amdgpu
src/sys/modules/amdgpu [netbsd-10]: Makefile
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #302):
sys/modules/amdgpu/Makefile: revision 1.6
sys/external/bsd/drm2/amdgpu/files.amdgpu: revision 1.30
amdgpu: Define CONFIG_DRM_AMD_DC to enable display core logic.
Probably resolves a host of issues with amdgpu not detecting
displays!
Noticed by rjs@.
PR kern/57059
PR kern/57325
PR kern/57452
To generate a diff of this commit:
cvs rdiff -u -r1.29 -r1.29.4.1 src/sys/external/bsd/drm2/amdgpu/files.amdgpu
cvs rdiff -u -r1.5 -r1.5.4.1 src/sys/modules/amdgpu/Makefile
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Germain Le Chapelain <germain.lechapelain@lanvaux.fr>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, tech-kern@netbsd.org
Subject: Re: kern/57325: Crash on boot w/ amdgpu driver on Lenovo
ThinkCentre M75n
Date: Mon, 21 Aug 2023 14:11:31 -0700
So I'm rolling with the patch at http://lanvaux.fr/.download/projs/support/NetBSD-amdgpu/diff
The change since I last reported is that I silenced further outputs that were also either left off or rate-limited under Linux.
It is not crashing with the patch however I am still getting these
`warning: [drm] Fence fallback timer expired on ring gfx' (I put the dmesg output next to the diff.)
Along with screen corruption: missing parts of glyphs inside of xterm (and I believe widgets as well at times.)
Sorry I forgot when I had synced and now I nuked those sources. But it was fairly consistent.
I am retrying with netbsd-10.
I will also try taking a screenshot but I suppose that timeout warning *has* to go one way or another -_-
Probably my change is wrong somewhere too: I either commented out something that should be properly converted or return early 0 when something should actually be done (I think I check this one it checked-out, but I can look again.)
--
Germain Le Chapelain <germain.lechapelain@lanvaux.fr>
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.