NetBSD Problem Report #57325

From www@netbsd.org  Thu Apr  6 02:21:39 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id DEAFE1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  6 Apr 2023 02:21:38 +0000 (UTC)
Message-Id: <20230406022137.6BC241A923A@mollari.NetBSD.org>
Date: Thu,  6 Apr 2023 02:21:37 +0000 (UTC)
From: germain@lanvaux.ca
Reply-To: germain@lanvaux.ca
To: gnats-bugs@NetBSD.org
Subject: Crash on boot w/ amdgpu driver on Lenovo ThinkCentre M75n
X-Send-Pr-Version: www-1.0

>Number:         57325
>Category:       kern
>Synopsis:       Crash on boot w/ amdgpu driver on Lenovo ThinkCentre M75n
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Apr 06 02:25:00 +0000 2023
>Last-Modified:  Mon Aug 21 21:15:01 +0000 2023
>Originator:     Germain Le Chapelain
>Release:        Current (10.99.2)
>Organization:
Lanvaux Computer Games Limited
>Environment:
NetBSD germ2.lanvaux.fr 10.99.2 NetBSD 10.99.2 (GENERIC) #0: Mon Apr  3 01:29:03 PDT 2023  german@germ2.lanvaux.fr:/usr/src/sys/arch/amd64/compile/obj/GENERIC amd64

>Description:
It crashes on boot with a call stack after some error in dmesg, notably `unable to locate a BIOS ROM'.

I'm still not 100% on debugging kernel, so I have an actual screenshot here:

http://lanvaux.fr/.download/projs/support/IMG_5866.JPG
>How-To-Repeat:
Compile the current source w/ amdgpu & amdgpufb lines uncommented in `GENERIC' and install & run on a Lenovo ThinkCentre M75n
>Fix:
I'm sorry I want to help more.
I can keep looking down the code though rly not sure how far I can take it.

Up, it *looked* like some function was returning IERRVAL (-1?) and some function was checking that against 0 to determine if there was a problem.

But that's after the problem of `nor being able to locate the BIOS'.

>Audit-Trail:
From: Germain Le Chapelain <germain.lechapelain@lanvaux.fr>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57325: Crash on boot w/ amdgpu driver on Lenovo ThinkCentre M75n
Date: Wed, 10 May 2023 21:45:28 -0700

 --Apple-Mail-27DFC81C-E2F0-4749-8ACA-EB1D934795CB
 Content-Type: text/plain;
 	charset=utf-8
 Content-Transfer-Encoding: quoted-printable

 =EF=BB=BFDear NetBSD,

 So I=E2=80=99ve tried (and succeeded in) patching my kernel to pass this err=
 or

 The parches are here&here:
 http://lanvaux.ca/.download/projs/support/patch
 http://lanvaux.ca/.download/projs/support/patch.2 <- important one, passes t=
 he error
 However I am facing a new error:

 =E2=80=9CNo console dev=E2=80=9D  panic or something of the like

 Something tells me that I should enable ACPI proper without tempering w/ the=
  original code but it=E2=80=99s still a shot in the dark

 I don=E2=80=99t *think* I had a typo in my patch and it checks for a checksu=
 m at the end of the function previously returning just false.

 I was aiming for the smallest change
 (But that=E2=80=99s not the smallest change in the original code.)

 Thank you see you! (Also the code is updated since upstream.. but yeah .. a r=
 edder herring I think.)

 Kindest regards,
 =E2=80=94=20
 Germain





 --Apple-Mail-27DFC81C-E2F0-4749-8ACA-EB1D934795CB
 Content-Type: text/html;
 	charset=utf-8
 Content-Transfer-Encoding: quoted-printable

 <html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D=
 utf-8"></head><body dir=3D"auto"><div dir=3D"ltr">=EF=BB=BF<meta http-equiv=3D=
 "content-type" content=3D"text/html; charset=3Dutf-8">Dear NetBSD,<div><br><=
 /div><div>So I=E2=80=99ve tried (and succeeded in) patching my kernel to pas=
 s this error</div><div><br></div><div>The parches are here&amp;here:</div><d=
 iv><ul><li><a href=3D"http://lanvaux.ca/.download/projs/support/patch">http:=
 //lanvaux.ca/.download/projs/support/patch</a></li><li><a href=3D"http://lan=
 vaux.ca/.download/projs/support/patch.2">http://lanvaux.ca/.download/projs/s=
 upport/patch.2</a>&nbsp;&lt;- important one, passes the error</li></ul><div>=
 However I am facing a new error:</div></div><div><br></div><div>=E2=80=9CNo c=
 onsole dev=E2=80=9D &nbsp;panic or something of the like</div><div><br></div=
 ><div>Something tells me that I should enable ACPI proper without tempering w=
 / the original code but it=E2=80=99s still a shot in the dark</div><div><br>=
 </div><div>I don=E2=80=99t *think* I had a typo in my patch and it checks fo=
 r a checksum at the end of the function previously returning just false.</di=
 v><div><br></div><div>I was aiming for the smallest change</div><div>(But th=
 at=E2=80=99s not the smallest change in the original code.)</div><div><br></=
 div><div>Thank you see you! (Also the code is updated since upstream.. but y=
 eah .. a redder herring I think.)</div><div><br></div><div>Kindest regards,<=
 /div><div>=E2=80=94&nbsp;</div><div>Germain</div><div><br></div><div><br></d=
 iv><div><br></div><div><br></div></div></body></html>=

 --Apple-Mail-27DFC81C-E2F0-4749-8ACA-EB1D934795CB--

From: Germain Le Chapelain <germain@lanvaux.ca>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57325: Crash on boot w/ amdgpu driver on Lenovo
 ThinkCentre M75n
Date: Wed, 12 Jul 2023 17:40:44 -0700

 Following is the patch to get to problem 57059

 I think it's good to check-in!

 Kindest regards,
 -- 
 Germain Le Chapelain <germain.lechapelain@lanvaux.fr>

 Index: sys/arch/amd64/conf/GENERIC
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/amd64/conf/GENERIC,v
 retrieving revision 1.602
 diff -u -r1.602 GENERIC
 --- sys/arch/amd64/conf/GENERIC	12 Apr 2023 06:39:15 -0000	1.602
 +++ sys/arch/amd64/conf/GENERIC	13 Jul 2023 00:15:10 -0000
 @@ -461,8 +461,8 @@
  radeon* 	at pci? dev ? function ?
  radeondrmkmsfb* at radeonfbbus?

 -#amdgpu*	at pci? dev ? function ?
 -#amdgpufb*	at amdgpufbbus?
 +amdgpu*	at pci? dev ? function ?
 +amdgpufb*	at amdgpufbbus?

  nouveau*	at pci? dev ? function ?
  nouveaufb*	at nouveaufbbus?
 Index: sys/external/bsd/drm2/amdgpu/files.amdgpu
 ===================================================================
 RCS file: /cvsroot/src/sys/external/bsd/drm2/amdgpu/files.amdgpu,v
 retrieving revision 1.29
 diff -u -r1.29 files.amdgpu
 --- sys/external/bsd/drm2/amdgpu/files.amdgpu	24 Jul 2022 20:05:00 -0000	1.29
 +++ sys/external/bsd/drm2/amdgpu/files.amdgpu	13 Jul 2023 00:15:12 -0000
 @@ -32,7 +32,10 @@
  makeoptions	amdgpu	"CPPFLAGS.amdgpu"+="-I$S/external/bsd/drm2/dist/drm/amd/display/amdgpu_dm"
  makeoptions	amdgpu	"CPPFLAGS.amdgpu"+="-I$S/external/bsd/drm2/dist/drm/amd/display/dmub/inc"

 +makeoptions	amdgpu	"CPPFLAGS.amdgpu"+="-DCONFIG_ACPI=1"
 +makeoptions	amdgpu	"CPPFLAGS.amdgpu"+="-DNACPICA=1"
  makeoptions	amdgpu	"CPPFLAGS.amdgpu"+="-DCONFIG_DRM_AMD_ACP=1"
 +makeoptions	amdgpu	"CPPFLAGS.amdgpu"+="-DCONFIG_DRM_AMD_DC=1"
  makeoptions	amdgpu	"CPPFLAGS.amdgpu"+="-DCONFIG_DRM_AMD_DC_DCN=1"
  makeoptions	amdgpu	"CPPFLAGS.amdgpu"+="-DCONFIG_DRM_AMD_DC_HDCP=1"
  makeoptions	amdgpu	"CPPFLAGS.amdgpu"+="-DCONFIG_PERF_EVENTS=0"
 @@ -353,7 +356,7 @@
  file	external/bsd/drm2/dist/drm/amd/amdgpu/../powerplay/smumgr/amdgpu_vega20_smumgr.c	amdgpu
  file	external/bsd/drm2/dist/drm/amd/amdgpu/../powerplay/smumgr/amdgpu_vegam_smumgr.c	amdgpu
  file	external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acp.c	amdgpu
 -#file	external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c	amdgpu
 +file	external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c	amdgpu
  file	external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_afmt.c	amdgpu
  file	external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_amdkfd.c	amdgpu
  file	external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_arct_reg_init.c	amdgpu
 Index: sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c
 ===================================================================
 RCS file: /cvsroot/src/sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c,v
 retrieving revision 1.5
 diff -u -r1.5 amdgpu_acpi.c
 --- sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c	27 Feb 2022 14:24:26 -0000	1.5
 +++ sys/external/bsd/drm2/dist/drm/amd/amdgpu/amdgpu_acpi.c	13 Jul 2023 00:15:12 -0000
 @@ -40,6 +40,9 @@
  #include "amd_acpi.h"
  #include "atom.h"

 +#include <linux/nbsd-namespace.h>
 +#include <linux/nbsd-namespace-acpi.h>
 +
  struct amdgpu_atif_notification_cfg {
  	bool enabled;
  	int command_code;
 @@ -362,6 +365,7 @@
  	return err;
  }

 +#ifndef __NetBSD__
  /**
   * amdgpu_atif_get_sbios_requests - get requested sbios event
   *
 @@ -487,6 +491,7 @@
  	 */
  	return NOTIFY_BAD;
  }
 +#endif

  /* Call the ATCS method
   */
 @@ -635,7 +640,12 @@
  	struct amdgpu_atcs *atcs = &adev->atcs;

  	/* Get the device handle */
 +#if defined(__NetBSD__)
 +	handle = (adev->pdev->pd_ad ? adev->pdev->pd_ad->ad_handle
 +		  : NULL);
 +#else
  	handle = ACPI_HANDLE(&adev->pdev->dev);
 +#endif
  	if (!handle)
  		return -EINVAL;

 @@ -678,7 +688,12 @@
  		return -EINVAL;

  	/* Get the device handle */
 +#if defined(__NetBSD__)
 +	handle = (adev->pdev->pd_ad ? adev->pdev->pd_ad->ad_handle
 +		  : NULL);
 +#else
  	handle = ACPI_HANDLE(&adev->pdev->dev);
 +#endif
  	if (!handle)
  		return -EINVAL;

 @@ -695,8 +710,13 @@
  	atcs_input.req_type = ATCS_PCIE_LINK_SPEED;
  	atcs_input.perf_req = perf_req;

 +#if defined(__NetBSD__)
 +	params.Length = sizeof(struct atcs_pref_req_input);
 +	params.Pointer = &atcs_input;
 +#else
  	params.length = sizeof(struct atcs_pref_req_input);
  	params.pointer = &atcs_input;
 +#endif

  	while (retry--) {
  		info = amdgpu_atcs_call(handle, ATCS_FUNCTION_PCIE_PERFORMANCE_REQUEST, &params);
 @@ -747,6 +767,9 @@
  			     unsigned long val,
  			     void *data)
  {
 +#ifdef __NetBSD__
 +	return 0;
 +#else
  	struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, acpi_nb);
  	struct acpi_bus_event *entry = (struct acpi_bus_event *)data;

 @@ -761,6 +784,7 @@

  	/* Check for pending SBIOS requests */
  	return amdgpu_atif_handler(adev, entry);
 +#endif
  }

  /* Call all ACPI methods here */
 @@ -781,7 +805,12 @@
  	int ret;

  	/* Get the device handle */
 +#if defined(__NetBSD__)
 +	handle = (adev->pdev->pd_ad ? adev->pdev->pd_ad->ad_handle
 +		  : NULL);
 +#else
  	handle = ACPI_HANDLE(&adev->pdev->dev);
 +#endif

  	if (!adev->bios || !handle)
  		return 0;
 @@ -863,7 +892,9 @@

  out:
  	adev->acpi_nb.notifier_call = amdgpu_acpi_event;
 +#ifndef __NetBSD__
  	register_acpi_notifier(&adev->acpi_nb);
 +#endif

  	return ret;
  }
 @@ -889,6 +920,8 @@
   */
  void amdgpu_acpi_fini(struct amdgpu_device *adev)
  {
 +#ifndef __NetBSD__
  	unregister_acpi_notifier(&adev->acpi_nb);
 +#endif
  	kfree(adev->atif);
  }
 Index: sys/external/bsd/drm2/include/linux/nbsd-namespace-acpi.h
 ===================================================================
 RCS file: /cvsroot/src/sys/external/bsd/drm2/include/linux/nbsd-namespace-acpi.h,v
 retrieving revision 1.1
 diff -u -r1.1 nbsd-namespace-acpi.h
 --- sys/external/bsd/drm2/include/linux/nbsd-namespace-acpi.h	27 Feb 2022 14:22:42 -0000	1.1
 +++ sys/external/bsd/drm2/include/linux/nbsd-namespace-acpi.h	13 Jul 2023 00:15:13 -0000
 @@ -39,6 +39,7 @@
  #define	type		Type
  #define	value		Value

 +#define	acpi_get_name		AcpiGetName
  #define	acpi_get_handle		AcpiGetHandle
  #define	acpi_get_table		AcpiGetTable
  #define	acpi_evaluate_object	AcpiEvaluateObject
 Index: sys/modules/amdgpu/Makefile
 ===================================================================
 RCS file: /cvsroot/src/sys/modules/amdgpu/Makefile,v
 retrieving revision 1.5
 diff -u -r1.5 Makefile
 --- sys/modules/amdgpu/Makefile	30 Jul 2022 03:29:52 -0000	1.5
 +++ sys/modules/amdgpu/Makefile	13 Jul 2023 00:15:14 -0000
 @@ -34,7 +34,10 @@
  CPPFLAGS+=	-I${S}/external/bsd/drm2/dist/drm/amd/display/modules/hdcp
  CPPFLAGS+=	-I${S}/external/bsd/drm2/dist/drm/amd/display/amdgpu_dm
  CPPFLAGS+=	-I${S}/external/bsd/drm2/dist/drm/amd/display/dmub/inc
 +CPPFLAGS+=	-DCONFIG_ACPI=1
 +CPPFLAGS+=	-DNACPICA=1
  CPPFLAGS+=	-DCONFIG_DRM_AMD_ACP=1
 +CPPFLAGS+=	-DCONFIG_DRM_AMD_DC=1
  CPPFLAGS+=	-DCONFIG_DRM_AMD_DC_DCN=1
  CPPFLAGS+=	-DCONFIG_DRM_AMD_DC_HDCP=1
  CPPFLAGS+=	-DCONFIG_PERF_EVENTS=0
 @@ -143,6 +146,7 @@
  # sed -ne 's,^file	external/bsd/drm2/.*/\([^/ 	]*\)	.*,SRCS+=	\1,gp' <files.amdgpu | sort -u
  SRCS+=	amdgpu_acp.c
  SRCS+=	amdgpu_acp_hw.c
 +SRCS+=	amdgpu_acpi.c
  SRCS+=	amdgpu_afmt.c
  SRCS+=	amdgpu_amd_powerplay.c
  SRCS+=	amdgpu_amdkfd.c

From: Germain Le Chapelain <germain@lanvaux.ca>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57325: Crash on boot w/ amdgpu driver on Lenovo
 ThinkCentre M75n
Date: Thu, 13 Jul 2023 01:13:39 -0700

 I forgot to mention my new call-stack:

 http://lanvaux.fr/.download/projs/support/NetBSD-amdgpu/crashwithchanges.txt

 It's based on that that it was like the aforementioned problem but looking deeper and I am not seeing it
 But I don't knoe how to check the last mime attachment

 -- 
 Germain Le Chapelain <germain.lechapelain@lanvaux.fr>

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57325 CVS commit: src/sys
Date: Fri, 14 Jul 2023 13:05:59 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Fri Jul 14 13:05:59 UTC 2023

 Modified Files:
 	src/sys/external/bsd/drm2/amdgpu: files.amdgpu
 	src/sys/modules/amdgpu: Makefile

 Log Message:
 amdgpu: Define CONFIG_DRM_AMD_DC to enable display core logic.

 Probably resolves a host of issues with amdgpu not detecting
 displays!

 Noticed by rjs@.

 PR kern/57059
 PR kern/57325
 PR kern/57452

 XXX pullup-10


 To generate a diff of this commit:
 cvs rdiff -u -r1.29 -r1.30 src/sys/external/bsd/drm2/amdgpu/files.amdgpu
 cvs rdiff -u -r1.5 -r1.6 src/sys/modules/amdgpu/Makefile

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57325 CVS commit: [netbsd-10] src/sys
Date: Wed, 2 Aug 2023 10:28:08 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Wed Aug  2 10:28:08 UTC 2023

 Modified Files:
 	src/sys/external/bsd/drm2/amdgpu [netbsd-10]: files.amdgpu
 	src/sys/modules/amdgpu [netbsd-10]: Makefile

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #302):

 	sys/modules/amdgpu/Makefile: revision 1.6
 	sys/external/bsd/drm2/amdgpu/files.amdgpu: revision 1.30

 amdgpu: Define CONFIG_DRM_AMD_DC to enable display core logic.

 Probably resolves a host of issues with amdgpu not detecting
 displays!
 Noticed by rjs@.

 PR kern/57059
 PR kern/57325
 PR kern/57452


 To generate a diff of this commit:
 cvs rdiff -u -r1.29 -r1.29.4.1 src/sys/external/bsd/drm2/amdgpu/files.amdgpu
 cvs rdiff -u -r1.5 -r1.5.4.1 src/sys/modules/amdgpu/Makefile

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Germain Le Chapelain <germain.lechapelain@lanvaux.fr>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org, tech-kern@netbsd.org
Subject: Re: kern/57325: Crash on boot w/ amdgpu driver on Lenovo
 ThinkCentre M75n
Date: Mon, 21 Aug 2023 14:11:31 -0700

 So I'm rolling with the patch at http://lanvaux.fr/.download/projs/support/NetBSD-amdgpu/diff
 The change since I last reported is that I silenced further outputs that were also either left off or rate-limited under Linux.

 It is not crashing with the patch however I am still getting these

   `warning: [drm] Fence fallback timer expired on ring gfx' (I put the dmesg output next to the diff.)

 Along with screen corruption: missing parts of glyphs inside of xterm (and I  believe widgets as well at times.)
 Sorry I forgot when I had synced and now I nuked those sources.  But it was fairly consistent.

 I am retrying with netbsd-10.
 I will also try taking a screenshot but I suppose that timeout warning *has* to go one way or another -_-

 Probably  my change is wrong somewhere too: I either commented out something that should be properly converted or return early 0 when something should actually be done (I think I check this one it checked-out, but I can look again.)

 -- 
 Germain Le Chapelain <germain.lechapelain@lanvaux.fr>

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.