NetBSD Problem Report #55006

From www@netbsd.org  Sat Feb 22 20:51:46 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4CCF71A9213
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 22 Feb 2020 20:51:46 +0000 (UTC)
Message-Id: <20200222205145.5A4A11A9217@mollari.NetBSD.org>
Date: Sat, 22 Feb 2020 20:51:45 +0000 (UTC)
From: jbryn@students.wcpss.net
Reply-To: jbryn@students.wcpss.net
To: gnats-bugs@NetBSD.org
Subject: unavoidable boot lockup and SpeedStep crash on ThinkPad W540
X-Send-Pr-Version: www-1.0

>Number:         55006
>Category:       port-amd64
>Synopsis:       unavoidable boot lockup and SpeedStep crash on ThinkPad W540
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-amd64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 22 20:55:00 +0000 2020
>Last-Modified:  Sun Mar 01 05:45:00 +0000 2020
>Originator:     Jackson Bryn
>Release:        9.0
>Organization:
N/A
>Environment:
Can not input
>Description:
This report describes two problems that both cause the system to be unable to boot. One of which can be worked around, and one of which can not, resulting in the system still being unable to boot regardless.

#1:
Happens when SpeedStep is enabled in BIOS. Kernel output slows down to a crawl, and the kernel itself crashes on enabling acpiecdt. In MBR mode, this causes an automatic restart, no jump to kernel debugger. In UEFI mode, however, the kernel succeeds in jumping to the debugger, and thus I can retrieve this:

Stopped in pid 0.1 (system) atfatal double fault in supervisor mode
[   1.0000030] trap type 13 code 0 rip 0xffffffff80223eb2 cs 0x8 rflags 0x10282 cr2 0xffffff0000001fc8 ilevel 0x8 rsp 0xffffff0000001fd0
[   1.0000030] curlwp 0xffffffff8145ce80 pid 0.1 lowest kstack 0xffffffff818e62c0
  kernel: double fault trap, code=0

Note that I used the non-ASLR kernel! Continuing on, I can not get to the debugger prompt and trace the issue, however, as the debugger log keeps repeating this ad infinitum.

To work around the issue, simply disable SpeedStep in BIOS. The kernel will get through and everything goes fine... until the root device search procedure; this leads to my next problem.

#2:
When the kernel begins to search for the root device, it... locks up. This is not an issue with the USB device driver, as I have disabled that and the problem still shows up.

On both MBR and UEFI mode, when the kernel locks up, there is no jump to debugger or an automatic restart. I have tried to enable/disable the suspected culprits in BIOS, but alas, I could not find anything that caused it. I need further guidance.

For those who can view images, this is where the kernel locks up at:
https://i.ibb.co/tZJYQv2/BC3-E718-C-052-C-4589-B401-649-A7-EF78-BD9.jpg
Apologies for the unprofessional image host.
>How-To-Repeat:
Reproducing SpeedStep Crash:
------------------------------------------------
1. Enable SpeedStep in BIOS.
2. Boot NetBSD normally.
3. Watch NetBSD crash when detecting acpiecdt.

Reproducing Boot Lockup:
------------------------------------------------
1. Disable SpeedStep to avoid problem above.
2. Boot NetBSD normally.
3. Watch NetBSD hang.
>Fix:

>Audit-Trail:
From: Jackson Bryn _ Student - WakefieldHS <jbryn@students.wcpss.net>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: port-amd64/55006
Date: Sun, 1 Mar 2020 01:08:32 +0000

 --_000_BL0PR16MB21960E4DBBBE4121F2DE80C196E60BL0PR16MB2196namp_
 Content-Type: text/plain; charset="iso-8859-1"
 Content-Transfer-Encoding: quoted-printable

 At last, I finally found the culprit of the hang: acpiec and acpiecdt.
 Sounds like this laptop is very problematic with the way NetBSD handles the=
  embedded controller. Anyone willing to help me debug EC writes/reads?

 --_000_BL0PR16MB21960E4DBBBE4121F2DE80C196E60BL0PR16MB2196namp_
 Content-Type: text/html; charset="iso-8859-1"
 Content-Transfer-Encoding: quoted-printable

 <html>
 <head>
 <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
 1">
 <style type=3D"text/css" style=3D"display:none;"> P {margin-top:0;margin-bo=
 ttom:0;} </style>
 </head>
 <body dir=3D"ltr">
 <div style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size=
 : 12pt; color: rgb(0, 0, 0);">
 At last, I finally found the culprit of the hang: acpiec and acpiecdt.</div=
 >
 <div style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size=
 : 12pt; color: rgb(0, 0, 0);">
 Sounds like this laptop is very problematic with the way NetBSD handles the=
  embedded controller. Anyone willing to help me debug EC writes/reads?</div=
 >
 </body>
 </html>

 --_000_BL0PR16MB21960E4DBBBE4121F2DE80C196E60BL0PR16MB2196namp_--

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.