NetBSD Problem Report #57162

From www@netbsd.org  Thu Jan  5 17:21:38 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id C62CB1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  5 Jan 2023 17:21:37 +0000 (UTC)
Message-Id: <20230105172136.919E81A923A@mollari.NetBSD.org>
Date: Thu,  5 Jan 2023 17:21:36 +0000 (UTC)
From: brian@bemorehuman.org
Reply-To: brian@bemorehuman.org
To: gnats-bugs@NetBSD.org
Subject: On Thinkpad E14 Gen2, acpibat seems broken
X-Send-Pr-Version: www-1.0

>Number:         57162
>Category:       kern
>Synopsis:       On Thinkpad E14 Gen2, acpibat seems broken
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jan 05 17:25:00 +0000 2023
>Last-Modified:  Sun Jul 30 12:05:03 +0000 2023
>Originator:     Brian Calhoun
>Release:        9.3
>Organization:
>Environment:
NetBSD localhost 9.3 NetBSD 9.3 (GENERIC) #0: Thu Aug 4 15:30:37 UTC 2022 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
I can't get 9.3 nor 10 Beta to boot from SSD bare metal on this thinkpad E14 Gen2. The boot process hangs, I believe, waiting for acpibat to do something. I was able to successfully install to the ssd from a usb stick-boot, but the boot from the ssd invariably fails. The boot process displays all the dkn wedge info then just stops. I remember one attempted boot (possibly with -v?) displayed a "waiting for response from acpibat0" once per second.

Not sure if related but during a boot from usb stick, I brought up a shell from within the installer and I typed "envstat" which hung the system (no output at all and had to force shutdown with power button).
>How-To-Repeat:
- install 9.3 (or 10-beta) from usb stick to SSD
- try to boot from SSD
- boot process will hang after dkn wedge info display
>Fix:
(temporary workaround) Thanks to tip from Nia, if I disable acpibat in the bootloader with the userconf command, the boot process will proceed as normal.

>Audit-Trail:
From: David Brownlee <abs@absd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/57162: On Thinkpad E14 Gen2, acpibat seems broken
Date: Fri, 23 Jun 2023 13:56:01 +0100

 --000000000000b6019205fecb870d
 Content-Type: text/plain; charset="UTF-8"

 Is this an AMD based ThinkPad? I'm seeing very similar behaviour on an AMD
 based T495
 - Needs "disable acpibat" to avoid hangs during boot
 - Also needs "disable thinkpad" to avoid envstat hanging
 - Hangs on poweroff (after umounting all disks)

 David

 --000000000000b6019205fecb870d
 Content-Type: text/html; charset="UTF-8"
 Content-Transfer-Encoding: quoted-printable

 <div dir=3D"ltr"><div>Is this an AMD based ThinkPad? I&#39;m seeing very si=
 milar behaviour on an AMD based T495</div><div>- Needs &quot;disable acpiba=
 t&quot; to avoid hangs during boot</div><div>- Also needs &quot;disable thi=
 nkpad&quot; to avoid envstat hanging</div><div>- Hangs on poweroff (after u=
 mounting all disks)</div><div><br></div><div>David<br></div></div>

 --000000000000b6019205fecb870d--

From: Brian Calhoun <brian@bemorehuman.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57162: On Thinkpad E14 Gen2, acpibat seems broken
Date: Sat, 24 Jun 2023 10:41:48 +1200

 The CPU on this E14 Thinkpad is Intel i5-1135g7.

 Brian

 On 24/06/23 01:00, David Brownlee wrote:
 > The following reply was made to PR kern/57162; it has been noted by GNATS.
 >
 > From: David Brownlee <abs@absd.org>
 > To: gnats-bugs@netbsd.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 > Subject: Re: kern/57162: On Thinkpad E14 Gen2, acpibat seems broken
 > Date: Fri, 23 Jun 2023 13:56:01 +0100
 >
 >   --000000000000b6019205fecb870d
 >   Content-Type: text/plain; charset="UTF-8"
 >   
 >   Is this an AMD based ThinkPad? I'm seeing very similar behaviour on an AMD
 >   based T495
 >   - Needs "disable acpibat" to avoid hangs during boot
 >   - Also needs "disable thinkpad" to avoid envstat hanging
 >   - Hangs on poweroff (after umounting all disks)
 >   
 >   David
 >   
 >   --000000000000b6019205fecb870d
 >   Content-Type: text/html; charset="UTF-8"
 >   Content-Transfer-Encoding: quoted-printable
 >   
 >   <div dir=3D"ltr"><div>Is this an AMD based ThinkPad? I&#39;m seeing very si=
 >   milar behaviour on an AMD based T495</div><div>- Needs &quot;disable acpiba=
 >   t&quot; to avoid hangs during boot</div><div>- Also needs &quot;disable thi=
 >   nkpad&quot; to avoid envstat hanging</div><div>- Hangs on poweroff (after u=
 >   mounting all disks)</div><div><br></div><div>David<br></div></div>
 >   
 >   --000000000000b6019205fecb870d--
 >   

 -- 
 Everything is connected. All the time.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57162 CVS commit: src/sys/dev/acpi
Date: Tue, 18 Jul 2023 10:04:15 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Tue Jul 18 10:04:14 UTC 2023

 Modified Files:
 	src/sys/dev/acpi: acpi_ec.c

 Log Message:
 acpiec(4): Set sc_got_sci only when a transaction is over.

 Before, when the acpiec thread noticed an SCI had been requested and
 entered acpiec_gpe_state_machine to send the query command, it would
 see the SCI is still requested -- because it had yet to acknowledge
 it by setting the query command! -- and think the EC was asking for a
 _second_ SCI.

 So once the first SCI transaction was over, it would start a second
 one, even though the EC hadn't asked for another -- and this would
 wedge on some ECs.

 Now, acpiec_gpe_state_machine waits to see what state we transition
 to before taking the SCI bit to mean we need to notify the acpiec
 thread to handle another query.

 That way, when the acpiec thread enters acpiec_gpe_state_machine with
 EC_STATE_QUERY, it can send the query command first, with the side
 effect of clearing the SCI bit in subsequent reads of the status
 register, and it won't think another SCI has been requested until it
 returns to EC_STATE_FREE and sees the SCI bit set again in the status
 register.

 Possibly relevant PRs:

 PR kern/53135
 PR kern/52763
 PR kern/57162


 To generate a diff of this commit:
 cvs rdiff -u -r1.91 -r1.92 src/sys/dev/acpi/acpi_ec.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57162 CVS commit: [netbsd-10] src/sys
Date: Sun, 30 Jul 2023 12:01:54 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Jul 30 12:01:54 UTC 2023

 Modified Files:
 	src/sys/arch/amd64/conf [netbsd-10]: ALL
 	src/sys/arch/i386/conf [netbsd-10]: ALL
 	src/sys/dev/acpi [netbsd-10]: acpi_ec.c files.acpi

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #259):

 	sys/dev/acpi/acpi_ec.c: revision 1.102
 	sys/dev/acpi/acpi_ec.c: revision 1.103
 	sys/dev/acpi/acpi_ec.c: revision 1.104
 	sys/dev/acpi/acpi_ec.c: revision 1.105
 	sys/dev/acpi/acpi_ec.c: revision 1.106
 	sys/dev/acpi/acpi_ec.c: revision 1.107
 	sys/dev/acpi/acpi_ec.c: revision 1.108
 	sys/dev/acpi/acpi_ec.c: revision 1.90
 	sys/dev/acpi/acpi_ec.c: revision 1.91
 	sys/dev/acpi/acpi_ec.c: revision 1.92
 	sys/dev/acpi/acpi_ec.c: revision 1.93
 	sys/dev/acpi/acpi_ec.c: revision 1.94
 	sys/dev/acpi/files.acpi: revision 1.128
 	sys/dev/acpi/acpi_ec.c: revision 1.95
 	sys/dev/acpi/acpi_ec.c: revision 1.96
 	sys/dev/acpi/acpi_ec.c: revision 1.97
 	sys/arch/amd64/conf/ALL: revision 1.179
 	sys/dev/acpi/acpi_ec.c: revision 1.98
 	sys/dev/acpi/acpi_ec.c: revision 1.99
 	sys/dev/acpi/acpi_ec.c: revision 1.87
 	sys/dev/acpi/acpi_ec.c: revision 1.88
 	sys/dev/acpi/acpi_ec.c: revision 1.89
 	sys/arch/i386/conf/ALL: revision 1.511
 	sys/dev/acpi/acpi_ec.c: revision 1.100
 	sys/dev/acpi/acpi_ec.c: revision 1.101

 acpiec(4): Record device_t self.

 Not used yet, to be used soon for device_printf and to allow making
 some of the internal functions a little more type-safe later.
 acpiec(4): New ACPIEC_DEBUG option.

 Value is bit mask of debug messages to enable.

 Enable in x86/ALL kernels.

 No functional change intended when the option is off.

 acpiec(4): Clarify lock order and sprinkle lock assertions.
 No functional change intended.

 acpiec(4): Sprinkle comments.
 Note where this code is abusing cv_wait and needs a loop to handle
 spurious wakeups.
 No functional change intended.

 acpiec(4): Assert state is free when we start a transaction.
 No functional change intended.

 acpiec(4): Set sc_got_sci only when a transaction is over.

 Before, when the acpiec thread noticed an SCI had been requested and
 entered acpiec_gpe_state_machine to send the query command, it would
 see the SCI is still requested -- because it had yet to acknowledge
 it by setting the query command! -- and think the EC was asking for a
 _second_ SCI.

 So once the first SCI transaction was over, it would start a second
 one, even though the EC hadn't asked for another -- and this would
 wedge on some ECs.

 Now, acpiec_gpe_state_machine waits to see what state we transition
 to before taking the SCI bit to mean we need to notify the acpiec
 thread to handle another query.

 That way, when the acpiec thread enters acpiec_gpe_state_machine with
 EC_STATE_QUERY, it can send the query command first, with the side
 effect of clearing the SCI bit in subsequent reads of the status
 register, and it won't think another SCI has been requested until it
 returns to EC_STATE_FREE and sees the SCI bit set again in the status
 register.

 Possibly relevant PRs:
 PR kern/53135
 PR kern/52763
 PR kern/57162

 acpiec(4): Fix cv_wait loop around sc->sc_got_sci.

 That is, make it actually loop as required, so it gracefully handles
 spurious wakeups instead of barging into invalid states.

 acpiec(4): Fix interrupt wait loop in acpiec_gpe_query thread.

 acpiec(4): Fix cv_timedwait abuse in acpiec_read/write.

 acpiec(4): Don't touch sc->sc_state outside sc->sc_mtx.

 acpiec(4): Merge returns in acpiec_read/write.
 No functional change intended.

 acpiec(4): Factor wait logic out.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_gpe_state_machine.
 Simpler, type-safer.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_callout.
 Simpler.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_gpe_handler.
 Simpler.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_lock/unlock.
 Simpler, type-safer.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_read/write.
 Simpler, type-safer.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_gpe_query thread.
 Simpler.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_space_handler.
 Better to keep the device_t isolated to public interfaces.  Simpler
 internally this way.
 No functional change intended.

 acpiec(4): Factor out if (state == FREE) cv_signal(sc_cv).

 In principle this could have a functional change, but at worst, it is
 to signal more wakeups than needed, which should always be safe.
 acpiec(4): Take a lock around acpiec_cold updates.

 Otherwise we race with readers -- probably harmlessly, but let's
 avoid the appearance of problems.
 XXX Maybe acpiec_suspend and acpiec_shutdown should interrupt
 transactions and force them to fail promptly?
 XXX This looks bad because acpiec_cold is global and sc->sc_mtx
 doesn't look like it's global, but we expect to have only one
 acpiec(4) device anyway from what I understand.  Maybe we should move
 acpiec_cold into the softc?

 acpiec(4): One more debug message about read/write polling timeout.


 To generate a diff of this commit:
 cvs rdiff -u -r1.174 -r1.174.4.1 src/sys/arch/amd64/conf/ALL
 cvs rdiff -u -r1.503 -r1.503.4.1 src/sys/arch/i386/conf/ALL
 cvs rdiff -u -r1.86 -r1.86.4.1 src/sys/dev/acpi/acpi_ec.c
 cvs rdiff -u -r1.126 -r1.126.4.1 src/sys/dev/acpi/files.acpi

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.