NetBSD Problem Report #57162
From www@netbsd.org Thu Jan 5 17:21:38 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id C62CB1A9239
for <gnats-bugs@gnats.NetBSD.org>; Thu, 5 Jan 2023 17:21:37 +0000 (UTC)
Message-Id: <20230105172136.919E81A923A@mollari.NetBSD.org>
Date: Thu, 5 Jan 2023 17:21:36 +0000 (UTC)
From: brian@bemorehuman.org
Reply-To: brian@bemorehuman.org
To: gnats-bugs@NetBSD.org
Subject: On Thinkpad E14 Gen2, acpibat seems broken
X-Send-Pr-Version: www-1.0
>Number: 57162
>Category: kern
>Synopsis: On Thinkpad E14 Gen2, acpibat seems broken
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jan 05 17:25:00 +0000 2023
>Last-Modified: Sun Jul 30 12:05:03 +0000 2023
>Originator: Brian Calhoun
>Release: 9.3
>Organization:
>Environment:
NetBSD localhost 9.3 NetBSD 9.3 (GENERIC) #0: Thu Aug 4 15:30:37 UTC 2022 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
I can't get 9.3 nor 10 Beta to boot from SSD bare metal on this thinkpad E14 Gen2. The boot process hangs, I believe, waiting for acpibat to do something. I was able to successfully install to the ssd from a usb stick-boot, but the boot from the ssd invariably fails. The boot process displays all the dkn wedge info then just stops. I remember one attempted boot (possibly with -v?) displayed a "waiting for response from acpibat0" once per second.
Not sure if related but during a boot from usb stick, I brought up a shell from within the installer and I typed "envstat" which hung the system (no output at all and had to force shutdown with power button).
>How-To-Repeat:
- install 9.3 (or 10-beta) from usb stick to SSD
- try to boot from SSD
- boot process will hang after dkn wedge info display
>Fix:
(temporary workaround) Thanks to tip from Nia, if I disable acpibat in the bootloader with the userconf command, the boot process will proceed as normal.
>Audit-Trail:
From: David Brownlee <abs@absd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/57162: On Thinkpad E14 Gen2, acpibat seems broken
Date: Fri, 23 Jun 2023 13:56:01 +0100
--000000000000b6019205fecb870d
Content-Type: text/plain; charset="UTF-8"
Is this an AMD based ThinkPad? I'm seeing very similar behaviour on an AMD
based T495
- Needs "disable acpibat" to avoid hangs during boot
- Also needs "disable thinkpad" to avoid envstat hanging
- Hangs on poweroff (after umounting all disks)
David
--000000000000b6019205fecb870d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div>Is this an AMD based ThinkPad? I'm seeing very si=
milar behaviour on an AMD based T495</div><div>- Needs "disable acpiba=
t" to avoid hangs during boot</div><div>- Also needs "disable thi=
nkpad" to avoid envstat hanging</div><div>- Hangs on poweroff (after u=
mounting all disks)</div><div><br></div><div>David<br></div></div>
--000000000000b6019205fecb870d--
From: Brian Calhoun <brian@bemorehuman.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57162: On Thinkpad E14 Gen2, acpibat seems broken
Date: Sat, 24 Jun 2023 10:41:48 +1200
The CPU on this E14 Thinkpad is Intel i5-1135g7.
Brian
On 24/06/23 01:00, David Brownlee wrote:
> The following reply was made to PR kern/57162; it has been noted by GNATS.
>
> From: David Brownlee <abs@absd.org>
> To: gnats-bugs@netbsd.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
> Subject: Re: kern/57162: On Thinkpad E14 Gen2, acpibat seems broken
> Date: Fri, 23 Jun 2023 13:56:01 +0100
>
> --000000000000b6019205fecb870d
> Content-Type: text/plain; charset="UTF-8"
>
> Is this an AMD based ThinkPad? I'm seeing very similar behaviour on an AMD
> based T495
> - Needs "disable acpibat" to avoid hangs during boot
> - Also needs "disable thinkpad" to avoid envstat hanging
> - Hangs on poweroff (after umounting all disks)
>
> David
>
> --000000000000b6019205fecb870d
> Content-Type: text/html; charset="UTF-8"
> Content-Transfer-Encoding: quoted-printable
>
> <div dir=3D"ltr"><div>Is this an AMD based ThinkPad? I'm seeing very si=
> milar behaviour on an AMD based T495</div><div>- Needs "disable acpiba=
> t" to avoid hangs during boot</div><div>- Also needs "disable thi=
> nkpad" to avoid envstat hanging</div><div>- Hangs on poweroff (after u=
> mounting all disks)</div><div><br></div><div>David<br></div></div>
>
> --000000000000b6019205fecb870d--
>
--
Everything is connected. All the time.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57162 CVS commit: src/sys/dev/acpi
Date: Tue, 18 Jul 2023 10:04:15 +0000
Module Name: src
Committed By: riastradh
Date: Tue Jul 18 10:04:14 UTC 2023
Modified Files:
src/sys/dev/acpi: acpi_ec.c
Log Message:
acpiec(4): Set sc_got_sci only when a transaction is over.
Before, when the acpiec thread noticed an SCI had been requested and
entered acpiec_gpe_state_machine to send the query command, it would
see the SCI is still requested -- because it had yet to acknowledge
it by setting the query command! -- and think the EC was asking for a
_second_ SCI.
So once the first SCI transaction was over, it would start a second
one, even though the EC hadn't asked for another -- and this would
wedge on some ECs.
Now, acpiec_gpe_state_machine waits to see what state we transition
to before taking the SCI bit to mean we need to notify the acpiec
thread to handle another query.
That way, when the acpiec thread enters acpiec_gpe_state_machine with
EC_STATE_QUERY, it can send the query command first, with the side
effect of clearing the SCI bit in subsequent reads of the status
register, and it won't think another SCI has been requested until it
returns to EC_STATE_FREE and sees the SCI bit set again in the status
register.
Possibly relevant PRs:
PR kern/53135
PR kern/52763
PR kern/57162
To generate a diff of this commit:
cvs rdiff -u -r1.91 -r1.92 src/sys/dev/acpi/acpi_ec.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57162 CVS commit: [netbsd-10] src/sys
Date: Sun, 30 Jul 2023 12:01:54 +0000
Module Name: src
Committed By: martin
Date: Sun Jul 30 12:01:54 UTC 2023
Modified Files:
src/sys/arch/amd64/conf [netbsd-10]: ALL
src/sys/arch/i386/conf [netbsd-10]: ALL
src/sys/dev/acpi [netbsd-10]: acpi_ec.c files.acpi
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #259):
sys/dev/acpi/acpi_ec.c: revision 1.102
sys/dev/acpi/acpi_ec.c: revision 1.103
sys/dev/acpi/acpi_ec.c: revision 1.104
sys/dev/acpi/acpi_ec.c: revision 1.105
sys/dev/acpi/acpi_ec.c: revision 1.106
sys/dev/acpi/acpi_ec.c: revision 1.107
sys/dev/acpi/acpi_ec.c: revision 1.108
sys/dev/acpi/acpi_ec.c: revision 1.90
sys/dev/acpi/acpi_ec.c: revision 1.91
sys/dev/acpi/acpi_ec.c: revision 1.92
sys/dev/acpi/acpi_ec.c: revision 1.93
sys/dev/acpi/acpi_ec.c: revision 1.94
sys/dev/acpi/files.acpi: revision 1.128
sys/dev/acpi/acpi_ec.c: revision 1.95
sys/dev/acpi/acpi_ec.c: revision 1.96
sys/dev/acpi/acpi_ec.c: revision 1.97
sys/arch/amd64/conf/ALL: revision 1.179
sys/dev/acpi/acpi_ec.c: revision 1.98
sys/dev/acpi/acpi_ec.c: revision 1.99
sys/dev/acpi/acpi_ec.c: revision 1.87
sys/dev/acpi/acpi_ec.c: revision 1.88
sys/dev/acpi/acpi_ec.c: revision 1.89
sys/arch/i386/conf/ALL: revision 1.511
sys/dev/acpi/acpi_ec.c: revision 1.100
sys/dev/acpi/acpi_ec.c: revision 1.101
acpiec(4): Record device_t self.
Not used yet, to be used soon for device_printf and to allow making
some of the internal functions a little more type-safe later.
acpiec(4): New ACPIEC_DEBUG option.
Value is bit mask of debug messages to enable.
Enable in x86/ALL kernels.
No functional change intended when the option is off.
acpiec(4): Clarify lock order and sprinkle lock assertions.
No functional change intended.
acpiec(4): Sprinkle comments.
Note where this code is abusing cv_wait and needs a loop to handle
spurious wakeups.
No functional change intended.
acpiec(4): Assert state is free when we start a transaction.
No functional change intended.
acpiec(4): Set sc_got_sci only when a transaction is over.
Before, when the acpiec thread noticed an SCI had been requested and
entered acpiec_gpe_state_machine to send the query command, it would
see the SCI is still requested -- because it had yet to acknowledge
it by setting the query command! -- and think the EC was asking for a
_second_ SCI.
So once the first SCI transaction was over, it would start a second
one, even though the EC hadn't asked for another -- and this would
wedge on some ECs.
Now, acpiec_gpe_state_machine waits to see what state we transition
to before taking the SCI bit to mean we need to notify the acpiec
thread to handle another query.
That way, when the acpiec thread enters acpiec_gpe_state_machine with
EC_STATE_QUERY, it can send the query command first, with the side
effect of clearing the SCI bit in subsequent reads of the status
register, and it won't think another SCI has been requested until it
returns to EC_STATE_FREE and sees the SCI bit set again in the status
register.
Possibly relevant PRs:
PR kern/53135
PR kern/52763
PR kern/57162
acpiec(4): Fix cv_wait loop around sc->sc_got_sci.
That is, make it actually loop as required, so it gracefully handles
spurious wakeups instead of barging into invalid states.
acpiec(4): Fix interrupt wait loop in acpiec_gpe_query thread.
acpiec(4): Fix cv_timedwait abuse in acpiec_read/write.
acpiec(4): Don't touch sc->sc_state outside sc->sc_mtx.
acpiec(4): Merge returns in acpiec_read/write.
No functional change intended.
acpiec(4): Factor wait logic out.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_gpe_state_machine.
Simpler, type-safer.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_callout.
Simpler.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_gpe_handler.
Simpler.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_lock/unlock.
Simpler, type-safer.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_read/write.
Simpler, type-safer.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_gpe_query thread.
Simpler.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_space_handler.
Better to keep the device_t isolated to public interfaces. Simpler
internally this way.
No functional change intended.
acpiec(4): Factor out if (state == FREE) cv_signal(sc_cv).
In principle this could have a functional change, but at worst, it is
to signal more wakeups than needed, which should always be safe.
acpiec(4): Take a lock around acpiec_cold updates.
Otherwise we race with readers -- probably harmlessly, but let's
avoid the appearance of problems.
XXX Maybe acpiec_suspend and acpiec_shutdown should interrupt
transactions and force them to fail promptly?
XXX This looks bad because acpiec_cold is global and sc->sc_mtx
doesn't look like it's global, but we expect to have only one
acpiec(4) device anyway from what I understand. Maybe we should move
acpiec_cold into the softc?
acpiec(4): One more debug message about read/write polling timeout.
To generate a diff of this commit:
cvs rdiff -u -r1.174 -r1.174.4.1 src/sys/arch/amd64/conf/ALL
cvs rdiff -u -r1.503 -r1.503.4.1 src/sys/arch/i386/conf/ALL
cvs rdiff -u -r1.86 -r1.86.4.1 src/sys/dev/acpi/acpi_ec.c
cvs rdiff -u -r1.126 -r1.126.4.1 src/sys/dev/acpi/files.acpi
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.