NetBSD Problem Report #52763
From www@NetBSD.org Sun Nov 26 11:28:37 2017
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 16B4C7A187
for <gnats-bugs@gnats.NetBSD.org>; Sun, 26 Nov 2017 11:28:37 +0000 (UTC)
Message-Id: <20171126112836.34D257A1E7@mollari.NetBSD.org>
Date: Sun, 26 Nov 2017 11:28:36 +0000 (UTC)
From: alnsn@NetBSD.org
Reply-To: alnsn@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: acpiec command timed out, state 2 on Intel E3-1245 v6, S1200SP
X-Send-Pr-Version: www-1.0
>Number: 52763
>Category: kern
>Synopsis: acpiec command timed out, state 2 on Intel E3-1245 v6, S1200SP
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Nov 26 11:30:00 +0000 2017
>Last-Modified: Sun Jul 30 12:05:00 +0000 2023
>Originator: Alexander Nasonov
>Release: NetBSD 8.0_BETA (GENERIC.201711180940Z)
>Organization:
XMM SWAP LTD
>Environment:
NetBSD hostname 8.0_BETA NetBSD 8.0_BETA (GENERIC.201711180940Z) amd64
>Description:
On a rented OVH box (no physical assess), I see these errors:
ACPI Exception: AE_ERROR, Returned by Handler for [EmbeddedControl] (20170303/evregion-317)
ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.H_EC.ECRD] (Node 0xfffffe886b4e1210), AE_ERROR (20170303/psparse-560)
ACPI Error: Method parse/execution failed [\_TZ.TZ00._TMP] (Node 0xfffffe886b45ec50), AE_ERROR (20170303/psparse-560)
acpitz0: levels: critical 119.0 C
ACPI Error: Result stack is empty! State=0xfffffe8139bfac10 (20170303/dswstate-110)
ACPI Exception: AE_AML_NO_RETURN_VALUE, Missing or null operand (20170303/dsutils-704)
ACPI Exception: AE_AML_NO_RETURN_VALUE, While creating Arg 0 (20170303/dsutils-830)
ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.H_EC.ECRD] (Node 0xfffffe886b4e1210), AE_AML_NO_RETURN_VALUE (20170303/psparse-560)
ACPI Error: Method parse/execution failed [\_TZ.TZ00._TMP] (Node 0xfffffe886b45ec50), AE_AML_NO_RETURN_VALUE (20170303/psparse-560)
acpitz1 at acpi0 (TZ01)
acpiec0: command timed out, state 2
ACPI Exception: AE_ERROR, Returned by Handler for [EmbeddedControl] (20170303/evregion-317)
ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.H_EC.ECRD] (Node 0xfffffe886b4e1210), AE_ERROR (20170303/psparse-560)
ACPI Error: Method parse/execution failed [\_TZ.TZ01._TMP] (Node 0xfffffe886b45eb10), AE_ERROR (20170303/psparse-560)
acpitz1: levels: critical 119.0 C, passive cooling
ACPI Error: Result stack is empty! State=0xfffffe8139bfa810 (20170303/dswstate-110)
ACPI Exception: AE_AML_NO_RETURN_VALUE, Missing or null operand (20170303/dsutils-704)
ACPI Exception: AE_AML_NO_RETURN_VALUE, While creating Arg 0 (20170303/dsutils-830)
ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.H_EC.ECRD] (Node 0xfffffe886b4e1210), AE_AML_NO_RETURN_VALUE (20170303/psparse-560)
ACPI Error: Method parse/execution failed [\_TZ.TZ01._TMP] (Node 0xfffffe886b45eb10), AE_AML_NO_RETURN_VALUE (20170303/psparse-560)
Full dmesg is here: https://www.netbsd.org/~alnsn/ovh-dmesg-netbsd8.out
Also, dmesg on debian https://www.netbsd.org/~alnsn/ovh-dmesg-debian.out
>How-To-Repeat:
>Fix:
Not known.
>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52763 CVS commit: src/sys/dev/acpi
Date: Tue, 18 Jul 2023 10:04:15 +0000
Module Name: src
Committed By: riastradh
Date: Tue Jul 18 10:04:14 UTC 2023
Modified Files:
src/sys/dev/acpi: acpi_ec.c
Log Message:
acpiec(4): Set sc_got_sci only when a transaction is over.
Before, when the acpiec thread noticed an SCI had been requested and
entered acpiec_gpe_state_machine to send the query command, it would
see the SCI is still requested -- because it had yet to acknowledge
it by setting the query command! -- and think the EC was asking for a
_second_ SCI.
So once the first SCI transaction was over, it would start a second
one, even though the EC hadn't asked for another -- and this would
wedge on some ECs.
Now, acpiec_gpe_state_machine waits to see what state we transition
to before taking the SCI bit to mean we need to notify the acpiec
thread to handle another query.
That way, when the acpiec thread enters acpiec_gpe_state_machine with
EC_STATE_QUERY, it can send the query command first, with the side
effect of clearing the SCI bit in subsequent reads of the status
register, and it won't think another SCI has been requested until it
returns to EC_STATE_FREE and sees the SCI bit set again in the status
register.
Possibly relevant PRs:
PR kern/53135
PR kern/52763
PR kern/57162
To generate a diff of this commit:
cvs rdiff -u -r1.91 -r1.92 src/sys/dev/acpi/acpi_ec.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52763 CVS commit: [netbsd-10] src/sys
Date: Sun, 30 Jul 2023 12:01:54 +0000
Module Name: src
Committed By: martin
Date: Sun Jul 30 12:01:54 UTC 2023
Modified Files:
src/sys/arch/amd64/conf [netbsd-10]: ALL
src/sys/arch/i386/conf [netbsd-10]: ALL
src/sys/dev/acpi [netbsd-10]: acpi_ec.c files.acpi
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #259):
sys/dev/acpi/acpi_ec.c: revision 1.102
sys/dev/acpi/acpi_ec.c: revision 1.103
sys/dev/acpi/acpi_ec.c: revision 1.104
sys/dev/acpi/acpi_ec.c: revision 1.105
sys/dev/acpi/acpi_ec.c: revision 1.106
sys/dev/acpi/acpi_ec.c: revision 1.107
sys/dev/acpi/acpi_ec.c: revision 1.108
sys/dev/acpi/acpi_ec.c: revision 1.90
sys/dev/acpi/acpi_ec.c: revision 1.91
sys/dev/acpi/acpi_ec.c: revision 1.92
sys/dev/acpi/acpi_ec.c: revision 1.93
sys/dev/acpi/acpi_ec.c: revision 1.94
sys/dev/acpi/files.acpi: revision 1.128
sys/dev/acpi/acpi_ec.c: revision 1.95
sys/dev/acpi/acpi_ec.c: revision 1.96
sys/dev/acpi/acpi_ec.c: revision 1.97
sys/arch/amd64/conf/ALL: revision 1.179
sys/dev/acpi/acpi_ec.c: revision 1.98
sys/dev/acpi/acpi_ec.c: revision 1.99
sys/dev/acpi/acpi_ec.c: revision 1.87
sys/dev/acpi/acpi_ec.c: revision 1.88
sys/dev/acpi/acpi_ec.c: revision 1.89
sys/arch/i386/conf/ALL: revision 1.511
sys/dev/acpi/acpi_ec.c: revision 1.100
sys/dev/acpi/acpi_ec.c: revision 1.101
acpiec(4): Record device_t self.
Not used yet, to be used soon for device_printf and to allow making
some of the internal functions a little more type-safe later.
acpiec(4): New ACPIEC_DEBUG option.
Value is bit mask of debug messages to enable.
Enable in x86/ALL kernels.
No functional change intended when the option is off.
acpiec(4): Clarify lock order and sprinkle lock assertions.
No functional change intended.
acpiec(4): Sprinkle comments.
Note where this code is abusing cv_wait and needs a loop to handle
spurious wakeups.
No functional change intended.
acpiec(4): Assert state is free when we start a transaction.
No functional change intended.
acpiec(4): Set sc_got_sci only when a transaction is over.
Before, when the acpiec thread noticed an SCI had been requested and
entered acpiec_gpe_state_machine to send the query command, it would
see the SCI is still requested -- because it had yet to acknowledge
it by setting the query command! -- and think the EC was asking for a
_second_ SCI.
So once the first SCI transaction was over, it would start a second
one, even though the EC hadn't asked for another -- and this would
wedge on some ECs.
Now, acpiec_gpe_state_machine waits to see what state we transition
to before taking the SCI bit to mean we need to notify the acpiec
thread to handle another query.
That way, when the acpiec thread enters acpiec_gpe_state_machine with
EC_STATE_QUERY, it can send the query command first, with the side
effect of clearing the SCI bit in subsequent reads of the status
register, and it won't think another SCI has been requested until it
returns to EC_STATE_FREE and sees the SCI bit set again in the status
register.
Possibly relevant PRs:
PR kern/53135
PR kern/52763
PR kern/57162
acpiec(4): Fix cv_wait loop around sc->sc_got_sci.
That is, make it actually loop as required, so it gracefully handles
spurious wakeups instead of barging into invalid states.
acpiec(4): Fix interrupt wait loop in acpiec_gpe_query thread.
acpiec(4): Fix cv_timedwait abuse in acpiec_read/write.
acpiec(4): Don't touch sc->sc_state outside sc->sc_mtx.
acpiec(4): Merge returns in acpiec_read/write.
No functional change intended.
acpiec(4): Factor wait logic out.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_gpe_state_machine.
Simpler, type-safer.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_callout.
Simpler.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_gpe_handler.
Simpler.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_lock/unlock.
Simpler, type-safer.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_read/write.
Simpler, type-safer.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_gpe_query thread.
Simpler.
No functional change intended.
acpiec(4): Pass softc, not device_t, to acpiec_space_handler.
Better to keep the device_t isolated to public interfaces. Simpler
internally this way.
No functional change intended.
acpiec(4): Factor out if (state == FREE) cv_signal(sc_cv).
In principle this could have a functional change, but at worst, it is
to signal more wakeups than needed, which should always be safe.
acpiec(4): Take a lock around acpiec_cold updates.
Otherwise we race with readers -- probably harmlessly, but let's
avoid the appearance of problems.
XXX Maybe acpiec_suspend and acpiec_shutdown should interrupt
transactions and force them to fail promptly?
XXX This looks bad because acpiec_cold is global and sc->sc_mtx
doesn't look like it's global, but we expect to have only one
acpiec(4) device anyway from what I understand. Maybe we should move
acpiec_cold into the softc?
acpiec(4): One more debug message about read/write polling timeout.
To generate a diff of this commit:
cvs rdiff -u -r1.174 -r1.174.4.1 src/sys/arch/amd64/conf/ALL
cvs rdiff -u -r1.503 -r1.503.4.1 src/sys/arch/i386/conf/ALL
cvs rdiff -u -r1.86 -r1.86.4.1 src/sys/dev/acpi/acpi_ec.c
cvs rdiff -u -r1.126 -r1.126.4.1 src/sys/dev/acpi/files.acpi
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.