NetBSD Problem Report #52763

From www@NetBSD.org  Sun Nov 26 11:28:37 2017
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 16B4C7A187
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 26 Nov 2017 11:28:37 +0000 (UTC)
Message-Id: <20171126112836.34D257A1E7@mollari.NetBSD.org>
Date: Sun, 26 Nov 2017 11:28:36 +0000 (UTC)
From: alnsn@NetBSD.org
Reply-To: alnsn@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: acpiec command timed out, state 2 on Intel E3-1245 v6, S1200SP
X-Send-Pr-Version: www-1.0

>Number:         52763
>Category:       kern
>Synopsis:       acpiec command timed out, state 2 on Intel E3-1245 v6, S1200SP
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Nov 26 11:30:00 +0000 2017
>Last-Modified:  Sun Jul 30 12:05:00 +0000 2023
>Originator:     Alexander Nasonov
>Release:        NetBSD 8.0_BETA (GENERIC.201711180940Z)
>Organization:
XMM SWAP LTD
>Environment:
NetBSD hostname 8.0_BETA NetBSD 8.0_BETA (GENERIC.201711180940Z) amd64
>Description:
On a rented OVH box (no physical assess), I see these errors:


ACPI Exception: AE_ERROR, Returned by Handler for [EmbeddedControl] (20170303/evregion-317)
ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.H_EC.ECRD] (Node 0xfffffe886b4e1210), AE_ERROR (20170303/psparse-560)
ACPI Error: Method parse/execution failed [\_TZ.TZ00._TMP] (Node 0xfffffe886b45ec50), AE_ERROR (20170303/psparse-560)
acpitz0: levels: critical 119.0 C
ACPI Error: Result stack is empty! State=0xfffffe8139bfac10 (20170303/dswstate-110)
ACPI Exception: AE_AML_NO_RETURN_VALUE, Missing or null operand (20170303/dsutils-704)
ACPI Exception: AE_AML_NO_RETURN_VALUE, While creating Arg 0 (20170303/dsutils-830)
ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.H_EC.ECRD] (Node 0xfffffe886b4e1210), AE_AML_NO_RETURN_VALUE (20170303/psparse-560)
ACPI Error: Method parse/execution failed [\_TZ.TZ00._TMP] (Node 0xfffffe886b45ec50), AE_AML_NO_RETURN_VALUE (20170303/psparse-560)
acpitz1 at acpi0 (TZ01)
acpiec0: command timed out, state 2
ACPI Exception: AE_ERROR, Returned by Handler for [EmbeddedControl] (20170303/evregion-317)
ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.H_EC.ECRD] (Node 0xfffffe886b4e1210), AE_ERROR (20170303/psparse-560)
ACPI Error: Method parse/execution failed [\_TZ.TZ01._TMP] (Node 0xfffffe886b45eb10), AE_ERROR (20170303/psparse-560)
acpitz1: levels: critical 119.0 C, passive cooling
ACPI Error: Result stack is empty! State=0xfffffe8139bfa810 (20170303/dswstate-110)
ACPI Exception: AE_AML_NO_RETURN_VALUE, Missing or null operand (20170303/dsutils-704)
ACPI Exception: AE_AML_NO_RETURN_VALUE, While creating Arg 0 (20170303/dsutils-830)
ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.H_EC.ECRD] (Node 0xfffffe886b4e1210), AE_AML_NO_RETURN_VALUE (20170303/psparse-560)
ACPI Error: Method parse/execution failed [\_TZ.TZ01._TMP] (Node 0xfffffe886b45eb10), AE_AML_NO_RETURN_VALUE (20170303/psparse-560)

Full dmesg is here: https://www.netbsd.org/~alnsn/ovh-dmesg-netbsd8.out

Also, dmesg on debian https://www.netbsd.org/~alnsn/ovh-dmesg-debian.out
>How-To-Repeat:

>Fix:
Not known.

>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52763 CVS commit: src/sys/dev/acpi
Date: Tue, 18 Jul 2023 10:04:15 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Tue Jul 18 10:04:14 UTC 2023

 Modified Files:
 	src/sys/dev/acpi: acpi_ec.c

 Log Message:
 acpiec(4): Set sc_got_sci only when a transaction is over.

 Before, when the acpiec thread noticed an SCI had been requested and
 entered acpiec_gpe_state_machine to send the query command, it would
 see the SCI is still requested -- because it had yet to acknowledge
 it by setting the query command! -- and think the EC was asking for a
 _second_ SCI.

 So once the first SCI transaction was over, it would start a second
 one, even though the EC hadn't asked for another -- and this would
 wedge on some ECs.

 Now, acpiec_gpe_state_machine waits to see what state we transition
 to before taking the SCI bit to mean we need to notify the acpiec
 thread to handle another query.

 That way, when the acpiec thread enters acpiec_gpe_state_machine with
 EC_STATE_QUERY, it can send the query command first, with the side
 effect of clearing the SCI bit in subsequent reads of the status
 register, and it won't think another SCI has been requested until it
 returns to EC_STATE_FREE and sees the SCI bit set again in the status
 register.

 Possibly relevant PRs:

 PR kern/53135
 PR kern/52763
 PR kern/57162


 To generate a diff of this commit:
 cvs rdiff -u -r1.91 -r1.92 src/sys/dev/acpi/acpi_ec.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52763 CVS commit: [netbsd-10] src/sys
Date: Sun, 30 Jul 2023 12:01:54 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Jul 30 12:01:54 UTC 2023

 Modified Files:
 	src/sys/arch/amd64/conf [netbsd-10]: ALL
 	src/sys/arch/i386/conf [netbsd-10]: ALL
 	src/sys/dev/acpi [netbsd-10]: acpi_ec.c files.acpi

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #259):

 	sys/dev/acpi/acpi_ec.c: revision 1.102
 	sys/dev/acpi/acpi_ec.c: revision 1.103
 	sys/dev/acpi/acpi_ec.c: revision 1.104
 	sys/dev/acpi/acpi_ec.c: revision 1.105
 	sys/dev/acpi/acpi_ec.c: revision 1.106
 	sys/dev/acpi/acpi_ec.c: revision 1.107
 	sys/dev/acpi/acpi_ec.c: revision 1.108
 	sys/dev/acpi/acpi_ec.c: revision 1.90
 	sys/dev/acpi/acpi_ec.c: revision 1.91
 	sys/dev/acpi/acpi_ec.c: revision 1.92
 	sys/dev/acpi/acpi_ec.c: revision 1.93
 	sys/dev/acpi/acpi_ec.c: revision 1.94
 	sys/dev/acpi/files.acpi: revision 1.128
 	sys/dev/acpi/acpi_ec.c: revision 1.95
 	sys/dev/acpi/acpi_ec.c: revision 1.96
 	sys/dev/acpi/acpi_ec.c: revision 1.97
 	sys/arch/amd64/conf/ALL: revision 1.179
 	sys/dev/acpi/acpi_ec.c: revision 1.98
 	sys/dev/acpi/acpi_ec.c: revision 1.99
 	sys/dev/acpi/acpi_ec.c: revision 1.87
 	sys/dev/acpi/acpi_ec.c: revision 1.88
 	sys/dev/acpi/acpi_ec.c: revision 1.89
 	sys/arch/i386/conf/ALL: revision 1.511
 	sys/dev/acpi/acpi_ec.c: revision 1.100
 	sys/dev/acpi/acpi_ec.c: revision 1.101

 acpiec(4): Record device_t self.

 Not used yet, to be used soon for device_printf and to allow making
 some of the internal functions a little more type-safe later.
 acpiec(4): New ACPIEC_DEBUG option.

 Value is bit mask of debug messages to enable.

 Enable in x86/ALL kernels.

 No functional change intended when the option is off.

 acpiec(4): Clarify lock order and sprinkle lock assertions.
 No functional change intended.

 acpiec(4): Sprinkle comments.
 Note where this code is abusing cv_wait and needs a loop to handle
 spurious wakeups.
 No functional change intended.

 acpiec(4): Assert state is free when we start a transaction.
 No functional change intended.

 acpiec(4): Set sc_got_sci only when a transaction is over.

 Before, when the acpiec thread noticed an SCI had been requested and
 entered acpiec_gpe_state_machine to send the query command, it would
 see the SCI is still requested -- because it had yet to acknowledge
 it by setting the query command! -- and think the EC was asking for a
 _second_ SCI.

 So once the first SCI transaction was over, it would start a second
 one, even though the EC hadn't asked for another -- and this would
 wedge on some ECs.

 Now, acpiec_gpe_state_machine waits to see what state we transition
 to before taking the SCI bit to mean we need to notify the acpiec
 thread to handle another query.

 That way, when the acpiec thread enters acpiec_gpe_state_machine with
 EC_STATE_QUERY, it can send the query command first, with the side
 effect of clearing the SCI bit in subsequent reads of the status
 register, and it won't think another SCI has been requested until it
 returns to EC_STATE_FREE and sees the SCI bit set again in the status
 register.

 Possibly relevant PRs:
 PR kern/53135
 PR kern/52763
 PR kern/57162

 acpiec(4): Fix cv_wait loop around sc->sc_got_sci.

 That is, make it actually loop as required, so it gracefully handles
 spurious wakeups instead of barging into invalid states.

 acpiec(4): Fix interrupt wait loop in acpiec_gpe_query thread.

 acpiec(4): Fix cv_timedwait abuse in acpiec_read/write.

 acpiec(4): Don't touch sc->sc_state outside sc->sc_mtx.

 acpiec(4): Merge returns in acpiec_read/write.
 No functional change intended.

 acpiec(4): Factor wait logic out.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_gpe_state_machine.
 Simpler, type-safer.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_callout.
 Simpler.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_gpe_handler.
 Simpler.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_lock/unlock.
 Simpler, type-safer.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_read/write.
 Simpler, type-safer.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_gpe_query thread.
 Simpler.
 No functional change intended.

 acpiec(4): Pass softc, not device_t, to acpiec_space_handler.
 Better to keep the device_t isolated to public interfaces.  Simpler
 internally this way.
 No functional change intended.

 acpiec(4): Factor out if (state == FREE) cv_signal(sc_cv).

 In principle this could have a functional change, but at worst, it is
 to signal more wakeups than needed, which should always be safe.
 acpiec(4): Take a lock around acpiec_cold updates.

 Otherwise we race with readers -- probably harmlessly, but let's
 avoid the appearance of problems.
 XXX Maybe acpiec_suspend and acpiec_shutdown should interrupt
 transactions and force them to fail promptly?
 XXX This looks bad because acpiec_cold is global and sc->sc_mtx
 doesn't look like it's global, but we expect to have only one
 acpiec(4) device anyway from what I understand.  Maybe we should move
 acpiec_cold into the softc?

 acpiec(4): One more debug message about read/write polling timeout.


 To generate a diff of this commit:
 cvs rdiff -u -r1.174 -r1.174.4.1 src/sys/arch/amd64/conf/ALL
 cvs rdiff -u -r1.503 -r1.503.4.1 src/sys/arch/i386/conf/ALL
 cvs rdiff -u -r1.86 -r1.86.4.1 src/sys/dev/acpi/acpi_ec.c
 cvs rdiff -u -r1.126 -r1.126.4.1 src/sys/dev/acpi/files.acpi

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.