NetBSD Problem Report #56568
From mlelstv@arnold.localdomain Wed Dec 22 15:39:55 2021
Return-Path: <mlelstv@arnold.localdomain>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id E44851A9239
for <gnats-bugs@gnats.NetBSD.org>; Wed, 22 Dec 2021 15:39:54 +0000 (UTC)
Message-Id: <20211222153950.1DF22BC141@arnold.localdomain>
Date: Wed, 22 Dec 2021 16:39:50 +0100 (CET)
From: mlelstv@netbsd.org
Reply-To: mlelstv@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: ipmi.c 1.7 causes large boot delays
X-Send-Pr-Version: 3.95
>Number: 56568
>Category: kern
>Synopsis: ipmi.c 1.7 causes large boot delays
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Dec 22 15:40:01 +0000 2021
>Last-Modified: Tue Dec 03 20:00:01 +0000 2024
>Originator: mlelstv@netbsd.org
>Release: NetBSD 9.99.92
>Organization:
>Environment:
System: NetBSD pussyfoot 9.99.92 NetBSD 9.99.92 (PUSSYFOOT_XEN3_DOM0) #4: Wed Dec 22 15:04:19 UTC 2021 mlelstv@slowpoke:/scratch2/obj.amd64/scratch/netbsd-current/src/sys/arch/amd64/compile/PUSSYFOOT_XEN3_DOM0 amd64
Architecture: x86_64
Machine: amd64
>Description:
ipmi(4) talks to a separate system (the BMC) when attaching. The communication
may be done very slowly, often it is passed through the keyboard controller
and can take several ten seconds.
The driver used to set the DV_ATTACH_INPROGRESS flag to signal
that the attachment hasn't completed when ipmi_attach returns.
The flag isn't used anymore by autoconf.
Changing this to config_pending_incr/descr has a different effect,
it makes the boot process wait for the attachment to finish. This is
neither wanted nor necessary, you only have to wait for it before
you can detach the driver again.
On this system (HP ML110 G4) the attachment takes 40 seconds.
>How-To-Repeat:
>Fix:
>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: mlelstv@netbsd.org,
Hauke Fath <hf@spg.tu-darmstadt.de>
Subject: Re: kern/56568: ipmi.c 1.7 causes large boot delays
Date: Fri, 24 Dec 2021 13:18:10 +0000
There seem to be different opinions about whether or how long ipmi
should delay boot -- in https://gnats.netbsd.org/56539 the problem was
that boot happened too fast and ipmi wasn't ready before wdogctl tried
to set it up!
I don't have an opinion about whether ipmi should delay boot or
whether there should be a different mechanism for wdogctl to wait.
Maybe ipmi_attach can publish the watchdog immediately, and
ipmi_watchdog_setmode can wait with cv_wait_sig for the thread to
finish whatever it needs to do first -- and, if we don't use
config_pending_incr/decr, then ipmi_detach will need to explicitly
wait for the thread to finish starting up before killing it.
(But abusing the autoconf DVF_ATTACH_INPROGRESS flag for this was
bogus and fundamentally buggy, so whatever change you make, please
don't bring that part back!)
From: Taylor R Campbell <campbell+netbsd-tech-kern@mumble.net>
To: Michael van Elst <mlelstv@NetBSD.org>,
Hauke Fath <hf@spg.tu-darmstadt.de>,
Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>,
Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/56568: ipmi.c 1.7 causes large boot delays
Date: Tue, 3 Dec 2024 01:00:57 +0000
This is a multi-part message in MIME format.
--=_wXwkmwLMJljksrXeYBkEj62oCldrq9cV
The attached patch -- compile-tested only -- tries to resolve the
present conflict between long ipmi(4) initialization time and early
use of wdogctl(8) at boot by:
(a) allowing boot to proceed with config_pending_decr sooner, before
device initialization has completed; and
(b) making watchdog configuration wait until initialization has
completed.
Testing welcome!
--=_wXwkmwLMJljksrXeYBkEj62oCldrq9cV
Content-Type: text/plain; charset="ISO-8859-1"; name="pr56568-ipmidelay"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr56568-ipmidelay.patch"
# HG changeset patch
# User Taylor R Campbell <riastradh@NetBSD.org>
# Date 1733184792 0
# Tue Dec 03 00:13:12 2024 +0000
# Branch trunk
# Node ID 32dee81c8e034a77811fddb38816c166f46bb530
# Parent 9e743b738fb8d4ce4e33ddccdc9acb6e8e81f9df
# EXP-Topic riastradh-pr56568-ipmidelay
ipmi(4): Don't hold up boot -- just hold up wdogctl(8).
While here, omit needless volatile qualifier on variables serialized
by mutex (or by config_pending_incr/decr -- ipmi_detach can't run
until ipmi_thread does config_pending_decr).
PR kern/56568: ipmi.c 1.7 causes large boot delays
diff -r 9e743b738fb8 -r 32dee81c8e03 sys/dev/ipmi.c
--- a/sys/dev/ipmi.c Sun Dec 01 11:10:34 2024 +0000
+++ b/sys/dev/ipmi.c Tue Dec 03 00:13:12 2024 +0000
@@ -1998,6 +1998,23 @@ ipmi_thread(void *cookie)
/* Map registers */
ipmi_map_regs(sc, ia);
=20
+ /* Setup Watchdog timer */
+ sc->sc_wdog.smw_name =3D device_xname(sc->sc_dev);
+ sc->sc_wdog.smw_cookie =3D sc;
+ sc->sc_wdog.smw_setmode =3D ipmi_watchdog_setmode;
+ sc->sc_wdog.smw_tickle =3D ipmi_watchdog_tickle;
+ sysmon_wdog_register(&sc->sc_wdog);
+
+ /* Set up a power handler so we can possibly sleep */
+ if (!pmf_device_register(self, ipmi_suspend, NULL))
+ aprint_error_dev(self, "couldn't establish a power handler=
\n");
+
+ /*
+ * Allow boot to proceed -- we'll do the rest asynchronously
+ * since it requires talking to the device.
+ */
+ config_pending_decr(self);
+
memset(&id, 0, sizeof(id));
if (ipmi_get_device_id(sc, &id))
aprint_error_dev(self, "Failed to re-query device ID\n");
@@ -2098,20 +2115,9 @@ ipmi_thread(void *cookie)
/* setup flag to exclude iic */
ipmi_enabled =3D 1;
=20
- /* Setup Watchdog timer */
- sc->sc_wdog.smw_name =3D device_xname(sc->sc_dev);
- sc->sc_wdog.smw_cookie =3D sc;
- sc->sc_wdog.smw_setmode =3D ipmi_watchdog_setmode;
- sc->sc_wdog.smw_tickle =3D ipmi_watchdog_tickle;
- sysmon_wdog_register(&sc->sc_wdog);
-
- /* Set up a power handler so we can possibly sleep */
- if (!pmf_device_register(self, ipmi_suspend, NULL))
- aprint_error_dev(self, "couldn't establish a power handler=
\n");
-
- config_pending_decr(self);
-
mutex_enter(&sc->sc_poll_mtx);
+ sc->sc_thread_ready =3D true;
+ cv_broadcast(&sc->sc_mode_cv);
while (sc->sc_thread_running) {
while (sc->sc_mode =3D=3D IPMI_MODE_COMMAND)
cv_wait(&sc->sc_mode_cv, &sc->sc_poll_mtx);
@@ -2259,6 +2265,15 @@ ipmi_watchdog_setmode(struct sysmon_wdog
else
sc->sc_wdog.smw_period =3D smwdog->smw_period;
=20
+ /* Wait until the device is initialized */
+ rc =3D 0;
+ mutex_enter(&sc->sc_poll_mtx);
+ while (sc->sc_thread_ready)
+ rc =3D cv_wait_sig(&sc->sc_mode_cv, &sc->sc_poll_mtx);
+ mutex_exit(&sc->sc_poll_mtx);
+ if (rc)
+ return rc;
+
mutex_enter(&sc->sc_cmd_mtx);
/* see if we can properly task to the watchdog */
rc =3D ipmi_sendcmd(sc, BMC_SA, BMC_LUN, APP_NETFN,
diff -r 9e743b738fb8 -r 32dee81c8e03 sys/dev/ipmivar.h
--- a/sys/dev/ipmivar.h Sun Dec 01 11:10:34 2024 +0000
+++ b/sys/dev/ipmivar.h Tue Dec 03 00:13:12 2024 +0000
@@ -101,8 +101,9 @@ struct ipmi_softc {
struct ipmi_bmc_args *sc_iowait_args;
=20
struct ipmi_sensor *current_sensor;
- volatile bool sc_thread_running;
- volatile bool sc_tickle_due;
+ bool sc_thread_running;
+ bool sc_thread_ready;
+ bool sc_tickle_due;
struct sysmon_wdog sc_wdog;
struct sysmon_envsys *sc_envsys;
envsys_data_t *sc_sensor;
--=_wXwkmwLMJljksrXeYBkEj62oCldrq9cV--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Taylor R Campbell <campbell+netbsd-tech-kern@mumble.net>
Cc: Michael van Elst <mlelstv@NetBSD.org>, Hauke Fath <hf@spg.tu-darmstadt.de>,
Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>,
gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/56568: ipmi.c 1.7 causes large boot delays
Date: Tue, 3 Dec 2024 11:42:10 +0100
On Tue, Dec 03, 2024 at 01:00:57AM +0000, Taylor R Campbell wrote:
> The attached patch -- compile-tested only -- tries to resolve the
> present conflict between long ipmi(4) initialization time and early
> use of wdogctl(8) at boot by:
>
> (a) allowing boot to proceed with config_pending_decr sooner, before
> device initialization has completed; and
>
> (b) making watchdog configuration wait until initialization has
> completed.
>
> Testing welcome!
Looks fine on a supermicro board:
...
[ 1.0000000] ipmi0 at mainbus0
...
[ 4.8031702] root file system type: ffs
[ 4.8231724] kern.module.path=/stand/amd64/10.0/modules
[ 5.1331493] entropy: best effort
Tue Dec 3 11:39:01 MET 2024
rcorder: could not open /etc/rc.d/xenbackendd: No such file or directory
rcorder: could not open /etc/rc.d/xend: No such file or directory
Starting watchdog timer.
[ 13.1031692] ipmi0: ID 32.1 IPMI 2.0 Available
Starting root file system check:
"Starting watchdog timer" did cause the 8s pause
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Taylor R Campbell <campbell+netbsd-tech-kern@mumble.net>
Cc: Michael van Elst <mlelstv@NetBSD.org>, Hauke Fath <hf@spg.tu-darmstadt.de>,
Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>,
gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/56568: ipmi.c 1.7 causes large boot delays
Date: Tue, 3 Dec 2024 11:43:25 +0100
On Tue, Dec 03, 2024 at 11:42:10AM +0100, Manuel Bouyer wrote:
> On Tue, Dec 03, 2024 at 01:00:57AM +0000, Taylor R Campbell wrote:
> > The attached patch -- compile-tested only -- tries to resolve the
> > present conflict between long ipmi(4) initialization time and early
> > use of wdogctl(8) at boot by:
> >
> > (a) allowing boot to proceed with config_pending_decr sooner, before
> > device initialization has completed; and
> >
> > (b) making watchdog configuration wait until initialization has
> > completed.
> >
> > Testing welcome!
>
> Looks fine on a supermicro board:
> ...
> [ 1.0000000] ipmi0 at mainbus0
> ...
> [ 4.8031702] root file system type: ffs
> [ 4.8231724] kern.module.path=/stand/amd64/10.0/modules
> [ 5.1331493] entropy: best effort
> Tue Dec 3 11:39:01 MET 2024
> rcorder: could not open /etc/rc.d/xenbackendd: No such file or directory
> rcorder: could not open /etc/rc.d/xend: No such file or directory
> Starting watchdog timer.
> [ 13.1031692] ipmi0: ID 32.1 IPMI 2.0 Available
> Starting root file system check:
>
> "Starting watchdog timer" did cause the 8s pause
Forgot to mention; this is patched netbsd-10 kernel, not head
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: Taylor R Campbell <campbell+netbsd-tech-kern@mumble.net>,
Michael van Elst <mlelstv@NetBSD.org>,
Hauke Fath <hf@spg.tu-darmstadt.de>, gnats-bugs@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: kern/56568: ipmi.c 1.7 causes large boot delays
Date: Tue, 3 Dec 2024 12:46:04 +0100
> Looks fine on a supermicro board:
On my Dell 6515, with a patched 10.0, I get
[ 5.143346] wsmouse0 at ums0 mux 0
[ 5.273346] WARNING: findroot: double match for boot device (bootinfo/bootwedge:sd0 bootinfo/bootwedge:
sd1)
[ 5.273346] raid0: RAID Level 1
[ 5.273346] raid0: Components: /dev/dk1 /dev/dk3
[ 5.273346] raid0: Total Sectors: 930869120 (454525 MB)
[ 5.283346] WARNING: 5 errors while detecting hardware; check system log.
[ 5.283346] boot device: raid0
[ 5.283346] root on raid0a dumps on raid0b
[ 5.293346] root file system type: ffs
[ 5.293346] kern.module.path=/stand/amd64/10.0/modules
[ 14.753360] arp_llinfo_output: source can't be determined: dst=131.220.132.161
[ 15.753362] arp_llinfo_output: source can't be determined: dst=131.220.132.161
[ 16.073362] wsdisplay0: screen 1 added (default, vt100 emulation)
[ 16.073362] wsdisplay0: screen 2 added (default, vt100 emulation)
[ 16.073362] wsdisplay0: screen 3 added (default, vt100 emulation)
[ 16.073362] wsdisplay0: screen 4 added (default, vt100 emulation)
[ 16.753363] arp_llinfo_output: source can't be determined: dst=131.220.132.161
[ 17.753365] arp_llinfo_output: source can't be determined: dst=131.220.132.161
[ 21.793371] arp_llinfo_output: source can't be determined: dst=131.220.132.161
[ 22.793373] arp_llinfo_output: source can't be determined: dst=131.220.132.161
[ 23.793374] arp_llinfo_output: source can't be determined: dst=131.220.132.161
[ 24.793412] arp_llinfo_output: source can't be determined: dst=131.220.132.161
[ 27.793567] arp_llinfo_output: source can't be determined: dst=131.220.132.161
[ 36.794034] arp_llinfo_output: source can't be determined: dst=131.220.132.161
[ 41.424274] ipmi0: autoconfiguration error: get_sdr_partial: recvcmd fails
[ 41.424274] ipmi0: autoconfiguration error: get chunk : 21,16 fails
[ 41.424274] ipmi0: version 32.0 interface KCS iobase 0xca8/0x8 spacing 4
[ 41.424274] ipmi0: ID 32.1 IPMI 2.0 Available +SDRs
[ 41.424274] ipmi0: Additional Chassis Bridge IPMBRcv FRU SEL SDR Sensor
[ 41.424274] ipmi0: Manufacturer 002a2 Product 01a2
[ 41.424274] ipmi0: Firmware 7.0
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
Cc: Taylor R Campbell <campbell+netbsd-tech-kern@mumble.net>,
Michael van Elst <mlelstv@NetBSD.org>,
Hauke Fath <hf@spg.tu-darmstadt.de>, gnats-bugs@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: kern/56568: ipmi.c 1.7 causes large boot delays
Date: Tue, 3 Dec 2024 17:31:34 +0100
On Tue, Dec 03, 2024 at 12:46:04PM +0100, Edgar Fuß wrote:
> [...]
> [ 41.424274] ipmi0: autoconfiguration error: get_sdr_partial: recvcmd fails
> [ 41.424274] ipmi0: autoconfiguration error: get chunk : 21,16 fails
is this a regression ?
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: Taylor R Campbell <campbell+netbsd-tech-kern@mumble.net>,
Michael van Elst <mlelstv@NetBSD.org>,
Hauke Fath <hf@spg.tu-darmstadt.de>, gnats-bugs@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: kern/56568: ipmi.c 1.7 causes large boot delays
Date: Tue, 3 Dec 2024 17:35:47 +0100
> is this a regression ?
Yes, it is.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56568 CVS commit: src/sys/dev
Date: Tue, 3 Dec 2024 19:55:33 +0000
Module Name: src
Committed By: riastradh
Date: Tue Dec 3 19:55:33 UTC 2024
Modified Files:
src/sys/dev: ipmi.c ipmivar.h
Log Message:
ipmi(4): Don't hold up boot -- just hold up wdogctl(8).
While here, omit needless volatile qualifier on variables serialized
by mutex (or by config_pending_incr/decr -- ipmi_detach can't run
until ipmi_thread does config_pending_decr).
It's not entirely clear that we even need to hold up wdogctl(8).
Perhaps we can do that in parallel -- or really, interleaved -- with
scanning for SDR sensors. But this is a safer change for now without
requiring thought about that.
PR kern/56568: ipmi.c 1.7 causes large boot delays
To generate a diff of this commit:
cvs rdiff -u -r1.10 -r1.11 src/sys/dev/ipmi.c
cvs rdiff -u -r1.3 -r1.4 src/sys/dev/ipmivar.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.