NetBSD Problem Report #48142
From Wolfgang.Stukenbrock@nagler-company.com Wed Aug 21 12:57:07 2013
Return-Path: <Wolfgang.Stukenbrock@nagler-company.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 8B7D8718A5
for <gnats-bugs@gnats.NetBSD.org>; Wed, 21 Aug 2013 12:57:07 +0000 (UTC)
Message-Id: <20130821125653.7D875123B93@test-s0.nagler-company.com>
Date: Wed, 21 Aug 2013 14:56:53 +0200 (CEST)
From: Wolfgang.Stukenbrock@nagler-company.com
Reply-To: Wolfgang.Stukenbrock@nagler-company.com
To: gnats-bugs@gnats.NetBSD.org
Subject: i8254 timer stop working during boot - system lockup during boot
X-Send-Pr-Version: 3.95
>Number: 48142
>Category: port-amd64
>Synopsis: i8254 timer stop working during boot - system lockup during boot
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-amd64-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Aug 21 13:00:00 +0000 2013
>Last-Modified: Thu Aug 22 09:55:00 +0000 2013
>Originator: Dr. Wolfgang Stukenbrock
>Release: NetBSD 6.1
>Organization:
Dr. Nagler & Company GmbH
>Environment:
System: NetBSD test-s0 5.1.2 NetBSD 5.1.2 (NSW-WS) #3: Fri Dec 21 15:15:43 CET 2012 wgstuken@test-s0:/usr/src/sys/arch/amd64/compile/NSW-WS amd64
Architecture: x86_64
Machine: amd64
>Description:
While initializing the hardware on x86-systems in sys/arch/x86/x86/cpu.c i8254_delay() is called directly
bypassing the DELAY() macro that may call this function of somthing else - e.g. lapic_delay() from
sys/arch/x86/x86/lapic.c. I'm not shure if this is by design or by error ...
If lapic is setup, the i8254 timer counter frequency is set to '0' - full cycle as far as I understand.
OK - the time-source is getting slower as before, but it is still running. All calls to i8254_delay() will wait
longer as before, but that does not realy hurt here and nobody has recognized this as a problem before.
On our Supermicro X8DAH (with one CPU only) we have the problem, that with some kernel configurations
the system will hang during startup while starting the "other" CPU's.
I've debugged into it and found, that the i8254 timer has stopped counting - for unknown reasons.
This happens after the the isa subsystems is initialized.
If i8254_delay() is used after this - e.g. at the end of isaattach() for debugging purpose, it will never return.
At start of this routine the timer is still working fine.
No indication of the problem is reported - the user sits there and is wondering ...
The problem is triggered by the finsio driver on port 0x4e, but I'm not shure if it is the fault of this driver or
if the timer registers are visible on other ports than 0x40 and 0x43 on this board too.
Also still not tested on other motherboards - perhaps others are affected too.
>How-To-Repeat:
Setup "finsio0 at isa? port 0x4e" in a kernel configuration on a Supermicro X8DAH board,
The kernel will freeze during startup.
>Fix:
Not 100% shure, because my knowloedge about the constrains during the startup is to small.
Perhaps replacing the i8254_delay() in arch/x86/x86/cpu.c with DELAY() would be a good idea.
It solves the problem for me, but I'm not shure if there are other side effects.
An other way to introduce a workaround is to catch the case that the timer stops working in i8254_delay() in
sys/arch/x86/isa/clock.c.
If we assume that each loop takes longer than one timer tick, we can decrement the remaining counter by one
each time we read the same tick-value again to avoid an endless loop here.
This aproach introduces some slowdown while bringing up the "other" CPU's, but works fine too without
accessing "other resources" as the first fix would do ...
remark: if complied as XEN, then xen_delay() would be used by the DELAY() macro. Not shure if this is OK or not,
or if sys/arch/x86/x86/cpu.c goes to a XEN kernel or not.
remark: i8254_delay() will not be used later directly again. So the second aproach will only slow down the
boot process. (At least I've found no other references to it in the souces.)
Here is a patch that uses the second aproach for sys/arch/x86/isa/clock.c.
Feel free to use it or to replace i8254_delay() with DELAY() in sys/arch/x86/x86/cpu.c
Perhaps it would make sence to add some addition code to report the problem to the user if it happes
the first time, but on very very fast systems in the future this message may be misleading ...
--- clock.c 2013/08/21 12:43:37 1.1
+++ clock.c 2013/08/21 12:48:26
@@ -482,6 +482,9 @@
cur_tick = gettick();
if (cur_tick > initial_tick)
delta = rtclock_tval - (cur_tick - initial_tick);
+// avoid looping forever if timer stops counting for any reason
+ else if (cur_tick == initial_tick)
+ delta = 1;
else
delta = initial_tick - cur_tick;
if (delta < 0 || delta >= rtclock_tval / 2) {
@@ -500,6 +503,9 @@
cur_tick = gettick();
if (cur_tick > initial_tick)
remaining -= rtclock_tval - (cur_tick - initial_tick);
+// avoid looping forever if timer stops counting for any reason
+ else if (cur_tick == initial_tick)
+ remaining -= 1;
else
remaining -= initial_tick - cur_tick;
#endif
>Audit-Trail:
From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: port-amd64/48142: i8254 timer stop working during boot - system lockup during boot
Date: Thu, 22 Aug 2013 08:30:04 +0100
On Wed, Aug 21, 2013 at 01:00:01PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
> >Number: 48142
> >Category: port-amd64
> >Synopsis: i8254 timer stop working during boot - system lockup during boot
...
> The problem is triggered by the finsio driver on port 0x4e,
> but I'm not shure if it is the fault of this driver or
> if the timer registers are visible on other ports than 0x40 and
> 0x43 on this board too.
Why are you enabling the finsio driver?
Enabling 'random' ISA drivers will always cause grief, the 'grope'
code can easily destroy other hardware.
The finsio grope is very nasty - it does io-writes - so you really shouldn't
enable it unless you really need it and expect the grope to succeed.
Restoring the origanl value might help.
I'd certainly add some debug to read the 8254 timer ports during the
finsio grop code - so work out exactly when they are clobbered.
IIRC reads from the timer registers are always ok.
Might be worth doing byte reads of ports 0x40 through 0x4f just to
see it the timer registers are aliased.
David
--
David Laight: david@l8s.co.uk
From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: port-amd64-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: port-amd64/48142: i8254 timer stop working during boot - system lockup during boot
Date: Thu, 22 Aug 2013 11:52:25 +0200
Hi,
I've enabled it to see if it will be found or not.
This is the only way known to me to detect "all" suported ISA-hardware
(sensors in most cases) of a motherboard. Even the HW-manuals for many
motherboards don't say anything about the available hardware and the
addresses where they are located.
Of cause, the problem will go away if finsio is disabled.
And of case it is no good idea to have useless parts configured in a
kernel - especialy parts, without any type indication that need to be
accessed in order to find out if they are present or not. (ISA,
VMEbus-devices, ...)
But I think it would be very nice to either have a workaround if the
timer has died or at least run into a panic.
It is no good idea that the system silently lock up during boot.
I can't see any warnings about potential problems in the finsio man page
too.
I've just tested "the problem" on an Intel S3000 board and it will also
lockup if finsio is enabled.
So either the 8257 has 16 or more register, or it is common to have
access on multiple addresses for the chip - some adress lines not
decoded to keep the HW tiny... (No manual for 8257 available to me at
the moment to check this ...)
If it would be possible for the finsio-driver to test for this situation
first, it would be a very good idea - and perhaps a warning message
during boot if the mapping problem is detected.
Also a warning/comment in the kernel config file, that the finsio-driver
may blow up something would be nice. This may warn "normal" users that
there are potential problems and they should try deactivating the driver
again if the system runs into problems.
If the problem is (or will be fixed) in next update and/or next release
of finsio-driver, I think you can close this PR.
best regards
W. Stukenbrock
David Laight wrote:
> The following reply was made to PR port-amd64/48142; it has been noted by GNATS.
>
> From: David Laight <david@l8s.co.uk>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: port-amd64/48142: i8254 timer stop working during boot - system lockup during boot
> Date: Thu, 22 Aug 2013 08:30:04 +0100
>
> On Wed, Aug 21, 2013 at 01:00:01PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
> > >Number: 48142
> > >Category: port-amd64
> > >Synopsis: i8254 timer stop working during boot - system lockup during boot
> ...
> > The problem is triggered by the finsio driver on port 0x4e,
> > but I'm not shure if it is the fault of this driver or
> > if the timer registers are visible on other ports than 0x40 and
> > 0x43 on this board too.
>
> Why are you enabling the finsio driver?
> Enabling 'random' ISA drivers will always cause grief, the 'grope'
> code can easily destroy other hardware.
> The finsio grope is very nasty - it does io-writes - so you really shouldn't
> enable it unless you really need it and expect the grope to succeed.
> Restoring the origanl value might help.
>
> I'd certainly add some debug to read the 8254 timer ports during the
> finsio grop code - so work out exactly when they are clobbered.
> IIRC reads from the timer registers are always ok.
> Might be worth doing byte reads of ports 0x40 through 0x4f just to
> see it the timer registers are aliased.
>
> David
>
> --
> David Laight: david@l8s.co.uk
>
>
>
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.