NetBSD Problem Report #37611
From martin@duskware.de Tue Dec 25 12:41:06 2007
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 2B6FC63B852
for <gnats-bugs@gnats.NetBSD.org>; Tue, 25 Dec 2007 12:41:06 +0000 (UTC)
Message-Id: <20071225124103.B1C2533AA2@mail.duskware.de>
Date: Tue, 25 Dec 2007 13:41:03 +0100 (CET)
From: martin@duskware.de
Reply-To: martin@duskware.de
To: gnats-bugs@NetBSD.org
Subject: macs with cuda adb don't boot -current
X-Send-Pr-Version: 3.95
>Number: 37611
>Category: port-mac68k
>Synopsis: macs with cuda adb don't boot -current
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: nat
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Dec 25 12:45:01 +0000 2007
>Closed-Date: Mon Nov 20 10:36:55 +0000 2023
>Last-Modified: Mon Nov 20 10:36:55 +0000 2023
>Originator: martin@duskware.de
>Release: NetBSD 4.99.45
>Organization:
>Environment:
System: NetBSD mac-beth.duskware.de 4.99.45 NetBSD 4.99.45 (MAC-BETH) #14: Tue Dec 25 13:12:03 CET 2007 martin@night-porter.duskware.de:/usr/src/sys/arch/mac68k/compile/MAC-BETH mac68k
Architecture: m68k
Machine: mac68k
>Description:
Trying to boot -current hangs in adb_read_date_time().
I suspect it hangs in this loop:
case ADB_HW_IISI:
output[0] = 0x02; /* 2 byte message */
output[1] = 0x01; /* to pram/rtc device */
output[2] = 0x03; /* read date/time */
result = send_adb_IIsi((u_char *)output, (u_char *)output,
(void *)adb_op_comprout, __UNVOLATILE(&flag), (int)0);
if (result != 0) /* exit if not sent */
return -1;
--->>> while (0 == flag) /* wait for result */
;
*curtime = (long)(*(long *)(output + 1));
but either ddb is lying to me or I'm overlooking something, as it says the
hang is a bit earlier (which does not make sense).
>How-To-Repeat:
Just boot -current on a machine with cuda?
>Fix:
The patch below avoids the problem, but there is more to cleanup.
Index: adb_direct.c
===================================================================
RCS file: /cvsroot/src/sys/arch/mac68k/dev/adb_direct.c,v
retrieving revision 1.60
diff -c -u -p -r1.60 adb_direct.c
cvs diff: conflicting specifications of output style
--- adb_direct.c 3 Dec 2007 15:33:52 -0000 1.60
+++ adb_direct.c 25 Dec 2007 12:33:22 -0000
@@ -2720,6 +2720,8 @@ adb_read_date_time(unsigned long *curtim
int result;
volatile int flag = 0;
+ return -1;
+
switch (adbHardware) {
case ADB_HW_II:
return -1;
First using "volatile int flag" and then passing __UNVOLATILE(&flag) around
looks like a poor man's condvar to me. Second the code plays tricks behind
the compilers back:
void
adb_op_comprout(void)
{
__asm("movw #1,%a2@ | update flag value");
}
(this is the code that sets flag to 1)
>Release-Note:
>Audit-Trail:
From: Scott Reynolds <scottr@NetBSD.org>
To: gnats-bugs@NetBSD.org,
Martin Husemann <martin@duskware.de>
Cc:
Subject: Re: port-mac68k/37611: macs with cuda adb don't boot -current
Date: Sun, 30 Mar 2008 21:18:44 -0500
I've looked at this several times with a varying degree of
thoroughness, but haven't come with anything conclusive.
While there are certainly similarities to a condvar or a semaphore,
this is a somewhat unique situation in that the ADB completion code is
shared by both the MRG (ROM) and direct drivers. Further, the flag is
set in an interrupt context, either directly or indirectly, so certain
accommodations must be made.
Simply put, it shouldn't be an issue to pass the address with
__UNVOLATILE since it seems to only be dereferenced in compiled code
within the function where it's declared volatile.
This doesn't answer the question as to why it's broken; there could
certainly be something that I'm missing. I am interested in knowing
which machine you're having trouble with, though.
--scott
From: Martin Husemann <martin@duskware.de>
To: Scott Reynolds <scottr@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-mac68k/37611: macs with cuda adb don't boot -current
Date: Mon, 31 Mar 2008 17:29:18 +0200
On Sun, Mar 30, 2008 at 09:18:44PM -0500, Scott Reynolds wrote:
> This doesn't answer the question as to why it's broken; there could
> certainly be something that I'm missing. I am interested in knowing
> which machine you're having trouble with, though.
I have no idea how ROM interoperates here, so I am probably missing big
parts of the picture.
Could you explain how adb_op_comprout() is supposed to work? AFAICT it changes
a random word on the stack, probably something where previous versions of the
compiler used to store something important. I suppose it should modify
that "something" via a global pointer instead.
Martin
From: Scott Reynolds <scottr@NetBSD.org>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-mac68k/37611: macs with cuda adb don't boot -current
Date: Tue, 1 Apr 2008 00:44:22 -0500
On Mar 31, 2008, at 10:29 AM, Martin Husemann wrote:
> On Sun, Mar 30, 2008 at 09:18:44PM -0500, Scott Reynolds wrote:
>> This doesn't answer the question as to why it's broken; there could
>> certainly be something that I'm missing. I am interested in knowing
>> which machine you're having trouble with, though.
>
> I have no idea how ROM interoperates here, so I am probably missing
> big
> parts of the picture.
>
> Could you explain how adb_op_comprout() is supposed to work? AFAICT
> it changes
> a random word on the stack, probably something where previous
> versions of the
> compiler used to store something important. I suppose it should modify
> that "something" via a global pointer instead.
The completion routine is called at interrupt time with the a2
register set to the completion data buffer. This data buffer is
registered on a per-call basis to the various "send" functions. You
will find that most of the time the completion data buffer is actually
associated with the specific ADB command sent (it's a part of struct
adbCommand); this allows multiple asynchronous commands to be sent,
e.g. one to the keyboard and one to the mouse, which can respond with
completion independently of each other.
There is another case in adb_op_sync() similar to the one you see in
adb_read_date_time(). The adb_op_sync() version times out, though. I
suspect that's what is needed here, especially if the TOD clock is
malfunctioning somehow, or doesn't exist.
From: Martin Husemann <martin@duskware.de>
To: Scott Reynolds <scottr@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-mac68k/37611: macs with cuda adb don't boot -current
Date: Tue, 1 Apr 2008 11:59:04 +0200
On Tue, Apr 01, 2008 at 12:44:22AM -0500, Scott Reynolds wrote:
> The completion routine is called at interrupt time with the a2
> register set to the completion data buffer.
I see the call now. I'll instrument a bit and see where the compData
pointer gets lost.
Is there any good reason why comprout does not use standard C calling
conventions?
Martin
From: Martin Husemann <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/37611 CVS commit: src/sys/arch/mac68k/dev
Date: Tue, 1 Apr 2008 12:02:52 +0000 (UTC)
Module Name: src
Committed By: martin
Date: Tue Apr 1 12:02:52 UTC 2008
Modified Files:
src/sys/arch/mac68k/dev: adb_direct.c
Log Message:
Add timeout to busy loops waiting for ADB command completition.
Turns the hard hang in PR port-mac68k/37611 into a warning.
To generate a diff of this commit:
cvs rdiff -r1.60 -r1.61 src/sys/arch/mac68k/dev/adb_direct.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Martin Husemann <martin@duskware.de>
To: Scott Reynolds <scottr@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: port-mac68k/37611: macs with cuda adb don't boot -current
Date: Tue, 1 Apr 2008 14:13:09 +0200
On Tue, Apr 01, 2008 at 12:44:22AM -0500, Scott Reynolds wrote:
> The completion routine is called at interrupt time with the a2
> register set to the completion data buffer.
See below for why this does not currently work. Minor nit: shouldn't
it do a
movl #1,%a2@
instead? We pass an int pointer, and after the command is done the flag
value is 0x00010000 ;-)
> There is another case in adb_op_sync() similar to the one you see in
> adb_read_date_time(). The adb_op_sync() version times out, though.
Ok, I copied over the timeout code, and now it only warns loud if something
goes wrong, but does not hang the whole kernel any more. Way more conveninent
while debugging the root cause.
I traced where this all fails: we end up in adb_pass_up(), comprout and
compdata are still correct. We then defer to the softint handler - but since
the caller is busy waiting, we have no chance to ever run that handler before
we timeout. Setting adb_polling=1 during this call makes it work early in
boot, but unfortunately later hangs the whole kernel when running ntpdate
(can't even break into ddb) - so I guess that does something very bad to the
adb bus (which I otherwise do not use, I have serial console).
Maybe adb commands should have a per command flag that indicates we are
polling for the result, in addition to the global flag? Or I just did
something wrong when playing with the global flag.
Martin
From: Scott Reynolds <scottr@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/37611 CVS commit: src/sys/arch/mac68k/dev
Date: Thu, 3 Apr 2008 05:03:23 +0000 (UTC)
Module Name: src
Committed By: scottr
Date: Thu Apr 3 05:03:23 UTC 2008
Modified Files:
src/sys/arch/mac68k/dev: adb.c adb_direct.c adbvar.h
Log Message:
Factor out ADB spin-wait timeout loop, and use it for synchronous
operations to access the PRAM RTC, etc. in the IIsi and Cuda cases.
Thanks to Martin Husemann for pointing out the flaw.
This is a slightly more thorough workaround for the issue Martin found
in PR 37611, as it affected more than just adb_read_date_time().
To generate a diff of this commit:
cvs rdiff -r1.51 -r1.52 src/sys/arch/mac68k/dev/adb.c
cvs rdiff -r1.61 -r1.62 src/sys/arch/mac68k/dev/adb_direct.c
cvs rdiff -r1.24 -r1.25 src/sys/arch/mac68k/dev/adbvar.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: port-mac68k-maintainer->nat
Responsible-Changed-By: nat@NetBSD.org
Responsible-Changed-When: Sun, 12 Nov 2023 09:00:44 +0000
Responsible-Changed-Why:
Take.
State-Changed-From-To: open->feedback
State-Changed-By: nat@NetBSD.org
State-Changed-When: Sun, 12 Nov 2023 09:00:44 +0000
State-Changed-Why:
Martin is this ok to close or is it still an issue?
State-Changed-From-To: feedback->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Mon, 20 Nov 2023 10:36:55 +0000
State-Changed-Why:
The commit fixed it, thanks!
(do I now get a price for longes bug in feedback state?)
>Unformatted:
(Contact us)
$NetBSD: gnats-precook-prs,v 1.4 2018/12/21 14:20:20 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.