NetBSD Problem Report #35553
From mark@mcs.vuw.ac.nz Tue Feb 6 10:35:10 2007
Return-Path: <mark@mcs.vuw.ac.nz>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id 6B6B163BAF4
for <gnats-bugs@gnats.NetBSD.org>; Tue, 6 Feb 2007 10:35:10 +0000 (UTC)
Message-Id: <200702061035.l16AZ7TX013853@turakirae.mcs.vuw.ac.nz>
Date: Tue, 6 Feb 2007 23:35:07 +1300 (NZDT)
From: Mark Davies <mark@mcs.vuw.ac.nz>
Reply-To: mark@mcs.vuw.ac.nz
To: gnats-bugs@NetBSD.org
Subject: azalia hangs an Optiplex 745
X-Send-Pr-Version: 3.95
>Number: 35553
>Category: kern
>Synopsis: uhci and azalia hangs an Optiplex 745
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Feb 06 10:40:00 +0000 2007
>Closed-Date:
>Last-Modified: Tue Aug 06 14:21:37 +0000 2019
>Originator: Mark Davies
>Release: NetBSD 4.99.8
>Organization:
Dept. of Comp. Sci., Victoria Uni. of Wellington, New Zealand.
>Environment:
System: NetBSD lap3.home.vuw.ac.nz 4.99.8 NetBSD 4.99.8 (MCS_LAPTOP) #15: Fri Jan 12 09:16:07 NZDT 2007 mark@lap3.home.vuw.ac.nz:/mnt/SAVE/build.obj/mnt/src/src/sys/arch/i386/compile/MCS_LAPTOP i386
Architecture: i386
Machine: i386
>Description:
Trying to run a current i386 GENERIC kernel on a Dell Optiplex 745,
the system hangs while still initialising. Replacing with a kernel
without azalia support the system boots.
>How-To-Repeat:
Install current on an Optiplex 745, reboot running a GENERIC kernel
and watch it identify and initialise devices finishing with the cd:
[...]
cd0(piixide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
but then hang before it gets to printing:
boot device: wd0
root on wd0a dumps on wd0b
Then reboot running a kernel built from the following config:
include "arch/i386/conf/GENERIC"
no azalia* at pci?
and watch it come all the way up to multiuser.
>Fix:
no idea.
>Release-Note:
>Audit-Trail:
From: "TAMURA Kent" <kent@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Fri, 9 Feb 2007 01:42:32 +0900
Could you tack down what code in azalia stops the machine
by inserting printf()s?
--
TAMURA Kent <kent_2007 at hauN.org> <kent at NetBSD.org>
From: Mark Davies <mark@mcs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Mon, 12 Feb 2007 11:17:02 +1300
On Friday 09 February 2007 05:45, TAMURA Kent wrote:
> Could you tack down what code in azalia stops the machine
> by inserting printf()s?
Well I put a printf at the start of every function in azalia.c and
azalia_codec.c and nothing was printed around the point that the
machine hangs. The only thing I did notice was that the last azalia
related function called (back at device config) was
azalia_query_devinfo() which was called more than a screenful times
in a row - I don't know if that is normal. Don't know what is called
before that as it disappears off the screen too quickly.
cheers
mark
From: "TAMURA Kent" <kent@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Tue, 6 Mar 2007 02:13:55 +0900
> azalia_query_devinfo() which was called more than a screenful times
> in a row - I don't know if that is normal. Don't know what is called
It's normal.
How about removing a call for audio_attach_mi() in azalia_attach_intr()?
--
TAMURA Kent <kent_2007 at hauN.org> <kent at NetBSD.org>
From: Mark Davies <mark@mcs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Wed, 7 Mar 2007 10:11:02 +1300
On Tue, 06 Mar 2007, TAMURA Kent wrote:
> It's normal.
> How about removing a call for audio_attach_mi() in
> azalia_attach_intr()?
I removed that and it still hung so working backwards I removed the
calls to azalia_stream_init() and it still hung. So I removed the
loop calling azalia_codec_init() and it booted successfully.
cheers
mark
From: "TAMURA Kent" <kent@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Sun, 11 Mar 2007 01:45:09 +0900
> I removed that and it still hung so working backwards I removed the
> calls to azalia_stream_init() and it still hung. So I removed the
> loop calling azalia_codec_init() and it booted successfully.
Thanks.
Could you track down the cause in azalia_codec_init()?
Inserting "return -1;" is good to abort safely.
--
TAMURA Kent <kent_2007 at hauN.org> <kent at NetBSD.org>
From: Mark Davies <mark@mcs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Wed, 21 Mar 2007 00:10:03 +1200
On Sunday 11 March 2007, TAMURA Kent wrote:
> > I removed that and it still hung so working backwards I removed
> > the calls to azalia_stream_init() and it still hung. So I
> > removed the loop calling azalia_codec_init() and it booted
> > successfully.
>
> Thanks.
> Could you track down the cause in azalia_codec_init()?
> Inserting "return -1;" is good to abort safely.
Its in azalia_codec_comresp() apparently.
DPRINTF(("%s: information of codec[%d] follows:\n",
XNAME(this->az), addr));
return -1;
/* codec vendor/device/revision */
err = this->comresp(this, CORB_NID_ROOT, CORB_GET_PARAMETER,
COP_REVISION_ID, &rev);
if (err)
return err;
return -1;
With the first "return -1" the system boots, with the second it hangs.
cheers
mark
From: "TAMURA Kent" <kent@NetBSD.org>
To: gnats-bugs@netbsd.org, mark@mcs.vuw.ac.nz
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Tue, 8 May 2007 14:15:41 +0900
Ok, thanks.
The kernel stack seems to be corrupted somewhere.
Could you change FLAGBUFLEN value in sys/dev/pci/azalia.h from 256 to
100, please?
--
TAMURA Kent <kent_2007 at hauN.org> <kent at NetBSD.org>
From: Mark Davies <mark@mcs.vuw.ac.nz>
To: "TAMURA Kent" <kent@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Wed, 9 May 2007 09:51:26 +1200
On Tuesday 08 May 2007, TAMURA Kent wrote:
> The kernel stack seems to be corrupted somewhere.
> Could you change FLAGBUFLEN value in sys/dev/pci/azalia.h from 256
> to 100, please?
Appears to make no difference.
cheers
mark
From: Mark Davies <mark@mcs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Fri, 8 Jun 2007 14:25:10 +1200
On Tue, 08 May 2007, TAMURA Kent wrote:
> The kernel stack seems to be corrupted somewhere.
On poking some more seems that the machine isn't completely hung. If
I insert or remove a usb device the appropriate attach/detach
messages are printed on the console it just doesn't make any progress
in finding the root device and running init.
cheers
mark
From: Mark Davies <mark@mcs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Mon, 6 Aug 2007 15:19:45 +1200
Just to update this, should note that disabling _either_ azalia or
ehci on the 745 gets a system that doesn't hang so its presumably
some interaction between them.
If you disable ehci the azalia reports as:
azalia0 at pci0 dev 27 function 0: Generic High Definition Audio
Controller
azalia0: interrupting at ioapic0 pin 16 (irq 11)
azalia0: host: Intel 82801H High Definition Audio Controller (rev. 2)
azalia0: host: High Definition Audio rev. 1.0
azalia0: codec[0]: Analog Devices AD1983 (rev. 4.0)
azalia0: codec[0]: High Definition Audio rev. 1.0
azalia0: playback: max channels=2, encodings=1<PCM>
azalia0: playback: PCM
formats=e007f<24bit,20bit,16bit,48kHz,44.1kHz,32kHz,22.05
kHz,16kHz,11.025kHz,8kHz>
azalia0: recording: max channels=2, encodings=1<PCM>
azalia0: recording: PCM
formats=e007f<24bit,20bit,16bit,48kHz,44.1kHz,32kHz,22.0
5kHz,16kHz,11.025kHz,8kHz>
audio0 at azalia0: full duplex, independent
and mixerctl gives:
outputs.dac02.source=hdaudio
outputs.green05.source=dac03
outputs.green05.mute=off
outputs.green05=125,125
outputs.black06.source=dac03
outputs.black06.mute=off
outputs.black06=125,125
outputs.black06.boost=on
outputs.unknown07.mute=off
outputs.unknown07=125
inputs.sel0b.source=dac03
inputs.sel0c.source=black08
outputs.sel0c=85,85
inputs.sel0d.source=blue09
inputs.beep10.mute=off
inputs.beep10=119
outputs.sel11.mute=off
outputs.sel11=123,123
outputs.sel12.mute=off
outputs.sel12=123,123
outputs.sel13.mute=off
outputs.sel13=123,123
inputs.sel14.source=sel0c
outputs.sel14.mute=off
outputs.sel14=119,119
playback.mode=03
so looks like azalia_codec.c still needs to know about the particular
controls etc for the AD1983.
cheers
mark
From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Thu, 23 Aug 2007 18:20:30 -0600 (MDT)
On Fri, 8 Jun 2007, Mark Davies wrote:
> On poking some more seems that the machine isn't completely hung. If
> I insert or remove a usb device the appropriate attach/detach
> messages are printed on the console it just doesn't make any progress
> in finding the root device and running init.
I've started poking around with my Optiplex 745, and it's hung waiting
for a pending deferred autoconf function to complete. Then pending count
is 1, so I'd guess it's the last one queued. I'm going to dig into what
that function is and where it hangs up.
--
Michael L. Hitch mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Fri, 24 Aug 2007 07:24:01 -0600 (MDT)
On Thu, 23 Aug 2007, Michael L. Hitch wrote:
> I've started poking around with my Optiplex 745, and it's hung waiting
> for a pending deferred autoconf function to complete. Then pending count
> is 1, so I'd guess it's the last one queued. I'm going to dig into what
> that function is and where it hangs up.
I think I see what's going on with the hang. Each USB controller starts a
kernel process to handle events, and increments the config_pending
semaphore. When the process starts, it calls a discovery routine, and
decrements the semaphore when the discovery returns. The process for
uhci0 never completes the discovery routine because when the azalia
initializes the codec, it somehow interferes with the uhci0 controller,
which halts. If ehci has been disabled, then there is no interference
and all the usb event processes complete their initialization and
everything proceeds normally. (Well, almost normally - I find that my
USB keyboard fails to attach if ehci is disabled. I can usually get it
to attach by plugging it into a different port, although is sometimes
takes several attempts.)
So all these problems seem to come back the the uhci/ehci interaction with
azalia and bge. One thing to note is that all three of these device are
sharing the same interrupt.
--
Michael L. Hitch mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, mark@mcs.vuw.ac.nz
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Mon, 27 Aug 2007 13:55:37 -0600 (MDT)
On Fri, 24 Aug 2007, Michael L. Hitch wrote:
> So all these problems seem to come back the the uhci/ehci interaction with
> azalia and bge. One thing to note is that all three of these device are
> sharing the same interrupt.
And indeed, the interrupt sharing is causing the hang. When azalia0 is
running the codec initialization, interrupts have been enabled and the
intialization is using interrupts. Each time azalia0 interrupts, the
interrupt handlers for azalia0, bge0, and uhci0 are called. Normall this
should't cause any undue problems other than extra processing per
interrupt, but the uhci interrupt handler does not deal intelligently with
the shared interrupt. By the time the azalia codec initialization runs,
something appears to have halted the uhci0 controller, and its status
register contains UHCI_STS_HCH. That status doesn't actually indicate an
interrupt as best as I have been able to determine, but the interrupt
handler looks at it, outputs a message about the controller halted, and
disables access to the controller. After the azalia codec intialization
is complete, all the USB event process start running. Each process has to
complete it's initial discovery task before the autoconfig stuff is
completed, and the process for uhci0 never completes (presumably because
access to uhci0 was disabled by the uhci interrupt handler), and the
system hangs waiting for it to complete.
When the azalia is disabled, the system will come up because all the USB
event tasks complete the initial discovery. However, when bge0
interrupts, it also causes the uhci interrupt handler to run and detects
the uhci0 controller halted status.
I've gotten around this problem by changing the uhci interrupt to not
check for the UHCI_STS_HCH as a valid interrupt bit, and now my system
boots and runs normally.
I'm trying to understand how uhci0 (and uhci1 as well) get halted, but
haven't figured that out yet.
--
Michael L. Hitch mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
State-Changed-From-To: open->feedback
State-Changed-By: jmcneill@NetBSD.org
State-Changed-When: Mon, 07 Sep 2009 17:07:25 +0000
State-Changed-Why:
This should be resolved in NetBSD -current using the new hdaudio(4) driver, can
you give it a try?
From: Mark Davies <mark@ecs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/35553 (azalia hangs an Optiplex 745)
Date: Thu, 7 Jan 2010 12:19:39 +1300
On Tuesday 08 September 2009 05:07:26 you wrote:
> This should be resolved in NetBSD -current using the new hdaudio(4)
> driver, can you give it a try?
Its unlikely that hdaudio fixes this as its actually a uhci and shared
interrupts issue (see the description from Michael L. Hitch towards the end
of this PR).
cheers
mark
From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/35553 (azalia hangs an Optiplex 745)
Date: Wed, 6 Jan 2010 16:14:54 -0700 (MST)
On Wed, 6 Jan 2010, Mark Davies wrote:
> On Tuesday 08 September 2009 05:07:26 you wrote:
> > This should be resolved in NetBSD -current using the new hdaudio(4)
> > driver, can you give it a try?
>
> Its unlikely that hdaudio fixes this as its actually a uhci and shared
> interrupts issue (see the description from Michael L. Hitch towards the end
> of this PR).
Actually, it was the uhci halting as a result of shared interrupts and
the azalia driver doing some autoconfig stuff with interrupts. The
hdaudio driver may not do this, and might not hang on boot.
--
Michael L. Hitch mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Thu, 7 Jan 2010 10:58:58 +0100
This patch also solves part of the problem described in PR 40159.
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Thu, 22 Jul 2010 14:46:45 +0200
(side remark: ignore my previous comment in this ticket, bogus diagnostic)
While it would be preferable to fix the root cause and avoid disabling
the host controller, if we do disable it, we need to signal uhub_explore
and have it abort discovery.
Martin
From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Thu, 22 Jul 2010 10:42:28 -0600 (MDT)
On Thu, 22 Jul 2010, Martin Husemann wrote:
> While it would be preferable to fix the root cause and avoid disabling
> the host controller, if we do disable it, we need to signal uhub_explore
> and have it abort discovery.
I was never able to determine where (or how) the controller was getting
halted.
Someone had posted a message about something very similar, where they
were getting the controller halted, and had proposed a fix to restart
the controller when it detected a halt. I don't think I ever saw a
response to that message, and nothing more was done with it. I was always
going to try to see if that change help on the Optiplex 745, but never
think about it when I'm updating that machine. I don't recall at the
moment who posted that message, or what list it was on. I should still
have it, so I'll try to dig it out.
--
Michael L. Hitch mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Thu, 22 Jul 2010 11:00:42 -0600 (MDT)
On Thu, 22 Jul 2010, Michael L. Hitch wrote:
> Someone had posted a message about something very similar, where they
> were getting the controller halted, and had proposed a fix to restart
> the controller when it detected a halt. I don't think I ever saw a
> response to that message, and nothing more was done with it. I was always
> going to try to see if that change help on the Optiplex 745, but never
> think about it when I'm updating that machine. I don't recall at the
> moment who posted that message, or what list it was on. I should still
> have it, so I'll try to dig it out.
And that would be Manuel Bouyer on tech-kern:
http://mail-index.netbsd.org/tech-kern/2008/06/02/msg001554.html
--
Michael L. Hitch mhitch@montana.edu
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 31 Jan 2011 02:32:23 +0000
State-Changed-Why:
this is really a usb controller problem, it's not fixed, and it should get
fixed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.