NetBSD Problem Report #35553

From mark@mcs.vuw.ac.nz  Tue Feb  6 10:35:10 2007
Return-Path: <mark@mcs.vuw.ac.nz>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 6B6B163BAF4
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  6 Feb 2007 10:35:10 +0000 (UTC)
Message-Id: <200702061035.l16AZ7TX013853@turakirae.mcs.vuw.ac.nz>
Date: Tue, 6 Feb 2007 23:35:07 +1300 (NZDT)
From: Mark Davies <mark@mcs.vuw.ac.nz>
Reply-To: mark@mcs.vuw.ac.nz
To: gnats-bugs@NetBSD.org
Subject: azalia hangs an Optiplex 745
X-Send-Pr-Version: 3.95

>Number:         35553
>Category:       kern
>Synopsis:       uhci and azalia hangs an Optiplex 745
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Feb 06 10:40:00 +0000 2007
>Closed-Date:    
>Last-Modified:  Tue Aug 06 14:21:37 +0000 2019
>Originator:     Mark Davies
>Release:        NetBSD 4.99.8
>Organization:
Dept. of Comp. Sci., Victoria Uni. of Wellington, New Zealand.
>Environment:


System: NetBSD lap3.home.vuw.ac.nz 4.99.8 NetBSD 4.99.8 (MCS_LAPTOP) #15: Fri Jan 12 09:16:07 NZDT 2007 mark@lap3.home.vuw.ac.nz:/mnt/SAVE/build.obj/mnt/src/src/sys/arch/i386/compile/MCS_LAPTOP i386
Architecture: i386
Machine: i386
>Description:
	Trying to run a current i386 GENERIC kernel on a Dell Optiplex 745,
	the system hangs while still initialising. Replacing with a kernel
	without azalia support the system boots.

>How-To-Repeat:
	Install current on an Optiplex 745, reboot running a GENERIC kernel
	and watch it identify and initialise devices finishing with the cd:

	[...]
cd0(piixide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)

	but then hang before it gets to printing:

boot device: wd0
root on wd0a dumps on wd0b


	Then reboot running a kernel built from the following config:

 include "arch/i386/conf/GENERIC"
 no azalia* at pci?

	and watch it come all the way up to multiuser.

>Fix:
	no idea.


>Release-Note:

>Audit-Trail:
From: "TAMURA Kent" <kent@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Fri, 9 Feb 2007 01:42:32 +0900

 Could you tack down what code in azalia stops the machine
 by inserting printf()s?

 -- 
 TAMURA Kent <kent_2007 at hauN.org> <kent at NetBSD.org>

From: Mark Davies <mark@mcs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Mon, 12 Feb 2007 11:17:02 +1300

 On Friday 09 February 2007 05:45, TAMURA Kent wrote:
 >  Could you tack down what code in azalia stops the machine
 >  by inserting printf()s?

 Well I put a printf at the start of every function in azalia.c and 
 azalia_codec.c and nothing was printed around the point that the 
 machine hangs.  The only thing I did notice was that the last azalia 
 related function called (back at device config) was 
 azalia_query_devinfo() which was called more than a screenful times 
 in a row - I don't know if that is normal.  Don't know what is called 
 before that as it disappears off the screen too quickly.

 cheers
 mark

From: "TAMURA Kent" <kent@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Tue, 6 Mar 2007 02:13:55 +0900

 >  azalia_query_devinfo() which was called more than a screenful times
 >  in a row - I don't know if that is normal.  Don't know what is called

 It's normal.
 How about removing a call for audio_attach_mi() in azalia_attach_intr()?

 -- 
 TAMURA Kent <kent_2007 at hauN.org> <kent at NetBSD.org>

From: Mark Davies <mark@mcs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Wed, 7 Mar 2007 10:11:02 +1300

 On Tue, 06 Mar 2007, TAMURA Kent wrote:
 >  It's normal.
 >  How about removing a call for audio_attach_mi() in
 > azalia_attach_intr()?

 I removed that and it still hung so working backwards I removed the 
 calls to azalia_stream_init() and it still hung.  So I removed the 
 loop calling azalia_codec_init() and it booted successfully.

 cheers
 mark

From: "TAMURA Kent" <kent@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Sun, 11 Mar 2007 01:45:09 +0900

 >  I removed that and it still hung so working backwards I removed the
 >  calls to azalia_stream_init() and it still hung.  So I removed the
 >  loop calling azalia_codec_init() and it booted successfully.

 Thanks.
 Could you track down the cause in azalia_codec_init()?
 Inserting "return -1;" is good to abort safely.

 -- 
 TAMURA Kent <kent_2007 at hauN.org> <kent at NetBSD.org>

From: Mark Davies <mark@mcs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Wed, 21 Mar 2007 00:10:03 +1200

 On Sunday 11 March 2007, TAMURA Kent wrote:
 >  >  I removed that and it still hung so working backwards I removed
 >  > the calls to azalia_stream_init() and it still hung.  So I
 >  > removed the loop calling azalia_codec_init() and it booted
 >  > successfully.
 >
 >  Thanks.
 >  Could you track down the cause in azalia_codec_init()?
 >  Inserting "return -1;" is good to abort safely.

 Its in azalia_codec_comresp() apparently.



 	DPRINTF(("%s: information of codec[%d] follows:\n",
 	    XNAME(this->az), addr));
  return -1;
 	/* codec vendor/device/revision */
 	err = this->comresp(this, CORB_NID_ROOT, CORB_GET_PARAMETER,
 	    COP_REVISION_ID, &rev);
 	if (err)
 		return err;
  return -1;


 With the first "return -1" the system boots, with the second it hangs.

 cheers
 mark

From: "TAMURA Kent" <kent@NetBSD.org>
To: gnats-bugs@netbsd.org, mark@mcs.vuw.ac.nz
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Tue, 8 May 2007 14:15:41 +0900

 Ok, thanks.

 The kernel stack seems to be corrupted somewhere.
 Could you change FLAGBUFLEN value in sys/dev/pci/azalia.h from 256 to
 100, please?

 -- 
 TAMURA Kent <kent_2007 at hauN.org> <kent at NetBSD.org>

From: Mark Davies <mark@mcs.vuw.ac.nz>
To: "TAMURA Kent" <kent@netbsd.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Wed, 9 May 2007 09:51:26 +1200

 On Tuesday 08 May 2007, TAMURA Kent wrote:
 > The kernel stack seems to be corrupted somewhere.
 > Could you change FLAGBUFLEN value in sys/dev/pci/azalia.h from 256
 > to 100, please?

 Appears to make no difference.

 cheers
 mark

From: Mark Davies <mark@mcs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Fri, 8 Jun 2007 14:25:10 +1200

 On Tue, 08 May 2007, TAMURA Kent wrote:
 >  The kernel stack seems to be corrupted somewhere.

 On poking some more seems that the machine isn't completely hung.  If 
 I insert or remove a usb device the appropriate attach/detach 
 messages are printed on the console it just doesn't make any progress 
 in finding the root device and running init.

 cheers
 mark

From: Mark Davies <mark@mcs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Mon, 6 Aug 2007 15:19:45 +1200

 Just to update this, should note that disabling _either_ azalia or 
 ehci on the 745 gets a system that doesn't hang so its presumably 
 some interaction between them.

 If you disable ehci the azalia reports as:

 azalia0 at pci0 dev 27 function 0: Generic High Definition Audio 
 Controller
 azalia0: interrupting at ioapic0 pin 16 (irq 11)
 azalia0: host: Intel 82801H High Definition Audio Controller (rev. 2)
 azalia0: host: High Definition Audio rev. 1.0

 azalia0: codec[0]: Analog Devices AD1983 (rev. 4.0)
 azalia0: codec[0]: High Definition Audio rev. 1.0
 azalia0: playback: max channels=2, encodings=1<PCM>
 azalia0: playback: PCM 
 formats=e007f<24bit,20bit,16bit,48kHz,44.1kHz,32kHz,22.05
 kHz,16kHz,11.025kHz,8kHz>
 azalia0: recording: max channels=2, encodings=1<PCM>
 azalia0: recording: PCM 
 formats=e007f<24bit,20bit,16bit,48kHz,44.1kHz,32kHz,22.0
 5kHz,16kHz,11.025kHz,8kHz>
 audio0 at azalia0: full duplex, independent

 and mixerctl gives:

 outputs.dac02.source=hdaudio
 outputs.green05.source=dac03
 outputs.green05.mute=off
 outputs.green05=125,125
 outputs.black06.source=dac03
 outputs.black06.mute=off
 outputs.black06=125,125
 outputs.black06.boost=on
 outputs.unknown07.mute=off
 outputs.unknown07=125
 inputs.sel0b.source=dac03
 inputs.sel0c.source=black08
 outputs.sel0c=85,85
 inputs.sel0d.source=blue09
 inputs.beep10.mute=off
 inputs.beep10=119
 outputs.sel11.mute=off
 outputs.sel11=123,123
 outputs.sel12.mute=off
 outputs.sel12=123,123
 outputs.sel13.mute=off
 outputs.sel13=123,123
 inputs.sel14.source=sel0c
 outputs.sel14.mute=off
 outputs.sel14=119,119
 playback.mode=03


 so looks like azalia_codec.c still needs to know about the particular 
 controls etc for the AD1983.

 cheers
 mark

From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Thu, 23 Aug 2007 18:20:30 -0600 (MDT)

 On Fri, 8 Jun 2007, Mark Davies wrote:

 > On poking some more seems that the machine isn't completely hung.  If
 > I insert or remove a usb device the appropriate attach/detach
 > messages are printed on the console it just doesn't make any progress
 > in finding the root device and running init.

 I've started poking around with my Optiplex 745, and it's hung waiting
 for a pending deferred autoconf function to complete.  Then pending count
 is 1, so I'd guess it's the last one queued.  I'm going to dig into what
 that function is and where it hangs up.

 --
 Michael L. Hitch			mhitch@montana.edu
 Computer Consultant
 Information Technology Center
 Montana State University	Bozeman, MT	USA

From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Fri, 24 Aug 2007 07:24:01 -0600 (MDT)

 On Thu, 23 Aug 2007, Michael L. Hitch wrote:

 > I've started poking around with my Optiplex 745, and it's hung waiting
 > for a pending deferred autoconf function to complete.  Then pending count
 > is 1, so I'd guess it's the last one queued.  I'm going to dig into what
 > that function is and where it hangs up.

 I think I see what's going on with the hang.  Each USB controller starts a
 kernel process to handle events, and increments the config_pending 
 semaphore.  When the process starts, it calls a discovery routine, and
 decrements the semaphore when the discovery returns.  The process for 
 uhci0 never completes the discovery routine because when the azalia 
 initializes the codec, it somehow interferes with the uhci0 controller,
 which halts.  If ehci has been disabled, then there is no interference
 and all the usb event processes complete their initialization and 
 everything proceeds normally.  (Well, almost normally - I find that my
 USB keyboard fails to attach if ehci is disabled.  I can usually get it
 to attach by plugging it into a different port, although is sometimes 
 takes several attempts.)

 So all these problems seem to come back the the uhci/ehci interaction with
 azalia and bge.  One thing to note is that all three of these device are
 sharing the same interrupt.

 --
 Michael L. Hitch			mhitch@montana.edu
 Computer Consultant
 Information Technology Center
 Montana State University	Bozeman, MT	USA

From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, mark@mcs.vuw.ac.nz
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Mon, 27 Aug 2007 13:55:37 -0600 (MDT)

 On Fri, 24 Aug 2007, Michael L. Hitch wrote:

 > So all these problems seem to come back the the uhci/ehci interaction with
 > azalia and bge.  One thing to note is that all three of these device are
 > sharing the same interrupt.

    And indeed, the interrupt sharing is causing the hang.  When azalia0 is 
 running the codec initialization, interrupts have been enabled and the 
 intialization is using interrupts.  Each time azalia0 interrupts, the 
 interrupt handlers for azalia0, bge0, and uhci0 are called.  Normall this 
 should't cause any undue problems other than extra processing per 
 interrupt, but the uhci interrupt handler does not deal intelligently with 
 the shared interrupt.  By the time the azalia codec initialization runs, 
 something appears to have halted the uhci0 controller, and its status 
 register contains UHCI_STS_HCH.  That status doesn't actually indicate an 
 interrupt as best as I have been able to determine, but the interrupt 
 handler looks at it, outputs a message about the controller halted, and 
 disables access to the controller.  After the azalia codec intialization 
 is complete, all the USB event process start running.  Each process has to 
 complete it's initial discovery task before the autoconfig stuff is 
 completed, and the process for uhci0 never completes (presumably because 
 access to uhci0 was disabled by the uhci interrupt handler), and the 
 system hangs waiting for it to complete.

    When the azalia is disabled, the system will come up because all the USB 
 event tasks complete the initial discovery.  However, when bge0 
 interrupts, it also causes the uhci interrupt handler to run and detects 
 the uhci0 controller halted status.

    I've gotten around this problem by changing the uhci interrupt to not
 check for the UHCI_STS_HCH as a valid interrupt bit, and now my system 
 boots and runs normally.

    I'm trying to understand how uhci0 (and uhci1 as well) get halted, but 
 haven't figured that out yet.

 --
 Michael L. Hitch			mhitch@montana.edu
 Computer Consultant
 Information Technology Center
 Montana State University	Bozeman, MT	USA

State-Changed-From-To: open->feedback
State-Changed-By: jmcneill@NetBSD.org
State-Changed-When: Mon, 07 Sep 2009 17:07:25 +0000
State-Changed-Why:
This should be resolved in NetBSD -current using the new hdaudio(4) driver, can
you give it a try?


From: Mark Davies <mark@ecs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/35553 (azalia hangs an Optiplex 745)
Date: Thu, 7 Jan 2010 12:19:39 +1300

 On Tuesday 08 September 2009 05:07:26 you wrote:
 > This should be resolved in NetBSD -current using the new hdaudio(4)
 >  driver, can you give it a try?

 Its unlikely that hdaudio fixes this as its actually a uhci and shared 
 interrupts issue (see the description from Michael L. Hitch towards the end 
 of this PR).

 cheers
 mark

From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/35553 (azalia hangs an Optiplex 745)
Date: Wed, 6 Jan 2010 16:14:54 -0700 (MST)

 On Wed, 6 Jan 2010, Mark Davies wrote:

 > On Tuesday 08 September 2009 05:07:26 you wrote:
 > > This should be resolved in NetBSD -current using the new hdaudio(4)
 > >  driver, can you give it a try?
 >
 > Its unlikely that hdaudio fixes this as its actually a uhci and shared
 > interrupts issue (see the description from Michael L. Hitch towards the end
 > of this PR).

    Actually, it was the uhci halting as a result of shared interrupts and 
 the azalia driver doing some autoconfig stuff with interrupts.  The 
 hdaudio driver may not do this, and might not hang on boot.

 --
 Michael L. Hitch			mhitch@montana.edu
 Computer Consultant
 Information Technology Center
 Montana State University	Bozeman, MT	USA

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Thu, 7 Jan 2010 10:58:58 +0100

 This patch also solves part of the problem described in PR 40159.

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Thu, 22 Jul 2010 14:46:45 +0200

 (side remark: ignore my previous comment in this ticket, bogus diagnostic)

 While it would be preferable to fix the root cause and avoid disabling
 the host controller, if we do disable it, we need to signal uhub_explore
 and have it abort discovery.

 Martin

From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Thu, 22 Jul 2010 10:42:28 -0600 (MDT)

 On Thu, 22 Jul 2010, Martin Husemann wrote:

 > While it would be preferable to fix the root cause and avoid disabling
 > the host controller, if we do disable it, we need to signal uhub_explore
 > and have it abort discovery.

    I was never able to determine where (or how) the controller was getting
 halted.

    Someone had posted a message about something very similar, where they
 were getting the controller halted, and had proposed a fix to restart
 the controller when it detected a halt.  I don't think I ever saw a 
 response to that message, and nothing more was done with it.  I was always 
 going to try to see if that change help on the Optiplex 745, but never 
 think about it when I'm updating that machine.  I don't recall at the 
 moment who posted that message, or what list it was on.  I should still 
 have it, so I'll try to dig it out.

 --
 Michael L. Hitch			mhitch@montana.edu
 Computer Consultant
 Information Technology Center
 Montana State University	Bozeman, MT	USA

From: "Michael L. Hitch" <mhitch@lightning.msu.montana.edu>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/35553: azalia hangs an Optiplex 745
Date: Thu, 22 Jul 2010 11:00:42 -0600 (MDT)

 On Thu, 22 Jul 2010, Michael L. Hitch wrote:

 >    Someone had posted a message about something very similar, where they
 > were getting the controller halted, and had proposed a fix to restart
 > the controller when it detected a halt.  I don't think I ever saw a
 > response to that message, and nothing more was done with it.  I was always
 > going to try to see if that change help on the Optiplex 745, but never
 > think about it when I'm updating that machine.  I don't recall at the
 > moment who posted that message, or what list it was on.  I should still
 > have it, so I'll try to dig it out.

    And that would be Manuel Bouyer on tech-kern: 
 http://mail-index.netbsd.org/tech-kern/2008/06/02/msg001554.html

 --
 Michael L. Hitch			mhitch@montana.edu
 Computer Consultant
 Information Technology Center
 Montana State University	Bozeman, MT	USA

State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 31 Jan 2011 02:32:23 +0000
State-Changed-Why:
this is really a usb controller problem, it's not fixed, and it should get
fixed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.