NetBSD Problem Report #46232

From mm_lists@pulsar-zone.net  Tue Mar 20 17:01:32 2012
Return-Path: <mm_lists@pulsar-zone.net>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id AC4C063E275
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 20 Mar 2012 17:01:31 +0000 (UTC)
Message-Id: <201203201701.q2KH1c9H022293@ginseng.pulsar-zone.net>
Date: Tue, 20 Mar 2012 13:01:37 -0400
From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@gnats.NetBSD.org
Subject: netbsd-6 pad(4) panic: locking against myself

>Number:         46232
>Category:       kern
>Synopsis:       netbsd-6 pad(4) panic: locking against myself
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    mrg
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Mar 20 17:05:01 +0000 2012
>Closed-Date:    Mon Apr 16 06:19:58 +0000 2012
>Last-Modified:  Tue Apr 17 17:05:01 +0000 2012
>Originator:     Matthew Mondor
>Release:        NetBSD 6.0_BETA
>Organization:
>Environment:
System: NetBSD ninja.xisop 6.0_BETA NetBSD 6.0_BETA (GENERIC_MM) #2: Tue Mar 20 12:14:05 EDT 2012 root@ninja.xisop:/usr/obj/sys/arch/amd64/compile/GENERIC_MM amd64
Architecture: x86_64
Machine: amd64
>Description:

I can reliably reproduce a pad(4) panic.

Mutex error: mutex_vector_enter: locking against myself
	wait/spin: 0/0

...
lockdebug_abort()
mutex_vector_enter()
audio_clear()
audio_pint()
pad_read()
spec_read()
VOP_READ()
vn_read()
dofileread()
sys_read()
syscall()

BTW, must I do something special for ddb to do VGA_POST?  The kernel
was built with support for it, but if within X11 I can't see ddb
entering.  I had crash dumps enabled because of that instead, but I
wasn't patient enough to wait for the long dump to finish...

>How-To-Repeat:

Shell 1:

$ audioplay -d /dev/sound1 foo.wav

Shall 2:

$ dd if=/dev/pad0 of=foo.pcm

>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->mrg
Responsible-Changed-By: mrg@NetBSD.org
Responsible-Changed-When: Wed, 21 Mar 2012 00:37:25 +0000
Responsible-Changed-Why:
i'll take a look.


From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46232: netbsd-6 pad(4) panic: locking against myself
Date: Thu, 22 Mar 2012 02:34:26 -0400

 On Tue, 20 Mar 2012 17:05:01 +0000 (UTC)
 Matthew Mondor <mm_lists@pulsar-zone.net> wrote:

 When attempting to reproduce the test today in order to see if the
 ddb.commandonenter=call ddb_vgapost  sysctl worked, I initially failed
 to crash the system, with instead a locked dd process stuck in an
 endless uninterruptible loop (and unkillable even with SIGKILL), and
 stuck audio writer process.

 There were two differences from my other tests in this case:

 - Playing a mono rather than stereo sample
 - Using -d /dev/sound1 rather than
   relinking /dev/sound, /dev/audio, /dev/audioctl to sound1, audio1 and
   audioctl1, respectively.

 With the symlinks pointing to the 1 devices, and playing stereo
 content, I still could reproduce the panic, and fortunately ddb_vgapost
 works.

 Thanks,
 -- 
 Matt

From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46232: netbsd-6 pad(4) panic: locking against myself
Date: Thu, 22 Mar 2012 05:30:36 -0400

 On Tue, 20 Mar 2012 17:05:01 +0000 (UTC)
 Matthew Mondor <mm_lists@pulsar-zone.net> wrote:

 > >Synopsis:       netbsd-6 pad(4) panic: locking against myself

 I had not noticed when posting the PR, but kern/39204 might be related.
 -- 
 Matt

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46232: netbsd-6 pad(4) panic: locking against myself
Date: Mon, 26 Mar 2012 17:21:47 +1100

 this bug is most certainly not related to prior bugs or crashes in
 pad itself, but is an audiomp issue.

 looking at the code is fairly obvious.  pad_read() takes the lock
 thread lock, does a bunch of processing, and then right before it
 calls into the audio interrupt it takes the intr lock.  then we
 traverse down audio_pint()->audio_clear(), and audio clear expects
 to need to take the intr_lock here.

 there's a bunch of bad uses of audio_clear() in audio.c.


 .mrg.

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
    Matthew Mondor <mm_lists@pulsar-zone.net>
Subject: re: kern/46232: netbsd-6 pad(4) panic: locking against myself
Date: Mon, 26 Mar 2012 17:46:33 +1100

 could you please try this patch?  thanks.


 .mrg.


 Index: audio.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/audio.c,v
 retrieving revision 1.258
 diff -p -r1.258 audio.c
 *** audio.c	21 Feb 2012 20:53:34 -0000	1.258
 --- audio.c	26 Mar 2012 06:45:55 -0000
 *************** int	audio_initbufs(struct audio_softc *)
 *** 241,246 ****
 --- 241,247 ----
   void	audio_calcwater(struct audio_softc *);
   int	audio_drain(struct audio_softc *);
   void	audio_clear(struct audio_softc *);
 + void	audio_clear_intr_unlocked(struct audio_softc *sc);
   static inline void audio_pint_silence
   	(struct audio_softc *, struct audio_ringbuffer *, uint8_t *, int);

 *************** audio_clear(struct audio_softc *sc)
 *** 1992,1998 ****

   	KASSERT(mutex_owned(sc->sc_lock));

 - 	mutex_enter(sc->sc_intr_lock);
   	if (sc->sc_rbus) {
   		cv_broadcast(&sc->sc_rchan);
   		sc->hw_if->halt_input(sc->hw_hdl);
 --- 1993,1998 ----
 *************** audio_clear(struct audio_softc *sc)
 *** 2005,2010 ****
 --- 2005,2018 ----
   		sc->sc_pbus = false;
   		sc->sc_pr.pause = false;
   	}
 + }
 + 
 + void
 + audio_clear_intr_unlocked(struct audio_softc *sc)
 + {
 + 
 + 	mutex_enter(sc->sc_intr_lock);
 + 	audio_clear(sc);
   	mutex_exit(sc->sc_intr_lock);
   }

 *************** audiosetinfo(struct audio_softc *sc, str
 *** 3643,3649 ****
   	setmode = 0;
   	if (nr > 0) {
   		if (!cleared) {
 ! 			audio_clear(sc);
   			cleared = true;
   		}
   		modechange = true;
 --- 3651,3657 ----
   	setmode = 0;
   	if (nr > 0) {
   		if (!cleared) {
 ! 			audio_clear_intr_unlocked(sc);
   			cleared = true;
   		}
   		modechange = true;
 *************** audiosetinfo(struct audio_softc *sc, str
 *** 3651,3657 ****
   	}
   	if (np > 0) {
   		if (!cleared) {
 ! 			audio_clear(sc);
   			cleared = true;
   		}
   		modechange = true;
 --- 3659,3665 ----
   	}
   	if (np > 0) {
   		if (!cleared) {
 ! 			audio_clear_intr_unlocked(sc);
   			cleared = true;
   		}
   		modechange = true;
 *************** audiosetinfo(struct audio_softc *sc, str
 *** 3660,3666 ****

   	if (SPECIFIED(ai->mode)) {
   		if (!cleared) {
 ! 			audio_clear(sc);
   			cleared = true;
   		}
   		modechange = true;
 --- 3668,3674 ----

   	if (SPECIFIED(ai->mode)) {
   		if (!cleared) {
 ! 			audio_clear_intr_unlocked(sc);
   			cleared = true;
   		}
   		modechange = true;
 *************** audiosetinfo(struct audio_softc *sc, str
 *** 3754,3760 ****

   	if (SPECIFIED(p->port)) {
   		if (!cleared) {
 ! 			audio_clear(sc);
   			cleared = true;
   		}
   		error = au_set_port(sc, &sc->sc_outports, p->port);
 --- 3762,3768 ----

   	if (SPECIFIED(p->port)) {
   		if (!cleared) {
 ! 			audio_clear_intr_unlocked(sc);
   			cleared = true;
   		}
   		error = au_set_port(sc, &sc->sc_outports, p->port);
 *************** audiosetinfo(struct audio_softc *sc, str
 *** 3763,3769 ****
   	}
   	if (SPECIFIED(r->port)) {
   		if (!cleared) {
 ! 			audio_clear(sc);
   			cleared = true;
   		}
   		error = au_set_port(sc, &sc->sc_inports, r->port);
 --- 3771,3777 ----
   	}
   	if (SPECIFIED(r->port)) {
   		if (!cleared) {
 ! 			audio_clear_intr_unlocked(sc);
   			cleared = true;
   		}
   		error = au_set_port(sc, &sc->sc_inports, r->port);
 *************** audiosetinfo(struct audio_softc *sc, str
 *** 3825,3831 ****
   		/* Block size specified explicitly. */
   		if (ai->blocksize == 0) {
   			if (!cleared) {
 ! 				audio_clear(sc);
   				cleared = true;
   			}
   			sc->sc_blkset = false;
 --- 3833,3839 ----
   		/* Block size specified explicitly. */
   		if (ai->blocksize == 0) {
   			if (!cleared) {
 ! 				audio_clear_intr_unlocked(sc);
   				cleared = true;
   			}
   			sc->sc_blkset = false;
 *************** audiosetinfo(struct audio_softc *sc, str
 *** 3836,3842 ****
   			/* check whether new blocksize changes actually */
   			if (hw->round_blocksize == NULL) {
   				if (!cleared) {
 ! 					audio_clear(sc);
   					cleared = true;
   				}
   				sc->sc_pr.blksize = ai->blocksize;
 --- 3844,3850 ----
   			/* check whether new blocksize changes actually */
   			if (hw->round_blocksize == NULL) {
   				if (!cleared) {
 ! 					audio_clear_intr_unlocked(sc);
   					cleared = true;
   				}
   				sc->sc_pr.blksize = ai->blocksize;
 *************** audiosetinfo(struct audio_softc *sc, str
 *** 3849,3855 ****
   				if (pblksize != sc->sc_pr.blksize ||
   				    rblksize != sc->sc_rr.blksize) {
   					if (!cleared) {
 ! 						audio_clear(sc);
   						cleared = true;
   					}
   					sc->sc_pr.blksize = ai->blocksize;
 --- 3857,3863 ----
   				if (pblksize != sc->sc_pr.blksize ||
   				    rblksize != sc->sc_rr.blksize) {
   					if (!cleared) {
 ! 						audio_clear_intr_unlocked(sc);
   						cleared = true;
   					}
   					sc->sc_pr.blksize = ai->blocksize;

State-Changed-From-To: open->feedback
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Mon, 26 Mar 2012 07:00:56 +0000
State-Changed-Why:
patch sent for testing.


From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46232: netbsd-6 pad(4) panic: locking against myself
Date: Mon, 26 Mar 2012 13:23:21 -0400

 On Mon, 26 Mar 2012 06:50:05 +0000 (UTC)
 matthew green <mrg@eterna.com.au> wrote:

 >  could you please try this patch?  thanks.

 With that patch I didn't experience a panic so far.  However:

 Try #1: killing dd was not possible, it went in some 100% system-time
 busy loop after the recording session once I tried to kill it.  The
 resulting file only had about 1 second of data, the rest of the file
 was zeros.

 Try #2: killing dd was successful.  The file content however was
 irregular, with many large zero sections.  Thus playing back the
 results with the proper parameters, it was correct stereo output, but
 filled with zero gaps, so playing with regular cuts.

 Thanks,
 -- 
 Matt

State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 26 Mar 2012 17:37:17 +0000
State-Changed-Why:
patch improved it, still borked though


From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
    Matthew Mondor <mm_lists@pulsar-zone.net>
Subject: re: kern/46232: netbsd-6 pad(4) panic: locking against myself
Date: Tue, 27 Mar 2012 04:47:48 +1100

 > From: Matthew Mondor <mm_lists@pulsar-zone.net>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/46232: netbsd-6 pad(4) panic: locking against myself
 > Date: Mon, 26 Mar 2012 13:23:21 -0400
 > 
 >  On Mon, 26 Mar 2012 06:50:05 +0000 (UTC)
 >  matthew green <mrg@eterna.com.au> wrote:
 >  
 >  >  could you please try this patch?  thanks.
 >  
 >  With that patch I didn't experience a panic so far.  However:
 >  
 >  Try #1: killing dd was not possible, it went in some 100% system-time
 >  busy loop after the recording session once I tried to kill it.  The
 >  resulting file only had about 1 second of data, the rest of the file
 >  was zeros.
 >  
 >  Try #2: killing dd was successful.  The file content however was
 >  irregular, with many large zero sections.  Thus playing back the
 >  results with the proper parameters, it was correct stereo output, but
 >  filled with zero gaps, so playing with regular cuts.

 yeah - these both are different bugs.  one of them might be the
 same as 39204 (#2).  so there are generic pad(4) bugs but they
 weren't introduced by audiomp it seems.  :-(

 thanks for testing, but i'm not sure what the real problem is for
 either of the remaining issues.


 .mrg.

State-Changed-From-To: open->closed
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Mon, 26 Mar 2012 18:27:13 +0000
State-Changed-Why:
this problem fixed.  the other pad problems should be in other PRs.
seems that PR 39204 covers #2, but the unkillable dd is not covered.


From: "matthew green" <mrg@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/46232 CVS commit: src/sys/dev
Date: Mon, 26 Mar 2012 18:26:11 +0000

 Module Name:	src
 Committed By:	mrg
 Date:		Mon Mar 26 18:26:10 UTC 2012

 Modified Files:
 	src/sys/dev: audio.c

 Log Message:
 deal with PR#46232:  ensure locking around audio_clear() is handled
 consistently and avoid locking against self.


 To generate a diff of this commit:
 cvs rdiff -u -r1.258 -r1.259 src/sys/dev/audio.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46232 (netbsd-6 pad(4) panic: locking against myself)
Date: Mon, 26 Mar 2012 14:39:10 -0400

 On Mon, 26 Mar 2012 18:27:14 +0000 (UTC)
 mrg@NetBSD.org wrote:

 > Synopsis: netbsd-6 pad(4) panic: locking against myself
 > 
 > State-Changed-From-To: open->closed
 > State-Changed-By: mrg@NetBSD.org
 > State-Changed-When: Mon, 26 Mar 2012 18:27:13 +0000
 > State-Changed-Why:
 > this problem fixed.  the other pad problems should be in other PRs.
 > seems that PR 39204 covers #2, but the unkillable dd is not covered.

 Should this be in pending-pullups for netbsd-6 though?

 Thanks,
 -- 
 Matt

State-Changed-From-To: closed->pending-pullups
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Mon, 26 Mar 2012 19:49:52 +0000
State-Changed-Why:
right, awaiting on netbsd-6 pullups.


From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/46232 CVS commit: [netbsd-6] src/sys/dev
Date: Mon, 9 Apr 2012 17:46:59 +0000

 Module Name:	src
 Committed By:	riz
 Date:		Mon Apr  9 17:46:58 UTC 2012

 Modified Files:
 	src/sys/dev [netbsd-6]: audio.c

 Log Message:
 Pull up following revision(s) (requested by mrg in ticket #164):
 	sys/dev/audio.c: revision 1.259
 deal with PR#46232:  ensure locking around audio_clear() is handled
 consistently and avoid locking against self.


 To generate a diff of this commit:
 cvs rdiff -u -r1.257.2.1 -r1.257.2.2 src/sys/dev/audio.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/46232 CVS commit: [netbsd-6] src/sys/dev
Date: Sat, 14 Apr 2012 21:31:38 -0400

 On Mon,  9 Apr 2012 17:50:07 +0000 (UTC)
 "Jeff Rizzo" <riz@netbsd.org> wrote:

 >  Module Name:	src
 >  Committed By:	riz
 >  Date:		Mon Apr  9 17:46:58 UTC 2012
 >  
 >  Modified Files:
 >  	src/sys/dev [netbsd-6]: audio.c
 >  
 >  Log Message:
 >  Pull up following revision(s) (requested by mrg in ticket #164):
 >  	sys/dev/audio.c: revision 1.259
 >  deal with PR#46232:  ensure locking around audio_clear() is handled
 >  consistently and avoid locking against self.

 I confirm that this fix works.  Unfortunately pad(4) still doesn't work
 as well as it did on netbsd-5, but I agree that this is another issue.

 Thanks,
 -- 
 Matt

State-Changed-From-To: pending-pullups->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 16 Apr 2012 06:19:58 +0000
State-Changed-Why:
pullup done.
Can you make sure that the other issues get their own PRs?
(and preferably post a crossreference in here for the record too)


From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46232 (netbsd-6 pad(4) panic: locking against myself)
Date: Mon, 16 Apr 2012 18:03:42 -0400

 On Mon, 16 Apr 2012 06:20:00 +0000 (UTC)
 dholland@NetBSD.org wrote:

 > Synopsis: netbsd-6 pad(4) panic: locking against myself
 > 
 > State-Changed-From-To: pending-pullups->closed
 > State-Changed-By: dholland@NetBSD.org
 > State-Changed-When: Mon, 16 Apr 2012 06:19:58 +0000
 > State-Changed-Why:
 > pullup done.
 > Can you make sure that the other issues get their own PRs?
 > (and preferably post a crossreference in here for the record too)

 I think that kern/39204 touches this other issue.  However, its
 synopsis might be a little confusing as I do get PCM data, albeit
 damaged data...
 -- 
 Matt

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46232 (netbsd-6 pad(4) panic: locking against myself)
Date: Tue, 17 Apr 2012 17:02:47 +0000

 On Mon, Apr 16, 2012 at 10:05:04PM +0000, Matthew Mondor wrote:
  >  > Can you make sure that the other issues get their own PRs?
  >  > (and preferably post a crossreference in here for the record too)
  >  
  >  I think that kern/39204 touches this other issue.  However, its
  >  synopsis might be a little confusing as I do get PCM data, albeit
  >  damaged data...

 Thanks.

 (synopses can be edited)

 -- 
 David A. Holland
 dholland@netbsd.org

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.