NetBSD Problem Report #45647

From martin@duskware.de  Thu Nov 24 03:39:00 2011
Return-Path: <martin@duskware.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 30D5D63C3D5
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 24 Nov 2011 03:39:00 +0000 (UTC)
Message-Id: <20111124033900.30D5D63C3D5@www.NetBSD.org>
Date: Thu, 24 Nov 2011 03:39:00 +0000 (UTC)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@gnats.NetBSD.org
Subject: kill from ddb broken
X-Send-Pr-Version: 3.95

>Number:         45647
>Category:       kern
>Synopsis:       kill from ddb broken
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Nov 24 03:40:00 +0000 2011
>Last-Modified:  Fri Dec 02 10:25:01 +0000 2011
>Originator:     Martin Husemann
>Release:        NetBSD 5.99.56
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD night-owl.duskware.de 5.99.56 NetBSD 5.99.56 (NIGHT-OWL) #46: Thu Nov 10 22:27:20 CET 2011 martin@night-owl.duskware.de:/usr/src/sys/arch/amd64/compile/NIGHT-OWL amd64
Architecture: x86_64
Machine: amd64
>Description:

Trying to kill a process from ddb fails:

db{1}> kill 0t507
Skipping crash dump on recursive panic
panic: kernel diagnostic assertion "mutex_owned(proc_lock)" failed: file "../../../../kern/kern_proc.c", line 590 


>How-To-Repeat:

Just try to kill a running process from ddb

>Fix:
n/a

>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/45647: kill from ddb broken
Date: Thu, 24 Nov 2011 18:07:55 +0100

 On Thu, Nov 24, 2011 at 03:40:00AM +0000, martin@NetBSD.org wrote:
 > [...]
 > Trying to kill a process from ddb fails:
 > 
 > db{1}> kill 0t507
 > Skipping crash dump on recursive panic
 > panic: kernel diagnostic assertion "mutex_owned(proc_lock)" failed: file "../../../../kern/kern_proc.c", line 590 

 FWIW that's not new, it's also there in netbsd-5.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/45647: kill from ddb broken
Date: Thu, 01 Dec 2011 15:42:50 +1100

 > Trying to kill a process from ddb fails:
 > 
 > db{1}> kill 0t507
 > Skipping crash dump on recursive panic
 > panic: kernel diagnostic assertion "mutex_owned(proc_lock)" failed: file "../../../../kern/kern_proc.c", line 590 

 this is non trivial to solve.

 simply taking proc_lock (and more!) isn't going to work if some cpu
 was stopped mid-lock of these.

 perhaps a solution would be to have ddb create a thread that will go
 away that will send the signal asap...


 .mrg.

From: Martin Husemann <martin@duskware.de>
To: matthew green <mrg@eterna.com.au>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/45647: kill from ddb broken
Date: Thu, 1 Dec 2011 07:18:35 +0100

 On Thu, Dec 01, 2011 at 03:42:50PM +1100, matthew green wrote:
 > perhaps a solution would be to have ddb create a thread that will go
 > away that will send the signal asap...

 I'm not sure that is worth the trouble - in the case at hand it turned out
 to be stuck in kernel and would have been unkillable anyway.
 I'm also fine with just removing the command, or making it work in "the easy
 case" (if that is detectable) but fail gracefully otherwise.

 Martin

From: Jared McNeill <jmcneill@invisible.ca>
To: matthew green <mrg@eterna.com.au>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
    netbsd-bugs@netbsd.org
Subject: re: kern/45647: kill from ddb broken
Date: Thu, 1 Dec 2011 07:18:43 -0500 (EST)

 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1


 You could probably just mutex_tryenter(&proc_lock) and print a message if 
 it fails.

 On Thu, 1 Dec 2011, matthew green wrote:

 >
 >> Trying to kill a process from ddb fails:
 >>
 >> db{1}> kill 0t507
 >> Skipping crash dump on recursive panic
 >> panic: kernel diagnostic assertion "mutex_owned(proc_lock)" failed: file "../../../../kern/kern_proc.c", line 590
 >
 > this is non trivial to solve.
 >
 > simply taking proc_lock (and more!) isn't going to work if some cpu
 > was stopped mid-lock of these.
 >
 > perhaps a solution would be to have ddb create a thread that will go
 > away that will send the signal asap...
 >
 >
 > .mrg.
 >
 >
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (NetBSD)

 iQEcBAEBAgAGBQJO13CjAAoJEKdMfxFXhnem9X4H/32c6aj8urcvgjuUmVDzgWZN
 WtmhiUafeJqlQ6aoeQpxJrj17tYrTWO5vtQ8/12/2h47kzc1hC3uubGG+RDOY6ci
 kyaaBmJKiZ6tQ74BtOOvs/4tYfqHFq5rS1bQLsF4t+Z/UPUpqvT8+Zgn6iVPVw0c
 1dMXGlugVDJ5kOGtjkBxe1npY7AVmGQmR7ayv8A30ID6Dx3BUGyzX/pMXtkboAGQ
 mEV7tG3AcpvQSTspe6zIzAqXnua3pZxjdb4/B1YOqzr8Z+F+Y0Sd5ZO5SFTwFDSa
 cM2vtOk/LSvTCwmA+ZSEY6WVzhFSabO9M4wSMtjTXdLWzkX7bQwIJWHDkuoKTE4=
 =pnQa
 -----END PGP SIGNATURE-----

From: matthew green <mrg@eterna.com.au>
To: Jared McNeill <jmcneill@invisible.ca>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
    gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: re: kern/45647: kill from ddb broken
Date: Fri, 02 Dec 2011 12:07:27 +1100

 > You could probably just mutex_tryenter(&proc_lock) and print a message if 
 > it fails.

 yeah, and you also need to take p->p_lock ... and i didn't look
 to see if anything else was necessary.

 i'll try that.


 .mrg.

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: jmcneill@invisible.ca
Cc: mrg@eterna.com.au, gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: re: kern/45647: kill from ddb broken
Date: Fri,  2 Dec 2011 03:27:23 +0000 (UTC)

 > You could probably just mutex_tryenter(&proc_lock) and print a message if 
 > it fails.

 i don't think tryenter is expected to handle cases like this.
 and of course, the mutex is not an only problem.

 IMO we should just close this PR.
 features like this has always been "works only if you are lucky enough".

 YAMAMOTO Takashi

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/45647: kill from ddb broken
Date: Fri, 2 Dec 2011 10:16:42 +0100

 On Fri, Dec 02, 2011 at 03:30:04AM +0000, YAMAMOTO Takashi wrote:
 >  IMO we should just close this PR.
 >  features like this has always been "works only if you are lucky enough".

 Fine, but this feature never ever works, so can we remove it?

 Martin

From: matthew green <mrg@eterna.com.au>
To: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
    gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
    jmcneill@invisible.ca
Subject: re: kern/45647: kill from ddb broken
Date: Fri, 02 Dec 2011 21:21:58 +1100

 > > You could probably just mutex_tryenter(&proc_lock) and print a message if 
 > > it fails.
 > 
 > i don't think tryenter is expected to handle cases like this.
 > and of course, the mutex is not an only problem.
 > 
 > IMO we should just close this PR.
 > features like this has always been "works only if you are lucky enough".

 they never work for DIAG kernels now.  it's pretty annoying.  a
 "might work sometimes" fix is much better than simply failing.


 .mrg.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.