NetBSD Problem Report #54802

From martin@aprisoft.de  Fri Dec 27 10:55:18 2019
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id EC6847A167
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 27 Dec 2019 10:55:17 +0000 (UTC)
Message-Id: <20191227105508.0D3C95CC7A2@emmas.aprisoft.de>
Date: Fri, 27 Dec 2019 11:55:08 +0100 (CET)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: gdb -p kills the kernel
X-Send-Pr-Version: 3.95

>Number:         54802
>Category:       kern
>Synopsis:       gdb -p kills the kernel
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kamil
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Dec 27 11:00:00 +0000 2019
>Closed-Date:    Fri Jan 03 02:36:17 +0000 2020
>Last-Modified:  Fri Jan 03 02:36:17 +0000 2020
>Originator:     Martin Husemann
>Release:        NetBSD 9.99.30
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD seven-days-to-the-wolves.aprisoft.de 9.99.30 NetBSD 9.99.30 (GENERIC) #326: Fri Dec 27 08:14:06 CET 2019 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:

Trying to debug a qemu hang (PR 54789) I attached gdb to the (only) qemu 
process running via gdb -p $pid.

This made the kernel crash imediately - but unfortunately the panic is not
100% reproducable.

Unfortunately no crash dump, ddb locked up after I tried to continue.

panic message (from memory): one of the two  KASSERT(l->l_stat != LSZOMB);
in kern_lwp.c failed (lwp_addref or lwp_delref), backtrace through various
ptrace_* functions.

>How-To-Repeat:

Download a recentish sparc64 iso and then:

 qemu-system-sparc64 -prom-env 'auto-boot?=false' -m 128 \
        -drive file=wd1.img,if=ide,index=0,media=disk,snapshot=off,format=raw \
        -drive file=wd0.img,if=ide,index=1,media=disk,snapshot=off,format=raw \
        -cdrom $ISO -nographic
 boot cdrom:f

In another window, do:

  gdb -p $( pgrep qemu )

and be unlucky.


>Fix:
n/a

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->kamil
Responsible-Changed-By: kamil@NetBSD.org
Responsible-Changed-When: Fri, 27 Dec 2019 13:13:24 +0100
Responsible-Changed-Why:
I have got a reproducer. It is a bug in PT_LWPSTATUS/NEXT when iterating over LWPs.
For some reason a zombie LWP is picked and the kernel panics on addref().


State-Changed-From-To: open->closed
State-Changed-By: kamil@NetBSD.org
State-Changed-When: Fri, 03 Jan 2020 03:36:17 +0100
State-Changed-Why:
Fixed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.