NetBSD Problem Report #54922

From kardel@kardel.name  Sun Feb  2 13:30:49 2020
Return-Path: <kardel@kardel.name>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id EE5101A9213
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  2 Feb 2020 13:30:48 +0000 (UTC)
Message-Id: <20200202133044.1FECCDA0D92@pip.kardel.name>
Date: Sun,  2 Feb 2020 14:30:44 +0100 (CET)
From: kardel@netbsg.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: 9.99.45@20200202 panic: diagnostic assertion linux ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
X-Send-Pr-Version: 3.95

>Number:         54922
>Category:       kern
>Synopsis:       linux ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    ad
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Feb 02 16:38:23 +0000 2020
>Closed-Date:    Fri Feb 21 09:04:30 +0000 2020
>Last-Modified:  Fri Feb 21 09:04:30 +0000 2020
>Originator:     Frank Kardel
>Release:        NetBSD 9.99.45
>Organization:

>Environment:


System: NetBSD pip 9.99.45 NetBSD 9.99.45 (PIPGEN) #1: Sun Feb 2 10:30:02 CET 2020 kardel@pip:/src/NetBSD/act/src/obj.amd64/sys/arch/amd64/compile/PIPGEN amd64
Architecture: x86_64
Machine: amd64
>Description:
	When bulk building pkgsrc from 20200202 on -current@20200202 a Linux binary ldconfig
	triggers in sys_exit()/exit1()/radix_tree_remove_node:674 the vpp != NULL assertion.
Stack (hand copied as usual ddb> sync trips over locking against myself (probably kernel_lock):
radix_tree_removenode()
exit1()
sys_exit()
linux_syscall()
Stopped in pid 4056.4056 (ldconfig) ata netbsd:breakpoint() ...

The LID of 4056 looks suspicious to me unless the is the new LID logic, but the would violate
a rustc assumption if I scanned the commit mails correctly.
>How-To-Repeat:
	Bulk build pkgsrc on a kernel from 20200202.
>Fix:
	?

>Release-Note:

>Audit-Trail:
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux
 ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
Date: Tue, 4 Feb 2020 23:36:14 +0000

 It looks like I missed two cases where the LID is being changed on the fly:
 in linux_e_proc_exec() and linux_e_proc_fork().  Wouldn't think compat code
 would be doing something like that, but there it is.  Ugh.

 Andrew

Responsible-Changed-From-To: kern-bug-people->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Wed, 05 Feb 2020 00:04:24 +0000
Responsible-Changed-Why:
Will take a look.


From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, kardel@netbsd.org
Subject: Re: kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux
 ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
Date: Wed, 5 Feb 2020 01:04:57 +0100

 --gKMricLos+KVdGMg
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Tue, Feb 04, 2020 at 11:40:01PM +0000, Andrew Doran wrote:
 > The following reply was made to PR kern/54922; it has been noted by GNATS.
 > 
 > From: Andrew Doran <ad@netbsd.org>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux
 >  ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
 > Date: Tue, 4 Feb 2020 23:36:14 +0000
 > 
 >  It looks like I missed two cases where the LID is being changed on the fly:
 >  in linux_e_proc_exec() and linux_e_proc_fork().  Wouldn't think compat code
 >  would be doing something like that, but there it is.  Ugh.

 I've run into this as well. The attached patch seems to work.

 Joerg

 --gKMricLos+KVdGMg
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="linux-radix-hack.diff"

 diff -r 7bd8a382b8db sys/compat/linux/common/linux_exec.c
 --- a/sys/compat/linux/common/linux_exec.c	Thu Jan 30 21:39:28 2020 +0100
 +++ b/sys/compat/linux/common/linux_exec.c	Wed Feb 05 01:04:22 2020 +0100
 @@ -130,6 +130,10 @@
  	KASSERT(p->p_nlwps == 1);
  	l = LIST_FIRST(&p->p_lwps);
  	mutex_enter(p->p_lock);
 +	if (l->l_lid != p->p_pid) {
 +		radix_tree_remove_node(&p->p_lwptree, (uint64_t)(l->l_lid - 1));
 +		radix_tree_insert_node(&p->p_lwptree, (uint64_t)(p->p_pid - 1), l);
 +	}
  	l->l_lid = p->p_pid;
  	mutex_exit(p->p_lock);
  }
 @@ -152,6 +156,10 @@

  	KASSERT(p2->p_nlwps == 1);
  	l2 = LIST_FIRST(&p2->p_lwps);
 +	if (l2->l_lid != p2->p_pid) {
 +		radix_tree_remove_node(&p2->p_lwptree, (uint64_t)(l2->l_lid - 1));
 +		radix_tree_insert_node(&p2->p_lwptree, (uint64_t)(p2->p_pid - 1), l2);
 +	}
  	l2->l_lid = p2->p_pid;
  	led1 = l1->l_emuldata;
  	led2 = l2->l_emuldata;

 --gKMricLos+KVdGMg--

From: Andrew Doran <ad@netbsd.org>
To: Joerg Sonnenberger <joerg@bec.de>
Cc: gnats-bugs@netbsd.org, kardel@netbsd.org
Subject: Re: kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux
 ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
Date: Sat, 8 Feb 2020 21:09:18 +0000

 Thank you Joerg.  Here is a more complete patch with locking etc., largely
 cut'n'paste.  I can't commit it right now; if someone else wants to, please
 do so.

 	http://www.netbsd.org/~ad/2020/linux.diff

 Andrew

 On Wed, Feb 05, 2020 at 01:04:57AM +0100, Joerg Sonnenberger wrote:

 > On Tue, Feb 04, 2020 at 11:40:01PM +0000, Andrew Doran wrote:
 > > The following reply was made to PR kern/54922; it has been noted by GNATS.
 > > 
 > > From: Andrew Doran <ad@netbsd.org>
 > > To: gnats-bugs@netbsd.org
 > > Cc: 
 > > Subject: Re: kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux
 > >  ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
 > > Date: Tue, 4 Feb 2020 23:36:14 +0000
 > > 
 > >  It looks like I missed two cases where the LID is being changed on the fly:
 > >  in linux_e_proc_exec() and linux_e_proc_fork().  Wouldn't think compat code
 > >  would be doing something like that, but there it is.  Ugh.
 > 
 > I've run into this as well. The attached patch seems to work.
 > 
 > Joerg

 > diff -r 7bd8a382b8db sys/compat/linux/common/linux_exec.c
 > --- a/sys/compat/linux/common/linux_exec.c	Thu Jan 30 21:39:28 2020 +0100
 > +++ b/sys/compat/linux/common/linux_exec.c	Wed Feb 05 01:04:22 2020 +0100
 > @@ -130,6 +130,10 @@
 >  	KASSERT(p->p_nlwps == 1);
 >  	l = LIST_FIRST(&p->p_lwps);
 >  	mutex_enter(p->p_lock);
 > +	if (l->l_lid != p->p_pid) {
 > +		radix_tree_remove_node(&p->p_lwptree, (uint64_t)(l->l_lid - 1));
 > +		radix_tree_insert_node(&p->p_lwptree, (uint64_t)(p->p_pid - 1), l);
 > +	}
 >  	l->l_lid = p->p_pid;
 >  	mutex_exit(p->p_lock);
 >  }
 > @@ -152,6 +156,10 @@
 >  
 >  	KASSERT(p2->p_nlwps == 1);
 >  	l2 = LIST_FIRST(&p2->p_lwps);
 > +	if (l2->l_lid != p2->p_pid) {
 > +		radix_tree_remove_node(&p2->p_lwptree, (uint64_t)(l2->l_lid - 1));
 > +		radix_tree_insert_node(&p2->p_lwptree, (uint64_t)(p2->p_pid - 1), l2);
 > +	}
 >  	l2->l_lid = p2->p_pid;
 >  	led1 = l1->l_emuldata;
 >  	led2 = l2->l_emuldata;

From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54922 CVS commit: src/sys
Date: Sat, 15 Feb 2020 17:13:55 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Sat Feb 15 17:13:55 UTC 2020

 Modified Files:
 	src/sys/compat/linux/common: linux_exec.c
 	src/sys/kern: kern_exec.c kern_lwp.c
 	src/sys/sys: lwp.h

 Log Message:
 PR kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674

 Create an lwp_renumber() from the code in emulexec() and use in
 linux_e_proc_exec() and linux_e_proc_fork() too.


 To generate a diff of this commit:
 cvs rdiff -u -r1.120 -r1.121 src/sys/compat/linux/common/linux_exec.c
 cvs rdiff -u -r1.491 -r1.492 src/sys/kern/kern_exec.c
 cvs rdiff -u -r1.225 -r1.226 src/sys/kern/kern_lwp.c
 cvs rdiff -u -r1.200 -r1.201 src/sys/sys/lwp.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: ad@NetBSD.org
State-Changed-When: Sat, 15 Feb 2020 17:32:53 +0000
State-Changed-Why:
I think this should be fixed now.


From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54922 (linux ldconfig triggers vpp != NULL in
 exit1()->radixtree.c line 674)
Date: Fri, 21 Feb 2020 09:40:52 +0100

 It is ok now.

 Thanks!


 On 02/15/20 18:32, ad@NetBSD.org wrote:
 > Synopsis: linux ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
 >
 > State-Changed-From-To: open->feedback
 > State-Changed-By: ad@NetBSD.org
 > State-Changed-When: Sat, 15 Feb 2020 17:32:53 +0000
 > State-Changed-Why:
 > I think this should be fixed now.
 >
 >
 >

State-Changed-From-To: feedback->closed
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Fri, 21 Feb 2020 09:04:30 +0000
State-Changed-Why:
Confirmed fixed, thanks!


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.