NetBSD Problem Report #54922
From kardel@kardel.name Sun Feb 2 13:30:49 2020
Return-Path: <kardel@kardel.name>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id EE5101A9213
for <gnats-bugs@gnats.NetBSD.org>; Sun, 2 Feb 2020 13:30:48 +0000 (UTC)
Message-Id: <20200202133044.1FECCDA0D92@pip.kardel.name>
Date: Sun, 2 Feb 2020 14:30:44 +0100 (CET)
From: kardel@netbsg.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: 9.99.45@20200202 panic: diagnostic assertion linux ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
X-Send-Pr-Version: 3.95
>Number: 54922
>Category: kern
>Synopsis: linux ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: ad
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Feb 02 16:38:23 +0000 2020
>Closed-Date: Fri Feb 21 09:04:30 +0000 2020
>Last-Modified: Fri Feb 21 09:04:30 +0000 2020
>Originator: Frank Kardel
>Release: NetBSD 9.99.45
>Organization:
>Environment:
System: NetBSD pip 9.99.45 NetBSD 9.99.45 (PIPGEN) #1: Sun Feb 2 10:30:02 CET 2020 kardel@pip:/src/NetBSD/act/src/obj.amd64/sys/arch/amd64/compile/PIPGEN amd64
Architecture: x86_64
Machine: amd64
>Description:
When bulk building pkgsrc from 20200202 on -current@20200202 a Linux binary ldconfig
triggers in sys_exit()/exit1()/radix_tree_remove_node:674 the vpp != NULL assertion.
Stack (hand copied as usual ddb> sync trips over locking against myself (probably kernel_lock):
radix_tree_removenode()
exit1()
sys_exit()
linux_syscall()
Stopped in pid 4056.4056 (ldconfig) ata netbsd:breakpoint() ...
The LID of 4056 looks suspicious to me unless the is the new LID logic, but the would violate
a rustc assumption if I scanned the commit mails correctly.
>How-To-Repeat:
Bulk build pkgsrc on a kernel from 20200202.
>Fix:
?
>Release-Note:
>Audit-Trail:
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux
ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
Date: Tue, 4 Feb 2020 23:36:14 +0000
It looks like I missed two cases where the LID is being changed on the fly:
in linux_e_proc_exec() and linux_e_proc_fork(). Wouldn't think compat code
would be doing something like that, but there it is. Ugh.
Andrew
Responsible-Changed-From-To: kern-bug-people->ad
Responsible-Changed-By: ad@NetBSD.org
Responsible-Changed-When: Wed, 05 Feb 2020 00:04:24 +0000
Responsible-Changed-Why:
Will take a look.
From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, kardel@netbsd.org
Subject: Re: kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux
ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
Date: Wed, 5 Feb 2020 01:04:57 +0100
--gKMricLos+KVdGMg
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
On Tue, Feb 04, 2020 at 11:40:01PM +0000, Andrew Doran wrote:
> The following reply was made to PR kern/54922; it has been noted by GNATS.
>
> From: Andrew Doran <ad@netbsd.org>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux
> ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
> Date: Tue, 4 Feb 2020 23:36:14 +0000
>
> It looks like I missed two cases where the LID is being changed on the fly:
> in linux_e_proc_exec() and linux_e_proc_fork(). Wouldn't think compat code
> would be doing something like that, but there it is. Ugh.
I've run into this as well. The attached patch seems to work.
Joerg
--gKMricLos+KVdGMg
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="linux-radix-hack.diff"
diff -r 7bd8a382b8db sys/compat/linux/common/linux_exec.c
--- a/sys/compat/linux/common/linux_exec.c Thu Jan 30 21:39:28 2020 +0100
+++ b/sys/compat/linux/common/linux_exec.c Wed Feb 05 01:04:22 2020 +0100
@@ -130,6 +130,10 @@
KASSERT(p->p_nlwps == 1);
l = LIST_FIRST(&p->p_lwps);
mutex_enter(p->p_lock);
+ if (l->l_lid != p->p_pid) {
+ radix_tree_remove_node(&p->p_lwptree, (uint64_t)(l->l_lid - 1));
+ radix_tree_insert_node(&p->p_lwptree, (uint64_t)(p->p_pid - 1), l);
+ }
l->l_lid = p->p_pid;
mutex_exit(p->p_lock);
}
@@ -152,6 +156,10 @@
KASSERT(p2->p_nlwps == 1);
l2 = LIST_FIRST(&p2->p_lwps);
+ if (l2->l_lid != p2->p_pid) {
+ radix_tree_remove_node(&p2->p_lwptree, (uint64_t)(l2->l_lid - 1));
+ radix_tree_insert_node(&p2->p_lwptree, (uint64_t)(p2->p_pid - 1), l2);
+ }
l2->l_lid = p2->p_pid;
led1 = l1->l_emuldata;
led2 = l2->l_emuldata;
--gKMricLos+KVdGMg--
From: Andrew Doran <ad@netbsd.org>
To: Joerg Sonnenberger <joerg@bec.de>
Cc: gnats-bugs@netbsd.org, kardel@netbsd.org
Subject: Re: kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux
ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
Date: Sat, 8 Feb 2020 21:09:18 +0000
Thank you Joerg. Here is a more complete patch with locking etc., largely
cut'n'paste. I can't commit it right now; if someone else wants to, please
do so.
http://www.netbsd.org/~ad/2020/linux.diff
Andrew
On Wed, Feb 05, 2020 at 01:04:57AM +0100, Joerg Sonnenberger wrote:
> On Tue, Feb 04, 2020 at 11:40:01PM +0000, Andrew Doran wrote:
> > The following reply was made to PR kern/54922; it has been noted by GNATS.
> >
> > From: Andrew Doran <ad@netbsd.org>
> > To: gnats-bugs@netbsd.org
> > Cc:
> > Subject: Re: kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux
> > ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
> > Date: Tue, 4 Feb 2020 23:36:14 +0000
> >
> > It looks like I missed two cases where the LID is being changed on the fly:
> > in linux_e_proc_exec() and linux_e_proc_fork(). Wouldn't think compat code
> > would be doing something like that, but there it is. Ugh.
>
> I've run into this as well. The attached patch seems to work.
>
> Joerg
> diff -r 7bd8a382b8db sys/compat/linux/common/linux_exec.c
> --- a/sys/compat/linux/common/linux_exec.c Thu Jan 30 21:39:28 2020 +0100
> +++ b/sys/compat/linux/common/linux_exec.c Wed Feb 05 01:04:22 2020 +0100
> @@ -130,6 +130,10 @@
> KASSERT(p->p_nlwps == 1);
> l = LIST_FIRST(&p->p_lwps);
> mutex_enter(p->p_lock);
> + if (l->l_lid != p->p_pid) {
> + radix_tree_remove_node(&p->p_lwptree, (uint64_t)(l->l_lid - 1));
> + radix_tree_insert_node(&p->p_lwptree, (uint64_t)(p->p_pid - 1), l);
> + }
> l->l_lid = p->p_pid;
> mutex_exit(p->p_lock);
> }
> @@ -152,6 +156,10 @@
>
> KASSERT(p2->p_nlwps == 1);
> l2 = LIST_FIRST(&p2->p_lwps);
> + if (l2->l_lid != p2->p_pid) {
> + radix_tree_remove_node(&p2->p_lwptree, (uint64_t)(l2->l_lid - 1));
> + radix_tree_insert_node(&p2->p_lwptree, (uint64_t)(p2->p_pid - 1), l2);
> + }
> l2->l_lid = p2->p_pid;
> led1 = l1->l_emuldata;
> led2 = l2->l_emuldata;
From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54922 CVS commit: src/sys
Date: Sat, 15 Feb 2020 17:13:55 +0000
Module Name: src
Committed By: ad
Date: Sat Feb 15 17:13:55 UTC 2020
Modified Files:
src/sys/compat/linux/common: linux_exec.c
src/sys/kern: kern_exec.c kern_lwp.c
src/sys/sys: lwp.h
Log Message:
PR kern/54922: 9.99.45@20200202 panic: diagnostic assertion linux ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
Create an lwp_renumber() from the code in emulexec() and use in
linux_e_proc_exec() and linux_e_proc_fork() too.
To generate a diff of this commit:
cvs rdiff -u -r1.120 -r1.121 src/sys/compat/linux/common/linux_exec.c
cvs rdiff -u -r1.491 -r1.492 src/sys/kern/kern_exec.c
cvs rdiff -u -r1.225 -r1.226 src/sys/kern/kern_lwp.c
cvs rdiff -u -r1.200 -r1.201 src/sys/sys/lwp.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: ad@NetBSD.org
State-Changed-When: Sat, 15 Feb 2020 17:32:53 +0000
State-Changed-Why:
I think this should be fixed now.
From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/54922 (linux ldconfig triggers vpp != NULL in
exit1()->radixtree.c line 674)
Date: Fri, 21 Feb 2020 09:40:52 +0100
It is ok now.
Thanks!
On 02/15/20 18:32, ad@NetBSD.org wrote:
> Synopsis: linux ldconfig triggers vpp != NULL in exit1()->radixtree.c line 674
>
> State-Changed-From-To: open->feedback
> State-Changed-By: ad@NetBSD.org
> State-Changed-When: Sat, 15 Feb 2020 17:32:53 +0000
> State-Changed-Why:
> I think this should be fixed now.
>
>
>
State-Changed-From-To: feedback->closed
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Fri, 21 Feb 2020 09:04:30 +0000
State-Changed-Why:
Confirmed fixed, thanks!
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.