NetBSD Problem Report #56264

From www@netbsd.org  Mon Jun 21 09:55:34 2021
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 6556E1A921F
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 21 Jun 2021 09:55:34 +0000 (UTC)
Message-Id: <20210621095532.F1A201A923D@mollari.NetBSD.org>
Date: Mon, 21 Jun 2021 09:55:32 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: diagnostic assertion "l->l_stat == LSONPROC" failed on RPI3
X-Send-Pr-Version: www-1.0

>Number:         56264
>Category:       port-arm
>Synopsis:       diagnostic assertion "l->l_stat == LSONPROC" failed on RPI3
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-arm-maintainer
>State:          analyzed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jun 21 10:00:00 +0000 2021
>Closed-Date:    
>Last-Modified:  Sun Sep 19 04:39:53 +0000 2021
>Originator:     Rin Okuyama
>Release:        9.99.85
>Organization:
Department of Physics, Meiji University
>Environment:
GENERIC64 kernel for NetBSD/aarch64{,eb} 9.99.85 as of 2021-06-21
GENERIC and RPI2 kernels for 32-bit arm are also potentially affected.
>Description:
GENERIC64 kernel for aarch64{,eb} fails to boot on RPI3 due to

----
panic: kernel diagnostic assertion "l->l_stat == LSONPROC" failed: file "../../../../kern/kern_sleepq.c", line 227
----

Backtrace is

----
[   1.6097589] cpu1: Begin traceback...
[   1.6097589] trace fp ffffc0006eba79b0
[   1.6197592] fp ffffc0006eba79e0 vpanic() at ffffc0000052cacc netbsd:vpanic+0x14c
[   1.6197592] fp ffffc0006eba7a40 kern_assert() at ffffc0000078a6c8 netbsd:kern_assert+0x58
[   1.6297644] fp ffffc0006eba7ad0 sleepq_enqueue() at ffffc000004f5724 netbsd:sleepq_enqueue+0x174
[   1.6397642] fp ffffc0006eba7b10 cv_enter() at ffffc000004bbda0 netbsd:cv_enter+0xf0
[   1.6497665] fp ffffc0006eba7b50 cv_wait() at ffffc000004bbfc8 netbsd:cv_wait+0x38
[   1.6597679] fp ffffc0006eba7b80 xc_wait() at ffffc00000536490 netbsd:xc_wait+0xb0
[   1.6697755] fp ffffc0006eba7bc0 percpu_backend_alloc() at ffffc000005237f4 netbsd:percpu_backend_alloc+0x154
[   1.6797713] fp ffffc0006eba7c40 vmem_xalloc() at ffffc000005345b8 netbsd:vmem_xalloc+0x578
[   1.6897752] fp ffffc0006eba7d10 vmem_alloc() at ffffc00000534ac4 netbsd:vmem_alloc+0x84
[   1.6997755] fp ffffc0006eba7d70 percpu_create() at ffffc00000523cc0 netbsd:percpu_create+0x40
[   1.7097788] fp ffffc0006eba7e00 pic_add() at ffffc0000000314c netbsd:pic_add+0xfc
[   1.7097788] fp ffffc0006eba7e40 bcm2836mp_intr_init() at ffffc0000001adb0 netbsd:bcm2836mp_intr_init+0x90
[   1.7297831] fp ffffc0006eba7e90 arm_fdt_cpu_hatch() at ffffc00000068ef8 netbsd:arm_fdt_cpu_hatch+0x24
[   1.7397826] fp ffffc0006eba7eb0 cpu_hatch() at ffffc0000009a0dc netbsd:cpu_hatch+0xbc
[   1.7397826] fp 0000000000000000 cpu_mpstart() at ffffc00000001a88 netbsd:cpu_mpstart+0x19c
[   1.7497887] cpu1: End traceback...
----

Although this failure was reported as kern/55889

----
http://gnats.netbsd.org/55889
----

and fixed for a moment, it starts to fail again with this commit:

----
http://www.nerv.org/netbsd/changeset.cgi?id=20210609T232251Z.095141ac330a067f4506a48b81678d3ed2bf99f9#src/sys/dev/dev_verbose.h
----

If dev_verbose.h is reverted to rev 1.5, the kernel boots multiuser.
However, still, I don't think that dev_verbose.h is guilty.

GENERIC kernel for evbarmv7hf{,eb} boots just fine on RPI3, but if this
KASSERT is inserted to bcm2836mp_intr_init():

----
Index: sys/arch/arm/broadcom/bcm2835_intr.c
===================================================================
RCS file: /home/netbsd/src/sys/arch/arm/broadcom/bcm2835_intr.c,v
retrieving revision 1.38
diff -p -u -r1.38 bcm2835_intr.c
--- sys/arch/arm/broadcom/bcm2835_intr.c	8 Mar 2021 14:22:42 -0000	1.38
+++ sys/arch/arm/broadcom/bcm2835_intr.c	21 Jun 2021 09:23:06 -0000
@@ -867,6 +867,8 @@ bcm2836mp_intr_init(void *priv, struct c
 	const cpuid_t cpuid = ci->ci_core_id;
 	struct pic_softc * const pic = &bcm2836mp_pic[cpuid];

+	KASSERT(curlwp->l_stat == LSONPROC);
+
 	KASSERT(cpuid < BCM2836_NCPUS);

 #if defined(MULTIPROCESSOR)
----

it fires also for 32-bit kernel. So, the reason why 32-bit kernel works
and 64-bit kernel does not is just because percpu_create() *happens* not
to be blocked. This may also be reason why old versions of dev_verbose.h
happen to work, I guess.

Therefore, I think that pic_add() should not be called from
arm_fdt_cpu_hatch() via bcm2836mp_intr_init(). But, I'm not sure where
it should be...
>How-To-Repeat:
Boot GENERIC64 kernel (both for aarch64{,eb}) on RPI3.
>Fix:
N/A

>Release-Note:

>Audit-Trail:
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56264 CVS commit: src/sys/arch/arm/broadcom
Date: Wed, 1 Sep 2021 03:08:08 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Wed Sep  1 03:08:08 UTC 2021

 Modified Files:
 	src/sys/arch/arm/broadcom: bcm2835_intr.c

 Log Message:
 PR port-arm/56264

 Register all PICs when bcmicu1 is attached, in order to avoid calling
 pic_add() from cpu_hatch(), which blocks for aarch64 kernel on RPI3.
 This prevented MP kernel to boot due to KASSERT failure as described
 in the PR.

 This is a kind of a workaround; the real fix should be to

 (a) reorganize cpu_hatch() for aarch64 and arm:
 http://mail-index.netbsd.org/port-arm/2021/06/21/msg007320.html

 (b) or change MI abstraction of ``MP ready'':
 http://mail-index.netbsd.org/port-arm/2021/06/22/msg007327.html

 However, still, this fix does not bring about any penalty, and it is
 not good to leave RPI3 broken for months...

 Tested on RPI3 (aarch64 MP, armv7hf MP) as well as RPI1 (armv6hf UP).


 To generate a diff of this commit:
 cvs rdiff -u -r1.38 -r1.39 src/sys/arch/arm/broadcom/bcm2835_intr.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->analyzed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Sun, 19 Sep 2021 04:39:53 +0000
State-Changed-Why:
Workaround committed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.