NetBSD Problem Report #55895

From gson@gson.org  Sun Dec 27 12:01:13 2020
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id C61901A921F
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 27 Dec 2020 12:01:13 +0000 (UTC)
Message-Id: <20201227120108.99507253EDE@guava.gson.org>
Date: Sun, 27 Dec 2020 14:01:08 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: panic: kernel diagnostic assertion "(flags & (PR_NOWAIT|PR_LIMITFAIL)) != 0" failed
X-Send-Pr-Version: 3.95

>Number:         55895
>Category:       port-sparc
>Synopsis:       panic: kernel diagnostic assertion "(flags & (PR_NOWAIT|PR_LIMITFAIL)) != 0" failed
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    chs
>State:          feedback
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Dec 27 12:05:00 +0000 2020
>Closed-Date:    
>Last-Modified:  Mon Jan 11 06:17:53 +0000 2021
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current
>Organization:
>Environment:
System: NetBSD
Architecture: sparc
Machine: sparc
>Description:

The TNF sparc testbed has recorded six random panics with the panic
message in the subject line in the last few months.

Here is a link to the log and the panic message for each one:

  http://releng.netbsd.org/b5reports/sparc/commits-2020.08.html#2020.08.10.11.09.15
  detect_unused_tests: [ 50699.6707740] panic: kernel diagnostic assertion "(flags & (PR_NOWAIT|PR_LIMITFAIL)) != 0" failed: file "/tmp/bracket/build/2020.08.10.11.09.15-sparc/src/sys/kern/subr_pool.c", line 1181 

  http://releng.netbsd.org/b5reports/sparc/commits-2020.09.html#2020.09.12.12.11.19
  grow_16M_v1_4096: [ 13917.1767735] panic: kernel diagnostic assertion "(flags & (PR_NOWAIT|PR_LIMITFAIL)) != 0" failed: file "/tmp/build/2020.09.12.12.11.19-sparc/src/sys/kern/subr_pool.c", line 1181 

  http://releng.netbsd.org/b5reports/sparc/commits-2020.09.html#2020.09.13.13.03.15
  ldp_regen: [ 26053.9547610] panic: kernel diagnostic assertion "(flags & (PR_NOWAIT|PR_LIMITFAIL)) != 0" failed: file "/tmp/bracket/build/2020.09.13.13.03.15-sparc/src/sys/kern/subr_pool.c", line 1181

  http://releng.netbsd.org/b5reports/sparc/commits-2020.10.html#2020.10.01.02.00.04
  [  18.0342095] panic: kernel diagnostic assertion "(flags & (PR_NOWAIT|PR_LIMITFAIL)) != 0" failed: file "/tmp/build/2020.10.01.02.00.04-sparc/src/sys/kern/subr_pool.c", line 1181 

  http://releng.netbsd.org/b5reports/sparc/commits-2020.11.html#2020.11.05.00.41.04
  crossping: [ 27861.1173080] panic: kernel diagnostic assertion "(flags & (PR_NOWAIT|PR_LIMITFAIL)) != 0" failed: file "/tmp/build/2020.11.05.00.41.04-sparc/src/sys/kern/subr_pool.c", line 1181 

  http://releng.netbsd.org/b5reports/sparc/commits-2020.12.html#2020.12.26.22.28.35
  ipsec_tunnel_ipv4_ah_hmacripemd160: [ 10851.7832570] panic: kernel diagnostic assertion "(flags & (PR_NOWAIT|PR_LIMITFAIL)) != 0" failed: file "/tmp/build/2020.12.26.22.28.35-sparc/src/sys/kern/subr_pool.c", line 1181 

I'm filing this as category "kern" rather than "port-sparc" because I
suspect it's an MI issue that just happens to hit the sparc testbed
because it has less (emulated) RAM than most.  A crash with the same
panic message has also been reported on evbearm6:

  https://mail-index.netbsd.org/current-users/2018/11/04/msg034522.html

and analyzed:

  https://mail-index.netbsd.org/current-users/2018/11/04/msg034523.html

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->chs
Responsible-Changed-By: chs@NetBSD.org
Responsible-Changed-When: Thu, 07 Jan 2021 13:04:42 +0000
Responsible-Changed-Why:
I'll fix it


From: Chuck Silvers <chuq@chuq.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
	gnats-admin@netbsd.org, Andreas Gustafsson <gson@gson.org>
Subject: Re: port-sparc/55895 (panic: kernel diagnostic assertion "(flags &
 (PR_NOWAIT|PR_LIMITFAIL)) != 0" failed)
Date: Thu, 7 Jan 2021 10:54:33 -0800

 all of these assertion failures have this stack trace:

     ipsec_tunnel_ipv4_ah_hmacripemd160: [ 10851.7832570] panic: kernel diagnostic assertion "(flags & (PR_NOWAIT|PR_LIMITFAIL)) != 0" failed: file "/tmp/build/2020.12.26.22.28.35-sparc/src/sys/kern/subr_pool.c", line 1181 
 [ 10851.7955840] cpu0: Begin traceback...
 [ 10851.7955840] 0x0(0xf0474818, 0xf584e920, 0xf0577c00, 0xf0578c00, 0x104, 0xf0578b50) at netbsd:kern_assert+0x38
 [ 10851.7955840] kern_assert(0xf0474818, 0xf0474808, 0xf04c5438, 0xf04c4c58, 0x49d, 0x0) at netbsd:pool_get+0x818
 [ 10851.7955840] pool_get(0xf055d108, 0x1, 0xf04c4c58, 0xf0474808, 0xf055d180, 0x0) at netbsd:pmap_pmap_pool_ctor+0xc4
 [ 10851.7955840] pmap_pmap_pool_ctor(0x0, 0xf0a72220, 0x1, 0x0, 0xf055cd38, 0xf0a72000) at netbsd:pool_cache_get_slow+0x18c
 [ 10851.7955840] pool_cache_get_slow(0xf055ccc0, 0xf055cec0, 0xf0a72220, 0xf584ea7c, 0x0, 0x1) at netbsd:pool_cache_get_paddr+0x14c
 [ 10851.7955840] pool_cache_get_paddr(0xf055ccc0, 0x1, 0x0, 0x0, 0xf055cec0, 0xf055ccc0) at netbsd:pmap_create+0x10
 [ 10851.7955840] pmap_create(0xf0d68a5c, 0x3, 0x0, 0xf029115c, 0x0, 0xf0d68ad0) at netbsd:uvmspace_init+0x5c
 [ 10851.8042170] uvmspace_init(0xf0d68a50, 0x0, 0x1000, 0xf0000000, 0x1, 0xf05730c0) at netbsd:uvmspace_alloc+0x28
 [ 10851.8042170] uvmspace_alloc(0xf0d68a50, 0xf0000000, 0x1, 0xf584ec04, 0x1000, 0xf584d000) at netbsd:uvmspace_exec+0x54
 [ 10851.8042170] uvmspace_exec(0xf0d4b180, 0x1000, 0xf0000000, 0x1, 0x0, 0xf0d68108) at netbsd:execve_runproc+0x838
 [ 10851.8042170] execve_runproc(0xf0d4b180, 0xf584ecf8, 0x0, 0x0, 0xf0d4b180, 0xf0d9ba90) at netbsd:execve1+0x44
 [ 10851.8042170] execve1(0xf0d4b180, 0x1, 0xeffff350, 0xffffffff, 0xeffff050, 0xeffff99c) at netbsd:sys_execve+0x24
 [ 10851.8042170] sys_execve(0xf0d4b180, 0xf584ef30, 0xf584ef28, 0xeffff350, 0x0, 0x8573836) at netbsd:syscall+0xe0
 [ 10851.8042170] syscall(0xc3b, 0xf584efb0, 0xedc5e124, 0x3b, 0x3, 0xf0d4b180) at netbsd:memfault_sun4m+0x3f8
 [ 10851.8042170] cpu0: End traceback...

 here is the pool_get() call:

 			upt = pool_get(&L1_pool, flags);

 and here is the ctor for L1_pool:

 void *
 pgt_page_alloc(struct pool *pp, int flags)
 {
 	int cacheit = (CACHEINFO.c_flags & CACHE_PAGETABLES) != 0;
 	struct vm_page *pg;
 	vaddr_t va;
 	paddr_t pa;

 	/* Allocate a page of physical memory */
 	if ((pg = uvm_pagealloc(NULL, 0, NULL, 0)) == NULL)
 		return (NULL);

 ...
 }


 the problem is that the ctor does not retry the page allocation if
 uvm_pagealloc() fails but (flags & PR_WAITOK).

From: "Chuck Silvers" <chs@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55895 CVS commit: src/sys/arch/sparc/sparc
Date: Mon, 11 Jan 2021 06:12:43 +0000

 Module Name:	src
 Committed By:	chs
 Date:		Mon Jan 11 06:12:43 UTC 2021

 Modified Files:
 	src/sys/arch/sparc/sparc: pmap.c

 Log Message:
 in pgt_page_alloc(), wait and retry the page allocation if PR_WAITOK.
 fixes PR 55895.


 To generate a diff of this commit:
 cvs rdiff -u -r1.369 -r1.370 src/sys/arch/sparc/sparc/pmap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: chs@NetBSD.org
State-Changed-When: Mon, 11 Jan 2021 06:17:53 +0000
State-Changed-Why:
does the patch I committed fix the problem for you?
(I understand that because the bug only triggers quite infrequently,
it may take several months before it's clear whether it's really gone.)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.