NetBSD Problem Report #54880
From mlelstv@tazz.1st.de Mon Jan 20 18:32:14 2020
Return-Path: <mlelstv@tazz.1st.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 8BEA37A153
for <gnats-bugs@gnats.NetBSD.org>; Mon, 20 Jan 2020 18:32:14 +0000 (UTC)
Message-Id: <20200120183150.08B15CCAE7@tazz.1st.de>
Date: Mon, 20 Jan 2020 19:08:08 +0100 (CET)
From: mlelstv@serpens.de
Reply-To: mlelstv@serpens.de
To: gnats-bugs@NetBSD.org
Subject: -current hangs in mountroot
X-Send-Pr-Version: 3.95
>Number: 54880
>Category: kern
>Synopsis: -current hangs in mountroot
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jan 20 18:35:00 +0000 2020
>Closed-Date: Fri Apr 24 13:56:29 +0000 2020
>Last-Modified: Fri Apr 24 13:56:29 +0000 2020
>Originator: Michael van Elst
>Release: NetBSD 9.99.39
>Organization:
>Environment:
System: NetBSD tazz 9.99.39 NetBSD 9.99.39 (GENERIC) #33: Mon Jan 20 16:34:50 UTC 2020 mlelstv@slowpoke:/scratch2/obj.amd64/scratch/netbsd-current/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
When booting -current the system stops when trying to mount root.
I've added debug printfs and a call to db_stracktrace().
[ 12.4556277] uvm_km_alloc(65536): no VM
[ 12.7114329] vm_map_lock_try(0xffffffff81d79820) = false (busy=0x0)
[ 12.7228386] uvm_map() at netbsd:uvm_map+0x6b
[ 12.7343650] uvm_km_alloc() at netbsd:uvm_km_alloc+0xff
[ 12.7463020] pool_grow() at netbsd:pool_grow+0x88
[ 12.7585233] pool_get() at netbsd:pool_get+0xa8
[ 12.7703830] allocbuf() at netbsd:allocbuf+0xe4
[ 12.7822311] getblk() at netbsd:getblk+0x143
[ 12.7940378] bio_doread() at netbsd:bio_doread+0x1d
[ 12.8060855] bread() at netbsd:bread+0x18
[ 12.8178598] lfs_mountfs() at netbsd:lfs_mountfs+0x9c
[ 12.8301169] lfs_mountroot() at netbsd:lfs_mountroot+0x6b
[ 12.8420919] vfs_mountroot() at netbsd:vfs_mountroot+0xf1
[ 12.8540742] main() at netbsd:main+0x4c6
This is caused by vm_map_lock_try() calling rw_tryenter() which is
defective on amd64 without LOCKDEBUG. As a result uvm_km_alloc()
and thus pool_get fails and allocbuf() repeats this infinitely.
rw_tryenter() is implemented as assembler stub, here is the
writer case:
/*
* Writer: if the compare-and-set fails, don't bother retrying.
*/
2: movq CPUVAR(CURLWP), %rcx
xorq %rax, %rax
orq $RW_WRITE_LOCKED, %rcx
LOCK
cmpxchgq %rcx, (%rdi)
movl $0, %eax
setz %al
The owner field, addressed by %rdi is atomically compared against zero
and if true overwritten with (curlwp | RW_WRITE_LOCKED).
However, without LOCKDEBUG, the owner field is initialized as RW_NODEBUG,
not zero. The check always fails and the new value is never written. The new
value would also lack the RW_NODEBUG flag.
The rw_enter() stub has the same flaw, but it only handles the first
check, which fails and then continues with the C version rw_vector_enter().
The C code handles the RW_NODEBUG case.
The error was introduced with rwlock.h 1.13, previously RW_NODEBUG was
set to zero when not compiling with LOCKDEBUG.
>How-To-Repeat:
boot -current. The hangup occurs when the buffer pool needs to be grown
and other buffers cannot be freed.
>Fix:
>Release-Note:
>Audit-Trail:
From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/54880 CVS commit: src/sys/kern
Date: Mon, 20 Jan 2020 18:48:16 +0000
Module Name: src
Committed By: ad
Date: Mon Jan 20 18:48:16 UTC 2020
Modified Files:
src/sys/kern: kern_rwlock.c
Log Message:
PR kern/54880: -current hangs in mountroot
- Don't set the RW_NODEBUG flag on init, since assembly stubs can't handle it.
- rw_downgrade(): fix a case where the RW_NODEBUG flag was lost.
To generate a diff of this commit:
cvs rdiff -u -r1.61 -r1.62 src/sys/kern/kern_rwlock.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Tue, 21 Apr 2020 17:08:42 +0000
State-Changed-Why:
Is this still an issue?
From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/54880 (-current hangs in mountroot)
Date: Tue, 21 Apr 2020 22:15:31 +0200
On Tue, Apr 21, 2020 at 05:08:42PM +0000, maya@NetBSD.org wrote:
> Synopsis: -current hangs in mountroot
>
> State-Changed-From-To: open->feedback
> State-Changed-By: maya@NetBSD.org
> State-Changed-When: Tue, 21 Apr 2020 17:08:42 +0000
> State-Changed-Why:
> Is this still an issue?
No.
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
State-Changed-From-To: feedback->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Fri, 24 Apr 2020 13:56:29 +0000
State-Changed-Why:
Reported fixed. Thanks mlelstv and ad!
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.