NetBSD Problem Report #54880

From mlelstv@tazz.1st.de  Mon Jan 20 18:32:14 2020
Return-Path: <mlelstv@tazz.1st.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 8BEA37A153
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 20 Jan 2020 18:32:14 +0000 (UTC)
Message-Id: <20200120183150.08B15CCAE7@tazz.1st.de>
Date: Mon, 20 Jan 2020 19:08:08 +0100 (CET)
From: mlelstv@serpens.de
Reply-To: mlelstv@serpens.de
To: gnats-bugs@NetBSD.org
Subject: -current hangs in mountroot
X-Send-Pr-Version: 3.95

>Number:         54880
>Category:       kern
>Synopsis:       -current hangs in mountroot
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 20 18:35:00 +0000 2020
>Closed-Date:    Fri Apr 24 13:56:29 +0000 2020
>Last-Modified:  Fri Apr 24 13:56:29 +0000 2020
>Originator:     Michael van Elst
>Release:        NetBSD 9.99.39
>Organization:

>Environment:


System: NetBSD tazz 9.99.39 NetBSD 9.99.39 (GENERIC) #33: Mon Jan 20 16:34:50 UTC 2020 mlelstv@slowpoke:/scratch2/obj.amd64/scratch/netbsd-current/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:

When booting -current the system stops when trying to mount root.
I've added debug printfs and a call to db_stracktrace().

[  12.4556277] uvm_km_alloc(65536): no VM
[  12.7114329] vm_map_lock_try(0xffffffff81d79820) = false (busy=0x0)
[  12.7228386] uvm_map() at netbsd:uvm_map+0x6b
[  12.7343650] uvm_km_alloc() at netbsd:uvm_km_alloc+0xff
[  12.7463020] pool_grow() at netbsd:pool_grow+0x88
[  12.7585233] pool_get() at netbsd:pool_get+0xa8
[  12.7703830] allocbuf() at netbsd:allocbuf+0xe4
[  12.7822311] getblk() at netbsd:getblk+0x143
[  12.7940378] bio_doread() at netbsd:bio_doread+0x1d
[  12.8060855] bread() at netbsd:bread+0x18
[  12.8178598] lfs_mountfs() at netbsd:lfs_mountfs+0x9c
[  12.8301169] lfs_mountroot() at netbsd:lfs_mountroot+0x6b
[  12.8420919] vfs_mountroot() at netbsd:vfs_mountroot+0xf1
[  12.8540742] main() at netbsd:main+0x4c6


This is caused by vm_map_lock_try() calling rw_tryenter() which is
defective on amd64 without LOCKDEBUG. As a result uvm_km_alloc()
and thus pool_get fails and allocbuf() repeats this infinitely.

rw_tryenter() is implemented as assembler stub, here is the
writer case:

        /*
         * Writer: if the compare-and-set fails, don't bother retrying.
         */
2:      movq    CPUVAR(CURLWP), %rcx
        xorq    %rax, %rax
        orq     $RW_WRITE_LOCKED, %rcx
        LOCK
        cmpxchgq %rcx, (%rdi)
        movl    $0, %eax
        setz    %al

The owner field, addressed by %rdi is atomically compared against zero
and if true overwritten with (curlwp | RW_WRITE_LOCKED).

However, without LOCKDEBUG, the owner field is initialized as RW_NODEBUG,
not zero. The check always fails and the new value is never written. The new
value would also lack the RW_NODEBUG flag.

The rw_enter() stub has the same flaw, but it only handles the first
check, which fails and then continues with the C version rw_vector_enter().
The C code handles the RW_NODEBUG case.


The error was introduced with rwlock.h 1.13, previously RW_NODEBUG was
set to zero when not compiling with LOCKDEBUG.



>How-To-Repeat:

boot -current. The hangup occurs when the buffer pool needs to be grown
and other buffers cannot be freed.

>Fix:


>Release-Note:

>Audit-Trail:
From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54880 CVS commit: src/sys/kern
Date: Mon, 20 Jan 2020 18:48:16 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Mon Jan 20 18:48:16 UTC 2020

 Modified Files:
 	src/sys/kern: kern_rwlock.c

 Log Message:
 PR kern/54880: -current hangs in mountroot

 - Don't set the RW_NODEBUG flag on init, since assembly stubs can't handle it.
 - rw_downgrade(): fix a case where the RW_NODEBUG flag was lost.


 To generate a diff of this commit:
 cvs rdiff -u -r1.61 -r1.62 src/sys/kern/kern_rwlock.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Tue, 21 Apr 2020 17:08:42 +0000
State-Changed-Why:
Is this still an issue?


From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/54880 (-current hangs in mountroot)
Date: Tue, 21 Apr 2020 22:15:31 +0200

 On Tue, Apr 21, 2020 at 05:08:42PM +0000, maya@NetBSD.org wrote:
 > Synopsis: -current hangs in mountroot
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: maya@NetBSD.org
 > State-Changed-When: Tue, 21 Apr 2020 17:08:42 +0000
 > State-Changed-Why:
 > Is this still an issue?

 No.

 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

State-Changed-From-To: feedback->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Fri, 24 Apr 2020 13:56:29 +0000
State-Changed-Why:
Reported fixed. Thanks mlelstv and ad!


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.