NetBSD Problem Report #56952
From dholland@netbsd.org Wed Aug 3 20:15:53 2022
Return-Path: <dholland@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 60E041A921F
for <gnats-bugs@gnats.NetBSD.org>; Wed, 3 Aug 2022 20:15:53 +0000 (UTC)
Message-Id: <20220803201552.AB7BE84D66@mail.netbsd.org>
Date: Wed, 3 Aug 2022 20:15:52 +0000 (UTC)
From: dholland@NetBSD.org
Reply-To: dholland@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: UVM deadlock in madvise vs. munmap
X-Send-Pr-Version: 3.95
>Number: 56952
>Category: kern
>Synopsis: UVM deadlock in madvise vs. munmap
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Aug 03 20:20:00 +0000 2022
>Closed-Date: Tue May 16 05:35:57 +0000 2023
>Last-Modified: Tue May 16 05:35:57 +0000 2023
>Originator: David A. Holland
>Release: NetBSD 9.99.97 (20220602)
>Organization:
>Environment:
System: NetBSD valkyrie 9.99.97 NetBSD 9.99.97 (VALKYRIE_LOCKDEBUG) #1: Wed Jun 22 23:56:00 EDT 2022 dholland@valkyrie:/usr/src/sys/arch/amd64/compile/VALKYRIE_LOCKDEBUG amd64
Architecture: x86_64
Machine: amd64
>Description:
I have a few times hit a deadlock while running some database stress
tests, and today caught it with UVM_PAGE_TRKOWN enabled.
The dead state is as follows:
Thread 1 is in madvise(MADV_DONTNEED) and is holding a read lock on
the process's map. It is waiting in putpages to chuck one of the pages.
Thread 2 is in uvm_fault_internal; it is holding the page and trying
to get a read lock on the map.
Thread 3 is in munmap; it is waiting for a write lock on the map, and
that converts this into a deadlock.
(This is all in one process.)
Taylor constructed the following narrative for how it got this way
(any transcription errors are my fault):
<Riastradh> Presumably you have an object foo which is mapped at
0xdeadbee000 in the address space
<Riastradh> 1. Someone tried to read from page 0xdeadbef000, say,
which is the range [0x1000, 0x2000) in foo.
<Riastradh> They consulted the map which determined that range in foo.
<Riastradh> They released the map lock, then allocated a page and
punched it into foo, and they want to reacquire the map lock to
punch it into the pmap.
<Riastradh> 2. Someone else tried to madvise(MADV_DONTNEED) some
range, say [0xdeadbee000, 0xdeadbf6000), in foo, and chuck all the
pages.
<Riastradh> Took the map read lock to that 0xdeadbef000 is mapped to
foo@0x1000, entered genfs_io_chuck_all_the_pages or whatever, and
then started waiting for the page that (1) allocated for
foo@0x1000.
<Riastradh> Except I got the order wrong again and this last player
actually started first, but whatever.
<Riastradh> 3. At the same time, someone else tried to unmap
0xdeadbef000, which requires taking a _write_ lock.
<Riastradh> which threw a wrench in the whole thing
<Riastradh> So, one obvious possibility is: make uvm_map_clean drop
the map lock while doing genfs_io_chuck_all_the_pages.
<Riastradh> (pgo_put)
>How-To-Repeat:
>Fix:
Oof.
>Release-Note:
>Audit-Trail:
From: "Chuck Silvers" <chs@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56952 CVS commit: src/sys/uvm
Date: Mon, 15 May 2023 01:42:42 +0000
Module Name: src
Committed By: chs
Date: Mon May 15 01:42:42 UTC 2023
Modified Files:
src/sys/uvm: uvm_map.c
Log Message:
uvm: avoid a deadlock in uvm_map_clean()
The locking order between map locks and page "busy" locks
is that the page "busy" lock comes first, but uvm_map_clean()
breaks this rule by holding a map locked (as reader) while
waiting for page "busy" locks.
If another thread is in the page-fault path holding a page
"busy" lock while waiting for the map lock (as a reader)
and at the same time a third thread is blocked waiting for
the map lock as a writer (which blocks the page-fault thread),
then these three threads will all deadlock with each other.
Fix this by marking the map "busy" (to block any modifications)
and unlocking the map lock before possibly waiting for any
page "busy" locks.
Martin Pieuchot reported that the same problem existed in OpenBSD
he applied this fix there after several people tested it.
fixes PR 56952
To generate a diff of this commit:
cvs rdiff -u -r1.405 -r1.406 src/sys/uvm/uvm_map.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56952 CVS commit: [netbsd-10] src/sys/uvm
Date: Mon, 15 May 2023 10:32:53 +0000
Module Name: src
Committed By: martin
Date: Mon May 15 10:32:53 UTC 2023
Modified Files:
src/sys/uvm [netbsd-10]: uvm_map.c
Log Message:
Pull up following revision(s) (requested by chs in ticket #167):
sys/uvm/uvm_map.c: revision 1.406
uvm: avoid a deadlock in uvm_map_clean()
The locking order between map locks and page "busy" locks
is that the page "busy" lock comes first, but uvm_map_clean()
breaks this rule by holding a map locked (as reader) while
waiting for page "busy" locks.
If another thread is in the page-fault path holding a page
"busy" lock while waiting for the map lock (as a reader)
and at the same time a third thread is blocked waiting for
the map lock as a writer (which blocks the page-fault thread),
then these three threads will all deadlock with each other.
Fix this by marking the map "busy" (to block any modifications)
and unlocking the map lock before possibly waiting for any
page "busy" locks.
Martin Pieuchot reported that the same problem existed in OpenBSD
he applied this fix there after several people tested it.
fixes PR 56952
To generate a diff of this commit:
cvs rdiff -u -r1.403 -r1.403.2.1 src/sys/uvm/uvm_map.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Tue, 16 May 2023 05:35:57 +0000
State-Changed-Why:
fixed and pulled up
thanks!
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.