NetBSD Problem Report #56952

From dholland@netbsd.org  Wed Aug  3 20:15:53 2022
Return-Path: <dholland@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 60E041A921F
	for <gnats-bugs@gnats.NetBSD.org>; Wed,  3 Aug 2022 20:15:53 +0000 (UTC)
Message-Id: <20220803201552.AB7BE84D66@mail.netbsd.org>
Date: Wed,  3 Aug 2022 20:15:52 +0000 (UTC)
From: dholland@NetBSD.org
Reply-To: dholland@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: UVM deadlock in madvise vs. munmap
X-Send-Pr-Version: 3.95

>Number:         56952
>Category:       kern
>Synopsis:       UVM deadlock in madvise vs. munmap
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Aug 03 20:20:00 +0000 2022
>Closed-Date:    Tue May 16 05:35:57 +0000 2023
>Last-Modified:  Tue May 16 05:35:57 +0000 2023
>Originator:     David A. Holland
>Release:        NetBSD 9.99.97 (20220602)
>Organization:
>Environment:
System: NetBSD valkyrie 9.99.97 NetBSD 9.99.97 (VALKYRIE_LOCKDEBUG) #1: Wed Jun 22 23:56:00 EDT 2022  dholland@valkyrie:/usr/src/sys/arch/amd64/compile/VALKYRIE_LOCKDEBUG amd64
Architecture: x86_64
Machine: amd64
>Description:

I have a few times hit a deadlock while running some database stress
tests, and today caught it with UVM_PAGE_TRKOWN enabled.

The dead state is as follows:

Thread 1 is in madvise(MADV_DONTNEED) and is holding a read lock on
the process's map. It is waiting in putpages to chuck one of the pages.

Thread 2 is in uvm_fault_internal; it is holding the page and trying
to get a read lock on the map.

Thread 3 is in munmap; it is waiting for a write lock on the map, and
that converts this into a deadlock.

(This is all in one process.)

Taylor constructed the following narrative for how it got this way
(any transcription errors are my fault):

<Riastradh> Presumably you have an object foo which is mapped at
   0xdeadbee000 in the address space
<Riastradh> 1. Someone tried to read from page 0xdeadbef000, say,
   which is the range [0x1000, 0x2000) in foo.
<Riastradh> They consulted the map which determined that range in foo.
<Riastradh> They released the map lock, then allocated a page and
   punched it into foo, and they want to reacquire the map lock to
   punch it into the pmap.
<Riastradh> 2. Someone else tried to madvise(MADV_DONTNEED) some
   range, say [0xdeadbee000, 0xdeadbf6000), in foo, and chuck all the
   pages.
<Riastradh> Took the map read lock to that 0xdeadbef000 is mapped to
   foo@0x1000, entered genfs_io_chuck_all_the_pages or whatever, and
   then started waiting for the page that (1) allocated for
   foo@0x1000.
<Riastradh> Except I got the order wrong again and this last player
   actually started first, but whatever.
<Riastradh> 3. At the same time, someone else tried to unmap
   0xdeadbef000, which requires taking a _write_ lock.
<Riastradh> which threw a wrench in the whole thing
<Riastradh> So, one obvious possibility is: make uvm_map_clean drop
   the map lock while doing genfs_io_chuck_all_the_pages.
<Riastradh> (pgo_put)


>How-To-Repeat:

>Fix:

Oof.

>Release-Note:

>Audit-Trail:
From: "Chuck Silvers" <chs@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56952 CVS commit: src/sys/uvm
Date: Mon, 15 May 2023 01:42:42 +0000

 Module Name:	src
 Committed By:	chs
 Date:		Mon May 15 01:42:42 UTC 2023

 Modified Files:
 	src/sys/uvm: uvm_map.c

 Log Message:
 uvm: avoid a deadlock in uvm_map_clean()

 The locking order between map locks and page "busy" locks
 is that the page "busy" lock comes first, but uvm_map_clean()
 breaks this rule by holding a map locked (as reader) while
 waiting for page "busy" locks.

 If another thread is in the page-fault path holding a page
 "busy" lock while waiting for the map lock (as a reader)
 and at the same time a third thread is blocked waiting for
 the map lock as a writer (which blocks the page-fault thread),
 then these three threads will all deadlock with each other.

 Fix this by marking the map "busy" (to block any modifications)
 and unlocking the map lock before possibly waiting for any
 page "busy" locks.

 Martin Pieuchot reported that the same problem existed in OpenBSD
 he applied this fix there after several people tested it.

 fixes PR 56952


 To generate a diff of this commit:
 cvs rdiff -u -r1.405 -r1.406 src/sys/uvm/uvm_map.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56952 CVS commit: [netbsd-10] src/sys/uvm
Date: Mon, 15 May 2023 10:32:53 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon May 15 10:32:53 UTC 2023

 Modified Files:
 	src/sys/uvm [netbsd-10]: uvm_map.c

 Log Message:
 Pull up following revision(s) (requested by chs in ticket #167):

 	sys/uvm/uvm_map.c: revision 1.406

 uvm: avoid a deadlock in uvm_map_clean()

 The locking order between map locks and page "busy" locks
 is that the page "busy" lock comes first, but uvm_map_clean()
 breaks this rule by holding a map locked (as reader) while
 waiting for page "busy" locks.

 If another thread is in the page-fault path holding a page
 "busy" lock while waiting for the map lock (as a reader)
 and at the same time a third thread is blocked waiting for
 the map lock as a writer (which blocks the page-fault thread),
 then these three threads will all deadlock with each other.

 Fix this by marking the map "busy" (to block any modifications)
 and unlocking the map lock before possibly waiting for any
 page "busy" locks.

 Martin Pieuchot reported that the same problem existed in OpenBSD
 he applied this fix there after several people tested it.

 fixes PR 56952


 To generate a diff of this commit:
 cvs rdiff -u -r1.403 -r1.403.2.1 src/sys/uvm/uvm_map.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Tue, 16 May 2023 05:35:57 +0000
State-Changed-Why:
fixed and pulled up
thanks!


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.