NetBSD Problem Report #52706
From www@NetBSD.org Tue Nov 7 16:47:17 2017
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id AAAEF7A1C8
for <gnats-bugs@gnats.NetBSD.org>; Tue, 7 Nov 2017 16:47:17 +0000 (UTC)
Message-Id: <20171107164716.C8FC97A222@mollari.NetBSD.org>
Date: Tue, 7 Nov 2017 16:47:16 +0000 (UTC)
From: n54@gmx.com
Reply-To: n54@gmx.com
To: gnats-bugs@NetBSD.org
Subject: "UVM: p...], uid 0 killed: out of swap" on boot
X-Send-Pr-Version: www-1.0
>Number: 52706
>Category: kern
>Synopsis: "UVM: p...], uid 0 killed: out of swap" on boot
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: chs
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Nov 07 16:50:00 +0000 2017
>Closed-Date: Tue Aug 20 13:15:59 +0000 2019
>Last-Modified: Tue Aug 20 13:15:59 +0000 2019
>Originator: Kamil Rytarowski
>Release: NetBSD/amd64 8.99.5
>Organization:
TNF
>Environment:
NetBSD chieftec 8.99.5 NetBSD 8.99.5 (GENERIC) #4: Sun Nov 5 07:38:39
CET 2017
root@chieftec:/public/netbsd-root/sys/arch/amd64/compile/GENERIC amd64
>Description:
I can reproduce that a random process is being killed due to "out of swap" on a regular boot. It appears to break a random programs, sometimes causing the boot process to halt.
UVM: pid 11. (sh), uid 0 killed: out of swap
[1] Killed (local line; loc...
Enter pathname of shell or RETURN for /bin/sh:
>How-To-Repeat:
Boot NetBSD/amd64 8.99.5.
>Fix:
N/A
>Release-Note:
>Audit-Trail:
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/52706: "UVM: p...], uid 0 killed: out of swap" on boot
Date: Wed, 15 Nov 2017 12:20:47 +0000
I just saw this with a NetBSD-8.99.6/amd64 13 Nov 2017 kernel.
From: Patrick Welche <prlw1@cam.ac.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/52706: "UVM: p...], uid 0 killed: out of swap" on boot
Date: Wed, 15 Nov 2017 20:26:11 +0000
And just in case: a 15 Nov, subr_pool 1.216 kernel too... (twice)
From: "Chuck Silvers" <chs@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52706 CVS commit: src/sys/arch/x86/x86
Date: Mon, 20 Nov 2017 20:57:59 +0000
Module Name: src
Committed By: chs
Date: Mon Nov 20 20:57:58 UTC 2017
Modified Files:
src/sys/arch/x86/x86: pmap.c
Log Message:
In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. This implements the requirement that
pmap_enter(PMAP_CANFAIL) must not fail when replacing an existing
mapping with the first mapping of a new page, which is an unintended
consequence of the changes from the rmind-uvmplock branch in 2011.
The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.
This somewhat indirectly fixes PR 52706, as well as the failing assertion
about "uvm_page_locked_p(old_pg)". (but only on x86, various other platforms
will need their own changes to handle this issue.)
To generate a diff of this commit:
cvs rdiff -u -r1.265 -r1.266 src/sys/arch/x86/x86/pmap.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Chuck Silvers" <chs@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52706 CVS commit: src/sys/uvm
Date: Mon, 20 Nov 2017 21:06:54 +0000
Module Name: src
Committed By: chs
Date: Mon Nov 20 21:06:54 UTC 2017
Modified Files:
src/sys/uvm: uvm_fault.c
Log Message:
In uvm_fault_upper_enter(), if pmap_enter(PMAP_CANFAIL) fails, assert that
the pmap did not leave around a now-stale pmap mapping for an old page.
If such a pmap mapping still existed after we unlocked the vm_map,
the UVM code would not know later that it would need to lock the
lower layer object while calling the pmap to remove or replace that
stale pmap mapping. See PR 52706 for further details.
To generate a diff of this commit:
cvs rdiff -u -r1.201 -r1.202 src/sys/uvm/uvm_fault.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: kern-bug-people->chs
Responsible-Changed-By: chs@NetBSD.org
Responsible-Changed-When: Mon, 20 Nov 2017 22:27:42 +0000
Responsible-Changed-Why:
I'm leading the charge on this one
From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/52706 ("UVM: p...], uid 0 killed: out of swap" on boot)
Date: Wed, 29 Nov 2017 00:17:14 +0100
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--HQlVebxJ3nxWEkHSoGTuDu7clwnrErFAj
Content-Type: multipart/mixed; boundary="OdkuCg7LNEsLtgtdiBMQQKhHuO9M37pVh";
protected-headers="v1"
From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Message-ID: <21781b1d-1947-03eb-42b2-635070fdbef4@gmx.com>
Subject: Re: kern/52706 ("UVM: p...], uid 0 killed: out of swap" on boot)
References: <pr-kern-52706@gnats.netbsd.org>
<20171107164716.C8FC97A222@mollari.NetBSD.org>
<20171120222743.0891B7A1F1@mollari.NetBSD.org>
In-Reply-To: <20171120222743.0891B7A1F1@mollari.NetBSD.org>
--OdkuCg7LNEsLtgtdiBMQQKhHuO9M37pVh
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
The problem is still there, however less frequent.
$ uname -a
NetBSD chieftec 8.99.7 NetBSD 8.99.7 (GENERIC) #12: Fri Nov 24 21:50:24
CET 2017
root@chieftec:/public/netbsd-root/sys/arch/amd64/compile/GENERIC amd64
--OdkuCg7LNEsLtgtdiBMQQKhHuO9M37pVh--
--HQlVebxJ3nxWEkHSoGTuDu7clwnrErFAj
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
iQJABAEBCAAqFiEELaxVpweEzw+lMDwuS7MI6bAudmwFAlod7noMHG41NEBnbXgu
Y29tAAoJEEuzCOmwLnZsKzoP/Ryv6ENgdI8fLV2jQW3Fi4U5sV7QMHh2dOLa3F6k
leJBpi0Sw6uRzp6Ww3dhB++RYAlQoiQORa4FCcwHIC9QjWJt4S4JOeizQ8X1CPLf
EDdqVX/AXpy1g158wwZu2H/cSjamzNV23Bninw2L/wIDl0tKG5G2uT5mlBGAvat4
Lpu61y3w5bhugf7EmMtpvm1SPCltONIfQsOPEGUXhtld+2lEKJspBd88k937pGcv
/dbrGWvtZrucxmOuXSqTH7RdMr7gKs5oueO7JEN0xO9c1vZfl3XlPQWDxOVPhEz/
az+rn1HzvYvrID70E8bWkuJ+WQamB4WU9x4KZkxY66foZUWAbuMoRVVcFTQ8O3J0
fqX45ZDY38AEcBshlW/oIcYEhJ81kli+S1/XSBcZUGdGVKwAn8y+w2KzaNRRpQj9
sxiGqM6F86LFK/B20kQc9MfS3/tZfOGAqa3Z9aB83KqhV9gl1pPmPGy474AYSjUS
KXxOj1ZR5HBRwOUzYGmL6hf8eZLMhDk/P0fJRCsJJY0mwtphTRxeLi3o0GT/jCd2
bUrrSsOZM1l1AWy1hoecTDmKrbT86G9wJjYKITbLAbcf27zDDvBEPTWyKo7CrH9Y
Vtan/gfdj3M0CPE2sAO6+OJfZsFxJoivDDKYf4U6/RL/DmydHf5sXR88LP/zBpwV
+B8T
=vqgH
-----END PGP SIGNATURE-----
--HQlVebxJ3nxWEkHSoGTuDu7clwnrErFAj--
From: "Chuck Silvers" <chs@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52706 CVS commit: src/sys/arch
Date: Sat, 27 Jan 2018 23:07:36 +0000
Module Name: src
Committed By: chs
Date: Sat Jan 27 23:07:36 UTC 2018
Modified Files:
src/sys/arch/alpha/alpha: pmap.c
src/sys/arch/m68k/m68k: pmap_motorola.c
src/sys/arch/powerpc/oea: pmap.c
src/sys/arch/sparc64/sparc64: pmap.c
Log Message:
apply the change from arch/x86/x86/pmap.c rev. 1.266 commitid vZRjvmxG7YTHLOfA:
In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. If we are replacing an existing mapping,
reuse the pv structure where possible.
This implements the requirement that pmap_enter(PMAP_CANFAIL) must not fail
when replacing an existing mapping with the first mapping of a new page,
which is an unintended consequence of the changes from the rmind-uvmplock
branch in 2011.
The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.
This somewhat indirectly fixes PR 52706 on the remaining platforms where
this problem existed.
To generate a diff of this commit:
cvs rdiff -u -r1.261 -r1.262 src/sys/arch/alpha/alpha/pmap.c
cvs rdiff -u -r1.69 -r1.70 src/sys/arch/m68k/m68k/pmap_motorola.c
cvs rdiff -u -r1.94 -r1.95 src/sys/arch/powerpc/oea/pmap.c
cvs rdiff -u -r1.307 -r1.308 src/sys/arch/sparc64/sparc64/pmap.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52706 CVS commit: [netbsd-8] src/sys
Date: Tue, 27 Feb 2018 09:07:34 +0000
Module Name: src
Committed By: martin
Date: Tue Feb 27 09:07:33 UTC 2018
Modified Files:
src/sys/arch/alpha/alpha [netbsd-8]: pmap.c
src/sys/arch/m68k/m68k [netbsd-8]: pmap_motorola.c
src/sys/arch/powerpc/oea [netbsd-8]: pmap.c
src/sys/arch/sparc64/sparc64 [netbsd-8]: pmap.c
src/sys/arch/x86/x86 [netbsd-8]: pmap.c
src/sys/dev/dtv [netbsd-8]: dtv_scatter.c
src/sys/dev/marvell [netbsd-8]: mvxpsec.c
src/sys/kern [netbsd-8]: subr_extent.c subr_pool.c uipc_mbuf.c
src/sys/opencrypto [netbsd-8]: crypto.c
src/sys/sys [netbsd-8]: mbuf.h pool.h
src/sys/ufs/chfs [netbsd-8]: chfs_malloc.c
src/sys/uvm [netbsd-8]: uvm_fault.c
Log Message:
Pull up following revision(s) (requested by mrg in ticket #593):
sys/dev/marvell/mvxpsec.c: revision 1.2
sys/arch/m68k/m68k/pmap_motorola.c: revision 1.70
sys/opencrypto/crypto.c: revision 1.102
sys/arch/sparc64/sparc64/pmap.c: revision 1.308
sys/ufs/chfs/chfs_malloc.c: revision 1.5
sys/arch/powerpc/oea/pmap.c: revision 1.95
sys/sys/pool.h: revision 1.80,1.82
sys/kern/subr_pool.c: revision 1.209-1.216,1.219-1.220
sys/arch/alpha/alpha/pmap.c: revision 1.262
sys/kern/uipc_mbuf.c: revision 1.173
sys/uvm/uvm_fault.c: revision 1.202
sys/sys/mbuf.h: revision 1.172
sys/kern/subr_extent.c: revision 1.86
sys/arch/x86/x86/pmap.c: revision 1.266 (via patch)
sys/dev/dtv/dtv_scatter.c: revision 1.4
Allow only one pending call to a pool's backing allocator at a time.
Candidate fix for problems with hanging after kva fragmentation related
to PR kern/45718.
Proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html
Tested by bouyer@ on i386.
This makes one small change to the semantics of pool_prime and
pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if
there is a pending call to the backing allocator in another thread but
we are not actually out of memory. That is unlikely because nearly
always these are used during initialization, when the pool is not in
use.
Define the new flag too for previous commit.
pool_grow can now fail even when sleeping is ok. Catch this case in pool_get
and retry.
Assert that pool_get failure happens only with PR_NOWAIT.
This would have caught the mistake I made last week leading to null
pointer dereferences all over the place, a mistake which I evidently
poorly scheduled alongside maxv's change to the panic message on x86
for null pointer dereferences.
Since pr_lock is now used to wait for two things now (PR_GROWING and
PR_WANTED) we need to loop for the condition we wanted.
make the KASSERTMSG/panic strings consistent as '%s: [%s], __func__, wchan'
Handle the ERESTART case from pool_grow()
don't pass 0 to the pool flags
Guess pool_cache_get(pc, 0) means PR_WAITOK here.
Earlier on in the same context we use kmem_alloc(sz, KM_SLEEP).
use PR_WAITOK everywhere.
use PR_NOWAIT.
Don't use 0 for PR_NOWAIT
use PR_NOWAIT instead of 0
panic ex nihilo -- PR_NOWAITing for zerot
Add assertions that either PR_WAITOK or PR_NOWAIT are set.
- fix an assert; we can reach there if we are nowait or limitfail.
- when priming the pool and failing with ERESTART, don't decrement the number
of pages; this avoids the issue of returning an ERESTART when we get to 0,
and is more correct.
- simplify the pool_grow code, and don't wakeup things if we ENOMEM.
In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. This implements the requirement that
pmap_enter(PMAP_CANFAIL) must not fail when replacing an existing
mapping with the first mapping of a new page, which is an unintended
consequence of the changes from the rmind-uvmplock branch in 2011.
The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.
This somewhat indirectly fixes PR 52706, as well as the failing assertion
about "uvm_page_locked_p(old_pg)". (but only on x86, various other platforms
will need their own changes to handle this issue.)
In uvm_fault_upper_enter(), if pmap_enter(PMAP_CANFAIL) fails, assert that
the pmap did not leave around a now-stale pmap mapping for an old page.
If such a pmap mapping still existed after we unlocked the vm_map,
the UVM code would not know later that it would need to lock the
lower layer object while calling the pmap to remove or replace that
stale pmap mapping. See PR 52706 for further details.
hopefully workaround the irregularly "fork fails in init" problem.
if a pool is growing, and the grower is PR_NOWAIT, mark this.
if another caller wants to grow the pool and is also PR_NOWAIT,
busy-wait for the original caller, which should either succeed
or hard-fail fairly quickly.
implement the busy-wait by unlocking and relocking this pools
mutex and returning ERESTART. other methods (such as having
the caller do this) were significantly more code and this hack
is fairly localised.
ok chs@ riastradh@
Don't release the lock in the PR_NOWAIT allocation. Move flags setting
after the acquiring the mutex. (from Tobias Nygren)
apply the change from arch/x86/x86/pmap.c rev. 1.266 commitid vZRjvmxG7YTHLOfA:
In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. If we are replacing an existing mapping,
reuse the pv structure where possible.
This implements the requirement that pmap_enter(PMAP_CANFAIL) must not fail
when replacing an existing mapping with the first mapping of a new page,
which is an unintended consequence of the changes from the rmind-uvmplock
branch in 2011.
The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.
This somewhat indirectly fixes PR 52706 on the remaining platforms where
this problem existed.
To generate a diff of this commit:
cvs rdiff -u -r1.261 -r1.261.8.1 src/sys/arch/alpha/alpha/pmap.c
cvs rdiff -u -r1.69 -r1.69.8.1 src/sys/arch/m68k/m68k/pmap_motorola.c
cvs rdiff -u -r1.94 -r1.94.8.1 src/sys/arch/powerpc/oea/pmap.c
cvs rdiff -u -r1.307 -r1.307.6.1 src/sys/arch/sparc64/sparc64/pmap.c
cvs rdiff -u -r1.245.6.1 -r1.245.6.2 src/sys/arch/x86/x86/pmap.c
cvs rdiff -u -r1.3 -r1.3.2.1 src/sys/dev/dtv/dtv_scatter.c
cvs rdiff -u -r1.1 -r1.1.12.1 src/sys/dev/marvell/mvxpsec.c
cvs rdiff -u -r1.80 -r1.80.8.1 src/sys/kern/subr_extent.c
cvs rdiff -u -r1.207 -r1.207.6.1 src/sys/kern/subr_pool.c
cvs rdiff -u -r1.172 -r1.172.6.1 src/sys/kern/uipc_mbuf.c
cvs rdiff -u -r1.78.2.4 -r1.78.2.5 src/sys/opencrypto/crypto.c
cvs rdiff -u -r1.170.2.1 -r1.170.2.2 src/sys/sys/mbuf.h
cvs rdiff -u -r1.79 -r1.79.10.1 src/sys/sys/pool.h
cvs rdiff -u -r1.4 -r1.4.30.1 src/sys/ufs/chfs/chfs_malloc.c
cvs rdiff -u -r1.199.6.2 -r1.199.6.3 src/sys/uvm/uvm_fault.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sun, 18 Aug 2019 06:43:25 +0000
State-Changed-Why:
I dont' recall seeing reports of similar things. Chuck, are there remaining issues? what are they?
State-Changed-From-To: feedback->closed
State-Changed-By: kamil@NetBSD.org
State-Changed-When: Tue, 20 Aug 2019 15:15:59 +0200
State-Changed-Why:
Probably fixed. No longer reported since then.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.