NetBSD Problem Report #56849
From www@netbsd.org Sun May 22 18:14:26 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 541B51A9242
for <gnats-bugs@gnats.NetBSD.org>; Sun, 22 May 2022 18:14:26 +0000 (UTC)
Message-Id: <20220522181424.AF07C1A9243@mollari.NetBSD.org>
Date: Sun, 22 May 2022 18:14:24 +0000 (UTC)
From: tgl@sss.pgh.pa.us
Reply-To: tgl@sss.pgh.pa.us
To: gnats-bugs@NetBSD.org
Subject: Wacko kernel memory accounting in current/hppa
X-Send-Pr-Version: www-1.0
>Number: 56849
>Category: port-hppa
>Synopsis: Wacko kernel memory accounting in current/hppa
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-hppa-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun May 22 18:15:00 +0000 2022
>Closed-Date: Sun May 29 06:28:33 +0000 2022
>Last-Modified: Sun May 29 06:28:33 +0000 2022
>Originator: Tom Lane
>Release: HEAD/202205210400Z
>Organization:
PostgreSQL Global Development Group
>Environment:
NetBSD sss2.sss.pgh.pa.us 9.99.96 NetBSD 9.99.96 (TGL) #0: Sat May 21 16:50:57 EDT 2022 tgl@nuc1.sss.pgh.pa.us:/home/tgl/netbsd-H-202205210400Z/obj.hppa/sys/arch/hppa/compile/TGL hppa
This kernel is custom only to the extent of having a particular root device selection in its config, so I don't have to intervene manually when netbooting (root filesystem is on a USB drive that the boot ROM doesn't know how to boot from)
>Description:
During and after a fairly long, disk-access-intensive test suite, I observe completely wacko reports from top(1) and ps(1) about the kernel's memory consumption. With the machine sitting
idle post-run, top says
load averages: 0.00, 0.00, 0.00; up 0+19:01:17 12:37:47
19 processes: 18 sleeping, 1 on CPU
CPU states: 0.0% user, 0.0% nice, 1.5% system, 0.0% interrupt, 98.5% idle
Memory: 38M Act, 4852K Inact, 26M Wired, 8456K Exec, 32M File, 296M Free
Swap: 1024M Total, 9068K Used, 1015M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
0 root 125 0 0K 79G vdrain 60:25 0.00% 0.00% [system]
...
"79G"? This machine only has 512M RAM, plus the 1024M swap partition.
The "Memory:" line is less obviously implausible, but it's still not right, because those numbers only sum to about 405M.
"ps auxww" seems to be looking at the same incorrect estimate, though
it presents it much differently:
USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND
root 0 0.0 15781.9 0 3050660 ? DKl 5:37PM 60:26.58 [system]
...
... or wait, that %MEM estimate tracks pretty closely with 79G, but
that RSS value only means 3G doesn't it?
These numbers appear to have crept up slowly since boot but then
stabilized at what I'm showing here. I'm guessing some sort of
pseudo-leak in memory accounting, but not actual memory consumption.
This test suite is extremely file-access-heavy, so something wrong in
vnode accounting could fit the facts perhaps. The suite ran a good
deal slower than I was hoping for, almost double the time it takes
under HP-UX on the same hardware, so I'm wondering if this bogus
accounting is having real performance effects somewhere.
For comparison's sake I looked at a nearby machine running 9.2/amd64,
and there, top and ps agree that the kernel is using about 26M which
seems generally sane.
>How-To-Repeat:
I was running Postgres "make check-world", but I doubt it's specific to this exact workload.
>Fix:
Nick Hudson speculates there's something wrong in sys/arch/hppa/hppa/pmap.c
>Release-Note:
>Audit-Trail:
From: Tom Lane <tgl@sss.pgh.pa.us>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-hppa/56849: Wacko kernel memory accounting in current/hppa
Date: Sun, 22 May 2022 20:15:58 -0400
I got a chance to reboot the machine, and I can now say that the
numbers are weird even very shortly after boot:
load averages: 0.13, 0.16, 0.08; up 0+00:03:58 18:=
47:42
20 processes: 19 sleeping, 1 on CPU
CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% i=
dle
Memory: 57M Act, 26M Wired, 16M Exec, 21M File, 380M Free
Swap: 1024M Total, 1024M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAN=
D
0 root 96 0 0K 656M usbevt 0:02 0.00% 0.00% [syste=
m]
USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND
root 0 0.0 128.2 0 671900 ? DKl 6:44PM 0:02.75 [system]
... and it goes up from there.
regards, tom lane
From: Nick Hudson <nick.hudson@gmx.co.uk>
To: gnats-bugs@netbsd.org, port-hppa-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: port-hppa/56849: Wacko kernel memory accounting in current/hppa
Date: Tue, 24 May 2022 06:50:08 +0100
This is a multi-part message in MIME format.
--------------WIqpHylpkgtuYOD7Yk76jtfL
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Candidate fix...
Nick
--------------WIqpHylpkgtuYOD7Yk76jtfL
Content-Type: text/x-patch; charset=UTF-8; name="pr56849.diff"
Content-Disposition: attachment; filename="pr56849.diff"
Content-Transfer-Encoding: base64
SW5kZXg6IHN5cy9hcmNoL2hwcGEvaHBwYS9wbWFwLmMKPT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQpSQ1MgZmls
ZTogL2N2c3Jvb3Qvc3JjL3N5cy9hcmNoL2hwcGEvaHBwYS9wbWFwLmMsdgpyZXRyaWV2aW5n
IHJldmlzaW9uIDEuMTE2CmRpZmYgLXUgLXAgLXIxLjExNiBwbWFwLmMKLS0tIHN5cy9hcmNo
L2hwcGEvaHBwYS9wbWFwLmMJOSBBcHIgMjAyMiAyMzozODozMiAtMDAwMAkxLjExNgorKysg
c3lzL2FyY2gvaHBwYS9ocHBhL3BtYXAuYwkyNCBNYXkgMjAyMiAwNTo0OTowNSAtMDAwMApA
QCAtMTEwOSw3ICsxMTA5LDcgQEAgcG1hcF9ib290c3RyYXAodmFkZHJfdCB2c3RhcnQpCiAJ
CWVsc2UgaWYgKHZhID09IHV2bV9sd3BfZ2V0dWFyZWEoJmx3cDApICsgVVNQQUNFIC0gUEFH
RV9TSVpFKQogCQkJcHJvdCA9IFVWTV9QUk9UX05PTkU7CiAjZW5kaWYKLQkJcG1hcF9rZW50
ZXJfcGEodmEsIHZhLCBwcm90LCAwKTsKKwkJcG1hcF9rZW50ZXJfcGEodmEsIHZhLCBwcm90
LCBQTUFQX0RJUkVDVE1BUCk7CiAJfQogCiAJLyogWFhYTkggdXBkYXRlICovCkBAIC0xOTkw
LDggKzE5OTMsMTEgQEAgcG1hcF9rZW50ZXJfcGEodmFkZHJfdCB2YSwgcGFkZHJfdCBwYSwg
dgogCSAgICBwbWFwX3Byb3QocG1hcF9rZXJuZWwoKSwgcHJvdCAmIFZNX1BST1RfQUxMKSk7
CiAJaWYgKElTX0lPUEFHRV9QKHBhKSB8fCAoZmxhZ3MgJiBQTUFQX05PQ0FDSEUpKQogCQlw
dGUgfD0gUFRFX1BST1QoVExCX1VOQ0FDSEVBQkxFKTsKLQlwbWFwX2tlcm5lbCgpLT5wbV9z
dGF0cy53aXJlZF9jb3VudCsrOwotCXBtYXBfa2VybmVsKCktPnBtX3N0YXRzLnJlc2lkZW50
X2NvdW50Kys7CisKKwlpZiAoKGZsYWdzICYgUE1BUF9ESVJFQ1RNQVApID09IDApIHsKKwkJ
cG1hcF9rZXJuZWwoKS0+cG1fc3RhdHMud2lyZWRfY291bnQrKzsKKwkJcG1hcF9rZXJuZWwo
KS0+cG1fc3RhdHMucmVzaWRlbnRfY291bnQrKzsKKwl9CiAJaWYgKG9wdGUpCiAJCXBtYXBf
cHRlX2ZsdXNoKHBtYXBfa2VybmVsKCksIHZhLCBvcHRlKTsKIApAQCAtMjA3MSw2ICsyMDc3
LDkgQEAgcG1hcF9rcmVtb3ZlKHZhZGRyX3QgdmEsIHZzaXplX3Qgc2l6ZSkKIAkJcG1hcF9w
dGVfZmx1c2gocG1hcCwgdmEsIHB0ZSk7CiAJCXBtYXBfcHRlX3NldChwZGUsIHZhLCAwKTsK
IAorCQlwbWFwLT5wbV9zdGF0cy53aXJlZF9jb3VudC0tOworCQlwbWFwLT5wbV9zdGF0cy5y
ZXNpZGVudF9jb3VudC0tOworCiAJCXBnID0gcG1hcF9pbml0aWFsaXplZCA/IFBIWVNfVE9f
Vk1fUEFHRShQVEVfUEFHRShwdGUpKSA6IE5VTEw7CiAJCWlmIChwZyAhPSBOVUxMKSB7CiAJ
CQlzdHJ1Y3Qgdm1fcGFnZV9tZCAqIGNvbnN0IG1kID0gVk1fUEFHRV9UT19NRChwZyk7Cklu
ZGV4OiBzeXMvYXJjaC9ocHBhL2luY2x1ZGUvcG1hcC5oCj09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KUkNTIGZp
bGU6IC9jdnNyb290L3NyYy9zeXMvYXJjaC9ocHBhL2luY2x1ZGUvcG1hcC5oLHYKcmV0cmll
dmluZyByZXZpc2lvbiAxLjQwCmRpZmYgLXUgLXAgLXIxLjQwIHBtYXAuaAotLS0gc3lzL2Fy
Y2gvaHBwYS9pbmNsdWRlL3BtYXAuaAkxNCBNYXIgMjAyMCAxNDowNTo0MiAtMDAwMAkxLjQw
CisrKyBzeXMvYXJjaC9ocHBhL2luY2x1ZGUvcG1hcC5oCTI0IE1heSAyMDIyIDA1OjQ5OjA1
IC0wMDAwCkBAIC03Miw2ICs3Miw4IEBAIHN0cnVjdCBwbWFwIHsKICNkZWZpbmUJSFBQQV9N
QVhfUElECTB4ZmZmYQogI2RlZmluZQlIUFBBX1NJRF9NQVgJMHg3ZmZkCiAKKyNkZWZpbmUJ
UE1BUF9ESVJFQ1RNQVAJMHgxMDAwMDAwMCAvKiBrZW50ZXJfcGEgKi8KKwogLyoKICAqIERP
TidUIENIQU5HRSBUSElTIC0gdGhpcyBpcyBhc3N1bWVkIGluIGxvdHMgb2YgcGxhY2VzLgog
ICovCg==
--------------WIqpHylpkgtuYOD7Yk76jtfL--
From: Tom Lane <tgl@sss.pgh.pa.us>
To: gnats-bugs@netbsd.org, Nick Hudson <nick.hudson@gmx.co.uk>
Cc: port-hppa-maintainer@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-hppa/56849: Wacko kernel memory accounting in current/hppa
Date: Tue, 24 May 2022 16:55:11 -0400
Nick Hudson <nick.hudson@gmx.co.uk> writes:
> Candidate fix...
I can confirm that with this patch installed, top and ps report a
reasonably stable kernel size, ranging from ~80M just after boot
to ~95M at peak load. This is a bit more than I was expecting, but
it's probably fine given that I kicked some things up after hitting
ENFILE errors:
$ cat /etc/sysctl.conf
...
# tgl additions
kern.ipc.semmni=3D100
kern.ipc.semmns=3D1000
kern.maxvnodes=3D50000
kern.maxproc=3D1000
kern.maxfiles=3D5000
I poked around with some of the other monitoring tools mentioned in
the NetBSD Guide, and I noticed that "sysstat bufcache" is a tad
confused about metadata buffers:
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average |||||||||||||||||||||||||||||||||||||||||||||||
2144 metadata buffers using 98681168592896 kBytes of memo=
ry (19777529200%).
55451 pages for cached file data using 221804 kBytes of memory (44%)=
.
15502 pages for executables using 62008 kBytes of memory (12%)=
.
35216 pages for anon (non-file) data 140864 kBytes of memory (28%)=
.
495 free pages 1980 kBytes of memory ( 0%)=
.
File System Bufs used % kB in use % Bufsize kB % Util %
/home 1227 57 12852 0 12852 0 100
/usr 524 24 4460 0 4498 0 99
/var 36 1 226 0 226 0 100
/ 19 0 186 0 186 0 100
Total: 1806 84 17724 0 17762 0 100
error reading kmem for vaddr at 0xda030 (Bad address)
That error message at the bottom of the screen might or might not
be related to the silly stat. I'm not sure to what extent these
numbers should be expected to match what "top" shows, but they
don't seem crazy on their own with the exception of that one.
regards, tom lane
State-Changed-From-To: open->feedback
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Sat, 28 May 2022 06:39:43 +0000
State-Changed-Why:
Changes committed. Other statistics problem exist elsewhere too, I think.
OK to close?
From: Tom Lane <tgl@sss.pgh.pa.us>
To: gnats-bugs@netbsd.org, skrll@NetBSD.org
Cc: port-hppa-maintainer@netbsd.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org
Subject: Re: port-hppa/56849 (Wacko kernel memory accounting in current/hppa)
Date: Sat, 28 May 2022 12:57:49 -0400
skrll@NetBSD.org writes:
> Changes committed. Other statistics problem exist elsewhere too, I think.
> OK to close?
OK by me; this is an improvement, even if it doesn't fix every
problem in the area.
regards, tom lane
State-Changed-From-To: feedback->closed
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Sun, 29 May 2022 06:28:33 +0000
State-Changed-Why:
Submitter agreed to close
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.