NetBSD Problem Report #58552
From root@netbsd.org Sun Aug 4 08:26:27 2024
Return-Path: <root@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
client-signature RSA-PSS (2048 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id EFAE11A923C
for <gnats-bugs@gnats.NetBSD.org>; Sun, 4 Aug 2024 08:26:26 +0000 (UTC)
Message-Id: <20240804082625.BBF941985D9@morden.netbsd.org>
Date: Sun, 4 Aug 2024 08:26:25 +0000 (UTC)
From: spz@NetBSD.org
Reply-To: spz@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: panic via genfs_getpages - ufs_bmaparray
X-Send-Pr-Version: 3.95
>Number: 58552
>Category: kern
>Synopsis: panic via genfs_getpages - ufs_bmaparray
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Aug 04 08:30:00 +0000 2024
>Last-Modified: Fri Aug 30 17:25:00 +0000 2024
>Originator: S.P.Zeidler
>Release: NetBSD 10.0_STABLE 20240527
>Organization:
The NetBSD Foundation
>Environment:
System: NetBSD morden.netbsd.org 10.0_STABLE NetBSD 10.0_STABLE (NBFTP) #0: Tue May 28 07:18:01 UTC 2024 spz@franklin.NetBSD.org:/home/netbsd/10/amd64/obj/sys/arch/amd64/compile/NBFTP amd64
Architecture: x86_64
Machine: amd64
>Description:
[ 5864557.5601213] uvm_fault(0xfffff48ffb0e0668, 0x826ea1000, 1) -> e
[ 5864557.5701215] fatal page fault in supervisor mode
[ 5864557.5701215] trap type 6 code 0 rip 0xffffffff808d0cd4 cs 0x8 rflags 0x10202 cr2 0x826ea10f8 ilevel 0 rsp 0xffff828c9c4b2338
[ 5864557.5701215] curlwp 0xfffff48b150b5680 pid 9552.9552 lowest kstack 0xffff828c9c4ae2c0
kernel: page fault trap, code=0
Stopped in pid 9552.9552 (rsync) at netbsd:incore+0x32: cmpq %rsi,d8(
%rax)
incore() at netbsd:incore+0x32
ufs_bmaparray() at netbsd:ufs_bmaparray+0x16c
ufs_bmap() at netbsd:ufs_bmap+0x4d
VOP_BMAP() at netbsd:VOP_BMAP+0x6a
genfs_getpages() at netbsd:genfs_getpages+0xd55
VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x58
ra_startio() at netbsd:ra_startio+0x94
uvm_ra_request() at netbsd:uvm_ra_request+0x124
uvn_get() at netbsd:uvn_get+0xa7
ubc_fault() at netbsd:ubc_fault+0x11e
uvm_fault_internal() at netbsd:uvm_fault_internal+0x3eb
trap() at netbsd:trap+0x2f9
--- trap (number 6) ---
copyout() at netbsd:copyout+0x33
ubc_uiomove() at netbsd:ubc_uiomove+0x104
ffs_read() at netbsd:ffs_read+0xd0
VOP_READ() at netbsd:VOP_READ+0x42
vn_read() at netbsd:vn_read+0x18e
dofileread() at netbsd:dofileread+0x79
sys_read() at netbsd:sys_read+0x49
syscall() at netbsd:syscall+0x1fc
--- syscall (number 3) ---
netbsd:syscall+0x1fc:
ds 2310
es 7048
fs 7098
gs 7040
rdi fffff48d36a6d940
rsi fffffffffffffff4
rbp ffff828c9c4b2380
rbx 0
rdx ffff82805d132000
rcx fffff48b150b5680
rax 826ea1020
r8 0
r9 ffff828c9c4b23dc
r10 fffffffffffffff4
r11 2
r12 fffff48d36a6d940
r13 fffffffffffffff4
r14 0
r15 fffff48b3807f200
rip ffffffff808d0cd4 incore+0x32
cs 8
rflags 10202
rsp ffff828c9c4b2338
ss 10
netbsd:incore+0x32: cmpq %rsi,d8(%rax)
a savecore is available, ask for location and access
>How-To-Repeat:
hopefully not too often
>Fix:
>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@mailbox.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/58552: panic via genfs_getpages - ufs_bmaparray
Date: Thu, 29 Aug 2024 11:25:59 +0200
From gdb frame #9 "incore(0xfffff48d36a6d940, -12)" we traverse the
hash list "bufhashtbl[3581645]->lh_first == 0xfffff48e57516230".
This buffer looks invalid:
b_iodone = 0xffff828027d87100,
b_error = 794242176,
b_resid = -32128,
b_flags = 1131827200,
b_prio = -32128,
b_bufsize = 660073344,
b_bcount = -32128,
...
The buffer is the 3rd element of "bufpl" item header 0xfffff4907870d150,
page 0xfffff48e57516000. All 15 buffers from this page are allocated.
Page 0xfffff48e57516000 is a large page from the direct map, not sure
if it matters (direct map 0xfffff484f8e00000 .. 0xfffff49138dfffff).
Printing the entire page:
0xfffff48e57516000: 0xffff82804a3f6380 0xffff82802784de80
0xfffff48e57516010: 0xffff82804e020600 0xffff828036284d80
0xfffff48e57516020: 0xffff82803c3f8d00 0xffff828031b42c80
0xfffff48e57516030: 0xffff82803054dc00 0xffff828036e98f80
...
0xfffff48e57516230: 0xffff82804d2e0e80 0xffff8280311de600
0xfffff48e57516240: 0xffff82803144c580 0xffff828027d87100
0xfffff48e57516250: 0xffff82802f572c80 0xffff828043765000
0xfffff48e57516260: 0xffff82802757eb80 0xffff8280501ea300
0xfffff48e57516270: 0xffff82802c916e00 0xffff828048faa400
0xfffff48e57516280: 0xffff8280547ebb80 0xffff828040165300
0xfffff48e57516290: 0xffff82804191ed80 0xffff82803eb2dd00
...
0xfffff48e57516fc0: 0xffff828037f58180 0xffff828034a14900
0xfffff48e57516fd0: 0xffff82804ee23080 0xffff8280519c0000
0xfffff48e57516fe0: 0xffff82803c38fb80 0xffff82802bc49700
0xfffff48e57516ff0: 0xffff828052c3f680 0xffff828033468200
All these entrys are valid "struct vm_page *" pointers so it looks like
this page allocated to the "bufpl" got overwritten with 512 pointers
to vm pages.
Which operation creates an array of at least 512 "struct vm_page"
pointers and therefore is a candidate for trashing?
--
J. Hannken-Illjes
From: Taylor R Campbell <riastradh@NetBSD.org>
To: "J. Hannken-Illjes" <hannken@mailbox.org>
Cc: "S.P.Zeidler" <spz@NetBSD.org>,
gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58552: panic via genfs_getpages - ufs_bmaparray
Date: Fri, 30 Aug 2024 01:09:02 +0000
> Date: Thu, 29 Aug 2024 11:25:59 +0200
> From: "J. Hannken-Illjes" <hannken@mailbox.org>
>
> All these entrys are valid "struct vm_page *" pointers so it looks like
> this page allocated to the "bufpl" got overwritten with 512 pointers
> to vm pages.
>
> Which operation creates an array of at least 512 "struct vm_page"
> pointers and therefore is a candidate for trashing?
genfs_getpages potentially does this, say for fsync or msync of a 2MB
range:
308 const int pgs_size = sizeof(struct vm_page *) *
309 ((endoffset - startoffset) >> PAGE_SHIFT);
310 struct vm_page **pgs, *pgs_onstack[UBC_MAX_PAGES];
311
312 if (pgs_size > sizeof(pgs_onstack)) {
313 pgs = kmem_zalloc(pgs_size, async ? KM_NOSLEEP : KM_SLEEP);
https://nxr.netbsd.org/xref/src/sys/miscfs/genfs/genfs_io.c?r=1.104#308
From: "J. Hannken-Illjes" <hannken@mailbox.org>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: "S.P.Zeidler" <spz@NetBSD.org>, gnats-bugs@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: kern/58552: panic via genfs_getpages - ufs_bmaparray
Date: Fri, 30 Aug 2024 16:48:58 +0200
On Fri, Aug 30, 2024 at 01:09:02AM +0000, Taylor R Campbell wrote:
> > Date: Thu, 29 Aug 2024 11:25:59 +0200
> > From: "J. Hannken-Illjes" <hannken@mailbox.org>
> >
> > All these entrys are valid "struct vm_page *" pointers so it looks like
> > this page allocated to the "bufpl" got overwritten with 512 pointers
> > to vm pages.
> >
> > Which operation creates an array of at least 512 "struct vm_page"
> > pointers and therefore is a candidate for trashing?
>
> genfs_getpages potentially does this, say for fsync or msync of a 2MB
> range:
>
> 308 const int pgs_size = sizeof(struct vm_page *) *
> 309 ((endoffset - startoffset) >> PAGE_SHIFT);
> 310 struct vm_page **pgs, *pgs_onstack[UBC_MAX_PAGES];
> 311
> 312 if (pgs_size > sizeof(pgs_onstack)) {
> 313 pgs = kmem_zalloc(pgs_size, async ? KM_NOSLEEP : KM_SLEEP);
>
> https://nxr.netbsd.org/xref/src/sys/miscfs/genfs/genfs_io.c?r=1.104#308
Where is the path from fsync/msync to VOP_GETPAGES? Looks like both end up
in VOP_PUTPAGES where requests seem bound to MAXPHYS.
--
J. Hannken-Illjes
From: Taylor R Campbell <riastradh@NetBSD.org>
To: "J. Hannken-Illjes" <hannken@mailbox.org>
Cc: "S.P.Zeidler" <spz@NetBSD.org>,
gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58552: panic via genfs_getpages - ufs_bmaparray
Date: Fri, 30 Aug 2024 17:22:03 +0000
> Date: Fri, 30 Aug 2024 16:48:58 +0200
> From: "J. Hannken-Illjes" <hannken@mailbox.org>
>=20
> Where is the path from fsync/msync to VOP_GETPAGES? Looks like both end =
up
> in VOP_PUTPAGES where requests seem bound to MAXPHYS.
Sorry, I was confusing getpages and putpages in my head while skimming
this code for patterns like `struct vm_page \*\*' and `pgs.*alloc'.
(The other candidates I found was in but I don't think it's relevant
here.)
Perhaps mlock/mlockall could trigger a 2MB getpages?
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.