NetBSD Problem Report #58552

From root@netbsd.org  Sun Aug  4 08:26:27 2024
Return-Path: <root@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits)
	 client-signature RSA-PSS (2048 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id EFAE11A923C
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  4 Aug 2024 08:26:26 +0000 (UTC)
Message-Id: <20240804082625.BBF941985D9@morden.netbsd.org>
Date: Sun,  4 Aug 2024 08:26:25 +0000 (UTC)
From: spz@NetBSD.org
Reply-To: spz@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: panic via genfs_getpages - ufs_bmaparray
X-Send-Pr-Version: 3.95

>Number:         58552
>Category:       kern
>Synopsis:       panic via genfs_getpages - ufs_bmaparray
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Aug 04 08:30:00 +0000 2024
>Last-Modified:  Fri Aug 30 17:25:00 +0000 2024
>Originator:     S.P.Zeidler
>Release:        NetBSD 10.0_STABLE 20240527
>Organization:
	The NetBSD Foundation
>Environment:
System: NetBSD morden.netbsd.org 10.0_STABLE NetBSD 10.0_STABLE (NBFTP) #0: Tue May 28 07:18:01 UTC 2024 spz@franklin.NetBSD.org:/home/netbsd/10/amd64/obj/sys/arch/amd64/compile/NBFTP amd64
Architecture: x86_64
Machine: amd64
>Description:
[ 5864557.5601213] uvm_fault(0xfffff48ffb0e0668, 0x826ea1000, 1) -> e
[ 5864557.5701215] fatal page fault in supervisor mode
[ 5864557.5701215] trap type 6 code 0 rip 0xffffffff808d0cd4 cs 0x8 rflags 0x10202 cr2 0x826ea10f8 ilevel 0 rsp 0xffff828c9c4b2338
[ 5864557.5701215] curlwp 0xfffff48b150b5680 pid 9552.9552 lowest kstack 0xffff828c9c4ae2c0
kernel: page fault trap, code=0
Stopped in pid 9552.9552 (rsync) at     netbsd:incore+0x32:     cmpq    %rsi,d8(
%rax)
incore() at netbsd:incore+0x32
ufs_bmaparray() at netbsd:ufs_bmaparray+0x16c
ufs_bmap() at netbsd:ufs_bmap+0x4d
VOP_BMAP() at netbsd:VOP_BMAP+0x6a
genfs_getpages() at netbsd:genfs_getpages+0xd55
VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x58
ra_startio() at netbsd:ra_startio+0x94
uvm_ra_request() at netbsd:uvm_ra_request+0x124
uvn_get() at netbsd:uvn_get+0xa7
ubc_fault() at netbsd:ubc_fault+0x11e
uvm_fault_internal() at netbsd:uvm_fault_internal+0x3eb
trap() at netbsd:trap+0x2f9
--- trap (number 6) ---
copyout() at netbsd:copyout+0x33
ubc_uiomove() at netbsd:ubc_uiomove+0x104
ffs_read() at netbsd:ffs_read+0xd0
VOP_READ() at netbsd:VOP_READ+0x42
vn_read() at netbsd:vn_read+0x18e
dofileread() at netbsd:dofileread+0x79
sys_read() at netbsd:sys_read+0x49
syscall() at netbsd:syscall+0x1fc
--- syscall (number 3) ---
netbsd:syscall+0x1fc:
ds          2310
es          7048
fs          7098
gs          7040
rdi         fffff48d36a6d940
rsi         fffffffffffffff4
rbp         ffff828c9c4b2380
rbx         0
rdx         ffff82805d132000
rcx         fffff48b150b5680
rax         826ea1020
r8          0
r9          ffff828c9c4b23dc
r10         fffffffffffffff4
r11         2
r12         fffff48d36a6d940
r13         fffffffffffffff4
r14         0
r15         fffff48b3807f200
rip         ffffffff808d0cd4    incore+0x32
cs          8
rflags      10202
rsp         ffff828c9c4b2338
ss          10
netbsd:incore+0x32:     cmpq    %rsi,d8(%rax)

a savecore is available, ask for location and access
>How-To-Repeat:
	hopefully not too often
>Fix:


>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@mailbox.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/58552: panic via genfs_getpages - ufs_bmaparray
Date: Thu, 29 Aug 2024 11:25:59 +0200

 From gdb frame #9 "incore(0xfffff48d36a6d940, -12)" we traverse the
 hash list "bufhashtbl[3581645]->lh_first == 0xfffff48e57516230".
 This buffer looks invalid:

   b_iodone = 0xffff828027d87100,
   b_error = 794242176,
   b_resid = -32128,
   b_flags = 1131827200,
   b_prio = -32128,
   b_bufsize = 660073344,
   b_bcount = -32128,
   ...

 The buffer is the 3rd element of "bufpl" item header 0xfffff4907870d150,
 page 0xfffff48e57516000.  All 15 buffers from this page are allocated.

 Page 0xfffff48e57516000 is a large page from the direct map, not sure
 if it matters (direct map 0xfffff484f8e00000 .. 0xfffff49138dfffff).

 Printing the entire page:

 0xfffff48e57516000:     0xffff82804a3f6380      0xffff82802784de80
 0xfffff48e57516010:     0xffff82804e020600      0xffff828036284d80
 0xfffff48e57516020:     0xffff82803c3f8d00      0xffff828031b42c80
 0xfffff48e57516030:     0xffff82803054dc00      0xffff828036e98f80
 ...
 0xfffff48e57516230:     0xffff82804d2e0e80      0xffff8280311de600
 0xfffff48e57516240:     0xffff82803144c580      0xffff828027d87100
 0xfffff48e57516250:     0xffff82802f572c80      0xffff828043765000
 0xfffff48e57516260:     0xffff82802757eb80      0xffff8280501ea300
 0xfffff48e57516270:     0xffff82802c916e00      0xffff828048faa400
 0xfffff48e57516280:     0xffff8280547ebb80      0xffff828040165300
 0xfffff48e57516290:     0xffff82804191ed80      0xffff82803eb2dd00
 ...
 0xfffff48e57516fc0:     0xffff828037f58180      0xffff828034a14900
 0xfffff48e57516fd0:     0xffff82804ee23080      0xffff8280519c0000
 0xfffff48e57516fe0:     0xffff82803c38fb80      0xffff82802bc49700
 0xfffff48e57516ff0:     0xffff828052c3f680      0xffff828033468200

 All these entrys are valid "struct vm_page *" pointers so it looks like
 this page allocated to the "bufpl" got overwritten with 512 pointers
 to vm pages.

 Which operation creates an array of at least 512 "struct vm_page"
 pointers and therefore is a candidate for trashing?

 -- 
 J. Hannken-Illjes

From: Taylor R Campbell <riastradh@NetBSD.org>
To: "J. Hannken-Illjes" <hannken@mailbox.org>
Cc: "S.P.Zeidler" <spz@NetBSD.org>,
	gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58552: panic via genfs_getpages - ufs_bmaparray
Date: Fri, 30 Aug 2024 01:09:02 +0000

 > Date: Thu, 29 Aug 2024 11:25:59 +0200
 > From: "J. Hannken-Illjes" <hannken@mailbox.org>
 > 
 > All these entrys are valid "struct vm_page *" pointers so it looks like
 > this page allocated to the "bufpl" got overwritten with 512 pointers
 > to vm pages.
 > 
 > Which operation creates an array of at least 512 "struct vm_page"
 > pointers and therefore is a candidate for trashing?

 genfs_getpages potentially does this, say for fsync or msync of a 2MB
 range:

     308 	const int pgs_size = sizeof(struct vm_page *) *
     309 	    ((endoffset - startoffset) >> PAGE_SHIFT);
     310 	struct vm_page **pgs, *pgs_onstack[UBC_MAX_PAGES];
     311 
     312 	if (pgs_size > sizeof(pgs_onstack)) {
     313 		pgs = kmem_zalloc(pgs_size, async ? KM_NOSLEEP : KM_SLEEP);

 https://nxr.netbsd.org/xref/src/sys/miscfs/genfs/genfs_io.c?r=1.104#308

From: "J. Hannken-Illjes" <hannken@mailbox.org>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: "S.P.Zeidler" <spz@NetBSD.org>, gnats-bugs@NetBSD.org,
	netbsd-bugs@NetBSD.org
Subject: Re: kern/58552: panic via genfs_getpages - ufs_bmaparray
Date: Fri, 30 Aug 2024 16:48:58 +0200

 On Fri, Aug 30, 2024 at 01:09:02AM +0000, Taylor R Campbell wrote:
 > > Date: Thu, 29 Aug 2024 11:25:59 +0200
 > > From: "J. Hannken-Illjes" <hannken@mailbox.org>
 > > 
 > > All these entrys are valid "struct vm_page *" pointers so it looks like
 > > this page allocated to the "bufpl" got overwritten with 512 pointers
 > > to vm pages.
 > > 
 > > Which operation creates an array of at least 512 "struct vm_page"
 > > pointers and therefore is a candidate for trashing?
 > 
 > genfs_getpages potentially does this, say for fsync or msync of a 2MB
 > range:
 > 
 >     308 	const int pgs_size = sizeof(struct vm_page *) *
 >     309 	    ((endoffset - startoffset) >> PAGE_SHIFT);
 >     310 	struct vm_page **pgs, *pgs_onstack[UBC_MAX_PAGES];
 >     311 
 >     312 	if (pgs_size > sizeof(pgs_onstack)) {
 >     313 		pgs = kmem_zalloc(pgs_size, async ? KM_NOSLEEP : KM_SLEEP);
 > 
 > https://nxr.netbsd.org/xref/src/sys/miscfs/genfs/genfs_io.c?r=1.104#308

 Where is the path from fsync/msync to VOP_GETPAGES?  Looks like both end up
 in VOP_PUTPAGES where requests seem bound to MAXPHYS.

 -- 
 J. Hannken-Illjes

From: Taylor R Campbell <riastradh@NetBSD.org>
To: "J. Hannken-Illjes" <hannken@mailbox.org>
Cc: "S.P.Zeidler" <spz@NetBSD.org>,
	gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58552: panic via genfs_getpages - ufs_bmaparray
Date: Fri, 30 Aug 2024 17:22:03 +0000

 > Date: Fri, 30 Aug 2024 16:48:58 +0200
 > From: "J. Hannken-Illjes" <hannken@mailbox.org>
 >=20
 > Where is the path from fsync/msync to VOP_GETPAGES?  Looks like both end =
 up
 > in VOP_PUTPAGES where requests seem bound to MAXPHYS.

 Sorry, I was confusing getpages and putpages in my head while skimming
 this code for patterns like `struct vm_page \*\*' and `pgs.*alloc'.
 (The other candidates I found was in but I don't think it's relevant
 here.)

 Perhaps mlock/mlockall could trigger a 2MB getpages?

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.