NetBSD Problem Report #55658

From www@netbsd.org  Mon Sep 14 08:51:09 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id B3A3D1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 14 Sep 2020 08:51:09 +0000 (UTC)
Message-Id: <20200914085108.9BD4A1A923A@mollari.NetBSD.org>
Date: Mon, 14 Sep 2020 08:51:08 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: rumpvfs:t_etfs triggers bug in ufs_balloc_range() for 16KB page
X-Send-Pr-Version: www-1.0

>Number:         55658
>Category:       kern
>Synopsis:       rumpvfs:t_etfs triggers bug in ufs_balloc_range() for 16KB page
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Sep 14 08:55:00 +0000 2020
>Last-Modified:  Thu Aug 03 03:25:01 +0000 2023
>Originator:     Rin Okuyama
>Release:        9.99.72
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD obs266 9.99.72 NetBSD 9.99.72 (OBS266) #385: Mon Sep 14 17:37:35 JST 2020  rin@latipes:/sys/arch/evbppc/compile/OBS266 evbppc
>Description:
On powerpc/ibm4xx (16KB page), tests/rump/rumpvfs/t_etfs causes kernel
freeze where it falls into infinite recursion in uvm_fault_internal()
called by DSI (data storage interruption) *write* handler:

----
# cd /usr/tests/rump/rumpvfs && atf-run t_etfs
...
tc-se:1+0 records in
tc-se:1+0 records out
tc-se:1 bytes transferred in 0.-45 secs (0 bytes/sec)
(freeze here)
~Stopped in pid 297.274 (t_etfs) at      netbsd:comintr+0x890:   stw     r23, 0x6
c(r25)
db> bt
0x06c7b590: at intr_deliver+0x7c
0x06c7b5c0: at pic_handle_intr+0xf8
0x06c7b610: at cpu_lwp_bootstrap+0x4d4
0x06c7b6e0: at genfs_getpages+0x92c
0x06c7b7e0: at VOP_GETPAGES+0x58
0x06c7b820: at uvn_get+0x8c
0x06c7b860: at ubc_fault+0x1a8
0x06c7b8e0: at uvm_fault_internal+0x4dc
0x06c7ba30: at trap+0x338
0x06c7bae0: kernel DSI write trap @ 0x84204ffc by memcpy+0x84: srr1=0x8030
            r1=0x6c7bbb0 cr=0x40222434 xer=0 ctr=0x400 dear=0x84204ffc esr=0x800
000 pid=0x1
0x06c7bbb0: at main+0x790
0x06c7bbd0: at copyin+0x1a4
0x06c7bc50: at copyin_vmspace+0x4c
0x06c7bc90: at uiomove+0x188
0x06c7bcd0: at ubc_uiomove+0x1b8
0x06c7bd40: at ffs_write+0x67c
0x06c7bdd0: at VOP_WRITE+0x3c
0x06c7be00: at vn_write+0x140
0x06c7be30: at dofilewrite+0x8c
0x06c7be80: at sys_pwrite+0xdc
0x06c7beb0: at syscall+0x2e4
0x06c7bf20: user SC trap #174 by 0xfda0cd4c: srr1=0xc030
            r1=0xfacdbf40 cr=0x28002482 xer=0x20000000 ctr=0xfda0cd44 esr=0x800000 pid=0x42
----

This turned out to be a bug in ufs_balloc_range() for machines with
16KB (and above) page, introduced by this commit:

http://mail-index.netbsd.org/source-changes/2005/07/17/0011.html
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/ufs/ufs/ufs_inode.c#rev1.50

> --- src/sys/ufs/ufs/ufs_inode.c	2005/07/10 01:08:52	1.49
> +++ src/sys/ufs/ufs/ufs_inode.c	2005/07/17 09:13:35	1.50
> ...
> @@ -273,9 +273,11 @@ ufs_balloc_range(struct vnode *vp, off_t
>  
>  	simple_lock(&uobj->vmobjlock);
>  	for (i = 0; i < npages; i++) {
> -		pgs[i]->flags &= ~PG_RDONLY;
>  		if (error) {
>  			pgs[i]->flags |= PG_RELEASED;
> +		} else if (off <= pagestart + (i << PAGE_SHIFT) &&
> +		    pagestart + ((i + 1) << PAGE_SHIFT) <= off + len) {
> +			pgs[i]->flags &= ~PG_RDONLY;
>  		}
>  	}
>  	if (error) {

Comment (added later) explains this code as:

>         /*
>          * if the allocation succeeded, mark all the pages dirty
>          * and clear PG_RDONLY on any pages that are now fully backed
>          * by disk blocks. ...
>          */

By inserting debug printf:

----
--- ufs_inode.c	5 Sep 2020 16:30:13 -0000	1.112
+++ ufs_inode.c	14 Sep 2020 08:26:12 -0000
@@ -275,8 +275,22 @@ ufs_balloc_range(struct vnode *vp, off_t
 		if (!error) {
 			if (off <= pagestart + (i << PAGE_SHIFT) &&
 			    pagestart + ((i + 1) << PAGE_SHIFT) <= eob) {
+#if 1 /* XXX */
+
+printf("%s: OK: pa %#jx off %#jx eob %#jx\n",
+    __func__, (intmax_t)pgs[i]->phys_addr, (intmax_t)off, (intmax_t)eob);
+
+#endif
 				pgs[i]->flags &= ~PG_RDONLY;
 			}
+			else {
+#if 1 /* XXX */
+
+printf("%s: NG: pa %#jx off %#jx eob %#jx\n",
+    __func__, (intmax_t)pgs[i]->phys_addr, (intmax_t)off, (intmax_t)eob);
+
+#endif
+			}
 			uvm_pagemarkdirty(pgs[i], UVM_PAGE_STATUS_DIRTY);
 		}
 		uvm_pagelock(pgs[i]);
----

ufs_balloc_range() called for 8KB region:

----
# cd /usr/tests/rump/rumpvfs && atf-run t_etfs
...
tc-se:1+0 records in
tc-se:1+0 records out
tc-se:1 bytes transferred in 0.-45 secs (0 bytes/sec)
ufs_balloc_range: NG: pa 0x6c6c000 off 0x60000000000 eob 0x60000002000
                         ^^^^^^^^^                                ^^^^
----

For machines with page size <= 8KB, this changes a target page writable
(confirmed on other ports with 4 and 8KB pages with debug printf above).

However, for machines with 16KB page, the condition above will never be
satisfied. As a result, uvm_fault_internal() will be falling into
infinite recursion, trying to turn that page writable. This ends up with
kernel freeze with TLB filled up with invalided entry for that page:

(continue from previous)
----
Stopped in pid 297.274 (t_etfs) at      netbsd:comintr+0x890:   stw     r23, 0x6
c(r25)
db> machine tlb
tlb  0  PID   1 EPN 0x00000000 16MB   RPN 0x00000000  ZONE  0N  EX WR
tlb  1  PID   1 EPN 0x08000000 16MB   RPN 0xef000000  ZONE  0N     WR  I G
tlb* 2  PID   1 EPN 0x84204000 16kB   RPN 0x06c6c000  ZONE  0N
                                          ^^^^^^^^^^               ^^ (RO)
tlb* 3  PID   1 EPN 0x84204000 16kB   RPN 0x06c6c000  ZONE  0N
tlb* 4  PID   1 EPN 0x84204000 16kB   RPN 0x06c6c000  ZONE  0N
...
----
(* means invalid entry)
>How-To-Repeat:
On powerpc/ibm4xx (our only port with 16KB page),

# cd /usr/tests/rump/rumpvfs && atf-run t_etfs
>Fix:
I'm not 100% sure whether this change is correct, but with this patch,
that test does no longer cause kernel freeze on powerpc/ibm4xx, and no
regression has been observed over a month for ibm4xx and other ports:

http://www.netbsd.org/~rin/ufs_inode_20200914.patch
----
Index: ufs_inode.c
===================================================================
RCS file: /home/netbsd/src/sys/ufs/ufs/ufs_inode.c,v
retrieving revision 1.112
diff -p -u -r1.112 ufs_inode.c
--- ufs_inode.c	5 Sep 2020 16:30:13 -0000	1.112
+++ ufs_inode.c	14 Sep 2020 08:03:05 -0000
@@ -260,7 +260,7 @@ ufs_balloc_range(struct vnode *vp, off_t

 	/*
 	 * if the allocation succeeded, mark all the pages dirty
-	 * and clear PG_RDONLY on any pages that are now fully backed
+	 * and clear PG_RDONLY on any pages that are now backed
 	 * by disk blocks.  if the allocation failed, we do not invalidate
 	 * the pages since they might have already existed and been dirty,
 	 * in which case we need to keep them around.  if we created the pages,
@@ -274,7 +274,7 @@ ufs_balloc_range(struct vnode *vp, off_t
 		KASSERT((pgs[i]->flags & PG_RELEASED) == 0);
 		if (!error) {
 			if (off <= pagestart + (i << PAGE_SHIFT) &&
-			    pagestart + ((i + 1) << PAGE_SHIFT) <= eob) {
+			    pagestart + (i << PAGE_SHIFT) <= eob) {
 				pgs[i]->flags &= ~PG_RDONLY;
 			}
 			uvm_pagemarkdirty(pgs[i], UVM_PAGE_STATUS_DIRTY);
----

>Audit-Trail:
From: Chuck Silvers <chuq@chuq.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55658: rumpvfs:t_etfs triggers bug in ufs_balloc_range()
 for 16KB page
Date: Mon, 14 Sep 2020 10:42:40 -0700

 This isn't the right fix for this problem.  Pages which are only
 partially backed by allocated blocks in the file system must not
 be mapped writable (at the pmap level) for user mappings,
 because the page fault that is taken by user code writing to
 the mapping of such a partially-backed page is the point where
 we are able to allocate the rest of the blocks for the page.
 If we allow user code to modify the page without a page fault,
 then when the page is cleaned, any modifications to the parts
 of the page that have no blocks allocated will be discarded.

 I think the right fix will be to continue setting PG_RDONLY as
 we do currently so that user mappings will behave correctly,
 and instead change ubc_fault_page() to ignore the PG_RDONLY flag
 and always pmap_enter() the page with the permissions of the
 original access_type.  It is the file system's responsibility
 to allocate blocks for any part of the file that is being modified
 by write() before calling into UBC to fill the pages for that
 range with the desired data.

 Note that I haven't tested this and there may well be
 other adjustments that need to be made elsewhere.

 -Chuck

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, Chuck Silvers <chuq@chuq.com>
Cc: 
Subject: Re: kern/55658: rumpvfs:t_etfs triggers bug in ufs_balloc_range() for
 16KB page
Date: Wed, 16 Sep 2020 01:09:23 +0900

 On 2020/09/15 2:45, Chuck Silvers wrote:
 >   This isn't the right fix for this problem.  Pages which are only
 >   partially backed by allocated blocks in the file system must not
 >   be mapped writable (at the pmap level) for user mappings,
 >   because the page fault that is taken by user code writing to
 >   the mapping of such a partially-backed page is the point where
 >   we are able to allocate the rest of the blocks for the page.
 >   If we allow user code to modify the page without a page fault,
 >   then when the page is cleaned, any modifications to the parts
 >   of the page that have no blocks allocated will be discarded.
 >   
 >   I think the right fix will be to continue setting PG_RDONLY as
 >   we do currently so that user mappings will behave correctly,
 >   and instead change ubc_fault_page() to ignore the PG_RDONLY flag
 >   and always pmap_enter() the page with the permissions of the
 >   original access_type.  It is the file system's responsibility
 >   to allocate blocks for any part of the file that is being modified
 >   by write() before calling into UBC to fill the pages for that
 >   range with the desired data.
 >   
 >   Note that I haven't tested this and there may well be
 >   other adjustments that need to be made elsewhere.

 Thank you very much for kind explanation!

 I've modified ubc_fault_page() accordingly:

 http://www.netbsd.org/~rin/uvm_bio_20200915.patch

 --- sys/uvm/uvm_bio.c	9 Jul 2020 09:24:32 -0000	1.121
 +++ sys/uvm/uvm_bio.c	15 Sep 2020 02:24:33 -0000
 @@ -235,9 +235,7 @@ static inline int
   ubc_fault_page(const struct uvm_faultinfo *ufi, const struct ubc_map *umap,
       struct vm_page *pg, vm_prot_t prot, vm_prot_t access_type, vaddr_t va)
   {
 -	vm_prot_t mask;
   	int error;
 -	bool rdonly;

   	KASSERT(rw_write_held(pg->uobject->vmobjlock));

 @@ -280,11 +278,8 @@ ubc_fault_page(const struct uvm_faultinf
   	    pg->offset < umap->writeoff ||
   	    pg->offset + PAGE_SIZE > umap->writeoff + umap->writelen);

 -	rdonly = uvm_pagereadonly_p(pg);
 -	mask = rdonly ? ~VM_PROT_WRITE : VM_PROT_ALL;
 -
   	error = pmap_enter(ufi->orig_map->pmap, va, VM_PAGE_TO_PHYS(pg),
 -	    prot & mask, PMAP_CANFAIL | (access_type & mask));
 +	    prot, PMAP_CANFAIL | access_type);

   	uvm_pagelock(pg);
   	uvm_pageactivate(pg);
 ----

 With this patch, rumpvfs:t_etfs test successfully passes on ibm4xx.

 Also, I've confirmed that the system boots multiuser and there's no
 regression in ATF for ibm4xx, as well as for macppc (normal 4KB page).

 Does this patch seem OK to you? Is there anything else to be checked
 before commit?

 Thanks,
 rin

From: Chuck Silvers <chuq@chuq.com>
To: Rin Okuyama <rokuyama.rk@gmail.com>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55658: rumpvfs:t_etfs triggers bug in ufs_balloc_range()
 for 16KB page
Date: Mon, 21 Sep 2020 00:42:20 -0700

 On Wed, Sep 16, 2020 at 01:09:23AM +0900, Rin Okuyama wrote:
 > Does this patch seem OK to you? Is there anything else to be checked
 > before commit?

 The one thing that I can think of to be concerned about is that
 now that we track page dirtiness via the radix tree tag,
 we have to make sure that a clean page is not mapped writable,
 so it would be good to have an assertion in ubc_fault_page() like:

 	KASSERT((access_type & VM_PROT_WRITE) == 0 ||
 		uvm_pagegetdirty(pg) != UVM_PAGE_STATUS_CLEAN);

 -Chuck

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: Chuck Silvers <chuq@chuq.com>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/55658: rumpvfs:t_etfs triggers bug in ufs_balloc_range() for
 16KB page
Date: Thu, 1 Oct 2020 14:11:54 +0900

 On 2020/09/21 16:42, Chuck Silvers wrote:
 > On Wed, Sep 16, 2020 at 01:09:23AM +0900, Rin Okuyama wrote:
 >> Does this patch seem OK to you? Is there anything else to be checked
 >> before commit?
 > 
 > The one thing that I can think of to be concerned about is that
 > now that we track page dirtiness via the radix tree tag,
 > we have to make sure that a clean page is not mapped writable,
 > so it would be good to have an assertion in ubc_fault_page() like:
 > 
 > 	KASSERT((access_type & VM_PROT_WRITE) == 0 ||
 > 		uvm_pagegetdirty(pg) != UVM_PAGE_STATUS_CLEAN);
 > 

 Thank you very much for your comment, and I'm so sorry for the late reply.

 I've added this KASSERT to ubc_fault_page(), and confirmed that it does
 not fire during full ATF runs at least for amd64, powerpc/oea, and
 powerpc/ibm4xx.

 I will commit it soon, probably in this weekend.

 Thanks,
 rin

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55658 CVS commit: src/sys/uvm
Date: Mon, 5 Oct 2020 04:48:24 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Mon Oct  5 04:48:24 UTC 2020

 Modified Files:
 	src/sys/uvm: uvm_bio.c

 Log Message:
 PR kern/55658

 ubc_fault_page(): Ignore PG_RDONLY flag and always pmap_enter() the page
 with the permissions of the original access_type.

 It is the file system's responsibility to allocate blocks that is being
 modified by write(), before calling into UBC to fill the pages for that
 range. KASSERT() is added there to confirm that no clean page is mapped
 writable.

 Fix infinite loop in uvm_fault_internal(), observed on 16KB-page systems,
 where it continues to try to make a partially-backed page writable.

 No regression in ATF and KASSERT() does not fire on several architectures,
 as far as I can see.

 Fix suggested by chs. Thanks!


 To generate a diff of this commit:
 cvs rdiff -u -r1.121 -r1.122 src/sys/uvm/uvm_bio.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55658 CVS commit: src/sys/uvm
Date: Sun, 18 Oct 2020 08:52:15 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Sun Oct 18 08:52:15 UTC 2020

 Modified Files:
 	src/sys/uvm: uvm_bio.c

 Log Message:
 PR kern/55658

 Revert rev 1.122:
 http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/uvm/uvm_bio.c#rev1.122

 If this commit is applied to NFS client, changes to files in client
 side are sometimes invisible in server side, which results in file
 corruption.

 Demonstrated by test code provided by Anthony Mallet:
 https://mail-index.netbsd.org/current-users/2020/10/17/msg039708.html

 Whether the test case above passes or not depends on architectures
 and size of NFS I/O specified by -r and -w options of mount_nfs(8)
 (the default size is 32KB for x86 and 8KB for other archs).

 Whereas it fails on amd64 and i386 with the default size, it passes
 on other archs (aarch64, arm, alpha, m68k, and powerpc at least) with
 their default. On most ports, it fails with some I/O sizes.

 However, the condition for failure is still unclear; whereas it fails
 with 2KB I/O size on amiga (m68k, 8KB page), it passes with same I/O
 size on alpha (8KB page). It may depends on some VM parameters or
 details in pmap implementation, or some race conditions are involved.

 Great thanks to Anthony Mallet for providing the test code, and sorry
 everyone for breakage.


 To generate a diff of this commit:
 cvs rdiff -u -r1.122 -r1.123 src/sys/uvm/uvm_bio.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55658 CVS commit: src/tests/rump/rumpvfs
Date: Thu, 3 Aug 2023 03:21:56 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Thu Aug  3 03:21:56 UTC 2023

 Modified Files:
 	src/tests/rump/rumpvfs: t_etfs.c

 Log Message:
 t_etfs: Skip large_blk for system with page size > 8192

 Work around kernel freeze reported as PR kern/55658 for
 powerpc/ibm4xx (16K pages)


 To generate a diff of this commit:
 cvs rdiff -u -r1.11 -r1.12 src/tests/rump/rumpvfs/t_etfs.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.