NetBSD Problem Report #50162
From www@NetBSD.org Sat Aug 22 15:18:17 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 23563A65BA
for <gnats-bugs@gnats.NetBSD.org>; Sat, 22 Aug 2015 15:18:17 +0000 (UTC)
Message-Id: <20150822151815.73DECA65E6@mollari.NetBSD.org>
Date: Sat, 22 Aug 2015 15:18:15 +0000 (UTC)
From: riz@NetBSD.org
Reply-To: riz@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: Lurking byteswap problem in FFS_EI ?
X-Send-Pr-Version: www-1.0
>Number: 50162
>Category: kern
>Synopsis: Lurking byteswap problem in FFS_EI ?
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Aug 22 15:20:00 +0000 2015
>Last-Modified: Sat Sep 05 23:40:00 +0000 2015
>Originator: Jeff Rizzo
>Release: 7.99.21
>Organization:
>Environment:
NetBSD odroid.lan 7.0 NetBSD 7.99.21 (ODROID-iscsi) #13: Fri Aug 21 15:19:14 PDT 2015 riz@cassava.tastylime.net:/scratch/evbarm7/obj/sys/arch/evbarm/compile/ODROID-iscsi evbarm
>Description:
I have a USB drive I move between little-endian and big-endian systems (evbarm and macppc), and I started seeing occasional panics:
panic: kernel diagnostic assertion "oldsize != VSIZENOTSET || pgend > oldsize" failed: file "/src/src/sys/uvm/uvm_vnode.c", line 355
cpu0: Begin traceback...
0xbf6e3a2c: netbsd:db_panic+0xc
0xbf6e3a5c: netbsd:vpanic+0x1b0
0xbf6e3a74: netbsd:__udivmoddi4
0xbf6e3ac4: netbsd:uvm_vnp_setsize+0x170
0xbf6e3b04: netbsd:ffs_loadvnode+0x104
0xbf6e3b6c: netbsd:vcache_get+0x2d0
0xbf6e3c24: netbsd:ufs_lookup+0x884
0xbf6e3c5c: netbsd:VOP_LOOKUP+0x48
0xbf6e3cac: netbsd:lookup_once+0x19c
0xbf6e3d7c: netbsd:namei_tryemulroot+0x528
0xbf6e3db4: netbsd:namei+0x34
0xbf6e3ddc: netbsd:fd_nameiat.isra.0+0x64
0xbf6e3e4c: netbsd:do_sys_statat+0x6c
0xbf6e3f04: netbsd:sys___lstat50+0x2c
0xbf6e3f7c: netbsd:syscall+0x84
0xbf6e3fac: netbsd:swi_handler+0xa0
cpu0: End traceback...
Since the byteswapped FS is only one of many, it took a number of these panics and much discussion with others before I realized it was probably happening on the byteswapped file systems. Code inspection also yielded that in the above, "pgend" was negative. I added the following local patch to get a little more information:
Index: sys/uvm/uvm_vnode.c
===================================================================
RCS file: /cvsroot/src/sys/uvm/uvm_vnode.c,v
retrieving revision 1.99
diff -u -r1.99 uvm_vnode.c
--- sys/uvm/uvm_vnode.c 30 Jul 2012 23:56:48 -0000 1.99
+++ sys/uvm/uvm_vnode.c 22 Aug 2015 14:58:54 -0000
@@ -346,7 +346,7 @@
* toss some pages...
*/
- KASSERT(newsize != VSIZENOTSET);
+ KASSERTMSG((newsize >= 0), "newsize is %"PRIx64, newsize);
KASSERT(vp->v_size <= vp->v_writesize);
KASSERT(vp->v_size == vp->v_writesize ||
newsize == vp->v_writesize || newsize <= vp->v_size);
and started seeing panics like this more frequently:
panic: kernel diagnostic assertion "(newsize >= 0)" failed: file "/home/riz/src/sys/uvm/uvm_vnode.c", line 349 newsize is d400000000000000
Stopped in pid 1987.1 (find) at netbsd:cpu_Debugger+0x4: bx r14
db{3}> bt
0xa19d3a5c: netbsd:vpanic+0xc
0xa19d3a74: netbsd:__udivmoddi4
0xa19d3ac4: netbsd:uvm_vnp_setsize+0x19c
0xa19d3b04: netbsd:ffs_loadvnode+0x104
0xa19d3b6c: netbsd:vcache_get+0x2d0
0xa19d3c24: netbsd:ufs_lookup+0x884
0xa19d3c5c: netbsd:VOP_LOOKUP+0x48
0xa19d3cac: netbsd:lookup_once+0x19c
0xa19d3d7c: netbsd:namei_tryemulroot+0x528
0xa19d3db4: netbsd:namei+0x34
0xa19d3ddc: netbsd:fd_nameiat.isra.0+0x64
0xa19d3e4c: netbsd:do_sys_statat+0x6c
0xa19d3f04: netbsd:sys___lstat50+0x2c
0xa19d3f7c: netbsd:syscall+0x84
0xa19d3fac: netbsd:swi_handler+0xa0
db{3}>
"newsize" varies, but clearly looks like it's been the victim of byteswapping (or lack thereof) at some point.
>How-To-Repeat:
Mount a bigendian FFSv2 on a littleendian system and do cvs updates, pkgsrc bulk building and other fs-intensive activities thereon.
>Fix:
None given.
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/50162: Lurking byteswap problem in FFS_EI ?
Date: Sun, 23 Aug 2015 03:50:55 +0000
On Sat, Aug 22, 2015 at 03:20:00PM +0000, riz@NetBSD.org wrote:
> "newsize" varies, but clearly looks like it's been the victim of
> byteswapping (or lack thereof) at some point.
Something I should have thought of earlier: any indication if the
on-disk size is also flipped?
Given that we couldn't find a nonswapping path in the load code, I
wonder if it's being written out wrong.
--
David A. Holland
dholland@netbsd.org
From: Jeff Rizzo <riz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/50162: Lurking byteswap problem in FFS_EI ?
Date: Sat, 22 Aug 2015 22:49:20 -0700
On 8/22/15 8:55 PM, David Holland wrote:
> The following reply was made to PR kern/50162; it has been noted by GNATS.
>
> From: David Holland <dholland-bugs@netbsd.org>
>
>
> Something I should have thought of earlier: any indication if the
> on-disk size is also flipped?
When I was dumping the file system so I could re-newfs it little-endian,
it filled up a 20G partition for what was supposed to be 1.7G of data.
I'm not certain that indicates problems with the files on disk, but it's
certainly possible.
>
> Given that we couldn't find a nonswapping path in the load code, I
> wonder if it's being written out wrong.
>
>
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/50162: Lurking byteswap problem in FFS_EI ?
Date: Sat, 5 Sep 2015 23:37:33 +0000
On Sun, Aug 23, 2015 at 05:50:01AM +0000, Jeff Rizzo wrote:
> > Something I should have thought of earlier: any indication if the
> > on-disk size is also flipped?
>
> When I was dumping the file system so I could re-newfs it little-endian,
> it filled up a 20G partition for what was supposed to be 1.7G of data.
> I'm not certain that indicates problems with the files on disk, but it's
> certainly possible.
That strongly suggests that *something* is pretty badly wrong.
--
David A. Holland
dholland@netbsd.org
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.