NetBSD Problem Report #50162

From www@NetBSD.org  Sat Aug 22 15:18:17 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 23563A65BA
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 22 Aug 2015 15:18:17 +0000 (UTC)
Message-Id: <20150822151815.73DECA65E6@mollari.NetBSD.org>
Date: Sat, 22 Aug 2015 15:18:15 +0000 (UTC)
From: riz@NetBSD.org
Reply-To: riz@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: Lurking byteswap problem in FFS_EI ?
X-Send-Pr-Version: www-1.0

>Number:         50162
>Category:       kern
>Synopsis:       Lurking byteswap problem in FFS_EI ?
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Aug 22 15:20:00 +0000 2015
>Last-Modified:  Sat Sep 05 23:40:00 +0000 2015
>Originator:     Jeff Rizzo
>Release:        7.99.21
>Organization:
>Environment:
NetBSD odroid.lan 7.0 NetBSD 7.99.21 (ODROID-iscsi) #13: Fri Aug 21 15:19:14 PDT 2015  riz@cassava.tastylime.net:/scratch/evbarm7/obj/sys/arch/evbarm/compile/ODROID-iscsi evbarm
>Description:
I have a USB drive I move between little-endian and big-endian systems (evbarm and macppc), and I started seeing occasional panics:

panic: kernel diagnostic assertion "oldsize != VSIZENOTSET || pgend > oldsize" failed: file "/src/src/sys/uvm/uvm_vnode.c", line 355
cpu0: Begin traceback...
0xbf6e3a2c: netbsd:db_panic+0xc
0xbf6e3a5c: netbsd:vpanic+0x1b0
0xbf6e3a74: netbsd:__udivmoddi4
0xbf6e3ac4: netbsd:uvm_vnp_setsize+0x170
0xbf6e3b04: netbsd:ffs_loadvnode+0x104
0xbf6e3b6c: netbsd:vcache_get+0x2d0
0xbf6e3c24: netbsd:ufs_lookup+0x884
0xbf6e3c5c: netbsd:VOP_LOOKUP+0x48
0xbf6e3cac: netbsd:lookup_once+0x19c
0xbf6e3d7c: netbsd:namei_tryemulroot+0x528
0xbf6e3db4: netbsd:namei+0x34
0xbf6e3ddc: netbsd:fd_nameiat.isra.0+0x64
0xbf6e3e4c: netbsd:do_sys_statat+0x6c
0xbf6e3f04: netbsd:sys___lstat50+0x2c
0xbf6e3f7c: netbsd:syscall+0x84
0xbf6e3fac: netbsd:swi_handler+0xa0
cpu0: End traceback...

Since the byteswapped FS is only one of many, it took a number of these panics and much discussion with others before I realized it was probably happening on the byteswapped file systems.  Code inspection also yielded that in the above, "pgend" was negative.  I added the following local patch to get a little more information:

Index: sys/uvm/uvm_vnode.c
===================================================================
RCS file: /cvsroot/src/sys/uvm/uvm_vnode.c,v
retrieving revision 1.99
diff -u -r1.99 uvm_vnode.c
--- sys/uvm/uvm_vnode.c 30 Jul 2012 23:56:48 -0000      1.99
+++ sys/uvm/uvm_vnode.c 22 Aug 2015 14:58:54 -0000
@@ -346,7 +346,7 @@
         * toss some pages...
         */

-       KASSERT(newsize != VSIZENOTSET);
+       KASSERTMSG((newsize >= 0), "newsize is %"PRIx64, newsize);
        KASSERT(vp->v_size <= vp->v_writesize);
        KASSERT(vp->v_size == vp->v_writesize ||
            newsize == vp->v_writesize || newsize <= vp->v_size);

and started seeing panics like this more frequently:

panic: kernel diagnostic assertion "(newsize >= 0)" failed: file "/home/riz/src/sys/uvm/uvm_vnode.c", line 349 newsize is d400000000000000
Stopped in pid 1987.1 (find) at netbsd:cpu_Debugger+0x4:        bx      r14
db{3}> bt
0xa19d3a5c: netbsd:vpanic+0xc
0xa19d3a74: netbsd:__udivmoddi4
0xa19d3ac4: netbsd:uvm_vnp_setsize+0x19c
0xa19d3b04: netbsd:ffs_loadvnode+0x104
0xa19d3b6c: netbsd:vcache_get+0x2d0
0xa19d3c24: netbsd:ufs_lookup+0x884
0xa19d3c5c: netbsd:VOP_LOOKUP+0x48
0xa19d3cac: netbsd:lookup_once+0x19c
0xa19d3d7c: netbsd:namei_tryemulroot+0x528
0xa19d3db4: netbsd:namei+0x34
0xa19d3ddc: netbsd:fd_nameiat.isra.0+0x64
0xa19d3e4c: netbsd:do_sys_statat+0x6c
0xa19d3f04: netbsd:sys___lstat50+0x2c
0xa19d3f7c: netbsd:syscall+0x84
0xa19d3fac: netbsd:swi_handler+0xa0
db{3}>

"newsize" varies, but clearly looks like it's been the victim of byteswapping (or lack thereof) at some point.
>How-To-Repeat:
Mount a bigendian FFSv2 on a littleendian system and do cvs updates, pkgsrc bulk building and other fs-intensive activities thereon.

>Fix:
None given.

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/50162: Lurking byteswap problem in FFS_EI ?
Date: Sun, 23 Aug 2015 03:50:55 +0000

 On Sat, Aug 22, 2015 at 03:20:00PM +0000, riz@NetBSD.org wrote:
  > "newsize" varies, but clearly looks like it's been the victim of
  > byteswapping (or lack thereof) at some point.

 Something I should have thought of earlier: any indication if the
 on-disk size is also flipped?

 Given that we couldn't find a nonswapping path in the load code, I
 wonder if it's being written out wrong.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Jeff Rizzo <riz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/50162: Lurking byteswap problem in FFS_EI ?
Date: Sat, 22 Aug 2015 22:49:20 -0700

 On 8/22/15 8:55 PM, David Holland wrote:
 > The following reply was made to PR kern/50162; it has been noted by GNATS.
 >
 > From: David Holland <dholland-bugs@netbsd.org>
 >
 >   
 >   Something I should have thought of earlier: any indication if the
 >   on-disk size is also flipped?

 When I was dumping the file system so I could re-newfs it little-endian, 
 it filled up a 20G partition for what was supposed to be 1.7G of data.  
 I'm not certain that indicates problems with the files on disk, but it's 
 certainly possible.
 >   
 >   Given that we couldn't find a nonswapping path in the load code, I
 >   wonder if it's being written out wrong.
 >   
 >

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/50162: Lurking byteswap problem in FFS_EI ?
Date: Sat, 5 Sep 2015 23:37:33 +0000

 On Sun, Aug 23, 2015 at 05:50:01AM +0000, Jeff Rizzo wrote:
  >  >   Something I should have thought of earlier: any indication if the
  >  >   on-disk size is also flipped?
  >  
  >  When I was dumping the file system so I could re-newfs it little-endian, 
  >  it filled up a 20G partition for what was supposed to be 1.7G of data.  
  >  I'm not certain that indicates problems with the files on disk, but it's 
  >  certainly possible.

 That strongly suggests that *something* is pretty badly wrong.

 -- 
 David A. Holland
 dholland@netbsd.org
Home
PR Database Search
(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.