NetBSD Problem Report #46472

From www@NetBSD.org  Mon May 21 13:57:15 2012
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 1ADB163BA27
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 21 May 2012 13:57:15 +0000 (UTC)
Message-Id: <20120521135714.1482463B86B@www.NetBSD.org>
Date: Mon, 21 May 2012 13:57:14 +0000 (UTC)
From: jdbaker@mylinuxisp.com
Reply-To: jdbaker@mylinuxisp.com
To: gnats-bugs@NetBSD.org
Subject: 5.1_STABLE/i386 panic after recent pull-ups
X-Send-Pr-Version: www-1.0

>Number:         46472
>Category:       kern
>Synopsis:       5.1_STABLE/i386 panic after recent pull-ups
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 21 14:00:00 +0000 2012
>Closed-Date:    Tue May 22 21:49:54 +0000 2012
>Last-Modified:  Tue May 22 21:49:54 +0000 2012
>Originator:     John D. Baker
>Release:        NetBSD 5.1_STABLE/i386
>Organization:
>Environment:
NetBSD slate.technoskunk.fur 5.1_STABLE NetBSD 5.1_STABLE (SLATE) #38: Sun
May 20 13:15:13 CDT 2012
sysop@slate.technoskunk.fur:/d0/build/netbsd-5/obj/i386/sys/arch/i386/compil
e/SLATE i386

NetBSD slate.technoskunk.fur 5.1_STABLE NetBSD 5.1_STABLE (GENERIC) #1: Sun May 20 16:16:30 CDT 2012  sysop@verthandi.technoskunk.fur:/d0/build/netbsd-5/obj/i386/sys/arch/i386/compile/GENERIC i386

>Description:
Following the recent pull-ups to the netbsd-5 branch, I updated my tree
and rebuilt.  Since then, I've experienced panics logged on reboot as:

May 20 16:43:05 slate savecore: reboot after panic: panic: rename: EXDEV
May 20 16:43:05 slate savecore: writing compressed core to
/var/crash/netbsd.2.c
ore.gz
May 20 16:43:23 slate savecore: writing compressed kernel to
/var/crash/netbsd.2
.gz
May 20 16:43:23 slate savecore: (null): Bad address

The resulting compressed kernel file is only 10 bytes.  Attempting to
run 'gdb' on the core file with "/netbsd" as the kernel causes 'gdb' to
declare the core file to be an unrecognized format, not core file.

Thinkpad T42 1.7GHz, 2GB.

I'll set ddb.onpanic=1 to see if I can get more data.  As it was, the
machine just froze for a few seconds and then rebooted.

=> Applying pkgsrc patches for openmotif-2.3.3nb1
panic: rename: EXDEV
fatal breakpoint trap in supervisor mode
trap type 1 code eip c053836c cs 8 eflags 246 cr2 ccdab000 ilevel 0
Stopped in pid 3922.1 (patch) at      netbsd:breakpoint+0x4:  popl    %ebp
db{0}> bt
breakpoint(c0775cab,ce971a88,c07d0d00,c3003000,cf088cd4,1,ce971a7c,c04d299c,
c305ff48,c307e82c) at netbsd:breakpoint+0x4
panic(c07640dd,0,0,2,ce971b44,ce971b74,ce971c48,1,ced9b39c,ce971c34) at
netbsd:panic+0x1b0
ufs_rename(ce971bac,cd91b000,ce971bcc,c046196c,c083c1e0,408418,c06c91a0,cd95
3e68,cf087b80,ce971c90) at netbsd:ufs_rename+0x55f
VOP_RENAME(cd953e68,cf087b80,ce971c90,ced9b39c,0,cd971c48,ce971c0c,c046f0aa,
cd91b000,0) at netbsd:VOP_RENAME+0x7c
do_sys_rename(bb903040,bb91a040,0,0,ce84b7e0,c0541274,ce84b7e0,ce971d00) at
netbsd:do_sys_rename+0x59e
sys_rename(ce84b7e0,ce971d00,ce971d28,bb936000,cdaf384c,80,bb903040,bb91a040
,0) at netbsd:sys_rename+0x26
syscall(ce971d48,b3,ab,1f,1f,80517fc,bb91a041,bfbfe6f8,bb91a040,0) at
netbsd:syscall+0xc4
db{0}>


I should note this is a custom kernel in which I enable options FFS_EI
options APPLE_UFS.  Will try again with a GENERIC kernel.

Same result with GENERIC.  same backtrace with just some variations in
a couple of the offsets from start of functions.

It seems to be something 'patch' is doing.  'mv' works as expected, but
if I make two files "foo" and "bar", save the output of
'diff -u bar foo >foo.diff' and then run 'patch < foo.diff', the machine
panics.  Upon reboot, the "foo.diff" file is corrupted with what appears
to be text/data segment of library(?).



>How-To-Repeat:
Unknown.  I have three machines running 5.1 stable.  Two have been updated
with the latest pullups.  One panics on rename via 'patch' as described
above.  The other displays no problems.  Both are self-hosted.  I brought
in the kernel and release sets from the unaffected machine to install on
the problem machine in case it was an issue with my build environment.
The result is the same.

I blew away the build directories on the problem machine to see about
rebuilding from top to bottom, but it froze/panicked when the first
"./configure" script got around to running its first "config.status"
script.  No doubt it employed "patch" to do some of its work.  I still
have "ddb.onpanic=1" so have no core-dump from the most recent events.

I am working on updating the third machine to see if any other machine
is affected.
>Fix:

>Release-Note:

>Audit-Trail:
From: Julian Coleman <jdc@coris.org.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46472: 5.1_STABLE/i386 panic after recent pull-ups
Date: Mon, 21 May 2012 16:31:04 +0100

 Hi,

 > I'll set ddb.onpanic=1 to see if I can get more data.  As it was, the
 > machine just froze for a few seconds and then rebooted.

 > ufs_rename(ce971bac,cd91b000,ce971bcc,c046196c,c083c1e0,408418,c06c91a0,cd95
 > 3e68,cf087b80,ce971c90) at netbsd:ufs_rename+0x55f
 > VOP_RENAME(cd953e68,cf087b80,ce971c90,ced9b39c,0,cd971c48,ce971c0c,c046f0aa,
 > cd91b000,0) at netbsd:VOP_RENAME+0x7c
 > do_sys_rename(bb903040,bb91a040,0,0,ce84b7e0,c0541274,ce84b7e0,ce971d00) at
 > netbsd:do_sys_rename+0x59e
 > sys_rename(ce84b7e0,ce971d00,ce971d28,bb936000,cdaf384c,80,bb903040,bb91a040
 > ,0) at netbsd:sys_rename+0x26

 It would be very helpful if you could add:

   makeoptions	DEBUG="-g"

 to your kernel configuration file, rebuild, crash, and then feed the resulting
 values into gdb to retrieve the line numbers.  For example:

   gdb netbsd.gdb
   (gdb) list *(ufs_rename+0x55f)
   (gdb) list *(VOP_RENAME+0x7c)

 for the crash above.

 Thanks,

 J

 -- 
   My other computer also runs NetBSD    /        Sailing at Newbiggin
         http://www.netbsd.org/        /   http://www.newbigginsailingclub.org/

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: jdbaker@mylinuxisp.com
Subject: Re: kern/46472: 5.1_STABLE/i386 panic after recent pull-ups
Date: Mon, 21 May 2012 18:20:59 +0200

 --Apple-Mail-4-680290231
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 John,

 could you add this diff to src/sys/ufs/ufs/ufs_vnops.c and post the result.
 Before the panic two or three vnodes should be printed.

 The output from `mount -v' is interesting too.

 --
 Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)


 --Apple-Mail-4-680290231
 Content-Disposition: attachment;
 	filename=ufs_vnops.diff
 Content-Type: application/octet-stream;
 	name="ufs_vnops.diff"
 Content-Transfer-Encoding: 7bit

 Index: ufs_vnops.c
 ===================================================================
 RCS file: /cvsroot/src/sys/ufs/ufs/ufs_vnops.c,v
 retrieving revision 1.169.4.2
 diff -p -u -2 -r1.169.4.2 ufs_vnops.c
 --- ufs_vnops.c	19 May 2012 17:28:29 -0000	1.169.4.2
 +++ ufs_vnops.c	21 May 2012 16:17:27 -0000
 @@ -1672,6 +1672,10 @@ ufs_rename(void *v)
  	 */
   	if (txp == NULL) {
 - 		if (tdp->i_dev != ip->i_dev)
 + 		if (tdp->i_dev != ip->i_dev) {
 +			printf("tdp->i_dev != ip->i_dev\n");
 +			VOP_PRINT(tdp);
 +			VOP_PRINT(ip);
  			panic("rename: EXDEV");
 +		}
  		/*
  		 * Account for ".." in new directory.
 @@ -1721,6 +1725,11 @@ ufs_rename(void *v)
  		VN_KNOTE(tdvp, NOTE_WRITE);
  	} else {
 - 		if (txp->i_dev != tdp->i_dev || txp->i_dev != ip->i_dev)
 + 		if (txp->i_dev != tdp->i_dev || txp->i_dev != ip->i_dev) {
 +			printf("txp->i_dev != tdp->i_dev || txp->i_dev != ip->i_dev\n");
 +			VOP_PRINT(txp);
 +			VOP_PRINT(tdp);
 +			VOP_PRINT(ip);
  			panic("rename: EXDEV");
 +		}
  		/*
  		 * Short circuit rename(foo, foo).

 --Apple-Mail-4-680290231--

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: jdbaker@mylinuxisp.com,
 Brian Buhrow <buhrow@lothlorien.nfbcal.org>
Subject: Re: kern/46472: 5.1_STABLE/i386 panic after recent pull-ups
Date: Mon, 21 May 2012 18:41:13 +0200

 --Apple-Mail-6-681504006
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 ... and this diff could fix it -- Brian?

 --
 Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)


 --Apple-Mail-6-681504006
 Content-Disposition: attachment;
 	filename=ufs_vnops.diff
 Content-Type: application/octet-stream;
 	name="ufs_vnops.diff"
 Content-Transfer-Encoding: 7bit

 Index: ufs_vnops.c
 ===================================================================
 RCS file: /cvsroot/src/sys/ufs/ufs/ufs_vnops.c,v
 retrieving revision 1.169.4.2
 diff -p -u -2 -r1.169.4.2 ufs_vnops.c
 --- ufs_vnops.c	19 May 2012 17:28:29 -0000	1.169.4.2
 +++ ufs_vnops.c	21 May 2012 16:39:47 -0000
 @@ -1388,4 +1388,5 @@ ufs_rename(void *v)
   	if (fdvp->v_mount != tdvp->v_mount) {
  		error = EXDEV;
 +		goto abort;
  	}


 --Apple-Mail-6-681504006--

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46472: 5.1_STABLE/i386 panic after recent pull-ups
Date: Mon, 21 May 2012 17:10:44 +0000

 On Mon, May 21, 2012 at 04:45:03PM +0000, J. Hannken-Illjes wrote:
  >  @@ -1388,4 +1388,5 @@ ufs_rename(void *v)
  >    	if (fdvp->v_mount != tdvp->v_mount) {
  >   		error = EXDEV;
  >  +		goto abort;
  >   	}

 That should do it :-/

 The goto is in the version from -current, so it must have been a merge
 glitch of some kind.

 (Also, -current has an EXDEV check in do_sys_rename that's not in
 netbsd-5. But I guess this isn't relevant.)

 -- 
 David A. Holland
 dholland@netbsd.org

From: buhrow@lothlorien.nfbcal.org (Brian Buhrow)
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>, gnats-bugs@NetBSD.org
Cc: jdbaker@mylinuxisp.com, buhrow@lothlorien.nfbcal.org
Subject: Re: kern/46472: 5.1_STABLE/i386 panic after recent pull-ups
Date: Mon, 21 May 2012 13:20:36 -0700

 	hello.  I've been able to reproduce the panic on my test machine and
 can confirm that Hannken's patch fixes the problem.  Here's an explanation
 of what's going on.
 	When patch patches a file, it creates a temporary file in /tmp and
 tries to link it with the file being patched.  (This is after it renames
 the original file to original.orig.)  If /tmp is on a different device from
 the file being patched, then rename will fail with EXDEV and patch will 
 copy the temporary file to the original file name and all will work as
 expected.  The missing goto caused rename to never return EXDEV, but rather
 to fall through the logic until the error was noticed at a critical moment
 and the panic ensued.  My guess is that the difference between the machine
 that worked and the one that didn't was that /tmp was on the same device as
 the source tree for the machine that worked  and /tmp was on a different
 device from the source tree on the machine that didn't.
 	David is right that this was undoubtedly a merging error when I
 backported the original changes from -current.  I apologize for the error
 and am glad the fix is so simple.
 	John, can you confirm that this patch fixes things for you?
 If it does, we'll get the patch  committed.

 -thanks
 -Brian

 [For the curious, here is a partial trace of patch doing its work.]
 I created:
 foo
 bar
 Rand:
 diff -u foo bar > foo.diff
 Then:
 patch < foo.diff
 (My /tmp is on a different device than the test filesystem I used for this
 test.)

 ... begin trace ...

    517      1 patch    CALL  rename(0xbb915098,0xbfbfe6d4)
    517      1 patch    NAMI  "foo"
    517      1 patch    NAMI  "foo.orig"
    517      1 patch    RET   rename 0
    517      1 patch    CALL  rename(0xbb903040,0xbb915098)
    517      1 patch    NAMI  "/tmp/patcho00000517aa"
    517      1 patch    NAMI  "foo"
    517      1 patch    RET   rename -1 errno 18 Cross-device link
    517      1 patch    CALL  open(0xbb915098,0x601,0x1b6)
    517      1 patch    NAMI  "foo"
    517      1 patch    RET   open 4
    517      1 patch    CALL  open(0xbb903040,0,0)
    517      1 patch    NAMI  "/tmp/patcho00000517aa"
    517      1 patch    RET   open 6


 . . . end trace . . .

 	Before the patch, that second rename would have caused a panic.

 -Brian

State-Changed-From-To: open->feedback
State-Changed-By: buhrow@NetBSD.org
State-Changed-When: Mon, 21 May 2012 20:36:51 +0000
State-Changed-Why:
Hannken suggested a patch which fixes the problem.  We're
waiting on the original submitter to confirm that it fixes the issue.


From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
Cc: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>, gnats-bugs@NetBSD.org
Subject: Re: kern/46472: 5.1_STABLE/i386 panic after recent pull-ups
Date: Tue, 22 May 2012 00:38:09 -0500 (CDT)

 On Mon, 21 May 2012, Brian Buhrow wrote:

 > My guess is that the difference between the machine
 > that worked and the one that didn't was that /tmp was on the same device as
 > the source tree for the machine that worked  and /tmp was on a different
 > device from the source tree on the machine that didn't.

 Actually, that was a false conclusion on my part.  On the failing machine,
 I did my test in my pkgsrc build area which is on a different filesystem
 from /tmp.  On the "working" machine, I did the test IN /tmp, so it
 succeeded.  Had I done the test elsewhere, that machine too would likely
 have crashed.

 > 	John, can you confirm that this patch fixes things for you?
 > If it does, we'll get the patch  committed.

 I've applied the patch and am rebuilding now (on a completely different
 machine running 6.0beta since the others are at risk).

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, buhrow@NetBSD.org
Subject: Re: kern/46472 (5.1_STABLE/i386 panic after recent pull-ups)
Date: Tue, 22 May 2012 09:36:41 -0500 (CDT)

 On Mon, 21 May 2012, buhrow@NetBSD.org wrote:

 > State-Changed-From-To: open->feedback
 > State-Changed-By: buhrow@NetBSD.org
 > State-Changed-When: Mon, 21 May 2012 20:36:51 +0000
 > State-Changed-Why:
 > Hannken suggested a patch which fixes the problem.  We're
 > waiting on the original submitter to confirm that it fixes the issue.

 Yes, this appears to fix the problem.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

State-Changed-From-To: feedback->pending-pullups
State-Changed-By: buhrow@NetBSD.org
State-Changed-When: Tue, 22 May 2012 21:42:51 +0000
State-Changed-Why:
Awaiting final pullups to the NetBSD-5 branch.


From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/46472 CVS commit: [netbsd-5] src/sys/ufs/ufs
Date: Tue, 22 May 2012 21:44:38 +0000

 Module Name:	src
 Committed By:	riz
 Date:		Tue May 22 21:44:38 UTC 2012

 Modified Files:
 	src/sys/ufs/ufs [netbsd-5]: ufs_vnops.c

 Log Message:
 Pull up following patch (requested by buhrow in ticket #1763):
 	sys/ufs/ufs/ufs_vnops.c: patch

 Make sure we return EXDEV on cross-device links rather than panicing
 the system.  This corrects a pasting error from the merged patches
 in ticket 1759.

 Thanks to hannken@ for figuring out the error.
 Fixes pr kern/46472
 Tested by buhrow@


 To generate a diff of this commit:
 cvs rdiff -u -r1.169.4.2 -r1.169.4.3 src/sys/ufs/ufs/ufs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: buhrow@NetBSD.org
State-Changed-When: Tue, 22 May 2012 21:49:54 +0000
State-Changed-Why:
Patches pulled up to NetBSD-5; panic resolved.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.