NetBSD Problem Report #46472
From www@NetBSD.org Mon May 21 13:57:15 2012
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
by www.NetBSD.org (Postfix) with ESMTP id 1ADB163BA27
for <gnats-bugs@gnats.NetBSD.org>; Mon, 21 May 2012 13:57:15 +0000 (UTC)
Message-Id: <20120521135714.1482463B86B@www.NetBSD.org>
Date: Mon, 21 May 2012 13:57:14 +0000 (UTC)
From: jdbaker@mylinuxisp.com
Reply-To: jdbaker@mylinuxisp.com
To: gnats-bugs@NetBSD.org
Subject: 5.1_STABLE/i386 panic after recent pull-ups
X-Send-Pr-Version: www-1.0
>Number: 46472
>Category: kern
>Synopsis: 5.1_STABLE/i386 panic after recent pull-ups
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon May 21 14:00:00 +0000 2012
>Closed-Date: Tue May 22 21:49:54 +0000 2012
>Last-Modified: Tue May 22 21:49:54 +0000 2012
>Originator: John D. Baker
>Release: NetBSD 5.1_STABLE/i386
>Organization:
>Environment:
NetBSD slate.technoskunk.fur 5.1_STABLE NetBSD 5.1_STABLE (SLATE) #38: Sun
May 20 13:15:13 CDT 2012
sysop@slate.technoskunk.fur:/d0/build/netbsd-5/obj/i386/sys/arch/i386/compil
e/SLATE i386
NetBSD slate.technoskunk.fur 5.1_STABLE NetBSD 5.1_STABLE (GENERIC) #1: Sun May 20 16:16:30 CDT 2012 sysop@verthandi.technoskunk.fur:/d0/build/netbsd-5/obj/i386/sys/arch/i386/compile/GENERIC i386
>Description:
Following the recent pull-ups to the netbsd-5 branch, I updated my tree
and rebuilt. Since then, I've experienced panics logged on reboot as:
May 20 16:43:05 slate savecore: reboot after panic: panic: rename: EXDEV
May 20 16:43:05 slate savecore: writing compressed core to
/var/crash/netbsd.2.c
ore.gz
May 20 16:43:23 slate savecore: writing compressed kernel to
/var/crash/netbsd.2
.gz
May 20 16:43:23 slate savecore: (null): Bad address
The resulting compressed kernel file is only 10 bytes. Attempting to
run 'gdb' on the core file with "/netbsd" as the kernel causes 'gdb' to
declare the core file to be an unrecognized format, not core file.
Thinkpad T42 1.7GHz, 2GB.
I'll set ddb.onpanic=1 to see if I can get more data. As it was, the
machine just froze for a few seconds and then rebooted.
=> Applying pkgsrc patches for openmotif-2.3.3nb1
panic: rename: EXDEV
fatal breakpoint trap in supervisor mode
trap type 1 code eip c053836c cs 8 eflags 246 cr2 ccdab000 ilevel 0
Stopped in pid 3922.1 (patch) at netbsd:breakpoint+0x4: popl %ebp
db{0}> bt
breakpoint(c0775cab,ce971a88,c07d0d00,c3003000,cf088cd4,1,ce971a7c,c04d299c,
c305ff48,c307e82c) at netbsd:breakpoint+0x4
panic(c07640dd,0,0,2,ce971b44,ce971b74,ce971c48,1,ced9b39c,ce971c34) at
netbsd:panic+0x1b0
ufs_rename(ce971bac,cd91b000,ce971bcc,c046196c,c083c1e0,408418,c06c91a0,cd95
3e68,cf087b80,ce971c90) at netbsd:ufs_rename+0x55f
VOP_RENAME(cd953e68,cf087b80,ce971c90,ced9b39c,0,cd971c48,ce971c0c,c046f0aa,
cd91b000,0) at netbsd:VOP_RENAME+0x7c
do_sys_rename(bb903040,bb91a040,0,0,ce84b7e0,c0541274,ce84b7e0,ce971d00) at
netbsd:do_sys_rename+0x59e
sys_rename(ce84b7e0,ce971d00,ce971d28,bb936000,cdaf384c,80,bb903040,bb91a040
,0) at netbsd:sys_rename+0x26
syscall(ce971d48,b3,ab,1f,1f,80517fc,bb91a041,bfbfe6f8,bb91a040,0) at
netbsd:syscall+0xc4
db{0}>
I should note this is a custom kernel in which I enable options FFS_EI
options APPLE_UFS. Will try again with a GENERIC kernel.
Same result with GENERIC. same backtrace with just some variations in
a couple of the offsets from start of functions.
It seems to be something 'patch' is doing. 'mv' works as expected, but
if I make two files "foo" and "bar", save the output of
'diff -u bar foo >foo.diff' and then run 'patch < foo.diff', the machine
panics. Upon reboot, the "foo.diff" file is corrupted with what appears
to be text/data segment of library(?).
>How-To-Repeat:
Unknown. I have three machines running 5.1 stable. Two have been updated
with the latest pullups. One panics on rename via 'patch' as described
above. The other displays no problems. Both are self-hosted. I brought
in the kernel and release sets from the unaffected machine to install on
the problem machine in case it was an issue with my build environment.
The result is the same.
I blew away the build directories on the problem machine to see about
rebuilding from top to bottom, but it froze/panicked when the first
"./configure" script got around to running its first "config.status"
script. No doubt it employed "patch" to do some of its work. I still
have "ddb.onpanic=1" so have no core-dump from the most recent events.
I am working on updating the third machine to see if any other machine
is affected.
>Fix:
>Release-Note:
>Audit-Trail:
From: Julian Coleman <jdc@coris.org.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/46472: 5.1_STABLE/i386 panic after recent pull-ups
Date: Mon, 21 May 2012 16:31:04 +0100
Hi,
> I'll set ddb.onpanic=1 to see if I can get more data. As it was, the
> machine just froze for a few seconds and then rebooted.
> ufs_rename(ce971bac,cd91b000,ce971bcc,c046196c,c083c1e0,408418,c06c91a0,cd95
> 3e68,cf087b80,ce971c90) at netbsd:ufs_rename+0x55f
> VOP_RENAME(cd953e68,cf087b80,ce971c90,ced9b39c,0,cd971c48,ce971c0c,c046f0aa,
> cd91b000,0) at netbsd:VOP_RENAME+0x7c
> do_sys_rename(bb903040,bb91a040,0,0,ce84b7e0,c0541274,ce84b7e0,ce971d00) at
> netbsd:do_sys_rename+0x59e
> sys_rename(ce84b7e0,ce971d00,ce971d28,bb936000,cdaf384c,80,bb903040,bb91a040
> ,0) at netbsd:sys_rename+0x26
It would be very helpful if you could add:
makeoptions DEBUG="-g"
to your kernel configuration file, rebuild, crash, and then feed the resulting
values into gdb to retrieve the line numbers. For example:
gdb netbsd.gdb
(gdb) list *(ufs_rename+0x55f)
(gdb) list *(VOP_RENAME+0x7c)
for the crash above.
Thanks,
J
--
My other computer also runs NetBSD / Sailing at Newbiggin
http://www.netbsd.org/ / http://www.newbigginsailingclub.org/
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: jdbaker@mylinuxisp.com
Subject: Re: kern/46472: 5.1_STABLE/i386 panic after recent pull-ups
Date: Mon, 21 May 2012 18:20:59 +0200
--Apple-Mail-4-680290231
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
John,
could you add this diff to src/sys/ufs/ufs/ufs_vnops.c and post the result.
Before the panic two or three vnodes should be printed.
The output from `mount -v' is interesting too.
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
--Apple-Mail-4-680290231
Content-Disposition: attachment;
filename=ufs_vnops.diff
Content-Type: application/octet-stream;
name="ufs_vnops.diff"
Content-Transfer-Encoding: 7bit
Index: ufs_vnops.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_vnops.c,v
retrieving revision 1.169.4.2
diff -p -u -2 -r1.169.4.2 ufs_vnops.c
--- ufs_vnops.c 19 May 2012 17:28:29 -0000 1.169.4.2
+++ ufs_vnops.c 21 May 2012 16:17:27 -0000
@@ -1672,6 +1672,10 @@ ufs_rename(void *v)
*/
if (txp == NULL) {
- if (tdp->i_dev != ip->i_dev)
+ if (tdp->i_dev != ip->i_dev) {
+ printf("tdp->i_dev != ip->i_dev\n");
+ VOP_PRINT(tdp);
+ VOP_PRINT(ip);
panic("rename: EXDEV");
+ }
/*
* Account for ".." in new directory.
@@ -1721,6 +1725,11 @@ ufs_rename(void *v)
VN_KNOTE(tdvp, NOTE_WRITE);
} else {
- if (txp->i_dev != tdp->i_dev || txp->i_dev != ip->i_dev)
+ if (txp->i_dev != tdp->i_dev || txp->i_dev != ip->i_dev) {
+ printf("txp->i_dev != tdp->i_dev || txp->i_dev != ip->i_dev\n");
+ VOP_PRINT(txp);
+ VOP_PRINT(tdp);
+ VOP_PRINT(ip);
panic("rename: EXDEV");
+ }
/*
* Short circuit rename(foo, foo).
--Apple-Mail-4-680290231--
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: jdbaker@mylinuxisp.com,
Brian Buhrow <buhrow@lothlorien.nfbcal.org>
Subject: Re: kern/46472: 5.1_STABLE/i386 panic after recent pull-ups
Date: Mon, 21 May 2012 18:41:13 +0200
--Apple-Mail-6-681504006
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
... and this diff could fix it -- Brian?
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
--Apple-Mail-6-681504006
Content-Disposition: attachment;
filename=ufs_vnops.diff
Content-Type: application/octet-stream;
name="ufs_vnops.diff"
Content-Transfer-Encoding: 7bit
Index: ufs_vnops.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_vnops.c,v
retrieving revision 1.169.4.2
diff -p -u -2 -r1.169.4.2 ufs_vnops.c
--- ufs_vnops.c 19 May 2012 17:28:29 -0000 1.169.4.2
+++ ufs_vnops.c 21 May 2012 16:39:47 -0000
@@ -1388,4 +1388,5 @@ ufs_rename(void *v)
if (fdvp->v_mount != tdvp->v_mount) {
error = EXDEV;
+ goto abort;
}
--Apple-Mail-6-681504006--
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/46472: 5.1_STABLE/i386 panic after recent pull-ups
Date: Mon, 21 May 2012 17:10:44 +0000
On Mon, May 21, 2012 at 04:45:03PM +0000, J. Hannken-Illjes wrote:
> @@ -1388,4 +1388,5 @@ ufs_rename(void *v)
> if (fdvp->v_mount != tdvp->v_mount) {
> error = EXDEV;
> + goto abort;
> }
That should do it :-/
The goto is in the version from -current, so it must have been a merge
glitch of some kind.
(Also, -current has an EXDEV check in do_sys_rename that's not in
netbsd-5. But I guess this isn't relevant.)
--
David A. Holland
dholland@netbsd.org
From: buhrow@lothlorien.nfbcal.org (Brian Buhrow)
To: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>, gnats-bugs@NetBSD.org
Cc: jdbaker@mylinuxisp.com, buhrow@lothlorien.nfbcal.org
Subject: Re: kern/46472: 5.1_STABLE/i386 panic after recent pull-ups
Date: Mon, 21 May 2012 13:20:36 -0700
hello. I've been able to reproduce the panic on my test machine and
can confirm that Hannken's patch fixes the problem. Here's an explanation
of what's going on.
When patch patches a file, it creates a temporary file in /tmp and
tries to link it with the file being patched. (This is after it renames
the original file to original.orig.) If /tmp is on a different device from
the file being patched, then rename will fail with EXDEV and patch will
copy the temporary file to the original file name and all will work as
expected. The missing goto caused rename to never return EXDEV, but rather
to fall through the logic until the error was noticed at a critical moment
and the panic ensued. My guess is that the difference between the machine
that worked and the one that didn't was that /tmp was on the same device as
the source tree for the machine that worked and /tmp was on a different
device from the source tree on the machine that didn't.
David is right that this was undoubtedly a merging error when I
backported the original changes from -current. I apologize for the error
and am glad the fix is so simple.
John, can you confirm that this patch fixes things for you?
If it does, we'll get the patch committed.
-thanks
-Brian
[For the curious, here is a partial trace of patch doing its work.]
I created:
foo
bar
Rand:
diff -u foo bar > foo.diff
Then:
patch < foo.diff
(My /tmp is on a different device than the test filesystem I used for this
test.)
... begin trace ...
517 1 patch CALL rename(0xbb915098,0xbfbfe6d4)
517 1 patch NAMI "foo"
517 1 patch NAMI "foo.orig"
517 1 patch RET rename 0
517 1 patch CALL rename(0xbb903040,0xbb915098)
517 1 patch NAMI "/tmp/patcho00000517aa"
517 1 patch NAMI "foo"
517 1 patch RET rename -1 errno 18 Cross-device link
517 1 patch CALL open(0xbb915098,0x601,0x1b6)
517 1 patch NAMI "foo"
517 1 patch RET open 4
517 1 patch CALL open(0xbb903040,0,0)
517 1 patch NAMI "/tmp/patcho00000517aa"
517 1 patch RET open 6
. . . end trace . . .
Before the patch, that second rename would have caused a panic.
-Brian
State-Changed-From-To: open->feedback
State-Changed-By: buhrow@NetBSD.org
State-Changed-When: Mon, 21 May 2012 20:36:51 +0000
State-Changed-Why:
Hannken suggested a patch which fixes the problem. We're
waiting on the original submitter to confirm that it fixes the issue.
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
Cc: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>, gnats-bugs@NetBSD.org
Subject: Re: kern/46472: 5.1_STABLE/i386 panic after recent pull-ups
Date: Tue, 22 May 2012 00:38:09 -0500 (CDT)
On Mon, 21 May 2012, Brian Buhrow wrote:
> My guess is that the difference between the machine
> that worked and the one that didn't was that /tmp was on the same device as
> the source tree for the machine that worked and /tmp was on a different
> device from the source tree on the machine that didn't.
Actually, that was a false conclusion on my part. On the failing machine,
I did my test in my pkgsrc build area which is on a different filesystem
from /tmp. On the "working" machine, I did the test IN /tmp, so it
succeeded. Had I done the test elsewhere, that machine too would likely
have crashed.
> John, can you confirm that this patch fixes things for you?
> If it does, we'll get the patch committed.
I've applied the patch and am rebuilding now (on a completely different
machine running 6.0beta since the others are at risk).
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, buhrow@NetBSD.org
Subject: Re: kern/46472 (5.1_STABLE/i386 panic after recent pull-ups)
Date: Tue, 22 May 2012 09:36:41 -0500 (CDT)
On Mon, 21 May 2012, buhrow@NetBSD.org wrote:
> State-Changed-From-To: open->feedback
> State-Changed-By: buhrow@NetBSD.org
> State-Changed-When: Mon, 21 May 2012 20:36:51 +0000
> State-Changed-Why:
> Hannken suggested a patch which fixes the problem. We're
> waiting on the original submitter to confirm that it fixes the issue.
Yes, this appears to fix the problem.
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
State-Changed-From-To: feedback->pending-pullups
State-Changed-By: buhrow@NetBSD.org
State-Changed-When: Tue, 22 May 2012 21:42:51 +0000
State-Changed-Why:
Awaiting final pullups to the NetBSD-5 branch.
From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/46472 CVS commit: [netbsd-5] src/sys/ufs/ufs
Date: Tue, 22 May 2012 21:44:38 +0000
Module Name: src
Committed By: riz
Date: Tue May 22 21:44:38 UTC 2012
Modified Files:
src/sys/ufs/ufs [netbsd-5]: ufs_vnops.c
Log Message:
Pull up following patch (requested by buhrow in ticket #1763):
sys/ufs/ufs/ufs_vnops.c: patch
Make sure we return EXDEV on cross-device links rather than panicing
the system. This corrects a pasting error from the merged patches
in ticket 1759.
Thanks to hannken@ for figuring out the error.
Fixes pr kern/46472
Tested by buhrow@
To generate a diff of this commit:
cvs rdiff -u -r1.169.4.2 -r1.169.4.3 src/sys/ufs/ufs/ufs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: buhrow@NetBSD.org
State-Changed-When: Tue, 22 May 2012 21:49:54 +0000
State-Changed-Why:
Patches pulled up to NetBSD-5; panic resolved.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.