NetBSD Problem Report #47480
From wiz@yt.nih.at Mon Jan 21 12:55:44 2013
Return-Path: <wiz@yt.nih.at>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
by www.NetBSD.org (Postfix) with ESMTP id 1BE0963E6F1
for <gnats-bugs@gnats.NetBSD.org>; Mon, 21 Jan 2013 12:55:44 +0000 (UTC)
Message-Id: <20130121122520.0E21139FE73@yt.nih.at>
Date: Mon, 21 Jan 2013 13:25:20 +0100 (CET)
From: Thomas Klausner <wiz@NetBSD.org>
Reply-To: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Subject: tmpfs panic: panic: kernel diagnostic assertion "cookie != TMPFS_DIRCOOKIE_DOT" failed
X-Send-Pr-Version: 3.95
>Number: 47480
>Category: kern
>Synopsis: tmpfs panic: panic: kernel diagnostic assertion "cookie != TMPFS_DIRCOOKIE_DOT" failed
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: rmind
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jan 21 13:00:00 +0000 2013
>Closed-Date: Fri Nov 08 15:56:19 +0000 2013
>Last-Modified: Fri Nov 08 15:56:19 +0000 2013
>Originator: Thomas Klausner
>Release: NetBSD 6.99.16
>Organization:
>Environment:
System: NetBSD yt.nih.at 6.99.16 NetBSD 6.99.16 (YT) #37: Mon Jan 21 01:22:59 CET 2013 wiz@yt.nih.at:/usr/src/sys/arch/amd64/compile/obj/YT amd64
Architecture: x86_64
Machine: amd64
>Description:
This night my amd64/6.99.16 paniced during a bulk build on a tmpfs.
My setup is
tmpfs on /home/wiz/sandbox type tmpfs (local)
/bin on /home/wiz/sandbox/bin type null (read-only, local)
/sbin on /home/wiz/sandbox/sbin type null (read-only, local)
/lib on /home/wiz/sandbox/lib type null (read-only, local)
/libexec on /home/wiz/sandbox/libexec type null (read-only, local)
/usr/X11R7 on /home/wiz/sandbox/usr/X11R7 type null (read-only, local)
/usr/bin on /home/wiz/sandbox/usr/bin type null (read-only, local)
/usr/games on /home/wiz/sandbox/usr/games type null (read-only, local)
/usr/include on /home/wiz/sandbox/usr/include type null (read-only, local)
/usr/lib on /home/wiz/sandbox/usr/lib type null (read-only, local)
/usr/libdata on /home/wiz/sandbox/usr/libdata type null (read-only, local)
/usr/libexec on /home/wiz/sandbox/usr/libexec type null (read-only, local)
/usr/share on /home/wiz/sandbox/usr/share type null (read-only, local)
/usr/sbin on /home/wiz/sandbox/usr/sbin type null (read-only, local)
/var/mail on /home/wiz/sandbox/var/mail type null (read-only, local)
/usr/src on /home/wiz/sandbox/usr/src type null (read-only, local)
/usr/pkgsrc on /home/wiz/sandbox/usr/pkgsrc type null (local)
/usr/xsrc on /home/wiz/sandbox/usr/xsrc type null (read-only, local)
/packages/6.99.16 on /home/wiz/sandbox/packages type null (local)
/distfiles on /home/wiz/sandbox/distfiles type null (local)
I was sleeping. When I woke up I found the machine had rebooted and
/var/log/messages contained:
Jan 21 01:58:50 yt savecore: reboot after panic: panic: kernel diagnostic assertion "cookie != TMPFS_DIRCOOKIE_DOT" failed: filAeR N"I/aNrGc:h iSvPeL/ cNvOsT/ sLrOcW/EsRyEsD/ OsN/t mSpYfs/CtAmLpLf s.h"6,7 l4i nEeX I9T3 4 6
This actually looks like two interspersed strings on might be:
file "/archive/cvs/src/sys/[f?]s/tmpfs/tmpfs.h [4936?]
Line 4930 of that file here is
KASSERT(cookie != TMPFS_DIRCOOKIE_DOT);
The other might be:
[W?]ARNING: SPL NOT LOWERED ON SY[S?]CALL EXIT
/var/crash contains:
-rw------- 1 root wheel 193984319 Jan 21 01:59 netbsd.7.core.gz
-rw------- 1 root wheel 373764 Jan 21 01:59 netbsd.7.gz
which don't look nearly big enough to be useable, but if somewants has specific ideas
what to look for, send me gdb commands.
For the record, the kernel was
-rwxr-xr-x 1 root wheel 9923436 Jan 21 01:23 /netbsd.6.99.16h.otus-cvs.32tx.debug4reallythistime (don't ask)
>How-To-Repeat:
Do bulk builds on tmpfs.
Wait.
Have bad luck.
>Fix:
Yes please.
>Release-Note:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/47480: tmpfs panic: panic: kernel diagnostic assertion
"cookie != TMPFS_DIRCOOKIE_DOT" failed
Date: Wed, 23 Jan 2013 09:40:54 +0000
On Mon, Jan 21, 2013 at 01:00:01PM +0000, Thomas Klausner wrote:
> This actually looks like two interspersed strings on might be:
>
> file "/archive/cvs/src/sys/[f?]s/tmpfs/tmpfs.h [4936?]
> Line 4930 of that file here is
> KASSERT(cookie != TMPFS_DIRCOOKIE_DOT);
tmpfs.h doesn't have anywhere near that many lines. It's on line 93,
though.
> The other might be:
> [W?]ARNING: SPL NOT LOWERED ON SY[S?]CALL EXIT
As I was saying elsewhere, it's rather interesting that these should
come up together.
Does leaving the kernel while holding a spinlock cause this warning or
some other one? (In other words, are we looking specifically for a spl
leak, which there probably aren't too many opportunities for, or is it
just a dangling lock?)
I was going to suggest that somewhere there's a broken error path that
both leaks spl (or a spinlock) and also corrupts something, but it
doesn't look that simple. There are very few places where this code is
called and most of them are using a pointer popped directly off a
tailq that is checked for NULL, so it's very unlikely that it's that
unless you have major memory corruption.
On further review I think this is the same as kern/41068, which I
believed at the time to be caused by getting a 64-bit tmpfs_dirent
pointer whose lower 32 bits are all 0.
(I still don't understand why the simple rework I suggested in
kern/41068 won't do.)
--
David A. Holland
dholland@netbsd.org
From: Thomas Klausner <wiz@NetBSD.org>
To: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: kern/47480: tmpfs panic: panic: kernel diagnostic assertion
"cookie != TMPFS_DIRCOOKIE_DOT" failed
Date: Sun, 3 Mar 2013 13:18:25 +0100
So I had another tmpfs panic today.
Let me know if it belongs in a different PR.
The panic message was:
Mar 3 13:15:43 yt savecore: reboot after panic: kernel diagnostic assertion "(node)->tn_spec.tn_dir.tn_readdir_lastp == NULL || tmpfs_dircookie((node)->tn_spec.tn_dir.tn_readdir_lastp) == (node)->tn_spec.tn_dir.tn_readdir_lastn" failed: file "/archive/cvs/src/sys/fs/tmpfs/tmpfs.h", line 357
Thomas
From: "Mindaugas Rasiukevicius" <rmind@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/47480 CVS commit: src/sys/fs/tmpfs
Date: Fri, 8 Nov 2013 15:44:23 +0000
Module Name: src
Committed By: rmind
Date: Fri Nov 8 15:44:23 UTC 2013
Modified Files:
src/sys/fs/tmpfs: tmpfs.h tmpfs_rename.c tmpfs_subr.c tmpfs_vfsops.c
tmpfs_vnops.c
Log Message:
tmpfs: replace the broken tmpfs_dircookie() logic which uses the node
address truncated to 31 bits (required for 32-bit readdir compatibility,
e.g. linux32). Instead, assign 2^31 range using the following logic:
- The first half of the 2^31 is assigned incrementally (the fast path).
- When exceeded, use the second half of 2^31, but manage with vmem(9).
It will require 2 billion files per-directory to trigger vmem(9) usage.
Also, while here, add some fixes for tmpfs_unmount().
Should fix PR/47739, PR/47480, PR/46088 and PR/41068.
Thanks to wiz@ for stress testing.
To generate a diff of this commit:
cvs rdiff -u -r1.45 -r1.46 src/sys/fs/tmpfs/tmpfs.h
cvs rdiff -u -r1.4 -r1.5 src/sys/fs/tmpfs/tmpfs_rename.c
cvs rdiff -u -r1.82 -r1.83 src/sys/fs/tmpfs/tmpfs_subr.c
cvs rdiff -u -r1.52 -r1.53 src/sys/fs/tmpfs/tmpfs_vfsops.c
cvs rdiff -u -r1.105 -r1.106 src/sys/fs/tmpfs/tmpfs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: kern-bug-people->rmind
Responsible-Changed-By: rmind@NetBSD.org
Responsible-Changed-When: Fri, 08 Nov 2013 15:56:19 +0000
Responsible-Changed-Why:
State-Changed-From-To: open->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Fri, 08 Nov 2013 15:56:19 +0000
State-Changed-Why:
Should be fixed in -current. Please let us know if you will ever see a
similar problem again.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.