NetBSD Problem Report #47480

From wiz@yt.nih.at  Mon Jan 21 12:55:44 2013
Return-Path: <wiz@yt.nih.at>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 1BE0963E6F1
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 21 Jan 2013 12:55:44 +0000 (UTC)
Message-Id: <20130121122520.0E21139FE73@yt.nih.at>
Date: Mon, 21 Jan 2013 13:25:20 +0100 (CET)
From: Thomas Klausner <wiz@NetBSD.org>
Reply-To: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Subject: tmpfs panic: panic: kernel diagnostic assertion "cookie != TMPFS_DIRCOOKIE_DOT" failed
X-Send-Pr-Version: 3.95

>Number:         47480
>Category:       kern
>Synopsis:       tmpfs panic: panic: kernel diagnostic assertion "cookie != TMPFS_DIRCOOKIE_DOT" failed
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    rmind
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 21 13:00:00 +0000 2013
>Closed-Date:    Fri Nov 08 15:56:19 +0000 2013
>Last-Modified:  Fri Nov 08 15:56:19 +0000 2013
>Originator:     Thomas Klausner
>Release:        NetBSD 6.99.16
>Organization:
>Environment:


System: NetBSD yt.nih.at 6.99.16 NetBSD 6.99.16 (YT) #37: Mon Jan 21 01:22:59 CET 2013 wiz@yt.nih.at:/usr/src/sys/arch/amd64/compile/obj/YT amd64
Architecture: x86_64
Machine: amd64
>Description:
This night my amd64/6.99.16 paniced during a bulk build on a tmpfs.

My setup is
tmpfs on /home/wiz/sandbox type tmpfs (local)
/bin on /home/wiz/sandbox/bin type null (read-only, local)
/sbin on /home/wiz/sandbox/sbin type null (read-only, local)
/lib on /home/wiz/sandbox/lib type null (read-only, local)
/libexec on /home/wiz/sandbox/libexec type null (read-only, local)
/usr/X11R7 on /home/wiz/sandbox/usr/X11R7 type null (read-only, local)
/usr/bin on /home/wiz/sandbox/usr/bin type null (read-only, local)
/usr/games on /home/wiz/sandbox/usr/games type null (read-only, local)
/usr/include on /home/wiz/sandbox/usr/include type null (read-only, local)
/usr/lib on /home/wiz/sandbox/usr/lib type null (read-only, local)
/usr/libdata on /home/wiz/sandbox/usr/libdata type null (read-only, local)
/usr/libexec on /home/wiz/sandbox/usr/libexec type null (read-only, local)
/usr/share on /home/wiz/sandbox/usr/share type null (read-only, local)
/usr/sbin on /home/wiz/sandbox/usr/sbin type null (read-only, local)
/var/mail on /home/wiz/sandbox/var/mail type null (read-only, local)
/usr/src on /home/wiz/sandbox/usr/src type null (read-only, local)
/usr/pkgsrc on /home/wiz/sandbox/usr/pkgsrc type null (local)
/usr/xsrc on /home/wiz/sandbox/usr/xsrc type null (read-only, local)
/packages/6.99.16 on /home/wiz/sandbox/packages type null (local)
/distfiles on /home/wiz/sandbox/distfiles type null (local)

I was sleeping. When I woke up I found the machine had rebooted and
/var/log/messages contained:
Jan 21 01:58:50 yt savecore: reboot after panic: panic: kernel diagnostic assertion "cookie != TMPFS_DIRCOOKIE_DOT" failed: filAeR N"I/aNrGc:h iSvPeL/ cNvOsT/ sLrOcW/EsRyEsD/ OsN/t mSpYfs/CtAmLpLf s.h"6,7  l4i nEeX I9T3  4  6

This actually looks like two interspersed strings on might be:

file "/archive/cvs/src/sys/[f?]s/tmpfs/tmpfs.h [4936?]
Line 4930 of that file here is
        KASSERT(cookie != TMPFS_DIRCOOKIE_DOT);

The other might be:
[W?]ARNING: SPL NOT LOWERED ON SY[S?]CALL EXIT


/var/crash contains:
-rw-------   1 root  wheel    193984319 Jan 21 01:59 netbsd.7.core.gz
-rw-------   1 root  wheel       373764 Jan 21 01:59 netbsd.7.gz
which don't look nearly big enough to be useable, but if somewants has specific ideas
what to look for, send me gdb commands.
For the record, the kernel was
-rwxr-xr-x  1 root  wheel  9923436 Jan 21 01:23 /netbsd.6.99.16h.otus-cvs.32tx.debug4reallythistime (don't ask)
>How-To-Repeat:
Do bulk builds on tmpfs.
Wait.
Have bad luck.
>Fix:
Yes please.

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47480: tmpfs panic: panic: kernel diagnostic assertion
 "cookie != TMPFS_DIRCOOKIE_DOT" failed
Date: Wed, 23 Jan 2013 09:40:54 +0000

 On Mon, Jan 21, 2013 at 01:00:01PM +0000, Thomas Klausner wrote:
  > This actually looks like two interspersed strings on might be:
  > 
  > file "/archive/cvs/src/sys/[f?]s/tmpfs/tmpfs.h [4936?]
  > Line 4930 of that file here is
  >         KASSERT(cookie != TMPFS_DIRCOOKIE_DOT);

 tmpfs.h doesn't have anywhere near that many lines. It's on line 93,
 though.

  > The other might be:
  > [W?]ARNING: SPL NOT LOWERED ON SY[S?]CALL EXIT

 As I was saying elsewhere, it's rather interesting that these should
 come up together.

 Does leaving the kernel while holding a spinlock cause this warning or
 some other one? (In other words, are we looking specifically for a spl
 leak, which there probably aren't too many opportunities for, or is it
 just a dangling lock?)

 I was going to suggest that somewhere there's a broken error path that
 both leaks spl (or a spinlock) and also corrupts something, but it
 doesn't look that simple. There are very few places where this code is
 called and most of them are using a pointer popped directly off a
 tailq that is checked for NULL, so it's very unlikely that it's that
 unless you have major memory corruption.

 On further review I think this is the same as kern/41068, which I
 believed at the time to be caused by getting a 64-bit tmpfs_dirent
 pointer whose lower 32 bits are all 0.

 (I still don't understand why the simple rework I suggested in
 kern/41068 won't do.)

 -- 
 David A. Holland
 dholland@netbsd.org

From: Thomas Klausner <wiz@NetBSD.org>
To: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/47480: tmpfs panic: panic: kernel diagnostic assertion
 "cookie != TMPFS_DIRCOOKIE_DOT" failed
Date: Sun, 3 Mar 2013 13:18:25 +0100

 So I had another tmpfs panic today.

 Let me know if it belongs in a different PR.

 The panic message was:
 Mar  3 13:15:43 yt savecore: reboot after panic: kernel diagnostic assertion "(node)->tn_spec.tn_dir.tn_readdir_lastp == NULL || tmpfs_dircookie((node)->tn_spec.tn_dir.tn_readdir_lastp) == (node)->tn_spec.tn_dir.tn_readdir_lastn" failed: file "/archive/cvs/src/sys/fs/tmpfs/tmpfs.h", line 357 

  Thomas


From: "Mindaugas Rasiukevicius" <rmind@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/47480 CVS commit: src/sys/fs/tmpfs
Date: Fri, 8 Nov 2013 15:44:23 +0000

 Module Name:	src
 Committed By:	rmind
 Date:		Fri Nov  8 15:44:23 UTC 2013

 Modified Files:
 	src/sys/fs/tmpfs: tmpfs.h tmpfs_rename.c tmpfs_subr.c tmpfs_vfsops.c
 	    tmpfs_vnops.c

 Log Message:
 tmpfs: replace the broken tmpfs_dircookie() logic which uses the node
 address truncated to 31 bits (required for 32-bit readdir compatibility,
 e.g. linux32).  Instead, assign 2^31 range using the following logic:
 - The first half of the 2^31 is assigned incrementally (the fast path).
 - When exceeded, use the second half of 2^31, but manage with vmem(9).

 It will require 2 billion files per-directory to trigger vmem(9) usage.
 Also, while here, add some fixes for tmpfs_unmount().

 Should fix PR/47739, PR/47480, PR/46088 and PR/41068.
 Thanks to wiz@ for stress testing.


 To generate a diff of this commit:
 cvs rdiff -u -r1.45 -r1.46 src/sys/fs/tmpfs/tmpfs.h
 cvs rdiff -u -r1.4 -r1.5 src/sys/fs/tmpfs/tmpfs_rename.c
 cvs rdiff -u -r1.82 -r1.83 src/sys/fs/tmpfs/tmpfs_subr.c
 cvs rdiff -u -r1.52 -r1.53 src/sys/fs/tmpfs/tmpfs_vfsops.c
 cvs rdiff -u -r1.105 -r1.106 src/sys/fs/tmpfs/tmpfs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->rmind
Responsible-Changed-By: rmind@NetBSD.org
Responsible-Changed-When: Fri, 08 Nov 2013 15:56:19 +0000
Responsible-Changed-Why:


State-Changed-From-To: open->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Fri, 08 Nov 2013 15:56:19 +0000
State-Changed-Why:
Should be fixed in -current.  Please let us know if you will ever see a
similar problem again.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.