NetBSD Problem Report #49171

From apb@cequrux.com  Mon Sep  1 19:59:30 2014
Return-Path: <apb@cequrux.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A6C06B6BA1
	for <gnats-bugs@gnats.NetBSD.org>; Mon,  1 Sep 2014 19:59:30 +0000 (UTC)
Message-Id: <20140901195924.19A161D2CCDF@apb-laptoy.apb.alt.za>
Date: Mon,  1 Sep 2014 21:59:23 +0200 (SAST)
From: apb@cequrux.com
To: gnats-bugs@NetBSD.org
Subject: panic when closing a pty
X-Send-Pr-Version: 3.95

>Number:         49171
>Category:       kern
>Synopsis:       panic when closing a pty
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    hannken
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Sep 01 20:00:00 +0000 2014
>Closed-Date:    Sun Oct 19 10:38:12 +0000 2014
>Last-Modified:  Sun Oct 19 10:38:12 +0000 2014
>Originator:     Alan Barrett
>Release:        NetBSD 7.99.1
>Organization:
Not much
>Environment:
NetBSD 7.99.1 i386
>Description:
Sometimes, when I exit from a shell inside a virtual window inside
screen, I get a panic, apparently from ptyfs_reclaim passing a NULL
struct mount * pointer as the first arg to vcache_remove.

This is a new problem since the changes to ptyfs a few weeks ago.

>How-To-Repeat:
Install screen from pkgsrc/misc/screen.

Run screen inside an xterm.

Open several shell windows inside screen.

Use some of the shell windows actively, and let some stay idle for a
while.

Switch to an idle window and press ^D (end of file).  This sill
sometimes exit the shell and close the screen window, as desired, but it
will sometimes crash.

Here's a backtrace:

breakpoint(c0d93734,c104f880,c0cc9c94,dd8dcd9c,0,0,8,dd8dcd90,c0a6a9b4,c0cc9c94)
 at breakpoint+0x4
vpanic(c0cc9c94,dd8dcd9c,dd8dcdd0,c09e6dae,c0cc9c94,c0cc9e45,c0ccb859,c0d9680c,5
91,c597dc38) at vpanic+0x121
kvtopte.part.1(c0cc9c94,c0cc9e45,c0ccb859,c0d9680c,591,c597dc38,cfc039ec,d2bcbe3
0,0,d2bcbe30) at kvtopte.part.1
vcache_remove(0,d2bcbe30,8,c9b9a580,dd8dce08,c0a008e7,dd8dcdfc,0,c0cb8e14,c0cb8d
f0) at vcache_remove+0x13f
ptyfs_reclaim(dd8dcdfc,0,c0cb8e14,c0cb8df0,c9b9a580,1,dd8dce40,c09e3eed,c9b9a580
,dd8dce33) at ptyfs_reclaim+0x2d
VOP_RECLAIM(c9b9a580,dd8dce33,ffffffff,c53e9560,0,0,4,18dce50,c53e9560,c9b9a580)
 at VOP_RECLAIM+0x4a
vclean(ce0e11c0,cddb62d0,dd8dce80,c09e612e,c9b9a580,509,0,dd8dce70,c104bc0c,4) a
t vclean+0xdd
vgone(c9b9a580,509,0,dd8dce70,c104bc0c,4,c9b9a580,cddb62d0,c53e9560,c8219040) at
 vgone+0x3c
vrevoke(cddb62d0,dd8dceb0,c0a001e7,dd8dcea0,8,c0637190,c0cb9000,cddb62d0,1,c8219
040) at vrevoke+0x92
genfs_revoke(dd8dcea0,8,c0637190,c0cb9000,cddb62d0,1,c8219040,dd8dcf24,c06371a3,
cddb62d0) at genfs_revoke+0x1a
VOP_REVOKE(cddb62d0,1,1,c7479a40,0,c0643d08,c7a7c588,c7a7c588,cddb62d0,c8219054)
 at VOP_REVOKE+0x4a
exit1(c53e9560,0,c53e9560,dd8dcfa8,dd8dcf9c,c08dbcd3,c53e9560,dd8dcf68,dd8dcf60,
81c7000) at exit1+0x677
sys_exit(c53e9560,dd8dcf68,dd8dcf60,81c7000,c8437370,c0f5a92c,dd8dcf68,0,0,0) at
 sys_exit+0x36
syscall() at syscall+0x83
--- syscall (number 1) ---

Notice the NULL first argument to vcache_remove.  This NULL is passed
to hash32_buf which tries to access memory through the pointer, and
triggers a panic

Let's examine the pointer passed to ptyfs_reclaim:

crash> exa/xl dd8dcdfc
dd8dcdfc:       c0cb8df0

c0cb8df0 should be a pointer to a vnode.

crash> show vnode c0cb8df0
crash> show vnode/f c0cb8df0

No output.  I wonder why.  At least its not a pointer to a
completely zeroed struct vnode:

crash> exa/m c0cb8df0,10
vop_reclaim_desc:       20000000 a69fd9c0 00000000 0c8ecbc0      ...............

vop_reclaim_desc+0x10:  ffffffff ffffffff ffffffff 04000000     ................

vop_reclaim_vp_offsets+0x4:     ffffffff 1f000000 b29fd9c0 00010000     ........
........
vop_inactive_desc+0xc:  308ecbc0 ffffffff ffffffff ffffffff     0...............


In case it makes any difference, I use init.chroot to run almost
everything except the kernel in a chroot; the /dev in the chroot is
a symlink to /dev.@machine, which resolves to /dev.i386 due to magic
symlinks.  mount(8) inside the chroot shows:

    ptyfs on /dev.i386/pts type ptyfs (local)

>Fix:

>Release-Note:

>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: Alan Barrett <apb@cequrux.com>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/49171: panic when closing a pty
Date: Fri, 3 Oct 2014 16:38:39 +0200

 On 02 Oct 2014, at 13:21, Alan Barrett <apb@cequrux.com> wrote:

 > Here's another instance of the same or a related problem.
 > Sources checked out from CVS with -D '2014-09-26 00:00 UTC'.
 >=20
 > The panic message is:
 >=20
 > kernel diagnostic assertion "node !=3D NULL" failed: file =
 "src/sys/kern/vfs_vnode.c", line 1426
 >=20
 > The backtrace is:
 >=20
 > #10 0xc08b9f40 in vpanic (
 >   fmt=3Dfmt@entry=3D0xc0cca854 "kernel %sassertion \"%s\" failed: file =
 \"%s\", line %d ",
 >   ap=3Dap@entry=3D0xdda78d9c =
 "\005\252\314\300\031\304\314\300Hn=EF=BF=BD\300\222\005")
 >   at src/sys/kern/subr_prf.c:338
 > #11 0xc0a6b604 in kern_assert (
 >   fmt=3Dfmt@entry=3D0xc0cca854 "kernel %sassertion \"%s\" failed: file =
 \"%s\", line %d ")
 >   at src/sys/lib/libkern/kern_assert.c:51
 > #12 0xc09e782e in vcache_remove (mp=3D0x0, key=3D0xc789a478, =
 key_len=3D8)
 >   at src/sys/kern/vfs_vnode.c:1426
 > #13 0xc07d3e7e in ptyfs_reclaim (v=3D0xdda78dfc)
 >   at src/sys/fs/ptyfs/ptyfs_vnops.c:228
 > #14 0xc0a01367 in VOP_RECLAIM (vp=3Dvp@entry=3D0xcc37fdc4)
 >   at src/sys/kern/vnode_if.c:1136
 > #15 0xc09e496d in vclean (vp=3Dvp@entry=3D0xcc37fdc4)
 >   at src/sys/kern/vfs_vnode.c:1032
 > #16 0xc09e6b0b in vgone (vp=3D0xcc37fdc4)
 >   at src/sys/kern/vfs_vnode.c:1145
 > #17 0xc09e6bae in vrevoke (vp=3D0xcc33d218)
 >   at src/sys/kern/vfs_vnode.c:1129
 > #18 0xc036ec3f in genfs_revoke (v=3D0xdda78ea0)
 >   at src/sys/miscfs/genfs/genfs_vnops.c:276
 > #19 0xc0a00c67 in VOP_REVOKE (vp=3Dvp@entry=3D0xcc33d218, =
 flags=3Dflags@entry=3D1)
 >   at src/sys/kern/vnode_if.c:656
 > #20 0xc0638173 in exit1 (l=3Dl@entry=3D0xd0c44a80, rv=3D0)
 >   at src/sys/kern/kern_exit.c:395
 > #21 0xc0638473 in sys_exit (l=3D0xd0c44a80, uap=3D0xdda78f68, =
 retval=3D0xdda78f60)
 >   at src/sys/kern/kern_exit.c:181
 > #22 0xc08dc743 in sy_call (rval=3D0xdda78f60, uap=3D0xdda78f68, =
 l=3D0xd0c44a80,
 >   sy=3D<optimized out>) at src/sys/sys/syscallvar.h:61
 > #23 sy_invoke (code=3D1, rval=3D0xdda78f60, uap=3D0xdda78f68, =
 l=3D0xd0c44a80,
 >   sy=3D<optimized out>) at src/sys/sys/syscallvar.h:85
 > #24 syscall (frame=3D0xdda78fa8)
 >   at src/sys/arch/x86/x86/syscall.c:156
 > #25 0xc01005c6 in Xsyscall ()
 > #26 0xdda78fa8 in ?? ()
 > Backtrace stopped: previous frame inner to this frame (corrupt stack?)

 We have two vnodes involved here:  0xcc33d218 gets revoked as it is the
 controlling tty and 0xcc37fdc4 gets revoked as it is an aliased device.

 0xcc33d218 is closed and dead.

 0xcc37fdc4 has "v_mount =3D=3D NULL", "v_specnode !=3D NULL" and "v_data =
 !=3D NULL"
 which can only happen during vnode creation after ptyfs_loadvnode() =
 called
 spec_node_init() and before vcache_get() calls vfs_insmntque().

 So we are revoking a partially initialized vnode and crash.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/49171 CVS commit: src/sys/kern
Date: Fri, 3 Oct 2014 14:45:38 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Fri Oct  3 14:45:38 UTC 2014

 Modified Files:
 	src/sys/kern: vfs_vnode.c

 Log Message:
 When creating a vnode with vcache_get() mark the vnode VI_CHANGING until
 it is fully initialised.  It may be on the specnode list before it is
 fully initialised and revoking it then would panic.

 Should prevent the panic from PR kern/49171 (panic when closing a pty).


 To generate a diff of this commit:
 cvs rdiff -u -r1.38 -r1.39 src/sys/kern/vfs_vnode.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Fri, 03 Oct 2014 15:00:38 +0000
Responsible-Changed-Why:
Take.


State-Changed-From-To: open->analyzed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Fri, 03 Oct 2014 15:00:38 +0000
State-Changed-Why:
Analyzed it and committed a fix.


From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49171: panic when closing a pty
Date: Fri, 17 Oct 2014 18:07:57 +0200

 Did you see this problem with vfs_vnode.c Rev. 1.39?

 Ok to request a pullup to NetBSD-7?

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: Alan Barrett <apb@cequrux.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49171: panic when closing a pty
Date: Sat, 18 Oct 2014 20:01:05 -0700

 On Fri, 17 Oct 2014, J. Hannken-Illjes wrote:
 > Did you see this problem with vfs_vnode.c Rev. 1.39?
 >
 > Ok to request a pullup to NetBSD-7?

 No, I haven't seen this problem with vfs_vnode.c Rev. 1.39.

 The ptyfs changes which triggered the problem were made after 
 netbsd-7 was branched, so they are not in netbsd-7.  Nevertheless, 
 the fix here seems as though it would apply to netbsd-7.

 --apb (Alan Barrett)

State-Changed-From-To: analyzed->pending-pullups
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Sun, 19 Oct 2014 09:55:17 +0000
State-Changed-Why:
Pullup requested, Ticket #150


From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49171: panic when closing a pty
Date: Sun, 19 Oct 2014 11:52:50 +0200

 On 19 Oct 2014, at 05:05, Alan Barrett <apb@cequrux.com> wrote:
 > On Fri, 17 Oct 2014, J. Hannken-Illjes wrote:
 >> Did you see this problem with vfs_vnode.c Rev. 1.39?
 >> 
 >> Ok to request a pullup to NetBSD-7?
 > 
 > No, I haven't seen this problem with vfs_vnode.c Rev. 1.39.
 > 
 > The ptyfs changes which triggered the problem were made after 
 > netbsd-7 was branched, so they are not in netbsd-7.

 These changes were pulled up with ticket #29.

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/49171 CVS commit: [netbsd-7] src/sys/kern
Date: Sun, 19 Oct 2014 10:02:59 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Oct 19 10:02:59 UTC 2014

 Modified Files:
 	src/sys/kern [netbsd-7]: vfs_vnode.c

 Log Message:
 Pull up following revision(s) (requested by hannken in ticket #150):
 	sys/kern/vfs_vnode.c: revision 1.39
 When creating a vnode with vcache_get() mark the vnode VI_CHANGING until
 it is fully initialised.  It may be on the specnode list before it is
 fully initialised and revoking it then would panic.
 Should prevent the panic from PR kern/49171 (panic when closing a pty).


 To generate a diff of this commit:
 cvs rdiff -u -r1.37 -r1.37.2.1 src/sys/kern/vfs_vnode.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Sun, 19 Oct 2014 10:38:12 +0000
State-Changed-Why:
Pulled up to NetBSD-7.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.