NetBSD Problem Report #38889

From www@NetBSD.org  Sun Jun  8 13:53:21 2008
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 113C363B91E
	for <gnats-bugs@gnats.netbsd.org>; Sun,  8 Jun 2008 13:53:21 +0000 (UTC)
Message-Id: <20080608135320.7C9F263B8BC@narn.NetBSD.org>
Date: Sun,  8 Jun 2008 13:53:20 +0000 (UTC)
From: ad@netbsd.org
Reply-To: ad@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: Crash on open/mmap/close of block device
X-Send-Pr-Version: www-1.0

>Number:         38889
>Category:       kern
>Synopsis:       Crash on open/mmap/close of block device
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    yamt
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jun 08 13:55:00 +0000 2008
>Last-Modified:  Mon Jun 11 21:30:02 +0000 2012
>Originator:     Andrew Doran
>Release:        4.99.64
>Organization:
The NetBSD Project
>Environment:
i386
>Description:
trap type 6 code 0 eip c04edb13 cs 8 eflags 10246 cr2 1008 ilevel 0
kernel: supervisor trap page fault, code=0
Stopped in pid 315.1 (a.out) at netbsd:spec_close+0x53: cmpl    0x8(%eax),%esi
db{4}> bt
spec_close(d0325a28,20002,d0325a3c,c04e3018,d003a010,c0819640,d003a010,1,ce1e6c0
0,1) at netbsd:spec_close+0x53
VOP_CLOSE(d003a010,1,ce1e6c00,d0398f28,0,cf65d080,d0325a7c,c04e3902,d003a010,1) a
t netbsd:VOP_CLOSE+0x6c
vn_close(d003a010,1,ce1e6c00,1,d0398f28,d0398f3c,d0325acc,c045760d,cf65d080,0) a
t netbsd:vn_close+0x4e
vn_closefile(cf65d080,0,d0325acc,c0456c32,d03486c0,0,d0398f28,0,d0398f28,c0a8c66
6) at netbsd:vn_closefile+0x22
closef(cf65d080,3,ffffffff,d02f1179,d0325b00,d03579ac,cff84e08,d02fb758,ce1e6c00
,cf65d080) at netbsd:closef+0x5d
fd_close(3,0,d0325bc8,c0470011,d02f100c,ffffffff,14,1,d02f1004,d0325bc8) at netb
sd:fd_close+0x124
fd_free(d02f1004,0,d0325bc8,ffffffff,cf654400,1,0,d02f1018,0,d0398f28) at netbsd
:fd_free+0x98
exit1(d035b9c0,8b,d0325bfc,c047c86c,c4f58000,1,1,c04772e5,d035bb4c,3f) at netbsd
:exit1+0x1aa
sigexit(d035b9c0,b,b,0,c50fdf90,0,d035bb3c,0,0,0) at netbsd:sigexit+0x1cc
postsig(b,d0325d00,0,0,1,d0381070,b,d0325d00,d035b9c0,d0325d30) at netbsd:postsi
g+0xfd
lwp_userret(d035b9c0,d0325d00,1,0,d035b9c0,bbafc000,cf65d080,0,d035b9c0,c0a8a91c
) at netbsd:lwp_userret+0x148
trap() at netbsd:trap+0x95d
--- trap (number 6) ---
0x8048815:
db{4}> show vnode d003a010
OBJECT 0xd003a010: locked=0, pgops=0xc0816020, npages=0, refs=2

VNODE flags 2000038<MAPPED,MPSAFE,LOCKSWORK,SOFTDEP>
mp 0xcf347600 numoutput 0 size 0x0 writesize 0x0
data 0xd003b360 writecount 0 holdcnt 0
tag VT_UFS(1) type VBLK(3) mount 0xcf347600 typedata 0xcfe93ef0
v_lock 0xd003a0b0 v_vnlock 0xd003a0b0

db{4}> whatis 0xcfe93ef0
0xcfe93ef0 is 0xcfe93ef0+0 in POOL 'kmem-20' (allocated)
0xcfe93ef0 is 0xcfe93dc0+304 in VMEM 'kmem' (allocated)
0xcfe93ef0 is 0xcfe92000+7920 from VMMAP 0xc0b250a0
db{4}> x/Lx 0xcfe93ef0,20
0xcfe93ef0:     d002f0bd    1000        0           10000       1000        0

vp->v_specnode->sn_dev == 0x1000, which is crazy. In decimal it is
4096, which is the size that the attached test program tries to
mmap(). Perhaps a pointer is being misused?

The next link (sn_next) is garbage:

db{4}> whatis d002f0bd
0xd002f0bd is 0xd002f000+189 in POOL 'kvakernel' (allocated)
0xd002f0bd is 0xd002f0bc+1 in POOL 'vnodepl' (allocated)
0xd002f0bd is 0xcffff000+196797 from VMMAP 0xc0b250a0
db{4}> show vnode d002f0bd
OBJECT 0xd002f0bd: locked=0, pgops=0xc08160, npages=0, refs=-1610612736

VNODE flags 30020000<LAYER>
mp 0xcf3476 numoutput 0 size 0x0 writesize 0x0
data 0xd0031d writecount 0 holdcnt 0
tag UNKNOWN(0) type UNKNOWN(16777216) mount 0xcf3476 typedata 0x4cfead9
v_lock 0xd002f15d v_vnlock 0xf8d002f1

So it appears the entire specnode_t has been corrupted.

>How-To-Repeat:
#include <unistd.h>
#include <sys/types.h>
#include <sys/fcntl.h>
#include <sys/mman.h>

main()
{
        char *p;
        int fd;


        fd = open("/dev/wd0g", O_RDONLY);
        if (fd < 0)
                err(1, "open");
        p = mmap(NULL, 4096, PROT_READ, MAP_FILE, fd, 0);
        if (p == MAP_FAILED)
                err(1, "mmap");
        printf("first byte %d\n", *p);
        if (close(fd) < 0)
                err(1, "close");
        printf("and again %d\n", *p);
}

>Fix:
Not known.

>Release-Note:

>Audit-Trail:
From: "Jukka Ruohonen" <jruoho@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/38889 CVS commit: src/tests/lib/libc/sys
Date: Tue, 6 Mar 2012 11:02:56 +0000

 Module Name:	src
 Committed By:	jruoho
 Date:		Tue Mar  6 11:02:56 UTC 2012

 Modified Files:
 	src/tests/lib/libc/sys: t_mmap.c

 Log Message:
 A test case for serious PR kern/38889: crash on open/mmap/close of block
 device. The test case is skipped for the time being as it replicates the
 panic described in the PR (tested on NetBSD/amd64 6.0 BETA).


 To generate a diff of this commit:
 cvs rdiff -u -r1.2 -r1.3 src/tests/lib/libc/sys/t_mmap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/38889
Date: Wed, 16 May 2012 21:40:41 +0200

 This still happens in -current.

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/38889
Date: Sun, 20 May 2012 21:23:42 +0200

 The specnode is apparently overwritten by uvm_ra_request(), which is called
 from uvm_get (sys/uvm/uvm_vnode.c:174):

         if ((access_type & VM_PROT_WRITE) == 0 && (flags & PGO_LOCKED) == 0) {
                 vn_ra_allocctx(vp);
                 uvm_ra_request(vp->v_ractx, advice, uobj, offset,
                     *npagesp << PAGE_SHIFT);
         }

 While vn_ra_allcoctx() tests for vp->v_type == VREG, uvm_ra_request does
 not. In this case v_ractx (overlaid via a union) is the same pointer as
 v_specnode, and uvm_ra_request has no chance to check for the vnode type
 any more.

 It is not obvious to me if the damage happens right in uvm_ra_request or
 later during the read handling, but this seems to be the start of it all.
 The test program does not trigger the bug any more, if the access to the 
 mmaped pages is commented out, and I verified nothing in miscfs/specfs is
 overwriting the value inside the specnode.


 Chuck?

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/38889
Date: Sun, 20 May 2012 21:27:51 +0200

 Forgot to mention: the obvious fix, avoiding the readahead handling
 if != VREG, does avoid the original problem, but now faulting-in of the
 mmaped page does not work any more.

 Martin

Responsible-Changed-From-To: kern-bug-people->yamt
Responsible-Changed-By: yamt@NetBSD.org
Responsible-Changed-When: Wed, 23 May 2012 13:28:29 +0000
Responsible-Changed-Why:
i'll look


From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: uebayasi@tombi.co.jp
Cc: netbsd-bugs@NetBSD.org, gnats-bugs@NetBSD.org
Subject: Re: kern/38889
Date: Thu, 24 May 2012 03:00:02 +0000 (UTC)

 hi,

 > I wonder if we can completely disallow mmap'ing block devices?  I suspect
 > it has never worked nor needed by anyone - otherwise why such a serious
 > problem has been open for a long time!

 i don't know.  but it might be useful for applications which want to
 do mmap+send for zero copy.  eg. an iscsi target

 please keep gnats-bugs@NetBSD.org in Cc: so that it's filed in gnats.

 YAMAMOTO Takashi

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, ad@netbsd.org
Subject: Re: kern/38889
Date: Thu, 24 May 2012 06:01:23 +0000 (UTC)

 hi,

 >  Forgot to mention: the obvious fix, avoiding the readahead handling
 >  if != VREG, does avoid the original problem, but now faulting-in of the
 >  mmaped page does not work any more.

 a guess: as VBLK's v_size is usually 0, genfs_getpages treats
 any requests as past EOF and bails out.

 YAMAMOTO Takashi

 >  
 >  Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/38889
Date: Thu, 24 May 2012 12:38:21 +0200

 Indeed, genfs_getpages() returns EINVAL due to read past EOF.

 Can we just skip that for VBLK?
 We'll need to fix the offset calculations (at least for endoffset) then.
 Should we just skip the "min" part, or set memeof to the maximum off_t
 value?

 Martin

From: Masao Uebayashi <uebayasi@tombi.co.jp>
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Cc: netbsd-bugs@NetBSD.org, gnats-bugs@NetBSD.org
Subject: Re: kern/38889
Date: Fri, 25 May 2012 08:57:48 +0900

 On Thu, May 24, 2012 at 03:00:02AM +0000, YAMAMOTO Takashi wrote:
 > hi,
 > 
 > > I wonder if we can completely disallow mmap'ing block devices?  I suspect
 > > it has never worked nor needed by anyone - otherwise why such a serious
 > > problem has been open for a long time!
 > 
 > i don't know.  but it might be useful for applications which want to
 > do mmap+send for zero copy.  eg. an iscsi target

 Understood, but that could be properly designed/implamented later.  I
 think it's better to just disallow mmap(bdev) on netbsd-6.

 (I also think that genfs is a wrong place to handle mmap(bdev).  IIUC, bdev's
 data has nothing to do with its hosting mount/filesystem.  Handling VREG and
 VBLK in one place sounds very odd to me.)

 > 
 > please keep gnats-bugs@NetBSD.org in Cc: so that it's filed in gnats.
 > 
 > YAMAMOTO Takashi

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/38889
Date: Fri, 25 May 2012 11:32:07 +0200

 --bg08WKrSYDhXBjb5
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 We can at least prevent the crash for -6. If I understood yamt
 correctly, this is correct anyway, only needs more changes in genfs.

 Should we commit this as a interim step and pull it up?

 Martin

 --bg08WKrSYDhXBjb5
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=patch

 Index: uvm_vnode.c
 ===================================================================
 RCS file: /cvsroot/src/sys/uvm/uvm_vnode.c,v
 retrieving revision 1.97
 diff -c -u -r1.97 uvm_vnode.c
 --- uvm_vnode.c	6 Sep 2011 16:41:55 -0000	1.97
 +++ uvm_vnode.c	25 May 2012 09:29:47 -0000
 @@ -170,7 +170,8 @@

  	UVMHIST_LOG(ubchist, "vp %p off 0x%x", vp, (int)offset, 0,0);

 -	if ((access_type & VM_PROT_WRITE) == 0 && (flags & PGO_LOCKED) == 0) {
 +	if (vp->v_type == VREG && (access_type & VM_PROT_WRITE) == 0
 +	    && (flags & PGO_LOCKED) == 0) {
  		vn_ra_allocctx(vp);
  		uvm_ra_request(vp->v_ractx, advice, uobj, offset,
  		    *npagesp << PAGE_SHIFT);

 --bg08WKrSYDhXBjb5--

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, yamt@NetBSD.org, gnats-admin@netbsd.org, 
	netbsd-bugs@netbsd.org, ad@netbsd.org
Cc: 
Subject: Re: kern/38889
Date: Fri, 25 May 2012 11:49:07 -0400

 On May 25,  9:35am, martin@duskware.de (Martin Husemann) wrote:
 -- Subject: Re: kern/38889

 |  We can at least prevent the crash for -6. If I understood yamt
 |  correctly, this is correct anyway, only needs more changes in genfs.
 |  
 |  Should we commit this as a interim step and pull it up?

 I think so.

 christos

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/38889 CVS commit: src/sys/uvm
Date: Fri, 1 Jun 2012 14:52:49 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Fri Jun  1 14:52:49 UTC 2012

 Modified Files:
 	src/sys/uvm: uvm_vnode.c

 Log Message:
 Only use generic readahead on VREG vnodes, the space used to store the
 context is not valid on other types.
 Prevents the crash reported in PR kern/38889, but does not fix the
 mmap of block devices, more work is needed (no size on VBLK vnodes).


 To generate a diff of this commit:
 cvs rdiff -u -r1.97 -r1.98 src/sys/uvm/uvm_vnode.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/38889 CVS commit: [netbsd-6] src
Date: Mon, 11 Jun 2012 21:25:03 +0000

 Module Name:	src
 Committed By:	riz
 Date:		Mon Jun 11 21:25:02 UTC 2012

 Modified Files:
 	src/sys/uvm [netbsd-6]: uvm_vnode.c
 	src/tests/lib/libc/sys [netbsd-6]: t_mmap.c

 Log Message:
 Pull up following revision(s) (requested by martin in ticket #301):
 	sys/uvm/uvm_vnode.c: revision 1.98
 	tests/lib/libc/sys/t_mmap.c: revision 1.3
 	tests/lib/libc/sys/t_mmap.c: revision 1.4
 	tests/lib/libc/sys/t_mmap.c: revision 1.5
 	tests/lib/libc/sys/t_mmap.c: revision 1.6
 Only use generic readahead on VREG vnodes, the space used to store the
 context is not valid on other types.
 Prevents the crash reported in PR kern/38889, but does not fix the
 mmap of block devices, more work is needed (no size on VBLK vnodes).
 Do not skip the block device mmap test, as it does not crash
 the kernel any more. Mark it as expected failure instead.
 mmap_block:
 do not use a hardcoded block device list, but query the kernel for attached
 disks instead, then try to mmap the raw partition.
 Use atf_tc_skip().
 A test case for serious PR kern/38889: crash on open/mmap/close of block
 device. The test case is skipped for the time being as it replicates the
 panic described in the PR (tested on NetBSD/amd64 6.0 BETA).


 To generate a diff of this commit:
 cvs rdiff -u -r1.97 -r1.97.8.1 src/sys/uvm/uvm_vnode.c
 cvs rdiff -u -r1.2 -r1.2.4.1 src/tests/lib/libc/sys/t_mmap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.