NetBSD Problem Report #47514

From tsugutomo.enami@jp.sony.com  Wed Jan 30 02:57:05 2013
Return-Path: <tsugutomo.enami@jp.sony.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id B7EEC63EC52
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 30 Jan 2013 02:57:05 +0000 (UTC)
Message-Id: <tkrr4l3ml8j.fsf@sigxcpu.sm.sony.co.jp>
Date: Wed, 30 Jan 2013 11:57:00 +0900
From: tsugutomo.enami@jp.sony.com
To: gnats-bugs@gnats.NetBSD.org
Subject: Multiple dump -X triggers kernel panic in fss_ioctl

>Number:         47514
>Category:       kern
>Synopsis:       Multiple dump -X triggers kernel panic in fss_ioctl
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    hannken
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 30 03:00:01 +0000 2013
>Closed-Date:    Mon Feb 11 08:49:52 +0000 2013
>Last-Modified:  Sun Jun 09 16:25:00 +0000 2013
>Originator:     enami tsugutomo
>Release:        NetBSD 6.0_STABLE
>Organization:
>Environment:
System: NetBSD rplaca.sm.sony.co.jp 6.0_STABLE NetBSD 6.0_STABLE (GENERIC) #2: Mon Jan 7 16:53:59 JST 2013 enami@sigfpe.sm.sony.co.jp:/home/enami/src/netbsd-6/obj.amd64/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:

Recently, I've updated amanda in pkgsrc (from few years old one)
and kernel starts to panic since then.  It looks like the amanda
in pkgsrc is added facility to use dump -X if possilble on last
summer.

Here is the panic message and stacktrace (copied by hand):

uvm_fault(0xfffffe80bda3bd40, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff804bf1af cs 8 rflags 10283 cr2 8 cpl 0 rsp fffffe8006d59820
kernel: page fault trap, code=0
Stopped in pid 1713.1 (dump) at netbsd:mutex_vector_enter+0x80: movq 18(%r15), %rax
db{0}> bt
mutex_vector_enter() at netbsd:mutex_vector_enter+0x80
fss_ioctl() at netbsd:fss_ioctl+0xed
VOP_IOCTL() at netbsd:VOP_IOCTL+0x3b
vn_ioctl() at netbsd:vn_ioctl+0x76
sys_ioctl() at netbsd:sys_ioctl+0x13c
syscall() at netbsd:syscall+0xc4
db{0}>

The value of %r15 is fffffffffffffff0

With my amanda configuration, up to 8 dump will runs in parallel.
The system has two cpus.

>How-To-Repeat:

Install amanda from pkgsrc and setup to run multiple dumps in parallel.

>Fix:

I guess there is race condition between fss_open and fss_close.
Here is possible story:

    A process calls fss_open while another process is calling
    fss_close (since the device driver is marked as MPSAFE).  In
    the fss_close, no lock is held if control is between
    mutex_exit(&sc->slock) and fss_ioctl(dev, FSSIOCCLR...) for
    example.  So, fss_open may return successfully during that.
    Then the fss_close will detatch the device, before the
    process which opened the fss device issues FSSIOCSET ioctl
    (mutexes are destroyed and softc is freed as a result).
    Later, the ioctl will be issued and it raises kernel panic.

The value of %r15 may indicate destroyed mutex.

>Release-Note:

>Audit-Trail:
From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47514: Multiple dump -X triggers kernel panic in fss_ioctl
Date: Wed, 30 Jan 2013 11:35:12 +0100

 --Apple-Mail=_B5B3A17D-6C5A-418D-87AF-705EBFBC1375
 Content-Transfer-Encoding: 7bit
 Content-Type: text/plain;
 	charset=us-ascii

 Please try the attached patch.  If you are not able to build a kernel
 please drop me a note containing the output of "uname -a".

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)


 --Apple-Mail=_B5B3A17D-6C5A-418D-87AF-705EBFBC1375
 Content-Disposition: attachment;
 	filename=fss.c.diff
 Content-Type: application/octet-stream;
 	name="fss.c.diff"
 Content-Transfer-Encoding: 7bit

 Index: fss.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/fss.c,v
 retrieving revision 1.83
 diff -p -u -2 -r1.83 fss.c
 --- fss.c	28 Jul 2012 16:14:17 -0000	1.83
 +++ fss.c	30 Jan 2013 10:33:15 -0000
 @@ -224,4 +224,5 @@ fss_close(dev_t dev, int flags, int mode
  	error = 0;

 +	mutex_enter(&fss_device_lock);
  restart:
  	mutex_enter(&sc->sc_slock);
 @@ -229,4 +230,5 @@ restart:
  		sc->sc_flags &= ~mflag;
  		mutex_exit(&sc->sc_slock);
 +		mutex_exit(&fss_device_lock);
  		return 0;
  	}
 @@ -240,10 +242,7 @@ restart:
  	if ((sc->sc_flags & FSS_ACTIVE) != 0) {
  		mutex_exit(&sc->sc_slock);
 +		mutex_exit(&fss_device_lock);
  		return error;
  	}
 -	if (! mutex_tryenter(&fss_device_lock)) {
 -		mutex_exit(&sc->sc_slock);
 -		goto restart;
 -	}

  	KASSERT((sc->sc_flags & FSS_ACTIVE) == 0);

 --Apple-Mail=_B5B3A17D-6C5A-418D-87AF-705EBFBC1375--

From: tsugutomo.enami@jp.sony.com
To: <gnats-bugs@netbsd.org>
Cc: <kern-bug-people@netbsd.org>, <gnats-admin@netbsd.org>,
        <netbsd-bugs@netbsd.org>
Subject: Re: kern/47514: Multiple dump -X triggers kernel panic in fss_ioctl
Date: Fri, 01 Feb 2013 15:07:53 +0900

 "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de> writes:

 >  --Apple-Mail=_B5B3A17D-6C5A-418D-87AF-705EBFBC1375
 >  Content-Transfer-Encoding: 7bit
 >  Content-Type: text/plain;
 >  	charset=us-ascii
 >  
 >  Please try the attached patch.  If you are not able to build a kernel
 >  please drop me a note containing the output of "uname -a".

 I've applied the patch to my netbsd-6 working directory and it looks
 like the system survived at least the nightly dump of last night.

 Thanks.

 enami.

From: "J. Hannken-Illjes" <hannken@eis.cs.tu-bs.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/47514: Multiple dump -X triggers kernel panic in fss_ioctl
Date: Fri, 1 Feb 2013 09:01:18 +0100

 On Feb 1, 2013, at 7:10 AM, tsugutomo.enami@jp.sony.com wrote:

 > I've applied the patch to my netbsd-6 working directory and it looks
 > like the system survived at least the nightly dump of last night.

 Will commit in a few days then ...

 --
 J. Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

From: "Juergen Hannken-Illjes" <hannken@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/47514 CVS commit: src/sys/dev
Date: Wed, 6 Feb 2013 09:29:47 +0000

 Module Name:	src
 Committed By:	hannken
 Date:		Wed Feb  6 09:29:46 UTC 2013

 Modified Files:
 	src/sys/dev: fss.c

 Log Message:
 Take fss_device_lock first when closing a fss device.

 Fixes PR kern/47514 (Multiple dump -X triggers kernel panic in fss_ioctl)


 To generate a diff of this commit:
 cvs rdiff -u -r1.83 -r1.84 src/sys/dev/fss.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->hannken
Responsible-Changed-By: hannken@NetBSD.org
Responsible-Changed-When: Fri, 08 Feb 2013 10:01:54 +0000
Responsible-Changed-Why:
Take.


State-Changed-From-To: open->pending-pullups
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Fri, 08 Feb 2013 10:01:54 +0000
State-Changed-Why:
Fixed in tree -- pullup requested.


From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/47514 CVS commit: [netbsd-6] src/sys/dev
Date: Sun, 10 Feb 2013 23:57:26 +0000

 Module Name:	src
 Committed By:	riz
 Date:		Sun Feb 10 23:57:26 UTC 2013

 Modified Files:
 	src/sys/dev [netbsd-6]: fss.c

 Log Message:
 Pull up following revision(s) (requested by hannken in ticket #808):
 	sys/dev/fss.c: revision 1.84
 Take fss_device_lock first when closing a fss device.
 Fixes PR kern/47514 (Multiple dump -X triggers kernel panic in fss_ioctl)


 To generate a diff of this commit:
 cvs rdiff -u -r1.81.4.1 -r1.81.4.2 src/sys/dev/fss.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/47514 CVS commit: [netbsd-6-0] src/sys/dev
Date: Sun, 10 Feb 2013 23:57:38 +0000

 Module Name:	src
 Committed By:	riz
 Date:		Sun Feb 10 23:57:38 UTC 2013

 Modified Files:
 	src/sys/dev [netbsd-6-0]: fss.c

 Log Message:
 Pull up following revision(s) (requested by hannken in ticket #808):
 	sys/dev/fss.c: revision 1.84
 Take fss_device_lock first when closing a fss device.
 Fixes PR kern/47514 (Multiple dump -X triggers kernel panic in fss_ioctl)


 To generate a diff of this commit:
 cvs rdiff -u -r1.81.4.1 -r1.81.4.1.4.1 src/sys/dev/fss.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: hannken@NetBSD.org
State-Changed-When: Mon, 11 Feb 2013 08:49:52 +0000
State-Changed-Why:
Pulled up.


From: "SAITOH Masanobu" <msaitoh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/47514 CVS commit: [netbsd-5] src/sys/dev
Date: Sun, 9 Jun 2013 11:29:43 +0000

 Module Name:	src
 Committed By:	msaitoh
 Date:		Sun Jun  9 11:29:43 UTC 2013

 Modified Files:
 	src/sys/dev [netbsd-5]: fss.c

 Log Message:
 Pull up following revision(s) (requested by gdt in ticket #1853):
 	sys/dev/fss.c: revision 1.84
 Take fss_device_lock first when closing a fss device.
 Fixes PR kern/47514 (Multiple dump -X triggers kernel panic in fss_ioctl)


 To generate a diff of this commit:
 cvs rdiff -u -r1.60.4.6 -r1.60.4.7 src/sys/dev/fss.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "SAITOH Masanobu" <msaitoh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/47514 CVS commit: [netbsd-5-1] src/sys/dev
Date: Sun, 9 Jun 2013 16:18:57 +0000

 Module Name:	src
 Committed By:	msaitoh
 Date:		Sun Jun  9 16:18:57 UTC 2013

 Modified Files:
 	src/sys/dev [netbsd-5-1]: fss.c

 Log Message:
 Pull up following revision(s) (requested by gdt in ticket #1853):
 	sys/dev/fss.c: revision 1.84
 Take fss_device_lock first when closing a fss device.
 Fixes PR kern/47514 (Multiple dump -X triggers kernel panic in fss_ioctl)


 To generate a diff of this commit:
 cvs rdiff -u -r1.60.4.3 -r1.60.4.3.2.1 src/sys/dev/fss.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "SAITOH Masanobu" <msaitoh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/47514 CVS commit: [netbsd-5-2] src/sys/dev
Date: Sun, 9 Jun 2013 16:20:29 +0000

 Module Name:	src
 Committed By:	msaitoh
 Date:		Sun Jun  9 16:20:29 UTC 2013

 Modified Files:
 	src/sys/dev [netbsd-5-2]: fss.c

 Log Message:
 Pull up following revision(s) (requested by gdt in ticket #1853):
 	sys/dev/fss.c: revision 1.84
 Take fss_device_lock first when closing a fss device.
 Fixes PR kern/47514 (Multiple dump -X triggers kernel panic in fss_ioctl)


 To generate a diff of this commit:
 cvs rdiff -u -r1.60.4.6 -r1.60.4.6.2.1 src/sys/dev/fss.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:


 Pullup request to -5.

 The following script provokes the crash around 10% of the time (with 4
 snapshot devices).  With the fix, 100 runs are fine.

 ----------------------------------------
 #!/bin/sh

 fssconfig -l

 for fs in / /usr /n1; do
    for lev in 0 1 2 3 4; do
 	dump  $lev -f - -XS $fs > /dev/null &
    done
 done

 sleep 1
 pkill dump

 sleep 1
 fssconfig -l
 ----------------------------------------

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.