NetBSD Problem Report #45948

From sborrill@precedence.co.uk  Wed Feb  8 08:19:05 2012
Return-Path: <sborrill@precedence.co.uk>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 4837563BD87
	for <gnats-bugs@gnats.NetBSD.org>; Wed,  8 Feb 2012 08:19:05 +0000 (UTC)
Message-Id: <201202080819.q188J0Ej011913@precedence.co.uk>
Date: Wed, 8 Feb 2012 08:19:00 GMT
From: netbsd@precedence.co.uk
Reply-To: netbsd@precedence.co.uk
To: gnats-bugs@gnats.NetBSD.org
Subject: dk(4) on raid(4) panic on halt with netbsd-5
X-Send-Pr-Version: 3.95

>Number:         45948
>Category:       kern
>Synopsis:       dk(4) on raid(4) will panic with lock error if mounted when halting (netbsd-5)
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Feb 08 08:20:00 +0000 2012
>Closed-Date:    Sat Dec 19 02:33:35 +0000 2015
>Last-Modified:  Sat Dec 19 02:33:35 +0000 2015
>Originator:     Stephen Borrill
>Release:        NetBSD 5.1_STABLE
>Organization:

>Environment:


System: NetBSD  5.1_STABLE NetBSD 5.1_STABLE (DEBUG) #0: Thu Feb  2 17:13:56 GMT 2012 root@builder.internal.precedence.co.uk:/usr/obj/5.0/i386/sys/arch/i386/compile/DEBUG i386
Architecture: i386
Machine: i386
>Description:
On a netbsd-5 system, if a gpt partition table is created on a RAIDframe device (type
does not matter) and a wedge is created, if the wedge is mounted when the machine is
halted, it will panic. It will not panic if the wedge is not mounted.

raid0: RAID Level 1
raid0: Components: /dev/wd1a component1[**FAILED**]
raid0: Total Sectors: 10485632 (5119 MB)
dk0 at raid0: 077b35b2-4d9c-11e1-9d54-525400123456
dk0: 10485535 blocks at 64, type: ffs
# mount /dev/dk0 /mnt
# mount
/dev/wd0a on / type ffs (local)
kernfs on /kern type kernfs (local)
ptyfs on /dev/pts type ptyfs (local)
procfs on /proc type procfs (local)
/dev/dk0 on /mnt type ffs (local)
# sysctl -w ddb.onpanic=1
ddb.onpanic: 0 -> 1
# halt -p
Feb  2 16:21:39  halt: halted by root
Feb  2 16:21:39  syslogd: Exiting on signal 15
syncing disks... done
unmounting /mnt (/dev/dk0)...Mutex error: lockdebug_wantlock: locking
against myself

lock address : 0x00000000c12dcd1c type     :     sleep/adaptive
initialized  : 0x00000000c05066de
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  1
current cpu  :                  0 last held:                  0
current lwp  : 0x00000000cb292320 last held: 0x00000000cb292320
last locked  : 0x00000000c04a4eb5 unlocked : 0x00000000c04a50b8
owner field  : 0x00000000cb292320 wait/spin:                0/0

Turnstile chain at 0xc0c62b20.
=> No active turnstile for this lock.

panic: LOCKDEBUG
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c05dc2dc cs 8 eflags 246 cr2 bb91c004 ilevel 0
Stopped in pid 284.1 (halt) at  netbsd:breakpoint+0x4:  popl    %ebp
db{0}> bt
breakpoint(c0b5f79e,cb3bd7a8,c0b9a580,c051205f,0,1,0,0,cb3bd7a8,8) at netbsd:breakpoint+0x4
panic(c0b16344,c0b6b914,c0901f97,c0afdb1d,9340,1292320,0,c12dcd1c,0,0) at netbsd:panic+0x1b0
lockdebug_abort1(c0afdb1d,1,0,0,ca70a180,c0c6ba20,0,c0b9a698,c0c6ba20,0) at netbsd:lockdebug_abort1+0xbb
mutex_vector_enter(c12dcd1c,0,0,4,0,c1209828,c1209864,c04dec5a,a8,a8) at netbsd:mutex_vector_enter+0x394
dkwedge_del(cb3bd8b0,cb33eb00,10,c050aaef,c0c640cc,c0c60c20,cb3bd8dc,306b64,c0c640cc,c0c60c20) at netbsd:dkwedge_del+0x198
dkwedge_delall(c1209828,c0b84280,0,c08d41c0,1203,cb292320,cb3bd9dc,c0504f74,1203,3) at netbsd:dkwedge_delall+0x61
raidclose(1203,3,6000,cb292320,6000,3,6,3,cb624234,0) at netbsd:raidclose+0x12f
bdev_close(1203,3,6000,cb292320,0,0,cb3bda2c,1203,6000,0) at netbsd:bdev_close+0x84
spec_close(cb3bda38,20002,cb3bda4c,c055e5c8,cb624234,c09031a0,cb624234,3,ffffffff,3) at netbsd:spec_close+0x24b
VOP_CLOSE(cb624234,3,ffffffff,c12dcc00,c12dcc00,0,cb3bda9c,c04a4f67,cb624234,3) at netbsd:VOP_CLOSE+0x6c
vn_close(cb624234,3,ffffffff,c043cd8f,0,cb292320,0,c0900240,a800,cb292320) at netbsd:vn_close+0x4e
dkclose(a800,3,6000,cb292320,6000,3,6,3,cb4925d0,0) at netbsd:dkclose+0xe7
bdev_close(a800,3,6000,cb292320,0,0,cb3bdb1c,a800,6000,0) at netbsd:bdev_close+0x84
spec_close(cb3bdb28,20002,cb3bdb3c,c055e5c8,cb4925d0,c09031a0,cb4925d0,3,ffffffff,c12cf000) at netbsd:spec_close+0x24b
VOP_CLOSE(cb4925d0,3,ffffffff,0,0,c12cd7cc,c12cd780,cb37f6d4,cb37f6d4,cb37f6f8) at netbsd:VOP_CLOSE+0x6c
ffs_unmount(cb37f6d4,80000,0,0,0,0,cb3bdbbc,c055c49f,cb37f6d4,80000) at netbsd:ffs_unmount+0x1f4
VFS_UNMOUNT(cb37f6d4,80000,ca3a6cc0,0,1000,c0549cba,1,cb37f6d4,cb37f7cc,cb2ae000) at netbsd:VFS_UNMOUNT+0x26
dounmount(cb37f6d4,80000,cb292320,0,cb3bdbf8,cb292320,0,cb292320,cb3bdd00,c0b95fe0) at netbsd:dounmount+0x13f
vfs_unmountall(cb292320,0,0,c04c530d,ca38a63c,808,cb3bdc2c,c05e2dfb,0,cb292320) at netbsd:vfs_unmountall+0x86
vfs_shutdown(0,cb292320,0,0,cb3bdd00,0,cb3bdcdc,c04fed94,808,0) at netbsd:vfs_shutdown+0x8d
cpu_reboot(808,0,0,0,0,0,cb3bdc9c,c05c9d52,23,cb3bdcc0) at netbsd:cpu_reboot+0x13b
sys_reboot(cb292320,cb3bdd00,cb3bdd28,cb3bdd40,c05c9d00,ca3bcf60,1,808,0,bfbfeb28) at netbsd:sys_reboot+0x74
syscall(cb3bdd48,b3,ab,1f,1f,1,d,bfbfeb28,2,256) at netbsd:syscall+0xc8
db{0}> x 0x00000000c12dcd1c
0xc12dcd1c:     cb292324

(gdb) list *(0x00000000c05066de)
0xc05066de is in disk_init (/usr/src/5.0/sys/kern/subr_disk.c:195).
190             mutex_init(&diskp->dk_rawlock, MUTEX_DEFAULT, IPL_NONE);
191             mutex_init(&diskp->dk_openlock, MUTEX_DEFAULT, IPL_NONE);
192             LIST_INIT(&diskp->dk_wedges);
193             diskp->dk_nwedges = 0;
194             diskp->dk_labelsector = LABELSECTOR;
195             disk_blocksize(diskp, DEV_BSIZE);
196             diskp->dk_name = name;
197             diskp->dk_driver = driver;
(gdb) list *(0x00000000c04a4eb5)
0xc04a4eb5 is in dkclose (/usr/src/5.0/sys/dev/dkwedge/dk.c:973).
968
969             KASSERT(sc->sc_dk.dk_openmask != 0);
970
971             mutex_enter(&sc->sc_dk.dk_openlock);
972
973             if (fmt == S_IFCHR)
974                     sc->sc_dk.dk_copenmask &= ~1;
975             else
976                     sc->sc_dk.dk_bopenmask &= ~1;
977             sc->sc_dk.dk_openmask =
(gdb) list *(0x00000000c04a50b8)
0xc04a50b8 is in dkopen (/usr/src/5.0/sys/dev/dkwedge/dk.c:954).
949             sc->sc_dk.dk_openmask =
950                 sc->sc_dk.dk_copenmask | sc->sc_dk.dk_bopenmask;
951
952      popen_fail:
953             mutex_exit(&sc->sc_parent->dk_rawlock);
954             mutex_exit(&sc->sc_dk.dk_openlock);
955             return (error);
956     }
957
958     /*

>How-To-Repeat:
# qemu-img create -f qcow wd0.fs G5
# qemu-img create -f qcow wd1.fs G5
# qemu -hda wd0.fs -cdrom i386cd.iso
*install minimal NetBSD and halt*
# qemu -hda wd0.fs -hdb wd1.fs -boot c
*Login as root*
# cat > raid0.conf
START array
1 2 0

START disks
/dev/wd1a
absent

START layout
128 1 1 1

START queue
fifo 100
^D
# raidctl -C raid0.conf raid0
# raidctl -i raid0
# gpt create raid0
# gpt add -t ufs -b 64 raid0
Partition added, use:
        dkctl raid0 addwedge <wedgename> 64 10485535 <type>
to create a wedge for it
# dkctl raid0d addwedge dk0 64 10485535 ufs
dk0 created successfully.
# newfs -O2 -f 4096 -b 32768 -I /dev/rdk0
/dev/rdk0: 5119.9MB (10485528 sectors) block size 32768, fragment size 4096
        using 7 cylinder groups of 731.44MB, 23406 blks, 46080 inodes.
super-block backups (for fsck_ffs -b #) at:
192, 1498176, 2996160, 4494144, 5992128, 7490112, 8988096,
# mount /dev/dk0 /mnt
# halt -p
>Fix:
        <how to correct or work around the problem, if known (multiple
lines)>

>Release-Note:

>Audit-Trail:
From: Stephen Borrill <netbsd@precedence.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/45948: dk(4) on raid(4) panic on halt with netbsd-5
Date: Mon, 5 Mar 2012 16:41:52 +0000 (GMT)

 Work around is to ensure that filesystems on wedges are 
 explicitly unmounted.

 So something like the following in /etc/rc.shutdown.local:

 fs=`mount | awk '{if ($1 ~ "^/dev/dk[0-9]") print $3}'`
 for f in $fs
 do
  	echo "Unmounting $fs"
  	umount $fs
 done

 There may be processes with open files on the filesystem which will stop 
 it being unmounted. You may want to add # KEYWORD: shutdown to relevant 
 rc.d scripts

 In my case, this is only likely to be samba and istgt, 
 so I stop those in the rc.shutdown.local script:

 fs=`mount | awk '{if ($1 ~ "^/dev/dk[0-9]") print $3}'`
 if [ -n "$fs" ]; then
          for srv in smbd istgt
          do
                  /etc/rc.d/$srv status > /dev/null
                  if [ $? = 0 ]; then
                          /etc/rc.d/$srv stop
                  fi
          done

          for f in $fs
          do
                  echo "Unmounting $fs"
                  umount $fs
          done
 fi

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/45948: dk(4) on raid(4) panic on halt with netbsd-5
Date: Mon, 5 Mar 2012 17:32:55 +0000

 On Mon, Mar 05, 2012 at 04:45:02PM +0000, Stephen Borrill wrote:
  >  Work around is to ensure that filesystems on wedges are 
  >  explicitly unmounted.

 The code that -6 and -current have to unmount things in order isn't in
 -5, right? So this problem only affects -5?

 -- 
 David A. Holland
 dholland@netbsd.org

From: Stephen Borrill <netbsd@precedence.co.uk>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/45948: dk(4) on raid(4) panic on halt with netbsd-5
Date: Mon, 5 Mar 2012 19:24:07 +0000 (GMT)

 On Mon, 5 Mar 2012, David Holland wrote:
 > On Mon, Mar 05, 2012 at 04:45:02PM +0000, Stephen Borrill wrote:
 >  >  Work around is to ensure that filesystems on wedges are
 >  >  explicitly unmounted.
 >
 > The code that -6 and -current have to unmount things in order isn't in
 > -5, right? So this problem only affects -5?

 Right - it is tested as working correctly in -6 and -current.

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/45948: dk(4) on raid(4) panic on halt with netbsd-5
Date: Mon, 5 Mar 2012 19:36:16 +0000

 On Mon, Mar 05, 2012 at 07:25:04PM +0000, Stephen Borrill wrote:
  >  > The code that -6 and -current have to unmount things in order isn't in
  >  > -5, right? So this problem only affects -5?
  >  
  >  Right - it is tested as working correctly in -6 and -current.

 I have so tagged it, thanks.

 -- 
 David A. Holland
 dholland@netbsd.org

State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 19 Dec 2015 02:33:35 +0000
State-Changed-Why:
Problem only affected -5 and -5 is now EOL.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.